Hi there,
I was wondering how to get the language mappings between different wikipedia pages. This information seems to be available on Wikidata as I can find it through browsing different pages on Wikidata such as http://www.wikidata.org/wiki/Q213710 and the https://www.mediawiki.org/wiki/Manual:Langlinks_table mentions a langlinks table, but I can't figure out how to get a dump.
The "Wiki interlanguage link records" at http://dumps.wikimedia.org/wikidatawiki/20140705/ looked promising but that seems to contain user information if I'm not mistaken. For example, " select count(*), ll_title from langlinks group by 2 order by 1 desc limit 20;” results in:
+----------+--------------------------------------+ | count(*) | ll_title | +----------+--------------------------------------+ | 284 | User:تفکر | | 272 | user:OffsBlink | | 215 | User:YourEyesOnly | | 179 | User:MoiraMoira | | 65 | User:AvocatoBot | | 35 | User:Shikai shaw | | 35 | user:Shuaib-bot | | 33 | user:לערי ריינהארט | | 33 | User:Leyo | | 27 | user:Лобачев Владимир | | 20 | User:Wagino 20100516 | | 18 | user:Gangleri | | 17 | user:I18n | | 16 | user:Meursault2004 | | 12 | User:Labant | | 11 | User:Stryn | | 11 | User:angelia2041 | | 10 | user:Kelvin | | 10 | User:JCIV | | 9 | Template:Mbox | +----------+———————————————————+
I checked out the #mediawiki IRC channel someone recommended the "Interwiki link tracking records" but those seem to also contain al sorts of other links, and I don't see a way to filter out the "in other languages" links. It would be great if you could help me out.
Thanks!
Marieke van Erp
-- Computational Lexicology & Terminology Lab (CLTL) The Network Institute, VU University Amsterdam
De Boelelaan 1105 1081 HV Amsterdam, The Netherlands http://www.mariekevanerp.com http://www.newsreader-project.eu
Hey Marieke,
You can either use the Wikidata toolkit by Markus Krötzsch, if you want to work on the dump, or the Wikidata web API, if you only need a few such mappings at a time. On Jul 17, 2014 9:24 AM, "Erp, M.G.J. van" marieke.van.erp@vu.nl wrote:
Hi there,
I was wondering how to get the language mappings between different wikipedia pages. This information seems to be available on Wikidata as I can find it through browsing different pages on Wikidata such as http://www.wikidata.org/wiki/Q213710 and the https://www.mediawiki.org/wiki/Manual:Langlinks_table mentions a langlinks table, but I can't figure out how to get a dump.
The "Wiki interlanguage link records" at http://dumps.wikimedia.org/wikidatawiki/20140705/ looked promising but that seems to contain user information if I'm not mistaken. For example, " select count(*), ll_title from langlinks group by 2 order by 1 desc limit 20;” results in:
+----------+--------------------------------------+ | count(*) | ll_title | +----------+--------------------------------------+ | 284 | User:تفکر | | 272 | user:OffsBlink | | 215 | User:YourEyesOnly | | 179 | User:MoiraMoira | | 65 | User:AvocatoBot | | 35 | User:Shikai shaw | | 35 | user:Shuaib-bot | | 33 | user:לערי ריינהארט | | 33 | User:Leyo | | 27 | user:Лобачев Владимир | | 20 | User:Wagino 20100516 | | 18 | user:Gangleri | | 17 | user:I18n | | 16 | user:Meursault2004 | | 12 | User:Labant | | 11 | User:Stryn | | 11 | User:angelia2041 | | 10 | user:Kelvin | | 10 | User:JCIV | | 9 | Template:Mbox | +----------+———————————————————+
I checked out the #mediawiki IRC channel someone recommended the "Interwiki link tracking records" but those seem to also contain al sorts of other links, and I don't see a way to filter out the "in other languages" links. It would be great if you could help me out.
Thanks!
Marieke van Erp
-- Computational Lexicology & Terminology Lab (CLTL) The Network Institute, VU University Amsterdam
De Boelelaan 1105 1081 HV Amsterdam, The Netherlands http://www.mariekevanerp.com http://www.newsreader-project.eu
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l