Let me top-post a question to the Wikidata dev team:
Where can we find documentation on what the Wikidata internal language codes actually mean? In particular, how do you map the language selector to the internal codes? I noticed some puzzling details:
* Wikidata uses "be-x-old" as a code, but MediaWiki messages for this language seem to use "be-tarask" as a language code. So there must be a mapping somewhere. Where?
* MediaWiki's http://www.mediawiki.org/wiki/Manual:$wgDummyLanguageCodes provides some mappings. For example, it maps "zh-yue" to "yue". Yet, Wikidata use both of these codes. What does this mean?
Answers to Nemo's points inline:
On 04/08/13 06:15, Federico Leva (Nemo) wrote:
Markus Krötzsch, 03/08/2013 15:48:
(3) Limited language support. The script uses Wikidata's internal language codes for string literals in RDF. In some cases, this might not be correct. It would be great if somebody could create a mapping from Wikidata language codes to BCP47 language codes (let me know if you think you can do this, and I'll tell you where to put it)
These are only a handful, aren't they?
There are about 369 language codes right now. You can see the complete list in langCodes at the bottom of the file
https://github.com/mkroetzsch/wda/blob/master/includes/epTurtleFileWriter.py
Most might be correct already, but it is hard to say. Also, is it okay to create new (sub)language codes for our own purposes? Something like simple English will hardly have an official code, but it would be bad to export is as "en".
(4) Limited site language support. To specify the language of linked wiki sites, the script extracts a language code from the URL of the site. Again, this might not be correct in all cases, and it would be great if somebody had a proper mapping from Wikipedias/Wikivoyages to language codes.
Apart from the above, doesn't wgLanguageCode in https://noc.wikimedia.org/conf/highlight.php?file=InitialiseSettings.php have what you need?
Interesting. However, the list there does not contain all 300 sites that we currently find in Wikidata dumps (and some that we do not find there, including things like dkwiki that seem to be outdated). The full list of sites we support is also found in the file I mentioned above, just after the language list (variable siteLanguageCodes).
Markus