"Scott MacLeod" worlduniversityandschool@gmail.com writes:
Hi Joe, Magnus, Andrew, GerardM, Jane, Daniel and Wikidatans, Since "Language fallback is not a luxury like it is for British English, it is essential for all the smaller languages. It is what prevents it from being editable / usable" (per GerardM), and in terms of Reasonator, statements, and careful design (DanielK), what are current Wikidata processes to plan eventually for all 7,106 living languages (plus even dead and invented languages) in the world per "Ethnologue: Languages of the World, Seventeenth edition" (http://www.ethnologue.com/statistics/size), as people add them, and use, for example, the ISO coding system (or similar) for this, to anticipate not yet added languages, and especially for 'smaller' languages that GerardM mentions?
Just FYI, the ISO 639 and Ethnologue are grossly incomplete in their coverage of world languages. One must assume some 10 times to 100 times more natural languages are currently in use than listed.
Some single additions have been made through the BCP47 and IANA, such as "en-GB-scouse" representing the Scouse dialect of British English, or "sl-rozaj-lipaw" — the Lipovaz dialect of Resian which is itself a variant of Slovenian spoken in Italy. In other fields, due differentiation is still lacking. For example, in the swiss Alps, almost ever village in ever vallley has its on language variety which are often mutually hardly comprehesible, but they all together have only one language code, "gsw", wich also covers a large part of Germanies South West and South Eastern France and their local language varieties. You can easily look up from a map that there are hundreds of cities, towns, villages, valleys, and even if only a thenth of them had a language of their own, "gsw" actually represts more than 1000 distinct languages. Considerig both spelling AND pronunciation, the deserve to be differenciated.
This is not meant do discourage you, or to say it was not manageable. You only need to be aware, that taking care of the few languages currently listed in ethnologue will not suffice, and coding them must be expected to be a bit more complex, than it appears at first sight.
In terms of British English (en-gb) and English (en) distinction, why not just code English in Wikidata as "ISO 639-3eng" per http://www.ethnologue.com/language/eng%5Bhttp://www.ethnologue.com/language/...] as part of a careful design for all languages, and then build out for smaller languages? (CC wiki WUaS is planning wiki schools in all 7,106 languages, plus dead and invented languages).
While the current 7106 is way too low, it does include some "Macrolanguages" (i.e. language groups) and many extinct and some invented languages.
It seems that using or keying in on the ISO system, or a similar one, would allow for remarkable extensibility and careful design of Wikidata, as well as fallback for other languages such as Hindi, Odia or Malayalam.
Yes indeed, only blindly following a body like SIL (editor of ISO 639-3 and Etnologue, btw. a fundamental christian missionary organization) with their rather slow process of adding languages (taking years) might be limiting our capacities and speed. I suggest that we evaluate our own needs first, then determine how to meet them best, and then cooperate with others.
Purodha