Am 25.11.2016 um 12:16 schrieb David Cuenca Tudela:
If we want to avoid this complexity, we could just go by prefix. So if the languages is "de", variants like "de-CH" or "de-DE_old" would be considered ok. Ordering these alphabetically would put the "main" code (with no suffix) first. May be ok for a start.
I find this issue potentially controversial, and I think that the community at large should be involved in this matter to avoid future dissatisfaction and to promote involvement in the decision-making.
We should absolutely discuss this with Wiktionarians. My suggestion was intended as a baseline implementation. Details about the restrictions on which variants are allowed on a Lexeme, or in what order they are shown, can be changed later without breaking anything.
In my opinion it would be more appropriate to use standardized language codes, and then specify the dialect with an item, as it provides greater flexibility. However, as mentioned before I would prefer if this topic in particular would be discussed with wiktionarians.
Using Items to represent dialects is going to be tricky. We need ISO language codes for use in HTML and RDF. We can somehow map between Items and ISO codes, but that's going to be messy, especially when that mapping changes.
So it seems like we need to further discuss how to represent a Lexeme's language and each lemma's variant. My current thinking is to represent the language as an Item reference, and the variant as an ISO code. But you are suggesting the opposite.
I can see why one would want items for dialects, but I currently have no good idea for making this work with the existing technology. Further investigation is needed.
I have filed a Phabricator task for investiagting this. I suggest to take the discussion about how to represent languages/variants/dialects/etc there:
https://phabricator.wikimedia.org/T151626