Am 25.11.2016 um 12:16 schrieb David Cuenca Tudela:
If we want to
avoid this complexity, we could just go by prefix. So if the
languages is "de", variants like "de-CH" or "de-DE_old"
would be considered ok.
Ordering these alphabetically would put the "main" code (with no suffix)
first.
May be ok for a start.
I find this issue potentially controversial, and I think that the community at
large should be involved in this matter to avoid future dissatisfaction and to
promote involvement in the decision-making.
We should absolutely discuss this with Wiktionarians. My suggestion was intended
as a baseline implementation. Details about the restrictions on which variants
are allowed on a Lexeme, or in what order they are shown, can be changed later
without breaking anything.
In my opinion it would be more appropriate to use
standardized language codes,
and then specify the dialect with an item, as it provides greater flexibility.
However, as mentioned before I would prefer if this topic in particular would be
discussed with wiktionarians.
Using Items to represent dialects is going to be tricky. We need ISO language
codes for use in HTML and RDF. We can somehow map between Items and ISO codes,
but that's going to be messy, especially when that mapping changes.
So it seems like we need to further discuss how to represent a Lexeme's language
and each lemma's variant. My current thinking is to represent the language as an
Item reference, and the variant as an ISO code. But you are suggesting the
opposite.
I can see why one would want items for dialects, but I currently have no good
idea for making this work with the existing technology. Further investigation is
needed.
I have filed a Phabricator task for investiagting this. I suggest to take the
discussion about how to represent languages/variants/dialects/etc there:
https://phabricator.wikimedia.org/T151626
--
Daniel Kinzler
Senior Software Developer
Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.