https://www.mediawiki.org/wiki/Extension:WikibaseLexeme/Data_Model
The language is a Q-Item, the lemma and the representations are Multilingual Texts. Multilingual texts are sets of pairs of strings and UserLanguageCodes.
My question is about the relation between representing a language as a Q-Item and as a UserLanguageCode.
A previous proposal treated lemmas and representations as raw strings, with the language pointing to the Q-Item being the only language information. This now is gone, and the lemma and representation carry their own language information.
How do they interact? The language set referencable through Q-Items is much larger than the set of languages with a UserLanguageCode, and indeed, the intention was to allow for every language to be representable in Wikidata, not only those with a UserLanguageCode.
I sense quite a problem here.
I see two possible ways to resolve this:
- return to the original model and use strings instead of Multilingual texts (with all the negative implications for variants)
- use Q-Items instead of UserLanguageCodes for Multilingual texts (which would be quite a migration)
I don't think restricting Wiktionary4Wikidata support to the list of languages with a UserLanguageCode is a viable solution, which would happen if we implement the data model as currently suggested, if I understand it correctly.
Cheers,
Denny