The current spec of the data model states that an L-Item has a lemma, a language, and several forms, and the forms in turn have representations.

https://www.mediawiki.org/wiki/Extension:WikibaseLexeme/Data_Model

The language is a Q-Item, the lemma and the representations are Multilingual Texts. Multilingual texts are sets of pairs of strings and UserLanguageCodes.

My question is about the relation between representing a language as a Q-Item and as a UserLanguageCode.

A previous proposal treated lemmas and representations as raw strings, with the language pointing to the Q-Item being the only language information. This now is gone, and the lemma and representation carry their own language information.

How do they interact? The language set referencable through Q-Items is much larger than the set of languages with a UserLanguageCode, and indeed, the intention was to allow for every language to be representable in Wikidata, not only those with a UserLanguageCode.

I sense quite a problem here.

I see two possible ways to resolve this:

- return to the original model and use strings instead of Multilingual texts (with all the negative implications for variants)

- use Q-Items instead of UserLanguageCodes for Multilingual texts (which would be quite a migration)

I don't think restricting Wiktionary4Wikidata support to the list of languages with a UserLanguageCode is a viable solution, which would happen if we implement the data model as currently suggested, if I understand it correctly.

Cheers,

Denny