Some randomization is a better start than Latin-alphabetization. I don’t think anyone should be guessing what a ZID might be!

I would look at scripts, aiming to have every supported script represented in the first block of reserved ZIDs.

We should also reserve a few ZIDs for international and interlingual labels, as a “language”. This would include “en-GB” as a label for the object labelled “British English” in English, for example, or “m” for the SI unit labelled “mètre” in French. 

After that, we might choose to ensure that ISO 639-1 languages (184 with two-character codes, like “de”) have a lower ZID than other interface languages. This is because ISO 639-1 was intended to include the most common languages.

No worries, in any event.
Al.

On Sat, 27 Mar 2021 at 00:35, Denny Vrandečić <dvrandecic@wikimedia.org> wrote:
To close this thread:

The current proposal says to give Z1001-Z1006 to the big six UN languages, and then to assign Z1011ff. to all the other language codes based on the alphabetical order of their language codes.

I was wondering if, for the languages not in the big six, instead of using the alphabetical order, if there was an appetite to using a randomized order 

Advantages of alphabetical order: it enables a bit of guessing and remembering (if you know what "uz" is, you'll be able to guess "uz-cyrl" and "uz-latn" with quite some confidence).

Advantages of random order: no special weight to latin script and latinized names of languages.


On Thu, Mar 11, 2021 at 3:51 PM Denny Vrandečić <dvrandecic@wikimedia.org> wrote:
Thanks for the thoughts, these are super useful! And yes, I sent that mail yesterday out with not enough context.

Thad, thanks, LCID looks indeed very interesting! That could provide a source of numbers to draw from. But looking into it in detail, it also looks like "we took the English language names, and in that order assigned numbers, until we finished the first batch, and then added more batches chronologically". So at least there's precedent for doing that.

Lucas, I was also thinking about QIDs. But besides the point that David raised, or rather in addition to them, there's also the point that Wikidata should be the 'pure representation' of what a language is - whereas the objects in Wikifunctions representing languages should malleable to exactly what we need and want them to be, irrelevant of their independent ontological status in the world (I hope that makes sense?). In Wikifunctions we decide what a language is, and their fallbacks, etc., based on product needs. In Wikidata on the other side we decide what is a language and what is not based on what relevant sources are stating. These hopefully overlap, and ideally are equivalent, but in reality I don't expect them to be and I don't want to introduce a push for edits in Wikidata to make certain features in Wikifunctions work.

Yes, we should have mappings from the relevant ZIDs to QIDs, and/or the other way around, but that's why I think they shouldn't be the same.

Charles, regarding your point, yes, we should be compatible and map to external standards as much as possible, but for the same reason as to regarding Wikidata, we shouldn't simply import them wholesale.

I wrote down a few more thoughts on-wiki here:







On Thu, Mar 11, 2021 at 5:44 AM Charles Matthews via Abstract-Wikipedia <abstract-wikipedia@lists.wikimedia.org> wrote:

> On 11 March 2021 at 12:24 David Abián <davidabian@wikimedia.es> wrote:
>
>
> Hi,
>
> Although we usually claim that Wikidata entities and their URIs are
> stable, I believe that, unfortunately, their stability isn't as high as
> desirable in practice, or at least the risk of deletion, merging,
> redirection, redefinition of identity due to confusion or ambiguity,
> etc. exists and materializes too often (I hope I'm not being read by
> Wikidata haters :D). This happens with all kinds of Items, and it can
> happen with minority languages as well as with something as widespread
> as, for instance, Chinese.

> On 11/03/2021 09:50, Lucas Werkmeister wrote:
> > Wouldn’t it be better to refer to them using Wikidata item IDs?

Not an area where I'm an expert. But there seem to be quite a number of standard IDs for languages, include some which are ISO, and P424 on Wikidata which is "Wikimedia language code".

I would say the default solution would be Z-numbers, and a Wikidata property for WA ID so that cross-walking to any other standard ID is just a Wikidata query away. If what is intended is an exact fit with P424 then that would be duplication; but I suppose in the end that won't be the decision.

Charles

_______________________________________________
Abstract-Wikipedia mailing list
Abstract-Wikipedia@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
_______________________________________________
Abstract-Wikipedia mailing list
Abstract-Wikipedia@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia