Aha, this feels much better to me, if plausible. 
And this is the sort of situation that will come up a lot. This sort of [subnamespace] mechanism could make ZIDs easier to remember, and the namespaces easier to parse, and limit the extent to which a high-cardinality subspace feels like an 'imposition' or skewing of the namespace as a whole.  [since we may harbor fuzzy meta-concepts like wanting to see a random ID]

On Sat, Mar 27, 2021 at 8:14 PM Philippe Verdy <verdyp@gmail.com> wrote:
And I still wonder why ZID's have to be entirely numeric with just the "Z" prefix. When in fact they are actually pagenames (on the wiki) that are also mapped to an internal pageid.
ZID's are just unique identifiers (supposed to be short and easy to process, and probably only in ASCII). But I don't wee why ZID's could not be "scoped".
For me the numeric form "Znnnn" is just the default form for "unscoped" ZID's. But ZID's that fall within the same registry, could very well have the form "Znnn/" (where "Znnn" is the "unscoped" parent ZID of a type) followed by the appropriate code from that type.

In that case a single unscoped ZID ("Z60" ) is sufficient as the parent ZID for the type "natural language" encoded according to BCP47. And then "Z60/en" is the ZID of English. This way we don't "reinvent the wheel", and we can reuse existing standard registries of codes, by scoping them in a parent ZID associated to the type representing this registry standard (here BCP47 is the registry standard).

To be able to use a "scoped" ZID (i.e. a ZID using subpages separated by "/"), the parent ZID (its base pagename in Mediawiki) just has to be a valid type in Wikifunctions (the parent type, here "Z60", may implement its own validator for the sub-ZID's it can process, i.e. check BCP47 conformance, and implement normalization, here the capitalization of codes as they lettercase is not significant, and the unification of "-" or "_" as per BCP47 rules, so that one cannot create non-conforming codes and so that the normalizer will also ensure that if one user enters the ZID "Z60/EN" in the JSON data to edit or import, it gets implicitly normalized to "Z60/en" by the validator implemented in the "Z60" type, and so that the presence in the same edit or import of both "Z60/en" and "Z60/EN" in the same "Multilingual text" intance will be treated as an error, by the "Multilingual text" type validator).

And this way, we no longer need to reserve large ranges: we can use as well coped ZIDs for Wikifunctions itself (e.g. builtin functions, builtin types...) and we no longer risk any collision.
These scoped ZIDs then work like namespaces in programming languages. And we are still compatible with MediaWiki and no longer need to reinvent our own encoding (pseudo-)standard.

This also allows different encoding standards to coexist, and allows implementing code conversions from one scope to another (when there's no ambiguity and codes are actually exact aliases), so that normalization is then possible as well (e.g. the "Z60" validator could recognize "Z60/ENG" as "valid", but implicitly converted to "Z60/en" by the normalization, by also accepting inputs using ISO 639-3 instead of just ISO 639-1 which is recommended in BCP47. Optionally, the "Z60" validator could recognize other locale identifiers such as Android's resource qualifiers like "zh-rCN" as being an alias of "zh-CN" in BCP47 and so normalizable to the "Z60/zh-CN" ZID).

And then we get the best of all worlds! No more need to remember many numeric ZID codes specific to Wikifunctions. And a simple and efficient way to process ZIDs, notably with natural languages that will be used a lot with standard i18n libraries.


Le sam. 27 mars 2021 à 09:52, Grounder UK <grounderuk@gmail.com> a écrit :
Some randomization is a better start than Latin-alphabetization. I don’t think anyone should be guessing what a ZID might be!

I would look at scripts, aiming to have every supported script represented in the first block of reserved ZIDs.

We should also reserve a few ZIDs for international and interlingual labels, as a “language”. This would include “en-GB” as a label for the object labelled “British English” in English, for example, or “m” for the SI unit labelled “mètre” in French. 

After that, we might choose to ensure that ISO 639-1 languages (184 with two-character codes, like “de”) have a lower ZID than other interface languages. This is because ISO 639-1 was intended to include the most common languages.

No worries, in any event.
Al.

On Sat, 27 Mar 2021 at 00:35, Denny Vrandečić <dvrandecic@wikimedia.org> wrote:
To close this thread:

The current proposal says to give Z1001-Z1006 to the big six UN languages, and then to assign Z1011ff. to all the other language codes based on the alphabetical order of their language codes.

I was wondering if, for the languages not in the big six, instead of using the alphabetical order, if there was an appetite to using a randomized order 

Advantages of alphabetical order: it enables a bit of guessing and remembering (if you know what "uz" is, you'll be able to guess "uz-cyrl" and "uz-latn" with quite some confidence).

Advantages of random order: no special weight to latin script and latinized names of languages.


On Thu, Mar 11, 2021 at 3:51 PM Denny Vrandečić <dvrandecic@wikimedia.org> wrote:
Thanks for the thoughts, these are super useful! And yes, I sent that mail yesterday out with not enough context.

Thad, thanks, LCID looks indeed very interesting! That could provide a source of numbers to draw from. But looking into it in detail, it also looks like "we took the English language names, and in that order assigned numbers, until we finished the first batch, and then added more batches chronologically". So at least there's precedent for doing that.

Lucas, I was also thinking about QIDs. But besides the point that David raised, or rather in addition to them, there's also the point that Wikidata should be the 'pure representation' of what a language is - whereas the objects in Wikifunctions representing languages should malleable to exactly what we need and want them to be, irrelevant of their independent ontological status in the world (I hope that makes sense?). In Wikifunctions we decide what a language is, and their fallbacks, etc., based on product needs. In Wikidata on the other side we decide what is a language and what is not based on what relevant sources are stating. These hopefully overlap, and ideally are equivalent, but in reality I don't expect them to be and I don't want to introduce a push for edits in Wikidata to make certain features in Wikifunctions work.

Yes, we should have mappings from the relevant ZIDs to QIDs, and/or the other way around, but that's why I think they shouldn't be the same.

Charles, regarding your point, yes, we should be compatible and map to external standards as much as possible, but for the same reason as to regarding Wikidata, we shouldn't simply import them wholesale.

I wrote down a few more thoughts on-wiki here:







On Thu, Mar 11, 2021 at 5:44 AM Charles Matthews via Abstract-Wikipedia <abstract-wikipedia@lists.wikimedia.org> wrote:

> On 11 March 2021 at 12:24 David Abián <davidabian@wikimedia.es> wrote:
>
>
> Hi,
>
> Although we usually claim that Wikidata entities and their URIs are
> stable, I believe that, unfortunately, their stability isn't as high as
> desirable in practice, or at least the risk of deletion, merging,
> redirection, redefinition of identity due to confusion or ambiguity,
> etc. exists and materializes too often (I hope I'm not being read by
> Wikidata haters :D). This happens with all kinds of Items, and it can
> happen with minority languages as well as with something as widespread
> as, for instance, Chinese.

> On 11/03/2021 09:50, Lucas Werkmeister wrote:
> > Wouldn’t it be better to refer to them using Wikidata item IDs?

Not an area where I'm an expert. But there seem to be quite a number of standard IDs for languages, include some which are ISO, and P424 on Wikidata which is "Wikimedia language code".

I would say the default solution would be Z-numbers, and a Wikidata property for WA ID so that cross-walking to any other standard ID is just a Wikidata query away. If what is intended is an exact fit with P424 then that would be duplication; but I suppose in the end that won't be the decision.

Charles

_______________________________________________
Abstract-Wikipedia mailing list
Abstract-Wikipedia@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
_______________________________________________
Abstract-Wikipedia mailing list
Abstract-Wikipedia@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
_______________________________________________
Abstract-Wikipedia mailing list
Abstract-Wikipedia@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
_______________________________________________
Abstract-Wikipedia mailing list
Abstract-Wikipedia@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia


--
Samuel Klein          @metasj           w:user:sj          +1 617 529 4266