[since it is my first intervention here, I quickly introduce myself: math PhD student, hobbist coder, interested by the semantic universe but I don’t know much than the general ideas for now.]
Le Mon, 02 Apr 2012 15:56:37 +0200, JFC Morfin jefsey@jefsey.com a écrit:
- Then, the third problem no one has addressed yet except ISO 3166,
is variance : two identical particulars (effects, names, data, etc.) may be different. eg. there are many ways to compute and present the same date. Are the results to be stored in Wikidata in all these ways every day and bridges to be built? or are they to be stored as a single data with the formulas to compute them, then how to be sure some parameters have not changed (i.e. death of the Emperor) and computation was not tampered with? Variance is everywhere (actually variance is most probably Life). ISO 3166 has no variance, because it is the sovereign reference: the list of States and laws languages (however, Palestine is in it already, Taiwan is there). ISO documents are in French, English and possibly in Russian. ISO 3166:1 states which are the normative languages in every country by reference to ISO 639 (list of language names). ISO 3166 defines the ccTLDs and is used in langtags to document languages and cultures. ISO 10646 (supported by UNICODE) is the scripts character coded tables. At binary layer it is full of variants (same graphs being supported by different code points).
I’m interested in this point since one often encounter on Wikipedia uncertainty/variance about some data: * dates can be known with some uncertainty (e.g. "born between -345 and -342", or "born in 734 or 736, depending of sources") * fixed dates could not be sufficient (e.g. "not born"/"not dead" for some mythological/religious characters, or "eternal" for the Eternal President of the Republic of North Korea) * some physical constants are defined up to a given precision (e.g. Avogadro constant) * names whose the writing is not fixed because of an oral tradition * the nationality of some people changed during their life so it cannot be considered in some specific cases as "one" data (e.g. Einstein)
Sébastien
PS: just curious: from what I understood, Unicode define some normalization rules to assure the unicity of a glyph vs code point (form C), no?