A city has a Wikipedia page and a corresponding Wikidata-item-page. One of the item properties is Property:City_mayor.
If the mayor changes, and both have their own pages/items (http://de.wikipedia.org/wiki/Eberhard_Diepgen to http://de.wikipedia.org/wiki/Klaus_Wowereit for http://de.wikipedia.org/wiki/Berlin), changing the mayor would mean to disconnect/replace the item to item property. The change would be clean and logical with respect to translated labels.
However, where the city mayor is not a well known person (smaller cities), the City_mayor property is mostly likely a string literal.
Replacing the string (name) for the mayor in this case would require to empty ALL translations/transliterations in all other languages. Unfortunately, the system cannot really know whether an update of a translated label is the result of a correction (person did not change) or occurs as a result of changing the label.
The design of the UI should make this situation as transparent to editors as possible. It may help to provide two edit-buttons for language-sensitive string literals:
[edit translations] [edit new value] (or [replace value] ?)
In the second case, all existing translations would be blanked. Probably more or better ideas can be found... :-)
Gregor
The idea is that the mayor would not be represented as a String value, even for smaller cities, but always by an item. This would possibly lead to items that have no Wikipedia articles associated to them, but there is no problem with that.
But humans (and other entities) should not be represented by strings in the system, but by items.
Hope that helps with this, Denny
2012/8/14 Gregor Hagedorn g.m.hagedorn@gmail.com:
A city has a Wikipedia page and a corresponding Wikidata-item-page. One of the item properties is Property:City_mayor.
If the mayor changes, and both have their own pages/items (http://de.wikipedia.org/wiki/Eberhard_Diepgen to http://de.wikipedia.org/wiki/Klaus_Wowereit for http://de.wikipedia.org/wiki/Berlin), changing the mayor would mean to disconnect/replace the item to item property. The change would be clean and logical with respect to translated labels.
However, where the city mayor is not a well known person (smaller cities), the City_mayor property is mostly likely a string literal.
Replacing the string (name) for the mayor in this case would require to empty ALL translations/transliterations in all other languages. Unfortunately, the system cannot really know whether an update of a translated label is the result of a correction (person did not change) or occurs as a result of changing the label.
The design of the UI should make this situation as transparent to editors as possible. It may help to provide two edit-buttons for language-sensitive string literals:
[edit translations] [edit new value] (or [replace value] ?)
In the second case, all existing translations would be blanked. Probably more or better ideas can be found... :-)
Gregor
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
But humans (and other entities) should not be represented by strings in the system, but by items.
I wonder whether this would not be too inflexible. It would burden the use of wikidata with the responsibility to determine entity-identity in all cases where only a name-string is known.
In the example of the mayor: Assume that the new mayor of a city is named "John Smith". Wikidata already has 500 items for persons named John Smith. The Wikipedia-Wikidata editor must now determine whether it is good practice to simply create wikidata-item 501, not knowing whether it is one of these or not.
I fear that the practice is even more problematic in the reverse case. If in a large percentage of cases there is little doubt about identify, this could lead to the practice of always connecting to a wikidata-item for a person, should there be a person of this name. Henceforth, Wikidata would claim that the mayor of Erewhon previously was councilor in Owd-Negrin, even if there is only a chance identity of a name. Wikipedia disambiguation pages know how many homonymic highly notable persons exist - Wikidata will deal with the non- or less-notable ones as well.
A well known example is that it is not a good idea for scientific reference management to treat authors as person entities, since the "reverse engineering" of author identity from the n:m relation between person and name-string is normally not feasible.
I would prefer if the decision whether entity-identity is known or whether only a name-string or other label is known, should be left to the Wikidata editor community, and not prescribed by the software.
Gregor
Hoi, It being a database you will find that someone in Germany will take care of a new mayor in German places and determines for every wiki using the same instance of that person and consequently such relations are implicitly available. When people choose NOT to use the benefits of a database, on their own head be it. Thanks, Gerard
On 14 August 2012 22:57, Gregor Hagedorn g.m.hagedorn@gmail.com wrote:
But humans (and other entities) should not be represented by strings in the system, but by items.
I wonder whether this would not be too inflexible. It would burden the use of wikidata with the responsibility to determine entity-identity in all cases where only a name-string is known.
In the example of the mayor: Assume that the new mayor of a city is named "John Smith". Wikidata already has 500 items for persons named John Smith. The Wikipedia-Wikidata editor must now determine whether it is good practice to simply create wikidata-item 501, not knowing whether it is one of these or not.
I fear that the practice is even more problematic in the reverse case. If in a large percentage of cases there is little doubt about identify, this could lead to the practice of always connecting to a wikidata-item for a person, should there be a person of this name. Henceforth, Wikidata would claim that the mayor of Erewhon previously was councilor in Owd-Negrin, even if there is only a chance identity of a name. Wikipedia disambiguation pages know how many homonymic highly notable persons exist - Wikidata will deal with the non- or less-notable ones as well.
A well known example is that it is not a good idea for scientific reference management to treat authors as person entities, since the "reverse engineering" of author identity from the n:m relation between person and name-string is normally not feasible.
I would prefer if the decision whether entity-identity is known or whether only a name-string or other label is known, should be left to the Wikidata editor community, and not prescribed by the software.
Gregor
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Hi Gregor,
I disagree mostly with you in the question of using strings or items for persons, but incidentally, that does not matter. I agree with you fully in the following point:
2012/8/14 Gregor Hagedorn g.m.hagedorn@gmail.com:
I would prefer if the decision whether entity-identity is known or whether only a name-string or other label is known, should be left to the Wikidata editor community, and not prescribed by the software.
I fully agree with that. All I meant is that I would expect that editors will create items for persons, and not use strings. There is indeed nothing in the software to force them to do so.
So, yes, it is up to the community on how to do that.
Cheers, Denny
On 14/08/12 22:57, Gregor Hagedorn wrote:
I would prefer if the decision whether entity-identity is known or whether only a name-string or other label is known, should be left to the Wikidata editor community, and not prescribed by the software.
I'm afraid that this will not be really possible in practice, since there is no support for multilingual strings. If you want to display a mayor name in multiple languages, you will have to link to an entity.
We do have a data type that supports multilingual strings.
http://meta.wikimedia.org/wiki/Wikidata/Data_model#Multilingual_texts
It is obviously not implemented yet, but already we did think about this and it is in the spec. So that solution would be indeed practical.
Cheers, Denny
2012/8/15 Nikola Smolenski smolensk@eunet.rs:
On 14/08/12 22:57, Gregor Hagedorn wrote:
I would prefer if the decision whether entity-identity is known or whether only a name-string or other label is known, should be left to the Wikidata editor community, and not prescribed by the software.
I'm afraid that this will not be really possible in practice, since there is no support for multilingual strings. If you want to display a mayor name in multiple languages, you will have to link to an entity.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
I agree that it is desirable that most persons should be represented by items. I would only say: it might be prohibitive to determine entity identity _prior_ to entering data. In the case of reference data in my experience this is the case: A single person may have published scientific articles as: "Wang Lin", "Lin Wang", "L. Wang", "L. R. Wang", depending on editorial practices of the journal; but 2 publications of "Wang Lin" and "Wang Lin" may be authored by different persons.
Gerard is very optimistic that the community can fix it, but there must be a workflow towards that rather than a all-or-nothing situation. Therefore:
Is it possible to build flexibility into the UI so that a single property can be used with http://wikidata.org/vocabulary/datatype_items as well as http://wikidata.org/vocabulary/datatype_multitext ? I believe this would simplify the communities task of accepting new information in string form but attempting to ultimately normalize (e.g. persons) to items.
The data model is already type-flexible; the documentation says " Note that it is not required that Value belongs to the Datatype that is currently given to the Property in the system. In general, the UI and API of Wikidata will only allow Values that match the given Datatype, but if the Datatype is changed, then it will not be possible to update all stored data immediately." However, in the UI such a multitype capability would have to be foreseen (not necessarily implemented in the next phase, but planned for).
Gregor
On 15.08.2012 14:25, Gregor Hagedorn wrote:
Is it possible to build flexibility into the UI so that a single property can be used with http://wikidata.org/vocabulary/datatype_items as well as http://wikidata.org/vocabulary/datatype_multitext ? I believe this would simplify the communities task of accepting new information in string form but attempting to ultimately normalize (e.g. persons) to items.
I think that would be a very bad idea, precisely *because* it would encourage people to use strings instead of item references whenever they are not sure about the item (or too lazy to look). If we have a "mayor" property, it should be an entity reference, and nothing else.
However, the frontend should make it very easy to create new entities on the fly. I.e. if you don't find the Major when you type in he name, you can just choose to create an item for that name, without even leaving the page.
This may indeed lead to the creation of duplicates (multiple items describing the same thing). But that can easily be detected once people start to connect the entity with Wikipedia pages, because of conflicting interlanguage links. And it can easily be fixed by merging the items.
The same thing may happen btw when the system createws a new item in order to link two wikipedia articles when both had no prior connection to wikidata. There's no way to tell whether an item describing the concept is already in wikidata - so a new item is created. When it turns out that this item is a dupe, the two items can be merged into one.
-- daniel
Basically what Daniel proposed is, that it would be best practice that for every string that refers to a concept, event, thing, person, unless the editor is certain about item identity, a new wikidata item entity should be created.
I could imagine this as a possible and perhaps elegant solution. My concern is the handling of unknown identity and a workflow towards improved identity recognition, not a discussion string versus item.
Could it be that the types http://wikidata.org/vocabulary/datatype_monotext http://wikidata.org/vocabulary/datatype_multitext become redundant then? Is it possible to simplify the wikidata model by specifying that all language-specific strings are to be represented by an item that keeps the translations together? I find this appealing...
Gregor
On 15/08/12 15:03, Gregor Hagedorn wrote:
Basically what Daniel proposed is, that it would be best practice that for every string that refers to a concept, event, thing, person, unless the editor is certain about item identity, a new wikidata item entity should be created.
I could imagine this as a possible and perhaps elegant solution. My concern is the handling of unknown identity and a workflow towards improved identity recognition, not a discussion string versus item.
In the proposed case, would there be a problem with creating a "human name" entity that would not be a person and linking to that? If it is confirmed that certain link to this entity is a known person, it could be simply changed.
Theoretically multitext could be replaced, but I would not like to do that. A property like "Tagline" for a movie or motto for a country might make sense to be a multitext. Yes, you could make the tagline of a movie an item -- but do we really want to require it to be an intermediary item? The subtitle of a book? The ring name of a wrestler?
Monotext is irreplaceable, though, and it means a simple string without a language designation. Something like "Chemical symbol", I guess, would be a monotext, or ISO 3166 code. A intermediary item could not do the job in that case.
Therefore I think we should not get rid of monotext and multitext.
Cheers, Denny
2012/8/15 Gregor Hagedorn g.m.hagedorn@gmail.com:
Basically what Daniel proposed is, that it would be best practice that for every string that refers to a concept, event, thing, person, unless the editor is certain about item identity, a new wikidata item entity should be created.
I could imagine this as a possible and perhaps elegant solution. My concern is the handling of unknown identity and a workflow towards improved identity recognition, not a discussion string versus item.
Could it be that the types http://wikidata.org/vocabulary/datatype_monotext http://wikidata.org/vocabulary/datatype_multitext become redundant then? Is it possible to simplify the wikidata model by specifying that all language-specific strings are to be represented by an item that keeps the translations together? I find this appealing...
Gregor
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Theoretically multitext could be replaced, but I would not like to do that. A property like "Tagline" for a movie or motto for a country might make sense to be a multitext. Yes, you could make the tagline of a movie an item -- but do we really want to require it to be an intermediary item? The subtitle of a book? The ring name of a wrestler?
In one of the examples on the wikidata notes, the role played by an actor is already assumed to be an item. But I think the other examples are good cases where an item is not full convining. The only caveat might be that such examples are pretty rare. It clearly is a trade-off between structural simplicity (no extra type) and optimality in term of access and storage.
Monotext is irreplaceable, though, and it means a simple string without a language designation. Something like "Chemical symbol", I guess, would be a monotext, or ISO 3166 code. A intermediary item could not do the job in that case.
I think this would be xsd:String in the wikidata model (which has 3 String types). Monotext was defined to be with language designation. (Note: I have not fully understood the use cases for Monotext, see comment on wiki, perhaps they can be elaborated on the data model page and contrasted with language-neutral/zxx String and multilingual text).
With language-neutral/zxx string I mainly see the problem that as soon as you want to provide audio pronounciation, chemical symbols, the ISO codes become language dependent again.
So one may need either: String (lang:xzz) and nested within * Audio (lang:en) * Audio (lang:fr) * Audio (lang:it)
Or a multilingual String-Audio combination * String + Audio (lang:en) * String + Audio (lang:fr) etc.
At present neither seems optimal - I clearly don't have the solution
Gregor
2012/8/17 Gregor Hagedorn g.m.hagedorn@gmail.com:
Monotext is irreplaceable, though, and it means a simple string without a language designation. Something like "Chemical symbol", I guess, would be a monotext, or ISO 3166 code. A intermediary item could not do the job in that case.
I think this would be xsd:String in the wikidata model (which has 3 String types). Monotext was defined to be with language designation. (Note: I have not fully understood the use cases for Monotext, see comment on wiki, perhaps they can be elaborated on the data model page and contrasted with language-neutral/zxx String and multilingual text).
You are right, I mixed them up (that comes from not checking).
The usecase for monolingual text are a bit rare, and I am thinking of things like official motto (which is usually not translated), etymological annotations, or the official name of a company (also, usually not translated), the species name maybe? We have to think carefully about how delineate them from each other in the entry forms, or otherwise it might end up a bit messy.
I hope it is a bit clearer. I think that there are usecases for all three text types (monotext, multitext, string) and that a unification of some of them do not make sense. As usual, I might be wrong :)
Cheers, Denny
You are right, I mixed them up (that comes from not checking).
The usecase for monolingual text are a bit rare, and I am thinking of things like official motto (which is usually not translated),
I think if it is only "usually not", but sometimes indeed translated, using multilingual for the property would be a better choice. If only one language is available, the language fallback would always end with this.
etymological annotations, or the official name of a company (also, usually not translated),
"usually". Companies sometimes do run under local names (or variations): de: Sanyo Denki K.K. ja: San’yō Denki Kabushiki-gaisha, en: SANYO Electric Co. Ltd.
carefully about how delineate them from each other in the entry forms, or otherwise it might end up a bit messy.
I think when adding the option for non-linguistic content (= ISO zxx) for language-neutral entities (e.g. for scientific species names, post codes), this type is the least needed.
(If anything, it may be more valuable to add a default flag to indicate a primary name that should be used prior to the first in a language fallback. This would be valuable in "mixed" cases, where a string is translated in a few cases, but not in the majority of languages (the "usually not case"). Else in a rare border case, where a German company that provides translations to Japanese and Chinese, but not to english, a language fallback chain that does contain German may accidentally end up with Chinese. This solves a border case within the multilingual type which I believe cannot be solved with monolingual text.)
Gregor
(Often wrong but never in doubt :-) )