Hi all,
I'm a relative newcommer to Wikidata but long time OpenStreetMap contributor.
Recently OpenStreetMap has a situation where large numbers of translated names have been added to OSM objects. When asked about the origin of these names, I've been told a number of places, one of which is Wikidata.
What appears to be happening, from what I've seen, is that there's a small number of users copying data from other (not so reliable) sources, and then putting that data in Wikidata as aliases for place names, then when asked where the place names are from, they say "Wikidata".
The problem here is that many of these "translations" are simply made up. They're either transliterations or word for word translations, rather than being genuine names in another language.
Unfortunately, it's my understanding that Wikidata aliases can't be sourced (ie they can't be validated or invalidated like other facts).
If this is the case, it's a problem for both our projects.
I'd planned on my own implementation of using Wikidata names for places in OSM to create custom renderings, but we need to be able to know that the place names are something we can trace back and source properly.
- Serge
Hi Serge,
The short answer to this is that the purpose of aliases in Wikidata is to help searching for items, and nothing more. Aliases may include nicknames that are in no way official, and abbreviations that are not valid if used in another context. Therefore, they seem to be a poor source of data to import into other projects.
Wikidata has properties such as birth name (https://www.wikidata.org/wiki/Property:P1477) that are used to provide properly sourced multi-lingual text data for items.
Cheers,
Markus
On 10.03.2015 16:09, Serge Wroclawski wrote:
Hi all,
I'm a relative newcommer to Wikidata but long time OpenStreetMap contributor.
Recently OpenStreetMap has a situation where large numbers of translated names have been added to OSM objects. When asked about the origin of these names, I've been told a number of places, one of which is Wikidata.
What appears to be happening, from what I've seen, is that there's a small number of users copying data from other (not so reliable) sources, and then putting that data in Wikidata as aliases for place names, then when asked where the place names are from, they say "Wikidata".
The problem here is that many of these "translations" are simply made up. They're either transliterations or word for word translations, rather than being genuine names in another language.
Unfortunately, it's my understanding that Wikidata aliases can't be sourced (ie they can't be validated or invalidated like other facts).
If this is the case, it's a problem for both our projects.
I'd planned on my own implementation of using Wikidata names for places in OSM to create custom renderings, but we need to be able to know that the place names are something we can trace back and source properly.
- Serge
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Am 10.03.2015 um 16:55 schrieb Markus Krötzsch:
Hi Serge,
The short answer to this is that the purpose of aliases in Wikidata is to help searching for items, and nothing more. Aliases may include nicknames that are in no way official, and abbreviations that are not valid if used in another context. Therefore, they seem to be a poor source of data to import into other projects.
Wikidata has properties such as birth name (https://www.wikidata.org/wiki/Property:P1477) that are used to provide properly sourced multi-lingual text data for items.
Note: Wikidata doesn't yet support multilingual property values, only "monolingual" (language + value). However, multiple statements about the respective property can be used to provide values in different languages. That's actually desirable in this case, since it allows different sources to be given for different languages.
For towns and streets, the best property to use would probably be "official name" https://www.wikidata.org/wiki/Property:P1448
On 10.03.2015 17:09, Daniel Kinzler wrote:
Am 10.03.2015 um 16:55 schrieb Markus Krötzsch:
Hi Serge,
The short answer to this is that the purpose of aliases in Wikidata is to help searching for items, and nothing more. Aliases may include nicknames that are in no way official, and abbreviations that are not valid if used in another context. Therefore, they seem to be a poor source of data to import into other projects.
Wikidata has properties such as birth name (https://www.wikidata.org/wiki/Property:P1477) that are used to provide properly sourced multi-lingual text data for items.
Note: Wikidata doesn't yet support multilingual property values, only "monolingual" (language + value). However, multiple statements about the respective property can be used to provide values in different languages. That's actually desirable in this case, since it allows different sources to be given for different languages.
Good point; my formulation was ambiguous. In fact, alias-like properties are a case where you really want "monolingual text" data, since there is no one-to-one correspondence between aliases in different languages.
Markus
For towns and streets, the best property to use would probably be "official name" https://www.wikidata.org/wiki/Property:P1448
I suspect these are not wikidata aliases. They are probably labels in other languages.
While wikidata doesn't have a multilingual datatype it does allow you to add labels (and aliases) in any language and these labels, if they are correct, are the appropriate thing to use to localise osm place names
Hope this helps On 10 Mar 2015 22:21, "Markus Krötzsch" markus@semantic-mediawiki.org wrote:
On 10.03.2015 17:09, Daniel Kinzler wrote:
Am 10.03.2015 um 16:55 schrieb Markus Krötzsch:
Hi Serge,
The short answer to this is that the purpose of aliases in Wikidata is to help searching for items, and nothing more. Aliases may include nicknames that are in no way official, and abbreviations that are not valid if used in another context. Therefore, they seem to be a poor source of data to import into other projects.
Wikidata has properties such as birth name (https://www.wikidata.org/wiki/Property:P1477) that are used to provide properly sourced multi-lingual text data for items.
Note: Wikidata doesn't yet support multilingual property values, only "monolingual" (language + value). However, multiple statements about the respective property can be used to provide values in different languages. That's actually desirable in this case, since it allows different sources to be given for different languages.
Good point; my formulation was ambiguous. In fact, alias-like properties are a case where you really want "monolingual text" data, since there is no one-to-one correspondence between aliases in different languages.
Markus
For towns and streets, the best property to use would probably be "official name" https://www.wikidata.org/wiki/Property:P1448
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On Wed, Mar 11, 2015 at 9:59 AM, Joe Filceolaire filceolaire@gmail.com wrote:
I suspect these are not wikidata aliases. They are probably labels in other languages.
While wikidata doesn't have a multilingual datatype it does allow you to add labels (and aliases) in any language and these labels, if they are correct, are the appropriate thing to use to localise osm place names
Do labels provide attribution/sources? That seems to be the key missing ingredient in the OPs query.
Tom
Hope this helps On 10 Mar 2015 22:21, "Markus Krötzsch" markus@semantic-mediawiki.org wrote:
On 10.03.2015 17:09, Daniel Kinzler wrote:
Am 10.03.2015 um 16:55 schrieb Markus Krötzsch:
Hi Serge,
The short answer to this is that the purpose of aliases in Wikidata is to help searching for items, and nothing more. Aliases may include nicknames that are in no way official, and abbreviations that are not valid if used in another context. Therefore, they seem to be a poor source of data to import into other projects.
Wikidata has properties such as birth name (https://www.wikidata.org/wiki/Property:P1477) that are used to provide properly sourced multi-lingual text data for items.
Note: Wikidata doesn't yet support multilingual property values, only "monolingual" (language + value). However, multiple statements about the respective property can be used to provide values in different languages. That's actually desirable in this case, since it allows different sources to be given for different languages.
Good point; my formulation was ambiguous. In fact, alias-like properties are a case where you really want "monolingual text" data, since there is no one-to-one correspondence between aliases in different languages.
Markus
For towns and streets, the best property to use would probably be "official name" https://www.wikidata.org/wiki/Property:P1448
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
No they don't.
Having said that we should be aware that many of the 20 million items on wikidata don't have generally accepted names in all of the over 200 languages on wikidata. Mechanically generated names by transliteration bots or by direct word for word translation (depending on the custom generally used in the target language) may well be appropriate in many cases On 12 Mar 2015 02:39, "Tom Morris" tfmorris@gmail.com wrote:
On Wed, Mar 11, 2015 at 9:59 AM, Joe Filceolaire filceolaire@gmail.com wrote:
I suspect these are not wikidata aliases. They are probably labels in other languages.
While wikidata doesn't have a multilingual datatype it does allow you to add labels (and aliases) in any language and these labels, if they are correct, are the appropriate thing to use to localise osm place names
Do labels provide attribution/sources? That seems to be the key missing ingredient in the OPs query.
Tom
Hope this helps On 10 Mar 2015 22:21, "Markus Krötzsch" markus@semantic-mediawiki.org wrote:
On 10.03.2015 17:09, Daniel Kinzler wrote:
Am 10.03.2015 um 16:55 schrieb Markus Krötzsch:
Hi Serge,
The short answer to this is that the purpose of aliases in Wikidata is to help searching for items, and nothing more. Aliases may include nicknames that are in no way official, and abbreviations that are not valid if used in another context. Therefore, they seem to be a poor source of data to import into other projects.
Wikidata has properties such as birth name (https://www.wikidata.org/wiki/Property:P1477) that are used to provide properly sourced multi-lingual text data for items.
Note: Wikidata doesn't yet support multilingual property values, only "monolingual" (language + value). However, multiple statements about the respective property can be used to provide values in different languages. That's actually desirable in this case, since it allows different sources to be given for different languages.
Good point; my formulation was ambiguous. In fact, alias-like properties are a case where you really want "monolingual text" data, since there is no one-to-one correspondence between aliases in different languages.
Markus
For towns and streets, the best property to use would probably be "official name" https://www.wikidata.org/wiki/Property:P1448
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Mechanically generated names by transliteration bots or by direct word for word translation (depending on the custom generally used in the target language) may well be appropriate in many cases
Remember that I'm talking about place names, rather than other types of names. Within that scope, I don't understand how a made up name can be appropriate, moreover I don't understand how a made up name can be given equal footing as the correct name.
Imagine a situation where someone transliterates a name in, say French, but the French name for the place is different. How are we to distinguish between the two?
- Serge
Hoi, What would you do with the many, many Chinese place names in Wikidata where we have nothing but Chinese ? It is completely useless to me in this way. A good transliterations works for me. Like most people beyond that I do not care much about it being "official" or sourced. Thanks. GerardM
On 12 March 2015 at 09:37, Serge Wroclawski emacsen@gmail.com wrote:
Mechanically generated names by transliteration bots or by direct word for word translation (depending on the custom generally used in the target language) may well be appropriate in many cases
Remember that I'm talking about place names, rather than other types of names. Within that scope, I don't understand how a made up name can be appropriate, moreover I don't understand how a made up name can be given equal footing as the correct name.
Imagine a situation where someone transliterates a name in, say French, but the French name for the place is different. How are we to distinguish between the two?
- Serge
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Am 12.03.2015 um 10:03 schrieb Gerard Meijssen:
Hoi, What would you do with the many, many Chinese place names in Wikidata where we have nothing but Chinese ? It is completely useless to me in this way. A good transliterations works for me. Like most people beyond that I do not care much about it being "official" or sourced.
Decent automatic translitteration is fine I think. Automatic word-for-word *translation* however seems rather problematic.
I agree that word for word translations are not appropriate for English.
If there are languages which traditionally do use word for word then that might be appropriate for those languages On 12 Mar 2015 10:24, "Daniel Kinzler" daniel.kinzler@wikimedia.de wrote:
Am 12.03.2015 um 10:03 schrieb Gerard Meijssen:
Hoi, What would you do with the many, many Chinese place names in Wikidata
where we
have nothing but Chinese ? It is completely useless to me in this way. A
good
transliterations works for me. Like most people beyond that I do not
care much
about it being "official" or sourced.
Decent automatic translitteration is fine I think. Automatic word-for-word
*translation* however seems rather problematic.
Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
I completely forgot we already had the excellent transliteration gadget https://www.wikidata.org/wiki/MediaWiki:Gadget-SimpleTransliterate.js by Ebraminio https://www.wikidata.org/wiki/User:Ebraminio. Just made a rough patch https://www.wikidata.org/w/index.php?title=MediaWiki:Gadget-SimpleTransliterate.js&diff=203702222&oldid=155995810 to make it work with the new UI. Enjoy!
Il 12/03/2015 11:24, Daniel Kinzler ha scritto:
Am 12.03.2015 um 10:03 schrieb Gerard Meijssen:
Hoi, What would you do with the many, many Chinese place names in Wikidata where we have nothing but Chinese ? It is completely useless to me in this way. A good transliterations works for me. Like most people beyond that I do not care much about it being "official" or sourced.
Decent automatic translitteration is fine I think. Automatic word-for-word *translation* however seems rather problematic.