Hallo,
Preamble 1: This email probably falls under this FAQ question: Q: How will Wikidata change the way articles are edited? A: That’s part of what we have to figure out during the development, together with the community.
Preamble 2: It's possible that there's an answer to this issue already, but I couldn't find it.
A popular example of using Wikidata is that it makes maintaining articles about cities easier: When a mayor of a city changes, it must only be updated once.
The problem is that the mayor's name can be written differently in other languages. I didn't actually try running it myself, but as far as I understand, Wikidata supports translating names. But what happens when the mayor changes? It is likely that the name will be updated in the language spoken in that city. At that point articles in Wikipedia in other languages will probably show the name in the language of the city, which may be unreadable.
Let's take Haifa for an example. Its previous mayor was: he: עמרם מצנע en: Amram Mitzna ru: Амрам Мицна hr: Amram Micna etc.
Now it changes to: he: יונה יהב
And then suddenly all the articles about Haifa in all the languages will show the mayor's name as "יונה יהב", which most people won't be able to read. Maybe the Wikidata community will develop some kind of a policy that will discourage adding names in local scripts without any translation to a more common script. Maybe at some point software should even show a warning if somebody tries to do it.
The scenario can be even simpler: Somebody will vandalize Wikidata and change the mayor's name to some nonsense.
The most practical way to solve this is to show that some piece of data that affects a Wikipedia article in the watchlist, as if it is a change in the article itself. Is it possible? If not, is it planned?
It's a problem with Commons, too: An image that is used in an article can change in Commons and it won't appear in the watchlist. But I expect that it will happen a lot more often with Wikidata items and that the changes would be a lot more subtle and hard to notice: It's easy to notice that an image changed, but it's harder to notice a change in a number or a name of a mayor.
Another question is: What is the fallback mechanism if a name was not translated? The usual MediaWiki fallback rules can be reused, but there's a twist, because in Wikidata the usual fallback language may be unavailable. So in this case it will probably be:
my language -> my fallback language -> English -> the language in which it is written
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
On Mon, Aug 13, 2012 at 4:37 PM, Amir E. Aharoni amir.aharoni@mail.huji.ac.il wrote:
Hallo,
Preamble 1: This email probably falls under this FAQ question: Q: How will Wikidata change the way articles are edited? A: That’s part of what we have to figure out during the development, together with the community.
Preamble 2: It's possible that there's an answer to this issue already, but I couldn't find it.
A popular example of using Wikidata is that it makes maintaining articles about cities easier: When a mayor of a city changes, it must only be updated once.
The problem is that the mayor's name can be written differently in other languages. I didn't actually try running it myself, but as far as I understand, Wikidata supports translating names. But what happens when the mayor changes? It is likely that the name will be updated in the language spoken in that city. At that point articles in Wikipedia in other languages will probably show the name in the language of the city, which may be unreadable.
Let's take Haifa for an example. Its previous mayor was: he: עמרם מצנע en: Amram Mitzna ru: Амрам Мицна hr: Amram Micna etc.
Now it changes to: he: יונה יהב
That's a very good point. I would expect WikiData to link the field not as a plain property, but as an entity with ( he: עמרם מצנע, en: Amram Mitzna, ru: Амрам Мицна, hr: Amram Micna)
And then suddenly all the articles about Haifa in all the languages will show the mayor's name as "יונה יהב", which most people won't be able to read. Maybe the Wikidata community will develop some kind of a policy that will discourage adding names in local scripts without any translation to a more common script. Maybe at some point software should even show a warning if somebody tries to do it.
That's unfortunate but if the position changed, it seems better to show the Hebrew name than the outdated guy. It may be appropiate to do some filtering .so that one language is marked as the source, and has to be present for that item, though. It would be very suspicious that we only knew the name in Russian, when dealing with Haifa.
In the best case, we would already have the entity for the running candidate, (supppose they were instead presidential elections) and the translation would be available on switch time.
The scenario can be even simpler: Somebody will vandalize Wikidata and change the mayor's name to some nonsense.
WikiData would amplify the effect, but it would be no different than any other vandalism.
(...)
Another question is: What is the fallback mechanism if a name was not translated? The usual MediaWiki fallback rules can be reused, but there's a twist, because in Wikidata the usual fallback language may be unavailable. So in this case it will probably be:
my language -> my fallback language -> English -> the language in which it is written
As a sidenote, it would be interesting to make MediaWiki fallback rules able to work with a non-English base language.
On 13/08/12 16:37, Amir E. Aharoni wrote:
The problem is that the mayor's name can be written differently in other languages. I didn't actually try running it myself, but as far as I understand, Wikidata supports translating names. But what happens when the mayor changes? It is likely that the name will be updated in the language spoken in that city. At that point articles in Wikipedia
This is correct, albeit in general mayor will not be a name, but a link to an item, and the item will contain his name (in various languages).
in other languages will probably show the name in the language of the city, which may be unreadable.
Let's take Haifa for an example. Its previous mayor was: he: עמרם מצנע en: Amram Mitzna ru: Амрам Мицна hr: Amram Micna etc.
Now it changes to: he: יונה יהב
And then suddenly all the articles about Haifa in all the languages will show the mayor's name as "יונה יהב", which most people won't be able to read. Maybe the Wikidata community will develop some kind of a policy that will discourage adding names in local scripts without any translation to a more common script. Maybe at some point software should even show a warning if somebody tries to do it.
I believe it should be possible to alleviate this problem to an extent by introducing automatic transcription between languages and specifying what language the mayor's "default" name is in. If automatic transcription gets it wrong, it could still be overriden when someone enters the name in another language.
Relevant bugs are https://bugzilla.wikimedia.org/show_bug.cgi?id=37461 and https://bugzilla.wikimedia.org/show_bug.cgi?id=36430
2012/8/14 Nikola Smolenski smolensk@eunet.rs:
I believe it should be possible to alleviate this problem to an extent by introducing automatic transcription between languages and specifying what language the mayor's "default" name is in. If automatic transcription gets it wrong, it could still be overriden when someone enters the name in another language.
It is guaranteed to be profoundly broken. The above-mentioned Hebrew names will be transliterated as <'mrm mcn'> (the apostrophes are part of the transliteration!) and <ywnh yhb>. The same problem applies to Arabic, Punjabi and many other languages. Without manual maintenance it will perpetuate horrendously wrong transliteration.
Some very limited auto-transliteration is OK, but just as a suggestion. I was actually going to write an email about that. But it must not be automatic all the way and propagate to all wikis.
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
Hebrew is rather problematic when it comes to transliteration, so maybe there should be a section where the community can monitor automatic transliterations ("latest transliterations") or transliterate themselves (something like translatewiki.net but on a smaller scale).
*Best Regards / Mit freundlichen Grüßen,* *Orel Beilinson*
On 14 August 2012 09:57, Amir E. Aharoni amir.aharoni@mail.huji.ac.ilwrote:
2012/8/14 Nikola Smolenski smolensk@eunet.rs:
I believe it should be possible to alleviate this problem to an extent by introducing automatic transcription between languages and specifying what language the mayor's "default" name is in. If automatic transcription
gets
it wrong, it could still be overriden when someone enters the name in another language.
It is guaranteed to be profoundly broken. The above-mentioned Hebrew names will be transliterated as <'mrm mcn'> (the apostrophes are part of the transliteration!) and <ywnh yhb>. The same problem applies to Arabic, Punjabi and many other languages. Without manual maintenance it will perpetuate horrendously wrong transliteration.
Some very limited auto-transliteration is OK, but just as a suggestion. I was actually going to write an email about that. But it must not be automatic all the way and propagate to all wikis.
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On 14/08/12 08:57, Amir E. Aharoni wrote:
2012/8/14 Nikola Smolenskismolensk@eunet.rs:
I believe it should be possible to alleviate this problem to an extent by introducing automatic transcription between languages and specifying what language the mayor's "default" name is in. If automatic transcription gets it wrong, it could still be overriden when someone enters the name in another language.
It is guaranteed to be profoundly broken. The above-mentioned Hebrew names will be transliterated as<'mrm mcn'> (the apostrophes are part
Would it? How many Hebrew names are there that are spelled "עמרם"? If the transliteration software knows it's a human name it can transliterate it as "Amram".
of the transliteration!) and<ywnh yhb>. The same problem applies to Arabic, Punjabi and many other languages. Without manual maintenance it will perpetuate horrendously wrong transliteration.
Some very limited auto-transliteration is OK, but just as a suggestion. I was actually going to write an email about that. But it must not be automatic all the way and propagate to all wikis.
On the other hand, we don't even need transliteration for a huge amount of languages which keep the spelling from the original language, and then there is a somewhat smaller number of languages where transliteration is unambiguous. In these cases we may use transliteration freely.
2012/8/14 Nikola Smolenski smolensk@eunet.rs:
On 14/08/12 08:57, Amir E. Aharoni wrote:
2012/8/14 Nikola Smolenskismolensk@eunet.rs:
I believe it should be possible to alleviate this problem to an extent by introducing automatic transcription between languages and specifying what language the mayor's "default" name is in. If automatic transcription gets it wrong, it could still be overriden when someone enters the name in another language.
It is guaranteed to be profoundly broken. The above-mentioned Hebrew names will be transliterated as<'mrm mcn'> (the apostrophes are part
Would it? How many Hebrew names are there that are spelled "עמרם"? If the transliteration software knows it's a human name it can transliterate it as "Amram".
What you say is kinda true, but in practice it's much more complicated. I worked for a few years in a company that makes software that does this and I was the lead developer. There are two software packages that do it for Hebrew, they are proprietary and very expensive. It's not that making a Free package is impossible, but you need a team for every language that has such problems, you need several full time people to maintain the words, and what's worst is that most words have six or so possible pronunciations. Sure, crowdsourcing in Wikidata may change that, but it's too early to talk about this.
AFAIK the situation is even worse in Arabic, which is a much bigger language than Hebrew.
What I'm getting at is, again, that some limited helping transliteration may be OK, but it must not be automatically propagated. Naïve people may think that that's how the name is actually written, and in such matters most people are very naïve.
-- Amir
On 14/08/12 09:28, Amir E. Aharoni wrote:
2012/8/14 Nikola Smolenskismolensk@eunet.rs:
On 14/08/12 08:57, Amir E. Aharoni wrote:
2012/8/14 Nikola Smolenskismolensk@eunet.rs:
I believe it should be possible to alleviate this problem to an extent by introducing automatic transcription between languages and specifying what language the mayor's "default" name is in. If automatic transcription gets it wrong, it could still be overriden when someone enters the name in another language.
It is guaranteed to be profoundly broken. The above-mentioned Hebrew names will be transliterated as<'mrm mcn'> (the apostrophes are part
Would it? How many Hebrew names are there that are spelled "עמרם"? If the transliteration software knows it's a human name it can transliterate it as "Amram".
What you say is kinda true, but in practice it's much more complicated. I worked for a few years in a company that makes software that does this and I was the lead developer. There are two software packages that do it for Hebrew, they are proprietary and very expensive. It's not that making a Free package is impossible, but you need a team for every language that has such problems, you need several full time people to maintain the words, and what's worst is that most words have six or so possible pronunciations. Sure, crowdsourcing in Wikidata may change that, but it's too early to talk about this.
AFAIK the situation is even worse in Arabic, which is a much bigger language than Hebrew.
What I'm getting at is, again, that some limited helping transliteration may be OK, but it must not be automatically propagated. Naïve people may think that that's how the name is
For Hebrew, Arabic and a few similar cases. In a large number of language combinations we will not have such problems.
In general I am a strong believer of "let's start with the simple thing", which is to let editors add transliterations (that is why we have a label field for every entity in every language).
I may see a use case for a transliteration-bot that does some of the transliterations (semi?)automatically, but I actually would think that this is probably something that should be left to the community.
There might be some simple cases for language fallbacks (including transliterations) but we have not touched that development item yet. We have to see how this works out.
But in short, I am wary of automatic systems and rather would count on the knowledge of the editors.
I hope that makes sense, Cheers, Denny
2012/8/14 Amir E. Aharoni amir.aharoni@mail.huji.ac.il:
2012/8/14 Nikola Smolenski smolensk@eunet.rs:
On 14/08/12 08:57, Amir E. Aharoni wrote:
2012/8/14 Nikola Smolenskismolensk@eunet.rs:
I believe it should be possible to alleviate this problem to an extent by introducing automatic transcription between languages and specifying what language the mayor's "default" name is in. If automatic transcription gets it wrong, it could still be overriden when someone enters the name in another language.
It is guaranteed to be profoundly broken. The above-mentioned Hebrew names will be transliterated as<'mrm mcn'> (the apostrophes are part
Would it? How many Hebrew names are there that are spelled "עמרם"? If the transliteration software knows it's a human name it can transliterate it as "Amram".
What you say is kinda true, but in practice it's much more complicated. I worked for a few years in a company that makes software that does this and I was the lead developer. There are two software packages that do it for Hebrew, they are proprietary and very expensive. It's not that making a Free package is impossible, but you need a team for every language that has such problems, you need several full time people to maintain the words, and what's worst is that most words have six or so possible pronunciations. Sure, crowdsourcing in Wikidata may change that, but it's too early to talk about this.
AFAIK the situation is even worse in Arabic, which is a much bigger language than Hebrew.
What I'm getting at is, again, that some limited helping transliteration may be OK, but it must not be automatically propagated. Naïve people may think that that's how the name is actually written, and in such matters most people are very naïve.
-- Amir
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
2012/8/14 Denny Vrandečić denny.vrandecic@wikimedia.de:
In general I am a strong believer of "let's start with the simple thing", which is to let editors add transliterations (that is why we have a label field for every entity in every language).
I may see a use case for a transliteration-bot that does some of the transliterations (semi?)automatically, but I actually would think that this is probably something that should be left to the community.
There might be some simple cases for language fallbacks (including transliterations) but we have not touched that development item yet. We have to see how this works out.
But in short, I am wary of automatic systems and rather would count on the knowledge of the editors.
I hope that makes sense,
This makes perfect sense and I agree.
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
I think the topic is relevant for the Wikidata editing UI.
At the hackathon in Berlin we had discussions about a chain of fallback languages. Have reworked and added some potential user-interface behaviour to
http://meta.wikimedia.org/wiki/Wikidata/Notes/Language_fallback
Gregor