Language links and double language links on the Wikipedias - Wikimedia-l

Bence Damokos

25 Jun 25 Jun

10:52 a.m.

Hi Denny, This is a really interesting list. Looking at the Hungarian list, I find that in many instances the duplicate interwiki link is actually commented out (in the form of " or ), and not real duplicate links. (In some cases there are indeed duplicate links, where one concept covers two concepts in other languages.) Maybe you could refine your search algorithm to exclude commented out links, and improve your listing page by including not only the second interwiki link found for a given language, but also the first one, so it is easier to assess without having to check the article pages or source codes? In any case, the village pumps might be a good place to post a link to the lists. The "Global message delivery" system might help you in that: http://meta.wikimedia.org/wiki/Global_message_delivery Best regards, Bence On Mon, Jun 25, 2012 at 12:29 PM, Denny Vrandečić < denny.vrandecic(a)wikimedia.de> wrote:

...

Hi all, I ran some analysis last week, to get some numbers out of the Wikipedia language links. One type of reports that were generated was the list of all articles in the main namespaces of the Wikipedias that link to more than one article in another language edition of Wikipedia (so called double language links). There are not that many of them (about 19,000 in total), split by language, all available here: <http://simia.net/languagelinks/> Double language links are not errors per se, but they contain a few nuisances * they lead to two links in the language links list that just look the same (you have to hover over them to see that they link to different languages), which is not really optimal from the user experience side * they are not saved in the langlinks table and thus are ignored in certain reports and also in the respective export I am not sure how to reach out to the respective Wikipedia communities, or if I should at all. Should I post to their respective version of the village pump? Remembering from the time I was active on the Croatian Wikipedia, I would have appreciated that list to check the entries. I reckoned the wikipedia-l list would be the right place, but that list looks rather dead. Cheers, Denny -- Project director Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. _______________________________________________ Wikimedia-l mailing list Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l

Reply

Amir E. Aharoni

25 Jun 25 Jun

11:21 a.m.

Hi Denny, TL;DR: It's a very important question, but don't worry about it too much. Just do Wikidata well as it is currently planned. Now, the full reply. I wrote a bit of an essay about it in 2008: https://meta.wikimedia.org/wiki/Tips_for_resolving_interwiki_conflicts I also started a page to coordinate the efforts to resolve such conflicts: https://meta.wikimedia.org/wiki/Interwiki_synchronization It started out nicely, but didn't really scale, so I had no choice but to neglect it. There are two main reasons that it didn't scale: 1. Fixing interlanguage links conflicts is an exhausting manual process. The Interlanguage extension or Wikidata are supposed to make it centralized and easier. 2. Almost all Wikipedians are very, very reluctant about doing anything outside their home projects. So, Wikidata is supposed to resolve #1. Once it becomes active, #2 will kick in again. At this stage, all I can say is our old motto: "Be Bold". There's a rumor about me, which says that I know a lot of languages. I don't; I'm just bold about trying to edit Wikipedias in languages that I don't know. Everybody can do it. Most of the time it turns out to be correct and people don't complain. Trying to talk to people about this on village pumps and using global message delivery is not very efficient. In many languages, even in some major ones, the village pumps are not as active as in English, and even when they are, people very often ignore messages in English. Anyway, my proposal is this: * As discussed at bug 15607 [1], the best strategy for rolling out centralized language links is to enable them in articles without conflicts and to leave articles with conflicts without any change at first. * After initial roll-out, a list of conflicts for every project should be created. That is, there should be one list of articles with conflicts in the English Wikipedia, another list for the Hebrew Wikipedia, another one for Croatian, etc. This will make it relatively more accessible for people, because it will look like a problem in their project. Most people like solving local problems more than global problems.[2] * Profit. I believe that this crowdsourcing model may work. It won't be immediately perfect or very fast. It's just a sensible start. [1] https://bugzilla.wikimedia.org/show_bug.cgi?id=15607 [2] A technical implementation comment about the "list of pages with conflicts": it will be most efficient, if it will be implemented as a special page in each project. If updating it immediately is too burdensome in terms of performance, it can be updated in batches every week or so. The reason it should be a special page is that it will look like an integrated site feature and that it will be easy to localize its interface. 2012/6/25 Denny Vrandečić <denny.vrandecic(a)wikimedia.de>de>:

...

Hi all, I ran some analysis last week, to get some numbers out of the Wikipedia language links. One type of reports that were generated was the list of all articles in the main namespaces of the Wikipedias that link to more than one article in another language edition of Wikipedia (so called double language links). There are not that many of them (about 19,000 in total), split by language, all available here: <http://simia.net/languagelinks/> Double language links are not errors per se, but they contain a few nuisances * they lead to two links in the language links list that just look the same (you have to hover over them to see that they link to different languages), which is not really optimal from the user experience side * they are not saved in the langlinks table and thus are ignored in certain reports and also in the respective export I am not sure how to reach out to the respective Wikipedia communities, or if I should at all. Should I post to their respective version of the village pump? Remembering from the time I was active on the Croatian Wikipedia, I would have appreciated that list to check the entries. I reckoned the wikipedia-l list would be the right place, but that list looks rather dead. Cheers, Denny -- Project director Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. _______________________________________________ Wikimedia-l mailing list Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l

Reply

Delirium

25 Jun 25 Jun

5:15 p.m.

New subject: Language links and double language links on the Wikipedias

Thanks for this list. For the languages I know, I've started going through and fixing ones that are clearly wrong. If a number of people do that, that should improve the general quality/consistency of interwiki links. I second the other comment that it'd be nice if the parsing could be re-run to exclude commented-out links, but the list is still useful as is. There are some difficult cases, though, when languages make different choices on how to group subjects, so the articles aren't actually in 1-to-1 correspondence. For example, the English article [[en: Móði and Magni]] unsurprisingly has two outgoing interwiki links, when linking to languages that split them, such as [[da:Magni]] and [[da:Modi]]. It's not clear what to do about these cases. Best, Mark On 6/25/12 12:29 PM, Denny Vrandečić wrote:

...

Hi all, I ran some analysis last week, to get some numbers out of the Wikipedia language links. One type of reports that were generated was the list of all articles in the main namespaces of the Wikipedias that link to more than one article in another language edition of Wikipedia (so called double language links). There are not that many of them (about 19,000 in total), split by language, all available here: <http://simia.net/languagelinks/> Double language links are not errors per se, but they contain a few nuisances * they lead to two links in the language links list that just look the same (you have to hover over them to see that they link to different languages), which is not really optimal from the user experience side * they are not saved in the langlinks table and thus are ignored in certain reports and also in the respective export I am not sure how to reach out to the respective Wikipedia communities, or if I should at all. Should I post to their respective version of the village pump? Remembering from the time I was active on the Croatian Wikipedia, I would have appreciated that list to check the entries. I reckoned the wikipedia-l list would be the right place, but that list looks rather dead. Cheers, Denny

Reply

Denny Vrandečić

26 Jun 26 Jun

12:45 p.m.

Ziko, it does not jeopardize the Wikidata goal -- the current language link system won't be switched off, but can be further used. Everything that is working currently will still be possible afterwards. Wikidata can still be used to represent the 99.2% of language links that are simple -- this would still be a huge improvement over the current state. As soon as these are out of the way, we can think about if and how to extend the system in order to deal with the rest. Cheers, Denny 2012/6/25 Ziko van Dijk <vandijk(a)wmnederland.nl>nl>:

...

Hello, So may I guess that "double links" are usually the result of a Wikipedian who was not sure which language link to set, so in doubt, he simply put in the language links for two different articles? And in general, is it imagineable that different languages divide the knowledge in different ways, which could jeopardize the whole goal of Wikidata unifiying the language links? Kind regards Ziko 2012/6/25 Delirium <delirium(a)hackish.org>rg>:

Thanks for this list. For the languages I know, I've started going through and fixing ones that are clearly wrong. If a number of people do that, that should improve the general quality/consistency of interwiki links. I second the other comment that it'd be nice if the parsing could be re-run to exclude commented-out links, but the list is still useful as is. There are some difficult cases, though, when languages make different choices on how to group subjects, so the articles aren't actually in 1-to-1 correspondence. For example, the English article [[en: Móði and Magni]] unsurprisingly has two outgoing interwiki links, when linking to languages that split them, such as [[da:Magni]] and [[da:Modi]]. It's not clear what to do about these cases. Best, Mark On 6/25/12 12:29 PM, Denny Vrandečić wrote:

Hi all, I ran some analysis last week, to get some numbers out of the Wikipedia language links. One type of reports that were generated was the list of all articles in the main namespaces of the Wikipedias that link to more than one article in another language edition of Wikipedia (so called double language links). There are not that many of them (about 19,000 in total), split by language, all available here: <http://simia.net/languagelinks/> Double language links are not errors per se, but they contain a few nuisances * they lead to two links in the language links list that just look the same (you have to hover over them to see that they link to different languages), which is not really optimal from the user experience side * they are not saved in the langlinks table and thus are ignored in certain reports and also in the respective export I am not sure how to reach out to the respective Wikipedia communities, or if I should at all. Should I post to their respective version of the village pump? Remembering from the time I was active on the Croatian Wikipedia, I would have appreciated that list to check the entries. I reckoned the wikipedia-l list would be the right place, but that list looks rather dead. Cheers, Denny

_______________________________________________ Wikimedia-l mailing list Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l

-- ----------------------------------------------------------- Vereniging Wikimedia Nederland dr. Ziko van Dijk, voorzitter http://wmnederland.nl/ Wikimedia Nederland Postbus 167 3500 AD Utrecht ----------------------------------------------------------- _______________________________________________ Wikimedia-l mailing list Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l

-- Project director Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

Reply

Denny Vrandečić

12:56 p.m.

I got the number from Brent Hecht, a researcher at Northwestern, who has a number of great papers published on Wikipedia-related topics. CC-ing him, so he knows I am blam.., er, referencing him :) Cheers, Denny 2012/6/26 Martijn Hoekstra <martijnhoekstra(a)gmail.com>om>:

...

This number, 99.2% was also mentioned on the Berlin Hackathon. It sounds much higher than what my (very scientifically relevant, obviously) gut feeling tells me. Could you indicate where this number is coming from? On Tue, Jun 26, 2012 at 2:45 PM, Denny Vrandečić <denny.vrandecic(a)wikimedia.de> wrote:

Ziko, it does not jeopardize the Wikidata goal -- the current language link system won't be switched off, but can be further used. Everything that is working currently will still be possible afterwards. Wikidata can still be used to represent the 99.2% of language links that are simple -- this would still be a huge improvement over the current state. As soon as these are out of the way, we can think about if and how to extend the system in order to deal with the rest. Cheers, Denny 2012/6/25 Ziko van Dijk <vandijk(a)wmnederland.nl>nl>:

Hello, So may I guess that "double links" are usually the result of a Wikipedian who was not sure which language link to set, so in doubt, he simply put in the language links for two different articles? And in general, is it imagineable that different languages divide the knowledge in different ways, which could jeopardize the whole goal of Wikidata unifiying the language links? Kind regards Ziko 2012/6/25 Delirium <delirium(a)hackish.org>rg>:

Thanks for this list. For the languages I know, I've started going through and fixing ones that are clearly wrong. If a number of people do that, that should improve the general quality/consistency of interwiki links. I second the other comment that it'd be nice if the parsing could be re-run to exclude commented-out links, but the list is still useful as is. There are some difficult cases, though, when languages make different choices on how to group subjects, so the articles aren't actually in 1-to-1 correspondence. For example, the English article [[en: Móði and Magni]] unsurprisingly has two outgoing interwiki links, when linking to languages that split them, such as [[da:Magni]] and [[da:Modi]]. It's not clear what to do about these cases. Best, Mark On 6/25/12 12:29 PM, Denny Vrandečić wrote: > > Hi all, > > I ran some analysis last week, to get some numbers out of the > Wikipedia language links. One type of reports that were generated was > the list of all articles in the main namespaces of the Wikipedias that > link to more than one article in another language edition of Wikipedia > (so called double language links). There are not that many of them > (about 19,000 in total), split by language, all available here: > > <http://simia.net/languagelinks/> > > Double language links are not errors per se, but they contain a few > nuisances > * they lead to two links in the language links list that just look the > same (you have to hover over them to see that they link to different > languages), which is not really optimal from the user experience side > * they are not saved in the langlinks table and thus are ignored in > certain reports and also in the respective export > > I am not sure how to reach out to the respective Wikipedia > communities, or if I should at all. Should I post to their respective > version of the village pump? Remembering from the time I was active on > the Croatian Wikipedia, I would have appreciated that list to check > the entries. I reckoned the wikipedia-l list would be the right place, > but that list looks rather dead. > > Cheers, > Denny > _______________________________________________ Wikimedia-l mailing list Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l

-- ----------------------------------------------------------- Vereniging Wikimedia Nederland dr. Ziko van Dijk, voorzitter http://wmnederland.nl/ Wikimedia Nederland Postbus 167 3500 AD Utrecht ----------------------------------------------------------- _______________________________________________ Wikimedia-l mailing list Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l

-- Project director Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. _______________________________________________ Wikimedia-l mailing list Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l

_______________________________________________ Wikimedia-l mailing list Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l

-- Project director Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

Reply

Deryck Chan

1:46 p.m.

One major problem with double language links I've encountered before was that they confuse interwiki bots and therefore break things. Several articles on the Cantonese Wikipedia (zh-yue.wp) pertaining to local political and cultural issues in the Cantonese-speaking world have __NOBOT__ on them simply because they have topic splits that are significantly different from other Wikipedias, and bot interwiki manipulation need to be prevented to maintain topic correspondence between languages. In short: double language links are a possible idea, but only if we can upgrade the interwiki bots first. On 25 June 2012 11:29, Denny Vrandečić <denny.vrandecic(a)wikimedia.de> wrote:

...

Hi all, I ran some analysis last week, to get some numbers out of the Wikipedia language links. One type of reports that were generated was the list of all articles in the main namespaces of the Wikipedias that link to more than one article in another language edition of Wikipedia (so called double language links). There are not that many of them (about 19,000 in total), split by language, all available here: <http://simia.net/languagelinks/> Double language links are not errors per se, but they contain a few nuisances * they lead to two links in the language links list that just look the same (you have to hover over them to see that they link to different languages), which is not really optimal from the user experience side * they are not saved in the langlinks table and thus are ignored in certain reports and also in the respective export I am not sure how to reach out to the respective Wikipedia communities, or if I should at all. Should I post to their respective version of the village pump? Remembering from the time I was active on the Croatian Wikipedia, I would have appreciated that list to check the entries. I reckoned the wikipedia-l list would be the right place, but that list looks rather dead. Cheers, Denny -- Project director Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. _______________________________________________ Wikimedia-l mailing list Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l

Reply