-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
First of all, thanks a lot for your replies.
Let me clarify a couple of assumptions that I've made: i) there should be at most one article on any given topic in a language edition, which is not true in sh:, az:, ku: and possibly others. ii) the sum of interwiki links should form an equivalence relation iii) an interwiki link to a redirect is treated as an interwiki link to the target of the redirect
My rationale: from a user's perspective, the links form a dictionary, so a user might expect that a visit through an interlanguage link will not carry them to a page on a (slightly) different subject. Also, a number of scientific projects seem to be based on this assumption, check for instance the articles "A Bilingual Dictionary Extracted from the Wikipedia Link Structure" (Springer) or "Analyzing Interlanguage Links of Wikipedias" (Wikimania 2008).
Some of you may say that the interpretation of interwiki links as a dictionary is incorrect, but the problem exists nevertheless, as I will try to show below.
Let me list a couple of popular patterns, to show that some of the problems with interlanguage links are independent of the questioned assumptions (the list is not exhaustive).
1) absurd links, incorrect by any conceivable definition, resulting either from vandalism, like ro:Nicolae Steinhardt -> de:Penis, or complete ignorance regarding the target language: fr:Rick Ankiel -> ja: 日本語
2) systematic errors, like this off-by-ten: http://wikitools.icm.edu.pl/show/en:36898/en:12681205/, or editor's laziness during copy&paste edits, like wuu:5月26号 -> bn:মে ১, wuu:5月27 号 -> bn:মে ১. Generally, automatically generated articles on days of year and years of current era tend to introduce conflicts, but fortunately these are easy to detect and fix.
3a) links to disambiguation pages la:Benedictus (nomen) -> en:Benedict 3b) links to incorrect meanings of a homonym: it:Rubinetto -> es:Grifo
4) combination of redirects and interwiki en:Mother-in-law -> ru:Тёща + redirect ru:Тёща -> ru:Родство + interwiki ru:Родство -> en:Kinship
5) a series of links widening and narrowing a meaning (excluding disambigs): pl:Województwo krakowskie (I Rzeczpospolita) -> en:Kraków Voivodeship (14th century-1795) -> pt:Voivodia da Cracóvia -> pl:Województwo krakowskie. The first two cover the period 14th century-1795, the third: 14th century-1998, the fourth: 1945-1998.
6) Problems stemming from the cultural and linguistic limits of the translation process, for example regarding food: http://wikitools.icm.edu.pl/show/en:57572/ or meals: http://wikitools.icm.edu.pl/show/en:71691/
Note that types 1, 2 and 3b are clearly incorrect, while the other ones are disputable. From my experience, all the types occur quite often. I encourage you to explore the incoherences yourself: take a random path and find the source(s) of semantic drift.
As you can see, most of the examples contain links between the major language editions. I would hypothesize that only the second category is "generated" by the small language editions, the rest is dominated by larger editions, simply because there are more opportunities to make an incoherent edit. I'll run the statistics for the top 10 editions to test Gregory's hypothesis.
Finally, let me write a few words about the possible large-scale solutions to the problem. I'm afraid that "centralization" of interlanguage links, ie. a separate service where all the interlanguage links would be stored and manipulated, would inadvertently impose an Anglocentric ontology, which is (presumably) not desired. The "decentralized interwikis + lots of bots" model, with all its flaws (for example: it's not feasible to find the incoming interlanguage links), will probably reflect the ontologies of smaller editions better.
Introducing two flavors of interwikis, "exact" and "approximate", would help both retain both the valuable interlanguage links that are incorrect under my narrow definition of correctness, and express the equivalence where it occurs. Cf. the concepts of defined meaning and relations in OmegaWiki: http://www.omegawiki.org/DefinedMeaning
As a side note, adding semantics to interlanguage links very nicely fits the model used in the SemanticMediawiki extension, which is unfortunately not integrated into Wikipedia (yet).
Regards, Lukasz