-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
First of all, thanks a lot for your replies.
Let me clarify a couple of assumptions that I've made:
i) there should be at most one article on any given topic in a language
edition, which is not true in sh:, az:, ku: and possibly others.
ii) the sum of interwiki links should form an equivalence relation
iii) an interwiki link to a redirect is treated as an interwiki link to
the target of the redirect
My rationale: from a user's perspective, the links form a dictionary, so
a user might expect that a visit through an interlanguage link will not
carry them to a page on a (slightly) different subject. Also, a number
of scientific projects seem to be based on this assumption, check for
instance the articles "A Bilingual Dictionary Extracted from the
Wikipedia Link Structure" (Springer) or "Analyzing Interlanguage Links
of Wikipedias" (Wikimania 2008).
Some of you may say that the interpretation of interwiki links as a
dictionary is incorrect, but the problem exists nevertheless, as I will
try to show below.
Let me list a couple of popular patterns, to show that some of the
problems with interlanguage links are independent of the questioned
assumptions (the list is not exhaustive).
1) absurd links, incorrect by any conceivable definition, resulting
either from vandalism, like ro:Nicolae Steinhardt -> de:Penis, or
complete ignorance regarding the target language: fr:Rick Ankiel -> ja:
日本語
2) systematic errors, like this off-by-ten:
http://wikitools.icm.edu.pl/show/en:36898/en:12681205/, or editor's
laziness during copy&paste edits, like wuu:5月26号 -> bn:মে ১, wuu:5月27
号 -> bn:মে ১. Generally, automatically generated articles on days of
year and years of current era tend to introduce conflicts, but
fortunately these are easy to detect and fix.
3a) links to disambiguation pages la:Benedictus (nomen) -> en:Benedict
3b) links to incorrect meanings of a homonym: it:Rubinetto -> es:Grifo
4) combination of redirects and interwiki en:Mother-in-law -> ru:Тёща +
redirect ru:Тёща -> ru:Родство + interwiki ru:Родство -> en:Kinship
5) a series of links widening and narrowing a meaning (excluding
disambigs): pl:Województwo krakowskie (I Rzeczpospolita) -> en:Kraków
Voivodeship (14th century-1795) -> pt:Voivodia da Cracóvia ->
pl:Województwo krakowskie. The first two cover the period 14th
century-1795, the third: 14th century-1998, the fourth: 1945-1998.
6) Problems stemming from the cultural and linguistic limits of the
translation process, for example regarding food:
http://wikitools.icm.edu.pl/show/en:57572/ or meals:
http://wikitools.icm.edu.pl/show/en:71691/
Note that types 1, 2 and 3b are clearly incorrect, while the other ones
are disputable. From my experience, all the types occur quite often. I
encourage you to explore the incoherences yourself: take a random path
and find the source(s) of semantic drift.
As you can see, most of the examples contain links between the major
language editions. I would hypothesize that only the second category is
"generated" by the small language editions, the rest is dominated by
larger editions, simply because there are more opportunities to make an
incoherent edit. I'll run the statistics for the top 10 editions to
test Gregory's hypothesis.
Finally, let me write a few words about the possible large-scale
solutions to the problem. I'm afraid that "centralization" of
interlanguage links, ie. a separate service where all the interlanguage
links would be stored and manipulated, would inadvertently impose an
Anglocentric ontology, which is (presumably) not desired. The
"decentralized interwikis + lots of bots" model, with all its flaws (for
example: it's not feasible to find the incoming interlanguage links),
will probably reflect the ontologies of smaller editions better.
Introducing two flavors of interwikis, "exact" and "approximate",
would
help both retain both the valuable interlanguage links that are
incorrect under my narrow definition of correctness, and express the
equivalence where it occurs. Cf. the concepts of defined meaning and
relations in OmegaWiki:
http://www.omegawiki.org/DefinedMeaning
As a side note, adding semantics to interlanguage links very nicely fits
the model used in the SemanticMediawiki extension, which is
unfortunately not integrated into Wikipedia (yet).
Regards,
Lukasz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla -
http://enigmail.mozdev.org
iEYEARECAAYFAkk9GNkACgkQqPt6S1UzhapDfwCdGhDAUsy0N2Bgw/2ioCmxY2dP
XmgAn18EiV3XrIgRd3bg3Q9jIEpA4co8
=4yjt
-----END PGP SIGNATURE-----