[WikiEN-l] Serious problems with interlanguage links

Lukasz Bolikowski L.Bolikowski at icm.edu.pl
Fri Dec 5 15:34:00 UTC 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

I've done some research on the network of interlanguage links as a
whole, you can see the results here:
  http://wikitools.icm.edu.pl/

I wrote to this list earlier this year about incoherences in the
interlanguage links, but two things have changed since then: the problem
has got more serious, and I've developed a more usable tool to correct it.

A short introduction: let's say that two articles are connected if there
is an interlanguage link from one to the other in at least one
direction.  Next, let's say that if A-B and B-C are connected, then A-C
are too.  Next, for each group of connected articles, let's check if it
is coherent, ie. if there is at most one article from each language.

It turns out that about 5% of articles belong to incoherent groups.  The
largest such group is growing quite fast: it had 48'000 articles in
March 2008, now it has over 76'000!  With over 3'000'000 links to check,
it has to be corrected semi-automatically.  There are tens of thousands
of other incoherent groups to fix, too.

Right now, you can find some really absurd connections using the
interlanguage links alone, like "en:December" to "en:City", or
"en:Alpine Ibex" to "en:Western culture".  The site I've created let's
you see a path connecting given two articles, and suggests a course of
action.  The suggestions are a result of a heuristic and should be taken
with a grain of salt, but maybe you'll find them useful.

Regards,
Lukasz Bolikowski

PS. Last time my replies were coming several days after I'd post them.
If I don't respond it's probably because my response is still moderated.
Anyway, I guess http://meta.wikimedia.org/wiki/Interwiki_synchronization
is the best place to discuss this matter.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkk5SeQACgkQqPt6S1UzhapbQgCeL4zKLmBH9Mp2uA1EFcniXcS/
i0wAn3J/cOERYhZgsaiwTQUXRm/y9+EB
=FIoB
-----END PGP SIGNATURE-----



More information about the WikiEN-l mailing list