[WikiEN-l] Interwiki links analysis tool

Lukasz Bolikowski bolo at icm.edu.pl
Mon Mar 17 22:46:00 UTC 2008


Hi,

I've written a visual tool for analyzing the graph
of interlanguage links between all 256 editions of
Wikipedia.

Its main advantages, compared to bots, are:
* it analyzes the whole inconsistent component
at once, while bots tend to work "locally" (in
some neighborhood of an article);
* cool (IMHO) graph visualization;
* concrete recommendations: remove a link, split
an article, merge articles, remove redirects.

To stress the advantage of "global" vs. "local"
analysis of a component: the largest connected
component in the graph contains over 48'000
articles, mixing over 2'500 different subjects.
Some of the sources of semantic drift in such
components are not visible "locally".

Main disadvantages:
* works on preprocessed dumps, instead of "live"
Wikipedia, so the recommendations may be outdated;
* (for the moment) does not recognize some of the
redirects, due to poor quality of redirect dumps.
Apparently I'm not the only one affected by the
problem, and the guys at wikitech-l are aware
of the issue;
* Requires Java 6, eats a lot of resources (512M
seems to be enough even for the largest case);
* Doesn't change anything (points to the possible
sources of problems instead).

The tool is far from being complete, "prototype"
would be a more appropriate name here (its
original purpose was to help me evaluate some
ideas for my PhD).  Please try it and send me
your feedback, I'd like to make it more useful
for the community.

You can find the tool here:
   http://wikitools.icm.edu.pl/

Regards,
Bolo1729




More information about the WikiEN-l mailing list