On Sat, Dec 6, 2008 at 9:22 AM, Eugene van der Pijll eugene@vanderpijll.nl wrote:
Lukasz Bolikowski schreef:
A short introduction: let's say that two articles are connected if there is an interlanguage link from one to the other in at least one direction. Next, let's say that if A-B and B-C are connected, then A-C are too.
Hold it right there. That assumes that every wikipedia divides its content into pages in the same way. Since content policy is made at the level of individual projects, this assumption is incorrect.
With over 3'000'000 links to check, it has to be corrected semi-automatically.
It doesn't have to be corrected at all, since you've not actually demonstrated that it is incorrect now.
"en:December" to "en:City", or "en:Alpine Ibex" to "en:Western culture"
Correct? really?
I think he's demonstrated that there exist problems, but not yet demonstrated that there exist many. So long as we support interlinking wikipedias of dramatically different size there will be some drift because the larger projects will make finer subdivisions.
Lukasz analysis depends on linking being communicative, but this can only be true when there is only one kind of link (x is the same subject no more, no less as y). If we limited ourselves to that it would preclude the "x is covered by broader article y" link which is absolutely necessary if we want to produce useful interwiki links from bigger projects to smaller ones.
(And I'd argue that the interwiki links from big projects such as en,es,fr,de to smaller projects like hi are important to getting native speakers of these smaller projects to discover the existence of those projects on the predominantly English speaking internet)
It would be interesting to re-run the analysis including only linkages which among the largest few Wikipedia and resolve those first: Those really should be much closer to the ideal "x==y" behavior.