Gregory Maxwell wrote:
It would be interesting to re-run the analysis including only linkages which among the largest few Wikipedia and resolve those first: Those really should be much closer to the ideal "x==y" behavior.
Hi, I've rerun the analysis as you proposed. I've taken all the articles from the 10 largest editions (de, en, es, fr, it, ja, nl, pl, pt, ru) and the interlanguage links between them. I was looking for incoherent components, as defined in my previous posts.
In this setting, there are 44245 incoherent components, containing, in total, 436529 articles from the 10 editions. Which shows that this is not only a problem of linking to/from small wikis. Also, trying the engine at: http://wikitools.icm.edu.pl/ you'll see that the differences between the ontologies of the large and the small wikis are not the only issue (as Eugene suggested).
Let me rephrase my concerns: on one hand, the policies state that interlanguage links represent equivalence: Meta says they connect "corresponding" articles, the English edition says they connect articles "on the same subject". There are third-party projects which assume this (I've given two examples before), not to mention an army of bots.
On the other hand, editors don't respect that strict interpretation since they want to show (valuable) relations between non-equivalent articles. Without seeing the "big picture", any such inexact link seems OK: what could possibly go wrong? And the global view does not seem to be commonly known...
I don't have a ready solution, although, as I've written before, we could take a closer look at the way OmegaWiki is dealing with the issue (in my perception, the project's existence is motivated solely by the existence of the issue in question), and the potential offered by the SemanticMediaWiki extension.
My main goal is to convince the community (or be convinced otherwise) that this is a serious, growing problem, which requires attention, and stimulate a discussion which might lead to a reasonable solution.
Regards, Łukasz
PS. An example: the following English articles are mutually accessible using only the interlanguage links between the top 10 editions (assuming that a link A -> B makes A accessible from B, which doesn't necessarily match users' experience, but bots and harvesters "see" it): Administration Administration (business) Administrator Aktiebolag Aktiengesellschaft Aktieselskab Apostolic Administrator Besloten Vennootschap Brother (disambiguation) Brotherhood Brotherhood (album) Business Businessperson Compagnons du Tour de France Companies law Company Company (disambiguation) Contract Corporate law Corporation Corporation (university) Entrepreneur Entrepreneurship Fraternities and sororities Fraternity Fraternity (disambiguation) General partnership German Student Corps Gesellschaft mit beschränkter Haftung Government-owned corporation Guild Hermano Hermano (band) Incorporation (business) Joint stock company Journeyman Junior Chamber International Kabushiki kaisha Legal name (business) Limited company Limited liability company List of general fraternities Management Management science Maszoperia Naamloze Vennootschap Public company Public limited company S.A. (corporation) Sibling Sister (disambiguation) Société à responsabilité limitée Société par actions simplifiée Society Society (disambiguation) Sole proprietorship Studentenverbindung Student society Trade name Types of business entity Yugen kaisha