On Tue, Jan 6, 2009 at 10:52 AM, Lars Aronsson lars@aronsson.se wrote:
So, my question:
Has anybody mapped exactly how many such interwiki conflicts we have? Or how many interwiki sets do we have without conflicts? Could/should someone make a list of current conflicts and try to rank them by importance, so we can get started in fixing them?
As you already noted, pywikipediabot when run autonomously will add a remark on each such conflict, so that would be an easy way to harvest a large number of them. There are many of them - although there are many people working on interwiki, they usually either just add them, or run autonomous bots, correcting incorrect links takes place much less.
Resolving them is in some cases easy, but in many cases not. Different Wikipedias not rarely have different ways of 'subdividing' the 'universe' of possible meanings. This means that the dual assumptions that 'interwiki is an equivalence relation' and 'any page can interwiki to only one page in a single language' that the framework is based on, are often not met, or only in artificial ways.
Examples of problems are: * Closely connected subjects (for example, a biological order and the only family in it, a municipality and its main town by the same name, a fruit tree and its fruit, a computer game and the series of which it is the first game, two scientific terms which are each other's opposite) have two pages on some Wikipedias, one page on other, and that one page is sometimes more one subject, sometimes more the other, and sometimes really about both * Words that mean a general term in one language being used for a more specific one in another language, for example [[en:Autobahn]] being about highways in Germany, [[de:Autobahn]] about highways in general, or the name of a Japanese traditional dagger being used to mean that specific type of dagger in western language, but more generally 'dagger' in Japanese, or countries using their own mythical small creature as the best translation of 'dwarf', but being about dwarves in a specific mythology elsewhere * Slight shifts of meaning from one language to the other causing a sequence of 'closest connections' leading to another word in the same language