I just recently started to play with interwiki.py (Pywikipedia bot framework) for propagating interwiki links. My interest comes from organizing the category tree, so I'm focusing on interwiki links between categories. Interwiki bots normally run in autonomous mode, but this means they give up on complicated cases.
If I run this script under manual supervision, without the "-autonomous" option, it stops and asks me how to resolve each conflict. This happens ever so often. I have now (manually) sorted out the interwiki links between all languages of Category:Knowledge, which was intertwined with Category:Science, and Category:Austrian writers which was mixed up with Category:Austrian literature. Such mistakes easily happen, of course. Who can spot errors in all these languages?
Many languages had interwiki links from their category for Austrian writers to the Japanese category for Austrian literature. I'm not sure exactly when or where this error originated. But on June 19, 2007, the English and Spanish Wikipedia's interwiki link to Japanese changed from Austrian novelists to Austrian literature, i.e. from one error to another. Ten days later, this link was copied to the Dutch Wikipedia. The error was corrected on en.wikipedia on October 1, 2007, but remained on other languages. Yes, that's 15 months ago.
The circular interwiki link structure from en:Category:Austrian writers to es:Categoría:Escritores de Austria to ja:... and back to en:Category:Austrian literature is such a conflict that makes interwiki.py give up when it runs in autonomous mode.
Thus, corrections (as on October 1) do not propagate. Instead a report about the conflict is given in a logfile, but apparently nobody had fixed this problem in the last 15 monhts. This conflict also blocked new interwiki links from propagating.
After I cleared up the mess, 21 new interwiki links were added to the category on the Russian Wikipedia (one where I have a bot flag). That means 21 languages of Wikipedia had created categories (or announced them to the interwiki system) for Austrian writers in the last 15 months, and they all added their interwiki link to the English Wikipedia. But these additions did not propagate because of the conflict.
So, my question:
Has anybody mapped exactly how many such interwiki conflicts we have? Or how many interwiki sets do we have without conflicts? Could/should someone make a list of current conflicts and try to rank them by importance, so we can get started in fixing them?
In the longer term, we need to redesign the interwiki links into a centralized system, that can be maintained. I think the way to do this is to use Wikimedia Commons. Instead of copying all the interwiki links to every language of Wikipedia, it should be enough to add {{commons|Category:Writers from Austria}}, and the rest should happen automatically.
2009/1/6 Lars Aronsson lars@aronsson.se:
(Lars' interesting insights about Interwiki conflicts...) ...
Have you seen:
* http://meta.wikimedia.org/wiki/A_newer_look_at_the_interlanguage_link * http://meta.wikimedia.org/wiki/Interwiki_synchronization
The last one is my own creation, which has surprisingly caught on; Unfortunately i haven't had much time to maintain it lately, and i'd be very glad if someone could help with that.
On Tue, Jan 6, 2009 at 10:52 AM, Lars Aronsson lars@aronsson.se wrote:
In the longer term, we need to redesign the interwiki links into a centralized system, that can be maintained. I think the way to do this is to use Wikimedia Commons. Instead of copying all the interwiki links to every language of Wikipedia, it should be enough to add {{commons|Category:Writers from Austria}}, and the rest should happen automatically.
Commons has enough to do with keeping metafiles up to date, they'd be crashed by also having to maintain IW links. I'd propose a new wiki, to which editors have to apply to get write access so that vandalism in this critical part is prevented. For the meanwhile, theoretically one could launch a huge query on toolserver which scans for conflicts. I'm not sure if this would be possible regarding performance, though.
Marco
2009/1/6 Marco Schuster marco@harddisk.is-a-geek.org:
On Tue, Jan 6, 2009 at 10:52 AM, Lars Aronsson lars@aronsson.se wrote:
In the longer term, we need to redesign the interwiki links into a centralized system, that can be maintained. I think the way to do this is to use Wikimedia Commons. Instead of copying all the interwiki links to every language of Wikipedia, it should be enough to add {{commons|Category:Writers from Austria}}, and the rest should happen automatically.
Commons has enough to do with keeping metafiles up to date, they'd be crashed by also having to maintain IW links. I'd propose a new wiki, to which editors have to apply to get write access so that vandalism in this critical part is prevented.
The technology already exists:
* http://meta.wikimedia.org/wiki/A_newer_look_at_the_interlanguage_link
But it is not enabled yet:
* https://bugzilla.wikimedia.org/show_bug.cgi?id=15607
Marco Schuster wrote:
Commons has enough to do with keeping metafiles up to date, they'd be crashed by also having to maintain IW links. I'd propose a new wiki, to which editors have to apply to get write access so that vandalism in this critical part is prevented.
I think the system needs to work like the categories. Very few people need to edit the category page, so we don't really have to worry about who can access that central storage. If I write an article about a president, I copy a category link from another president biography. I don't have to update the category page and I don't have to update other articles in the same category.
From a biography of the same president in another language, I can
copy a list of interwiki links. But instead I should just copy a single, global interwiki pointer. As far as I understand, this is how the interlanguage extension should work.
What stops us from trying that out? Could it be introduced in small steps, or is it a big scary change?
On Wed, Jan 7, 2009 at 6:20 AM, Lars Aronsson lars@aronsson.se wrote:
From a biography of the same president in another language, I can copy a list of interwiki links. But instead I should just copy a single, global interwiki pointer. As far as I understand, this is how the interlanguage extension should work.
What stops us from trying that out? Could it be introduced in small steps, or is it a big scary change?
I think it would be a big change. At the moment we have a single database per wiki, and no actual connection between the various databases. As far as I know the only exception to that is the images from Commons, but your idea goes further than that, because I cannot change a picture on Commons by editing a wiki page elsewhere. This would both be a large conceptual change and a technical issue (suppose the 'interwiki database' is down for writing, what do we do when someone tries to edit a page?)
Apart from that there is the issue of naming of pages on this central depository. It seems you'd have to have an interwiki consensus about that... And then there's initial population - what do we do with the currently existing problems? I guess that's a point that could be done in small steps though (allow the 'old' and the 'new' system to exist in parallel for some time). What do you do with new problems? That is, what if the same subject is linked from 2 pages in one language? And what if A is of the opinion that a group of pages should all be the same 'interwiki group' and B that they should be two? Will we be getting cross-wiki edit wars?
There's one problem with these interwiki links that has not yet been mentioned in this thread: Not rarely when I have finally sorted out two subjects, and kept only those interwiki that are to the same subject, someone comes around and tells me that I should not be removing correct interwiki links.
2009/1/7 Andre Engels andreengels@gmail.com:
As far as I know the only exception to that is the images from Commons,
You missed CentralAuth. :)
(suppose the 'interwiki database' is down for writing, what do we do when someone tries to edit a page?)
Central editing of those would solve that
Will we be getting cross-wiki edit wars?
Probabally, and the people may not even be able to talk about it, due to the language differences. It'll be a pain to sort out.
On Tue, Jan 6, 2009 at 10:52 AM, Lars Aronsson lars@aronsson.se wrote:
So, my question:
Has anybody mapped exactly how many such interwiki conflicts we have? Or how many interwiki sets do we have without conflicts? Could/should someone make a list of current conflicts and try to rank them by importance, so we can get started in fixing them?
As you already noted, pywikipediabot when run autonomously will add a remark on each such conflict, so that would be an easy way to harvest a large number of them. There are many of them - although there are many people working on interwiki, they usually either just add them, or run autonomous bots, correcting incorrect links takes place much less.
Resolving them is in some cases easy, but in many cases not. Different Wikipedias not rarely have different ways of 'subdividing' the 'universe' of possible meanings. This means that the dual assumptions that 'interwiki is an equivalence relation' and 'any page can interwiki to only one page in a single language' that the framework is based on, are often not met, or only in artificial ways.
Examples of problems are: * Closely connected subjects (for example, a biological order and the only family in it, a municipality and its main town by the same name, a fruit tree and its fruit, a computer game and the series of which it is the first game, two scientific terms which are each other's opposite) have two pages on some Wikipedias, one page on other, and that one page is sometimes more one subject, sometimes more the other, and sometimes really about both * Words that mean a general term in one language being used for a more specific one in another language, for example [[en:Autobahn]] being about highways in Germany, [[de:Autobahn]] about highways in general, or the name of a Japanese traditional dagger being used to mean that specific type of dagger in western language, but more generally 'dagger' in Japanese, or countries using their own mythical small creature as the best translation of 'dwarf', but being about dwarves in a specific mythology elsewhere * Slight shifts of meaning from one language to the other causing a sequence of 'closest connections' leading to another word in the same language
I might have sounded too negative by including all those problems. I think it would be good to do a search for such conflicts, and I know that several of them CAN be easily corrected. But one should not close the eyes to the fact that there are clear problems.
Once again, I'd like to point the interested reader to my own take on the issue of interlanguage links: http://brightbyte.de/page/Ideas_for_a_smarter_inter-language_link_system.
I still believe that that would be better than a central place for managing interwikis. In a nutshell: edit locally, like now, but compare globally, and show also *incoming* interwiki links.
-- daniel
Andre Engels wrote:
Examples of problems are:
- Closely connected subjects (for example, a biological order and the only family in it, a municipality and its main town by the same name, a fruit tree and its fruit, a computer game and
Such problems certainly exist. But they are not our worst problem at the moment. Today I sorted out "Calvin Klein" (the company) and "Calvin Klein (fashion designer)" (the person). They now form two separate interwiki clusters, without conflicts. But sorting this out was more hard work than it needs to be. With some improved tools, we can make this work a little easier.
The Category:Politicians in many languages has an interwiki link to the Armenian (hy:) category for political scientists. I fixed the English Wikipedia (manually) and the North European languages (by bot), but some 50 languages remain to be edited.
If interwiki.py supported SUL and if I had a truly global bot flag, I could do it. But I'm reluctant to edit 50 languages manually, especially since there are hundreds of such conflicts.
One problem here is that interwiki.py only adds links. Both correct ones and errors are quickly propagated. But corrections are not propagated, because the conflicts make it give up. An easy way to remove that hy: interwiki link would be a great help.
On Wed, Jan 7, 2009 at 6:08 AM, Lars Aronsson lars@aronsson.se wrote:
The Category:Politicians in many languages has an interwiki link to the Armenian (hy:) category for political scientists. I fixed the English Wikipedia (manually) and the North European languages (by bot), but some 50 languages remain to be edited.
If interwiki.py supported SUL and if I had a truly global bot flag, I could do it. But I'm reluctant to edit 50 languages manually, especially since there are hundreds of such conflicts.
You can do it by bot as things are. I myself use Robbot on all languages; the only thing that could be improved regarding SUL is that I have to type in its password once for each language rather than one time for all, and as regards bot flags - it seems it has one on every language where it needs it.
One problem here is that interwiki.py only adds links. Both correct ones and errors are quickly propagated. But corrections are not propagated, because the conflicts make it give up. An easy way to remove that hy: interwiki link would be a great help.
Well, as said, I use Robbot on all languages, the code I use for that is:
from family import Family for lang in Family().alphabetic: usernames['wikipedia'][lang] = 'Robbot'
This gives me 2 warnings every time I start the bot, but I just ignore them. With such a setting, whenever I get to a conflict of which I know the resolution, I start a separate interwiki.py with the necessary -ignore or -neverlink and -force, and the bot will remove at least that problem everywhere it exists.
Hi!
See http://ru.wikipedia.org/wiki/User:VolkovBot/conflicts for list of conflicts. VolkovBot is pretty active, so list should be more or less comprehensive.
Eugene.
On Tue, Jan 6, 2009 at 1:52 AM, Lars Aronsson lars@aronsson.se wrote:
I just recently started to play with interwiki.py (Pywikipedia bot framework) for propagating interwiki links. My interest comes from organizing the category tree, so I'm focusing on interwiki links between categories. Interwiki bots normally run in autonomous mode, but this means they give up on complicated cases.
If I run this script under manual supervision, without the "-autonomous" option, it stops and asks me how to resolve each conflict. This happens ever so often. I have now (manually) sorted out the interwiki links between all languages of Category:Knowledge, which was intertwined with Category:Science, and Category:Austrian writers which was mixed up with Category:Austrian literature. Such mistakes easily happen, of course. Who can spot errors in all these languages?
Many languages had interwiki links from their category for Austrian writers to the Japanese category for Austrian literature. I'm not sure exactly when or where this error originated. But on June 19, 2007, the English and Spanish Wikipedia's interwiki link to Japanese changed from Austrian novelists to Austrian literature, i.e. from one error to another. Ten days later, this link was copied to the Dutch Wikipedia. The error was corrected on en.wikipedia on October 1, 2007, but remained on other languages. Yes, that's 15 months ago.
The circular interwiki link structure from en:Category:Austrian writers to es:Categoría:Escritores de Austria to ja:... and back to en:Category:Austrian literature is such a conflict that makes interwiki.py give up when it runs in autonomous mode.
Thus, corrections (as on October 1) do not propagate. Instead a report about the conflict is given in a logfile, but apparently nobody had fixed this problem in the last 15 monhts. This conflict also blocked new interwiki links from propagating.
After I cleared up the mess, 21 new interwiki links were added to the category on the Russian Wikipedia (one where I have a bot flag). That means 21 languages of Wikipedia had created categories (or announced them to the interwiki system) for Austrian writers in the last 15 months, and they all added their interwiki link to the English Wikipedia. But these additions did not propagate because of the conflict.
So, my question:
Has anybody mapped exactly how many such interwiki conflicts we have? Or how many interwiki sets do we have without conflicts? Could/should someone make a list of current conflicts and try to rank them by importance, so we can get started in fixing them?
In the longer term, we need to redesign the interwiki links into a centralized system, that can be maintained. I think the way to do this is to use Wikimedia Commons. Instead of copying all the interwiki links to every language of Wikipedia, it should be enough to add {{commons|Category:Writers from Austria}}, and the rest should happen automatically.
-- Lars Aronsson (lars@aronsson.se) Aronsson Datateknik - http://aronsson.se
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
2009/1/6 Lars Aronsson lars@aronsson.se:
Has anybody mapped exactly how many such interwiki conflicts we have? Or how many interwiki sets do we have without conflicts? Could/should someone make a list of current conflicts and try to rank them by importance, so we can get started in fixing them?
Someone actually did this, it was discussed a few months ago on wikien-l. (I'm writing this in my lunch hour, so haven't time to track down the thread in the archive right now, sorry.)
But basically: treating interwiki links as a 1-1 relationship even from one wiki to another is horribly unreliable, and assuming you can go from wiki A to wiki B to wiki C with interwiki links is just not doable reliably with robots.
It's not quite as horrible as trying to make ontological sense of the category tree (where the only relationship that can be presumed is "has something to do with" - one of the reasons that making cats work like tags with a good complex Boolean query frontend would be so useful), but it's in the same realms of hair-tearing horror for the same reasons, i.e. people are a problem.
- d.
David Gerard schrieb:
But basically: treating interwiki links as a 1-1 relationship even from one wiki to another is horribly unreliable, and assuming you can go from wiki A to wiki B to wiki C with interwiki links is just not doable reliably with robots.
If you only look at language-links that got *both* ways, you get a decent 1-to-1 mapping. I used this as part of my thesis, and wrote a short paper about it: http://brightbyte.de/repos/papers/2008/LangLinks-paper.pdf.
I can also recommend the studies of Rainer Hammwöhner about Wikipedia, especially "Interlingual Aspects if Wikipedia’s Quality" http://mitiq.mit.edu/iciq/PDF/INTERLINGUAL%20ASPECTS%20OF%20WIKIPEDIAS%20QUALITY.pdf, which studies the quality of language links and the categtory system, among other things.
-- daniel
Amir E. Aharoni wrote:
The last one is my own creation, which has surprisingly caught on; Unfortunately i haven't had much time to maintain it lately, and i'd be very glad if someone could help with that.
I think it's popular because it is easy to collaborate and do something with the problem. As opposed to discussing 'interwikis are broken', on which all of us agree.
However, I'd still improve the interface. Why edit a page to change the group, instead of choosing the meaning with one click? I'd move it to the toolserver with a interface to view the interwiki groups, split and define them, move interwikis on their groups... All of that then backed by some bot.
Moving to the wikipedia scenario, the interwikis could be shown on a different state "conflicted". Thus on normal wiki interaction, wikipedians would notice *on their wiki*, lead to the interwiki managing (having a link on p-lang) and help to fix it (and I say help instead of fix because this is work has to be collaborative).
The briefs for the article groups also could and should be reused for something else (WikitionaryZ, simplewiki, yahoo abstracts...).
wikitech-l@lists.wikimedia.org