Hello to both the wikitech and pywikipedia lists -- please keep both informed when replying. Thanks.
A few days ago, we - the pywikipedia developers - received alarming reports of interwiki bots removing content from pages. This does not seem to happen often, and we have not been able to reproduce the conditions in which this happens.
However, the common denominator is the fact it seems to be happening only on the wikipedia's that run MediaWiki 1.18 wikis. As such, I think this topic might be relevant for wikitech-l, too. In addition, there is no-one in the pywikipedia team with a clear idea of why this is happening. As such, we would appreciate any ideas.
1. What happens? Essentially, the interwiki bot does its job, retrieves the graph and determines the correct interwiki links. It should then add it to the page, but instead, /only/ the interwiki links are stored. For example: http://nl.wikipedia.org/w/index.php?title=Blankenbach&diff=next&oldi... http://eo.wikipedia.org/w/index.php?title=Anton%C3%ADn_Kl%C3%A1%C5%A1tersk%C... http://simple.wikipedia.org/w/index.php?title=Mettau%2C_Switzerland&acti...
2. Why does this happen? This is unclear. On the one hand, interwiki.py is somewhat black magic: none of the current developers intimately knows its workings. On the other hand, the bug is not reproducible: running it on the exact same page with the exact same page text does not result in a cleared page. It could very well be something like broken network error handling - but mainly, we have no idea. Did anything change in Special:Export (which is still used in interwiki.py) or the API which might cause something like this? I couldn't find anything in the release notes.
3. Reasons for relating it to MW 1.18 To find out on which wikis this problem happens, I used a quick-and-dirty heuristic: select rc_comment, rc_cur_time, rc_user, rc_namespace, rc_title, rc_old_len, rc_new_len from recentchanges left join user_groups on ug_user=rc_user where rc_new_len < rc_old_len * 0.1 and ug_group = 'bot' and rc_namespace=0 limit 10 /* SLOW OK */;
This is a slow query (~30s for nlwiki_p on the toolserver), but it gives some interesting results: nlwiki: 9 rows, all broken interwiki bots eowiki: 25 rows, all interwiki bots simplewiki: 3 rows, of which 2 are interwiki bots dewiki: 0 rows using rc_old_len * 0.3: 14 rows, all double redirect fixes frwiki: 9 rows, but *none* from interwiki bots (all edits are by the same antivandalism bot) itwiki: 0 rows ptwiki: 0 rows
All ideas and hints are very welcome. Hopefully we will be able to solve this before tuesday...
Best regards, Merlijn van Deen