[Pywikipedia-l] serious interwiki.py issues on MW 1.18 wikis
Merlijn van Deen
valhallasw at arctus.nl
Thu Sep 29 20:08:08 UTC 2011
Hello to both the wikitech and pywikipedia lists -- please keep both
informed when replying. Thanks.
A few days ago, we - the pywikipedia developers - received alarming
reports of interwiki bots removing content from pages. This does not
seem to happen often, and we have not been able to reproduce the
conditions in which this happens.
However, the common denominator is the fact it seems to be happening
only on the wikipedia's that run MediaWiki 1.18 wikis. As such, I
think this topic might be relevant for wikitech-l, too. In addition,
there is no-one in the pywikipedia team with a clear idea of why this
is happening. As such, we would appreciate any ideas.
1. What happens?
Essentially, the interwiki bot does its job, retrieves the graph and
determines the correct interwiki links. It should then add it to the
page, but instead, /only/ the interwiki links are stored. For example:
http://nl.wikipedia.org/w/index.php?title=Blankenbach&diff=next&oldid=10676248
http://eo.wikipedia.org/w/index.php?title=Anton%C3%ADn_Kl%C3%A1%C5%A1tersk%C3%BD&action=historysubmit&diff=3855198&oldid=1369139
http://simple.wikipedia.org/w/index.php?title=Mettau%2C_Switzerland&action=historysubmit&diff=3060418&oldid=1249270
2. Why does this happen?
This is unclear. On the one hand, interwiki.py is somewhat black
magic: none of the current developers intimately knows its workings.
On the other hand, the bug is not reproducible: running it on the
exact same page with the exact same page text does not result in a
cleared page. It could very well be something like broken network
error handling - but mainly, we have no idea. Did anything change in
Special:Export (which is still used in interwiki.py) or the API which
might cause something like this? I couldn't find anything in the
release notes.
3. Reasons for relating it to MW 1.18
To find out on which wikis this problem happens, I used a
quick-and-dirty heuristic:
select rc_comment, rc_cur_time, rc_user, rc_namespace, rc_title,
rc_old_len, rc_new_len from recentchanges left join user_groups on
ug_user=rc_user where rc_new_len < rc_old_len * 0.1 and ug_group =
'bot' and rc_namespace=0 limit 10 /* SLOW OK */;
This is a slow query (~30s for nlwiki_p on the toolserver), but it
gives some interesting results:
nlwiki: 9 rows, all broken interwiki bots
eowiki: 25 rows, all interwiki bots
simplewiki: 3 rows, of which 2 are interwiki bots
dewiki: 0 rows
using rc_old_len * 0.3: 14 rows, all double redirect fixes
frwiki: 9 rows, but *none* from interwiki bots (all edits are by the
same antivandalism bot)
itwiki: 0 rows
ptwiki: 0 rows
All ideas and hints are very welcome. Hopefully we will be able to
solve this before tuesday...
Best regards,
Merlijn van Deen
More information about the Pywikipedia-l
mailing list