It seems somewhere along the way pywikipedia-l got removed, therefore
forwarding this message, which might be interesting: The problem seems to
not be bot-specific, if it happened that a 'normal' user got to the page
before a bot, he would have seen something very unusual too.
---------- Forwarded message ----------
From: Andre Engels <andreengels(a)gmail.com>
Date: Fri, Sep 30, 2011 at 11:13 AM
Subject: Re: [Wikitech-l] serious interwiki.py issues on MW 1.18 wikis
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
On Fri, Sep 30, 2011 at 9:39 AM, Ariel T. Glenn <ariel(a)wikimedia.org> wrote:
> Out of curiosity... If the new revisions of one of these badly edited
> pages are deleted, leaving the top revision as the one just before the
> bad iw bot edit, does a rerun of the bot on the page fail?
I did a test, and the result was very interesting, which might point to the
cause of this bug:
I deleted the page [[nl:Blankenbach]], then restored the 2 versions before
the problematic bot edit. When now I look at the page, instead of the page
content I get:
In de database is geen inhoud aangetroffen voor de pagina met .
Dit kan voorkomen als u een verouderde verwijzing naar het verschil tussen
twee versies van een pagina volgt of een versie opvraagt die is verwijderd.
Als dit niet het geval is, hebt u wellicht een fout in de software gevonden.
Maak hiervan melding bij een
Wikipedia en vermeld daarbij de URL van deze pagina.
Going to the specific version that after the deletion-and-partial-restore
should be the newest (
claims that there is a newer version, but going to the newer version or the
newest version, I get the abovementioned message again.
As an extra test, I did the
delete-then-restore-some-versions-but-not-the-most-recent action with
another page (http://nl.wikipedia.org/wiki/Gebruiker:Andre_Engels/Test), and
there I found no such problem. From this I conclude that the bug has not
been caused by that process, but that for some reason the page had a wrong
(or empty) version number for its 'most recent' version, or something like
André Engels, andreengels(a)gmail.com
André Engels, andreengels(a)gmail.com
On Fri, Sep 30, 2011 at 11:12 AM, Max Semenik <maxsem.wiki(a)gmail.com> wrote:
> On Fri, Sep 30, 2011 at 12:56 PM, Andre Engels <andreengels(a)gmail.com
> > The interwiki links are retrieved from page content. The page content has
> > been received through a call to Special:Export.
> > > I.e. would receiving no content (from the bot POV) produce that
> > >
> > Yes, the only reasonable explanation seems to be that the bot interprets
> > what it gets from the server as an empty page.
> So you screen-scrape? No surprise it breaks. Why? For example, due to
> protocol-relative URLs. Or some other changes to HTML output. Why not just
> use API?
Basically, because most of the core functionality comes from before the API
came into existence. At least, that would be my explanation.
André Engels, andreengels(a)gmail.com
Hello to both the wikitech and pywikipedia lists -- please keep both
informed when replying. Thanks.
A few days ago, we - the pywikipedia developers - received alarming
reports of interwiki bots removing content from pages. This does not
seem to happen often, and we have not been able to reproduce the
conditions in which this happens.
However, the common denominator is the fact it seems to be happening
only on the wikipedia's that run MediaWiki 1.18 wikis. As such, I
think this topic might be relevant for wikitech-l, too. In addition,
there is no-one in the pywikipedia team with a clear idea of why this
is happening. As such, we would appreciate any ideas.
1. What happens?
Essentially, the interwiki bot does its job, retrieves the graph and
determines the correct interwiki links. It should then add it to the
page, but instead, /only/ the interwiki links are stored. For example:
2. Why does this happen?
This is unclear. On the one hand, interwiki.py is somewhat black
magic: none of the current developers intimately knows its workings.
On the other hand, the bug is not reproducible: running it on the
exact same page with the exact same page text does not result in a
cleared page. It could very well be something like broken network
error handling - but mainly, we have no idea. Did anything change in
Special:Export (which is still used in interwiki.py) or the API which
might cause something like this? I couldn't find anything in the
3. Reasons for relating it to MW 1.18
To find out on which wikis this problem happens, I used a
select rc_comment, rc_cur_time, rc_user, rc_namespace, rc_title,
rc_old_len, rc_new_len from recentchanges left join user_groups on
ug_user=rc_user where rc_new_len < rc_old_len * 0.1 and ug_group =
'bot' and rc_namespace=0 limit 10 /* SLOW OK */;
This is a slow query (~30s for nlwiki_p on the toolserver), but it
gives some interesting results:
nlwiki: 9 rows, all broken interwiki bots
eowiki: 25 rows, all interwiki bots
simplewiki: 3 rows, of which 2 are interwiki bots
dewiki: 0 rows
using rc_old_len * 0.3: 14 rows, all double redirect fixes
frwiki: 9 rows, but *none* from interwiki bots (all edits are by the
same antivandalism bot)
itwiki: 0 rows
ptwiki: 0 rows
All ideas and hints are very welcome. Hopefully we will be able to
solve this before tuesday...
Merlijn van Deen
Thank you for your attention
Information user-config. And I will send you the script used.
Please check the files posted, please solve this problem
On 18 September 2011 13:51, majid <magidred(a)yahoo.com> wrote:
I've never editing interwiki command run in a long time I have not been successful robot,
Please check the problem
First of all, it's annoying the bot is not doing what it should be doing. However, it's hard for other people to understand what is going on with your current report.
Could you please send the following information:
* what you are trying to do (e.g. 'fixing interwiki links on enwiki, srwiki, itwiki')
* which command (with which parameters) you are using for that
* the result you are expecting
* the real result, and the text outputted by the script
and then, to help debugging the issue:
* the output of python version.py
* the family you are working on (i.e. wikipedia, wikibooks, etc). If it's a custom family, please also post the family file.
* your user-config.py
-----Inline Attachment Follows-----
Pywikipedia-l mailing list
Hi! First, I do realize that the problem that I'm gonna show you
following is probably easy to fix, but, well, I'm not so in to python
yet. Second, it appears to me that it is a problem that could affect
more people than me, since the task I'm trying to execute is not that
uncommon, I think, so if it were fixed it would help... more than one. =)
Here is the deal:
I'm trying to upload all the images from this page:
to another wiki website.
The bot seems to recognize the images, but instead of go to
"pt.wikipedia.org/Ficheiro:name-of-image.xxx" where they actually are,
he retypes the given url of the portal and add the
"ficheiro:name-of-image.xxx" part. For example, the first image of the
archive must be found at
instead the bot tries to upload from
what obviously does not work. Here is the result in my terminal: "HTTP
Error 404: Not Found"
I'm wondering what should I do to fix it, which script to edit and what
change in it, since I am a Python newbie.
* "Ficheiro" is the portuguese equivalent to "file", in this case
-----BEGIN PGP SIGNED MESSAGE-----
Today DrTrigonBot complained about a
"HTTPError: HTTP Error 400: Bad Request"
that was originating from botlist.py. I was able to track this
down to the point where "タチコマ robot" should be used in an
URL as offset (line 96) which caused the error when puting in
this bot name. As I can see a
should solve the problem as done in the attached file. Could
please someone verify this (if there is time) and then commit
this change to SVN.
Btw: I was not able to find something similar to "urllib.quote"
in the bot framework so I had to import urllib.
Thanks a lot and Greetings
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
-----END PGP SIGNATURE-----