[Toolserver-l] [OT] Python coding help needed

Siebrand Mazeland s.mazeland at xs4all.nl
Tue May 22 06:04:30 UTC 2007


This may be off topic, but asking on Wikimedia Commons and the
Pywikipediabot mailinglist didn't give me any replies. Please flame me if
you think I should not make a request like the one below again on this list.
----
I would like to request a volunteer to assist (well, assist, you have to do
the dirty work, I'm afraid, as I am just not skilled enough, I can spec it,
though) in stabilising and fine tuning the CommonsDelinker bots replacer.py
and delinker.py.

Orgullomoore, the previous maintainer of the code has gone back packing in
Europe and is no longer able to maintain the bots. About two weeks ago I
took over running the bot from the toolserver. In the past two weeks I have
experienced a few flaws in the bots that I would like to see fixed.

If you are an experienced python coder and feel at home with pywikipediabot,
threads and a little MySQL and you feel like making a difference on this
crucial bot in supporting all users of Wikimedia Commons, please contact me.
Having an account on toolserver is not required but is a definite advantage.

Below a description of current functionality and found flaws.

Thanks for your help.

Siebrand

Delinker: removes links from any Wikimedia wiki once a file has been deleted
from Wikimedia Commons. If a special string is added to the deletion
message, this action is not performed. Success and failure of edits is
logged into a MySQL database on the toolserver. Edit messages can be
localised and are read from the local wiki.

Flaws:
*	Fails if CheckUsage is too busy. Delinking is not performed. This
should be recognised and retries should be performed until a complete
CheckUsage was obtained
*	Each toolserver user is granted 15 connections to the MySQL
database. Sometimes this limit is reached and in that case success and
failure is not logged. This should not happen.
*	If a lot of files have been deleted that are used a lot on one wiki,
a lot of edits can be made to that one wiki. This can (and has) upset users
in a community. An edit limit of 3 edits per minute per wiki should be in
place.
*	From the bot output it looks like it keeps on fetching the edit
summary on multiple edits on a wiki. This causes unnecessary server load and
should not occur.
*	Most main projects are supported, but especially the meta projects
can give issues because of the implementation of project recognition.
*	TypeError: unsupported operand type(s) for -: 'unicode' and 'int'
*	Sometimes when toolserver is really busy (or whatever the reason may
be), a thread cannot be created. In this case a task is not executed. This
should not happen.

Replacer: replaces image links on any Wikimedia wiki. Tasks are fetched from
a sysop only page on Wikimedia Commons. AnyImageTypeButSVG to SVG is not
supported because of a pending SupersededSVG deletion debate. Success and
failure of edits is logged into a MySQL database on the toolserver. Edit
messages can be localised and are read from the local wiki. If {{stop}} is
present on the wiki page, the bot will stop working until that text has been
removed.

Flaws:
The bots use a lot of similar code. Because of that, similar issues may be
documented.
*	Fails if CheckUsage is too busy. This should be recognised and
retries should be performed until a complete CheckUsage was obtained
*	Each toolserver user is granted 15 connections to the MySQL
database. Sometimes this limit is reached and in that case success and
failure is not logged. This should not happen.
*	If a lot of files have to be replaced that are used a lot on a wiki,
a lot of edits can be made to that one wiki in a time unit. This can (and
has) upset users in a community. An edit limit of 3 edits per minute per
wiki should be in place.
*	From the bot output it looks like it keeps on fetching the edit
summary on multiple edits on a wiki. This causes unnecessary server load and
should not occur.
*	Most main projects are supported, but especially the meta projects
can give issues because of the implementation of project recognition.
*	Image name recognition is such errors can occur in special cases.
Example: A page contains images "Test 1.jpg" and "1.jpg". If "1.jpg" is
removed, "Test 1.jpg" may end up as "Test ". (example unfortunately lost)
*	Although there is code present to prevent this issues, it has
happened that a replacement was made when a local image with the same name
was present. This should not happen. [1]
*	Bot crashes on 'Unhandled exception in thread started'
*	Sometimes when toolserver is really busy (or whatever the reason may
be), a thread cannot be created. In this case a task is not executed. This
should not happen.
*	If an image has more than 50 uses on a wiki and all uses are from a
template that is not displayed in the first 50 pages displayed, no text
replacement will occur (can this be fixed at all)

[1]
http://en.wikipedia.org/w/index.php?title=User:TomStar81/World_War_II&diff=p
rev&oldid=129050292

General wishes
*	Simple web interface for database logs (last edits, last edits per
language, all edits for a particular image)
*	Some kind of registration for the bots to enable starting it from
cron if it is no longer running





More information about the Toolserver-l mailing list