On 3/29/06, Gregory Maxwell <gmaxwell(a)gmail.com> wrote:
I also run a wikipedia bot that does a number of tasks
which is not
itself on toolserver, but which operates itself based on queries
performed on toolserver. Almost all the tasks I have it do are ones
which would require reading hundreds of pages via http were I unable
to drive it with queries.
http://tools.wikimedia.de/~kate/cgi-bin/count_edits?user=Roomba&dbname=…
(based on the sort of activity roomba does, the workload savings vs
screen scraping is remarkable... for example one of its more frequent
activities is tagging orpahaned fair use images. Once a day it finds
all of them, and keeps a running list. Images which are persistantly
orphaned for a while get tagged. The query typically takes a couple of
seconds, without toolserver roomba would need to walk the categories
to find all the image pages then load them to find what links to
them... This would result in about a million additional http requests
a week. Limited to 5 requests/second, it just wouldn't work at all)