On 3/29/06, Gregory Maxwell gmaxwell@gmail.com wrote:
I also run a wikipedia bot that does a number of tasks which is not itself on toolserver, but which operates itself based on queries performed on toolserver. Almost all the tasks I have it do are ones which would require reading hundreds of pages via http were I unable to drive it with queries.
http://tools.wikimedia.de/~kate/cgi-bin/count_edits?user=Roomba&dbname=e...
(based on the sort of activity roomba does, the workload savings vs screen scraping is remarkable... for example one of its more frequent activities is tagging orpahaned fair use images. Once a day it finds all of them, and keeps a running list. Images which are persistantly orphaned for a while get tagged. The query typically takes a couple of seconds, without toolserver roomba would need to walk the categories to find all the image pages then load them to find what links to them... This would result in about a million additional http requests a week. Limited to 5 requests/second, it just wouldn't work at all)