Purodha Blissenbach wrote:
with the beginning of the next year (2011) I will not longer accept new interwiki-bot-accounts, if: [...]
While we're at it - in the future, we shall have interwiki bots reading the replicated data bases to a great extent while gathering informations about existing and prseumably missing interwiki links. This will be sparing lots of request to the wmf servers which will then be bothered only when wiki pages are actually altered.
Using the replicated data instead of making http (api) requests should speed up the data collection phase of large inteerwiki groups from several minutes to a seconds or so.
Another approach of making interwiki bots use the replicated data would be to pre-process their interwiki data into a list or table of versioned change requests, being published on the toolserver. Interwiki worker bots running elsewhere would pick requests from the list & process them. Picked requests are postponed for a while until replicated data renders them done, or until a timeout (>replication lag) is exhausted.
My current understanding is that a variety of different bot operators run interwiki.py from different accounts (both Toolserver accounts and Wikimedia accounts), using different lists, using very inefficient code, and bot operators do not check the edits manually. Is that correct? If so, there is an underlying, fundamental problem to interwiki.py that using database connections rather than HTTP API requests cannot fix.
Do you know the status of getting a solution built in to MediaWiki (either in core or in an extension) that could make interwiki.py completely obsolete? It's my _strong_ recommendation that development effort be put into a real solution rather than focusing on ways to make interwiki.py suck less.
MZMcBride