Purodha Blissenbach wrote:
with the
beginning of the next year (2011) I will not longer accept new
interwiki-bot-accounts, if:
[...]
While we're at it - in the future, we shall have interwiki bots reading the
replicated data bases to a great extent while gathering informations about
existing and prseumably missing interwiki links. This will be sparing lots of
request to the wmf servers which will then be bothered only when wiki pages
are actually altered.
Using the replicated data instead of making http (api) requests should speed
up the data collection phase of large inteerwiki groups from several minutes
to a seconds or so.
Another approach of making interwiki bots use the replicated data would
be to pre-process their interwiki data into a list or table of versioned
change requests, being published on the toolserver.
Interwiki worker bots running elsewhere would pick requests from the list &
process them. Picked requests are postponed for a while until replicated
data renders them done, or until a timeout (>replication lag) is exhausted.
My current understanding is that a variety of different bot operators run
interwiki.py from different accounts (both Toolserver accounts and Wikimedia
accounts), using different lists, using very inefficient code, and bot
operators do not check the edits manually. Is that correct? If so, there is
an underlying, fundamental problem to interwiki.py that using database
connections rather than HTTP API requests cannot fix.
Do you know the status of getting a solution built in to MediaWiki (either
in core or in an extension) that could make interwiki.py completely
obsolete? It's my _strong_ recommendation that development effort be put
into a real solution rather than focusing on ways to make interwiki.py suck
less.
MZMcBride