[Toolserver-l] Interwiki handling on the toolserver. (Was: FYI: Changing of interwiki-bots-account-approval)
MZMcBride
z at mzmcbride.com
Fri Dec 3 04:18:58 UTC 2010
Purodha Blissenbach wrote:
>> with the beginning of the next year (2011) I will not longer accept new
>> interwiki-bot-accounts, if:
>> [...]
>
> While we're at it - in the future, we shall have interwiki bots reading the
> replicated data bases to a great extent while gathering informations about
> existing and prseumably missing interwiki links. This will be sparing lots of
> request to the wmf servers which will then be bothered only when wiki pages
> are actually altered.
>
> Using the replicated data instead of making http (api) requests should speed
> up the data collection phase of large inteerwiki groups from several minutes
> to a seconds or so.
>
> Another approach of making interwiki bots use the replicated data would
> be to pre-process their interwiki data into a list or table of versioned
> change requests, being published on the toolserver.
> Interwiki worker bots running elsewhere would pick requests from the list &
> process them. Picked requests are postponed for a while until replicated
> data renders them done, or until a timeout (>replication lag) is exhausted.
My current understanding is that a variety of different bot operators run
interwiki.py from different accounts (both Toolserver accounts and Wikimedia
accounts), using different lists, using very inefficient code, and bot
operators do not check the edits manually. Is that correct? If so, there is
an underlying, fundamental problem to interwiki.py that using database
connections rather than HTTP API requests cannot fix.
Do you know the status of getting a solution built in to MediaWiki (either
in core or in an extension) that could make interwiki.py completely
obsolete? It's my _strong_ recommendation that development effort be put
into a real solution rather than focusing on ways to make interwiki.py suck
less.
MZMcBride
More information about the Toolserver-l
mailing list