[Toolserver-l] A blog comment.

Purodha toolserver-l.wikipedia.org at publi.purodha.net
Wed Jun 11 11:02:57 UTC 2008


I could not store this comment on the blog server.
Feel free to put it there if you can, or forward it elsewhere,
if you see fit.

Since an interwiki link needing propagation may exist only once in
one specific wiki in one specific page, all pages having the
potential for interwiki linking in each language of a project need
to be read. There is no reason, not to have a single bot doing
this, but as pywikipediabot is currently structured, it is always
operated starting from a selection of pages of one idividual wiki
only. These selctions may be huge, such as all articles in the
English wikipedia (but no non-article pages, such as templates, or
category pages, and no other language) So with the current
structure, it is advisable, for each language wiki, to have at
least one bot starting from it regularly, propagating the "here
only" set links to the remaining wikis.           

There is another sad thing to mention. If only one link could not
be set - be it because of an edit conflict, a transient network
error, server overload, or because a bot is not allowed to access
a specific wiki - the entire bot run for all linked articles in
this interwiki class has to be repeated just to add this single
missing link. The majority of interwiki bots is serving only a
comparatively small number of wikis. Its hard to get a single bot
to serve all language wikis. It requires a lot of labour due to
the sheer number of wikis there is, each and every wiki requires
an individual account to be set up and an inividual bot
application by rules individual to each wiki, which you have to
find, read, understand, and obbey, proceedings and procedures
vary, and are in part contradictive between wikis. Even if you
follow their rules, some wiki communities, or their bureaurocrats,
just don't do it, for one or another reason or without.              

An "interwiki class" is the set of pages each (needing to be)
linked to each other in the same class. Such classes can be as
little as two pages, and as big as one page from each wiki in a
family.

A slightly redesigned interwiki bot reading replicated databases
and tables on the toolserver could be collecting class information
much more efficiently than interwiki.py currently does by
exporting groups of articles from each wiki. Provided, there is no
significant replication lag, it would be even more up to date when
it comes to updating pages, because of its excessively higher
speed of collecting the members of a class. Such a redesign would
also allow to more easily implement various helpful new ways of
selecting which pages to look at, e.g. "language='all',
title='Amadeus Mozart'", or ones using SQL wildcards or regular
expressions, etc.

Greetings.
Purodha.





More information about the Toolserver-l mailing list