I'm not sure I understand you.
Searching for "Amadeus Mozart" in the replicated databases could help,
yes, but the number of articles that share a common String through
different languages is quite small, isn't ?
It works for some specific concepts and personalities, but most of the
article titles need to be translated, and a search using wildcards or
regexps is not going to help for these.
Honestly, the pywikipedia team has a bit changed these last months,
and the API edit will soon be available : I've been telling myself for
days that interwiki.py will need sooner or later a rewrite. But this
is not this easy.
I understand your concept of "interwiki class", but finding such a
class does not appear to be this obvious.
If you have a general pseudo-algorithm being able to outline a
specific class of articles on the same subject, please share it. But I
think that the actual behavior -- starting from a specific page,
building the interwikik links graph, and indexing the cycles -- if not
optimal, can not be avoided this easily.
2008/6/11 Purodha <toolserver-l.wikipedia.org(a)publi.purodha.net>et>:
I could not store this comment on the blog server.
Feel free to put it there if you can, or forward it elsewhere,
if you see fit.
Since an interwiki link needing propagation may exist only once in
one specific wiki in one specific page, all pages having the
potential for interwiki linking in each language of a project need
to be read. There is no reason, not to have a single bot doing
this, but as pywikipediabot is currently structured, it is always
operated starting from a selection of pages of one idividual wiki
only. These selctions may be huge, such as all articles in the
English wikipedia (but no non-article pages, such as templates, or
category pages, and no other language) So with the current
structure, it is advisable, for each language wiki, to have at
least one bot starting from it regularly, propagating the "here
only" set links to the remaining wikis.
There is another sad thing to mention. If only one link could not
be set - be it because of an edit conflict, a transient network
error, server overload, or because a bot is not allowed to access
a specific wiki - the entire bot run for all linked articles in
this interwiki class has to be repeated just to add this single
missing link. The majority of interwiki bots is serving only a
comparatively small number of wikis. Its hard to get a single bot
to serve all language wikis. It requires a lot of labour due to
the sheer number of wikis there is, each and every wiki requires
an individual account to be set up and an inividual bot
application by rules individual to each wiki, which you have to
find, read, understand, and obbey, proceedings and procedures
vary, and are in part contradictive between wikis. Even if you
follow their rules, some wiki communities, or their bureaurocrats,
just don't do it, for one or another reason or without.
An "interwiki class" is the set of pages each (needing to be)
linked to each other in the same class. Such classes can be as
little as two pages, and as big as one page from each wiki in a
family.
A slightly redesigned interwiki bot reading replicated databases
and tables on the toolserver could be collecting class information
much more efficiently than interwiki.py currently does by
exporting groups of articles from each wiki. Provided, there is no
significant replication lag, it would be even more up to date when
it comes to updating pages, because of its excessively higher
speed of collecting the members of a class. Such a redesign would
also allow to more easily implement various helpful new ways of
selecting which pages to look at, e.g. "language='all',
title='Amadeus Mozart'", or ones using SQL wildcards or regular
expressions, etc.
Greetings.
Purodha.
_______________________________________________
Toolserver-l mailing list
Toolserver-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
--
Nicolas Dumazet — NicDumZ [ nIk.d̪ymz ]
pywikipedia & mediawiki
Deuxième année ENSIMAG.
06 03 88 92 29