Hi Daniel, thanks for your answer - you wrote:
a) Are we now
allowed - from a toolserver account - to iterate over german
and english articles (first all, then only the new ones) - or not?
In theory yes, in practice no. As this would currently mean to pull
every single article via HTTP, this is discouraged, because it creates a
lot of load. Doing it trough WikiProxy would mean that each revision is
only loaded once, which makes this a bit better. But it's still slow
(more than 1 sec per article).
Where can I find the token? I'd like to process some 10 to 1000 articles
based on templates that are used in it.
Currently, the best way to bulk-process article text
is to read from an
XML dump. You can adopt the exiting importers to fit your purpose, code
is available in PHP, Java and C#, I believe.
Well, I think this means that Stefan's team has to recode a lot. Pulling
the titles and texts out of the XML dump is easy but you only get a new
dump every 1 or 2 month. On the other hand XML is more robust while the
database structure will change with every MediaWiki version - for
instance I was not aware of the external text before.
b) Is there a
technical solution (in PHP, WikiProxy?) to solve our problem
trying to access all pages - even those residing in external storage?
WikiProxy solves the problem of accessing external storage, for any page
you want. It does not solve it very efficiently, so it should not be use
to access *all* pages in a run.
c) In order to mirror those pages on toolserver
can perhaps Kate or Brion
come to rescue?
Again, in theory, yes. In practice, both are quite busy, maybe we should
try asking someone else (like, I don't know... JeLuF, perhaps?). I
imagine this would involve setting up a second mysql server instance,
and replication for that. There are probably some other tricky things to
take care of. Perhaps we should officially request technical help with
this from the e.V. I have already talked to elian about it.
What do you mean? The e.V. can support with money and fame but it's
pretty unexperienced in setting up mysql servers ;-)
On a slightly related note: we still do not get
updates for any data on
the Asian cluster (the databases we have are stuck in October).
Apparently, it would be possible to resolve this, but it's tricky. The
*real* solution would be to to have multi-master replication, which (i
am told) is expected to be supported by MySQL 5.2.
Sounds like not definite solution before MySQL 5.2.
Greetings,
Jakob