Hi Daniel, thanks for your answer - you wrote:
a) Are we now allowed - from a toolserver account - to iterate over german and english articles (first all, then only the new ones) - or not?
In theory yes, in practice no. As this would currently mean to pull every single article via HTTP, this is discouraged, because it creates a lot of load. Doing it trough WikiProxy would mean that each revision is only loaded once, which makes this a bit better. But it's still slow (more than 1 sec per article).
Where can I find the token? I'd like to process some 10 to 1000 articles based on templates that are used in it.
Currently, the best way to bulk-process article text is to read from an XML dump. You can adopt the exiting importers to fit your purpose, code is available in PHP, Java and C#, I believe.
Well, I think this means that Stefan's team has to recode a lot. Pulling the titles and texts out of the XML dump is easy but you only get a new dump every 1 or 2 month. On the other hand XML is more robust while the database structure will change with every MediaWiki version - for instance I was not aware of the external text before.
b) Is there a technical solution (in PHP, WikiProxy?) to solve our problem trying to access all pages - even those residing in external storage?
WikiProxy solves the problem of accessing external storage, for any page you want. It does not solve it very efficiently, so it should not be use to access *all* pages in a run.
c) In order to mirror those pages on toolserver can perhaps Kate or Brion come to rescue?
Again, in theory, yes. In practice, both are quite busy, maybe we should try asking someone else (like, I don't know... JeLuF, perhaps?). I imagine this would involve setting up a second mysql server instance, and replication for that. There are probably some other tricky things to take care of. Perhaps we should officially request technical help with this from the e.V. I have already talked to elian about it.
What do you mean? The e.V. can support with money and fame but it's pretty unexperienced in setting up mysql servers ;-)
On a slightly related note: we still do not get updates for any data on the Asian cluster (the databases we have are stuck in October). Apparently, it would be possible to resolve this, but it's tricky. The *real* solution would be to to have multi-master replication, which (i am told) is expected to be supported by MySQL 5.2.
Sounds like not definite solution before MySQL 5.2.
Greetings, Jakob