Re: [Toolserver-l] Troubles with reading Articles

25 Mar 2006


      Hi Daniel, thanks for your answer - you wrote:
...
...
a) Are we now allowed - from a toolserver account - to iterate over german
and english articles (first all, then only the new ones) - or not?
In theory yes, in practice no. As this would currently mean to pull
every single article via HTTP, this is discouraged, because it creates a
lot of load. Doing it trough WikiProxy would mean that each revision is
only loaded once, which makes this a bit better. But it's still slow
(more than 1 sec per article).
Where can I find the token? I'd like to process some 10 to 1000 articles 
based on templates that are used in it.
...
Currently, the best way to bulk-process article text is to read from an
XML dump. You can adopt the exiting importers to fit your purpose, code
is available in PHP, Java and C#, I believe.
Well, I think this means that Stefan's team has to recode a lot. Pulling 
the titles and texts out of the XML dump is easy but you only get a new 
dump every 1 or 2 month. On the other hand XML is more robust while the 
database structure will change with every MediaWiki version - for 
instance I was not aware of the external text before.
...
...
b) Is there a technical solution (in PHP, WikiProxy?) to solve our problem
trying to access all pages - even those residing in external storage?
WikiProxy solves the problem of accessing external storage, for any page
you want. It does not solve it very efficiently, so it should not be use
to access *all* pages in a run.
...
c) In order to mirror those pages on toolserver can perhaps Kate or Brion
come to rescue?
Again, in theory, yes. In practice, both are quite busy, maybe we should
try asking someone else (like, I don't know... JeLuF, perhaps?). I
imagine this would involve setting up a second mysql server instance,
and replication for that. There are probably some other tricky things to
take care of. Perhaps we should officially request technical help with
this from the e.V. I have already talked to elian about it.
What do you mean? The e.V. can support with money and fame but it's 
pretty unexperienced in setting up mysql servers ;-)
...
On a slightly related note: we still do not get updates for any data on
the Asian cluster (the databases we have are stuck in October).
Apparently, it would be possible to resolve this, but it's tricky. The
*real* solution would be to to have multi-master replication, which (i
am told) is expected to be supported by MySQL 5.2.
Sounds like not definite solution before MySQL 5.2.
Greetings,
Jakob

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Toolserver-l] Troubles with reading Articles