On Saturday, March 25, 2006 1:48 PM Daniel wrote:
Right now, no token is necessary to use WikiProxy -
the "lock" will
become active when I update my tools next time. Then, you can get an
access token by asking me :)
Note that you will *never* need a token to access WikiProxy locally from
the toolserver. The IPs are whitelisted.
Thank you and Jakob!
> Well, I think this means that Stefan's team
has to recode a lot. Pulling
> the titles and texts out of the XML dump is easy but you only get a new
> dump every 1 or 2 month. On the other hand XML is more robust while the
[...]
For the analysis of large volumes of texts doing it
"live" isn't really
an option anyway, I think. And being able to handle XML dumps is a good
idea anyway :)
If you assume that one has to repeat this process repeatedly you are right.
In our case we only need to run it _once_ (Ok, I admit: twice, because of
some testing). After that we try to visit only those articles which have
changed since, say, one or several days. We would even bare lowest priority
while our process runs.
On the other hand: A dump could serve for this first 'full access', but only
if it's a recent one... ('cause we'll try then to iterate only on the delta
since the timestamp of the dump).
And: Pulling text out of the XML dump is not that easy really; needs lots of
additional code (dumps from several tables and re-indexing, etc.) compared
to online-access. And it's not repeatable up to now, as I'm aware, e.g.
either the path/filename to the most recent dewiki dump needs to be constant
or the online request of the XML dump should tell us its timestamp.
-- Stefan