On Saturday, March 25, 2006 1:48 PM Daniel wrote:
Right now, no token is necessary to use WikiProxy - the "lock" will become active when I update my tools next time. Then, you can get an access token by asking me :)
Note that you will *never* need a token to access WikiProxy locally from the toolserver. The IPs are whitelisted.
Thank you and Jakob!
Well, I think this means that Stefan's team has to recode a lot. Pulling the titles and texts out of the XML dump is easy but you only get a new dump every 1 or 2 month. On the other hand XML is more robust while the
[...]
For the analysis of large volumes of texts doing it "live" isn't really an option anyway, I think. And being able to handle XML dumps is a good idea anyway :)
If you assume that one has to repeat this process repeatedly you are right. In our case we only need to run it _once_ (Ok, I admit: twice, because of some testing). After that we try to visit only those articles which have changed since, say, one or several days. We would even bare lowest priority while our process runs.
On the other hand: A dump could serve for this first 'full access', but only if it's a recent one... ('cause we'll try then to iterate only on the delta since the timestamp of the dump).
And: Pulling text out of the XML dump is not that easy really; needs lots of additional code (dumps from several tables and re-indexing, etc.) compared to online-access. And it's not repeatable up to now, as I'm aware, e.g. either the path/filename to the most recent dewiki dump needs to be constant or the online request of the XML dump should tell us its timestamp.
-- Stefan