-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Wed, Jan 28, 2009 at 1:13 AM, Daniel Kinzler wrote:
Marco Schuster schrieb:
Fetch them from the toolserver (there's a tool by duesentrieb for that). It will catch almost all of them from the toolserver cluster, and make a request to wikipedia only if needed.
I highly doubt this is "legal" use for the toolserver, and I pretty much guess that 800k revisions to fetch would be a huge resource load.
Thanks, Marco
PS: CC-ing toolserver list.
It's a legal use, the only problem is that the tool i wrote for is is quite slow. You shouldn't hit it at full speed. So it might actually be better to query the main server cluster, they can distribute the load more nicely.
What is the best speed, actually? 2 requests per second? Or can I go up to 4?
One day i'll rewrite WikiProxy and everything will be better :)
:)
But by then, i do hope we have revision flags in the dumps. because that would be The Right Thing to use.
Still, using the dumps would require me to get the full history dump because I only want flagged revisions and not current revisions without the flag.
Marco