Marco Schuster schrieb:
Fetch them from the toolserver (there's a tool by duesentrieb for that). It will catch almost all of them from the toolserver cluster, and make a request to wikipedia only if needed.
I highly doubt this is "legal" use for the toolserver, and I pretty much guess that 800k revisions to fetch would be a huge resource load.
Thanks, Marco
PS: CC-ing toolserver list.
It's a legal use, the only problem is that the tool i wrote for is is quite slow. You shouldn't hit it at full speed. So it might actually be better to query the main server cluster, they can distribute the load more nicely.
One day i'll rewrite WikiProxy and everything will be better :)
But by then, i do hope we have revision flags in the dumps. because that would be The Right Thing to use.
-- daniel