On Fri, Sep 24, 2010 at 1:19 PM, Max Semenik <maxsem.wiki(a)gmail.com> wrote:
On 24.09.2010, 14:32 Robin wrote:
I would like to collect data on interlanguage
links for academic research
purposes. I really do not want to use the dumps, since I would need to
download dumps of all language Wikipedias, which would be huge.
I have written a script which goes through the API, but I am wondering
how
often it is acceptable for me to query the API.
Assuming I do not run
parallel queries, do I need to wait between each query? If so, how long?
Crawling all the Wikipedias is not an easy task either. Probably,
toolserver.org would be more suitable. What data do you need, exactly?
Full dumps are not required for retrieving interlanguage links.
For example, the last fr dump contains a dedicated file for them :
It will be a lot faster to download this file (only 75M) than making more
than 1 million calls to the API for the fr wiki.
Nico