Hi,
You don't need the full dumps. Look at (for example) the tr.wp dump
that is running at the moment:
you'll see the text dumps and also dumps of various SQL tables. Look
at the one that is labelled "Wiki interlanguage link records."
You ought to be able to reasonably download those for all of the
'pedias that you are interested in; it will certainly be better than
trawling with the API. They have (if I understand correctly what you
are asking) just the data you want.
Cheers,
Robert
On 9/24/10, Max Semenik <maxsem.wiki(a)gmail.com> wrote:
On 24.09.2010, 14:32 Robin wrote:
I would like to collect data on interlanguage
links for academic research
purposes. I really do not want to use the dumps, since I would need to
download dumps of all language Wikipedias, which would be huge.
I have written a script which goes through the API, but I am wondering how
often it is acceptable for me to query the API. Assuming I do not run
parallel queries, do I need to wait between each query? If so, how long?
Crawling all the Wikipedias is not an easy task either. Probably,
toolserver.org would be more suitable. What data do you need, exactly?
--
Best regards,
Max Semenik ([[User:MaxSem]])
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l