Re: [Wikitech-l] Acceptable use of API

24 Sep 2010


      Hi,
Thanks for the quick answers, and for the useful link.
My previous e-mail was not detailed enough; sorry about that. Let me
clarify:
- I don't need to crawl the entire Wikipedia, only (for example) articles in
a category. ~1,000 articles would be a good start, and I definitely won't be
going above ~40,000 articles.
- For every article in the data set, I need to follow every interlanguage
link, and get the article creation date (i.e. creation date of [[en:Brad
Pitt]], [[fr:Brad Pitt]], [[it:Brad Pitt]], etc). As far as I can tell, this
means that I need one query for every language link.
The data are reasonably easy to get through the API. If my queries risk
overloading the server, I am obviously happy to go through the toolserver
(once my account gets approved!).
Robin Ryder
----
Postdoctoral researcher
CEREMADE - Paris Dauphine and CREST - INSEE
...
On 24.09.2010, 14:32 Robin wrote:
...
I would like to collect data on interlanguage links for academic research
purposes. I really do not want to use the dumps, since I would need to
download dumps of all language Wikipedias, which would be huge.
I have written a script which goes through the API, but I am wondering
how
...
...
often it is acceptable for me to query the API. Assuming I do not run
parallel queries, do I need to wait between each query? If so, how long?
Crawling all the Wikipedias is not an easy task either. Probably,
toolserver.org would be more suitable. What data do you need, exactly?
--
Best regards,
  Max Semenik ([[User:MaxSem]])

Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Acceptable use of API