2010/9/24 Robin Ryder robin.ryder@ensae.fr:
- I don't need to crawl the entire Wikipedia, only (for example) articles in
a category. ~1,000 articles would be a good start, and I definitely won't be going above ~40,000 articles.
- For every article in the data set, I need to follow every interlanguage
link, and get the article creation date (i.e. creation date of [[en:Brad Pitt]], [[fr:Brad Pitt]], [[it:Brad Pitt]], etc). As far as I can tell, this means that I need one query for every language link.
Unfortunately, this is true. You can't use a generator because those don't work with interwiki titles, and you can't query multiple titles in one request because prop=revisions only allows that in get-only-the-latest-revision mode (and you want the earliest revision).
Hitting the API repeatedly without waiting between requests and without making parallel requests is considered acceptable usage AFAIK, but I do think that the Toolserver would better suit your needs.
Roan Kattouw (Catrope)