2010/9/24 Robin Ryder <robin.ryder(a)ensae.fr>fr>:
- I don't need to crawl the entire Wikipedia, only
(for example) articles in
a category. ~1,000 articles would be a good start, and I definitely won't be
going above ~40,000 articles.
- For every article in the data set, I need to follow every interlanguage
link, and get the article creation date (i.e. creation date of [[en:Brad
Pitt]], [[fr:Brad Pitt]], [[it:Brad Pitt]], etc). As far as I can tell, this
means that I need one query for every language link.
Unfortunately, this is true. You can't use a generator because those
don't work with interwiki titles, and you can't query multiple titles
in one request because prop=revisions only allows that in
get-only-the-latest-revision mode (and you want the earliest
revision).
Hitting the API repeatedly without waiting between requests and
without making parallel requests is considered acceptable usage AFAIK,
but I do think that the Toolserver would better suit your needs.
Roan Kattouw (Catrope)