James Linden wrote:
Why do you need to access the live wikipedia for this? Using categorylinks.sql and page.sql you should be able to fetch the same data. Probably faster.
In my research, the answer to this question is two-fold
A) Creating a local copy of wikipedia (using mediawiki and various import tools) is quite a process, and requires a significant investment of time and research unto itself.
You don't need to do a full copy to eg. fetch infoboxes.
B) A few months ago, I pulled 333 semi-random articles from the live API -- of those, 329 of them have significant enough changes since 20100312 dump (which was the newest dump at the time). A new check against the 20110115 dump has similar percentage.
Getting updated data may be a reason, but I don't think that's what Ramesh wanted. Plus, you wanted 333 articles, not the 3 million...