James Linden wrote:
Why do you
need to access the live wikipedia for this?
Using categorylinks.sql and page.sql you should be able to fetch the
same data. Probably faster.
In my research, the answer to this question is two-fold
A) Creating a local copy of wikipedia (using mediawiki and various
import tools) is quite a process, and requires a significant
investment of time and research unto itself.
You don't need to do a full copy to eg. fetch infoboxes.
B) A few months ago, I pulled 333 semi-random articles
from the live
API -- of those, 329 of them have significant enough changes since
20100312 dump (which was the newest dump at the time). A new check
against the 20110115 dump has similar percentage.
Getting updated data may be a reason, but I don't think that's what
Ramesh wanted.
Plus, you wanted 333 articles, not the 3 million...