On 3/10/11 6:29 AM, Paul Houle wrote:
I can say, positively, that you'll get the job done faster by
downloading the dump file and cracking into it directly. I've got scripts that can download and extract stuff from the XML dump in an hour or so. I still have some processes that use the API, but I'm increasingly using the dumps because it's faster and easier.
You're likely correct - also I've recently been exposed to the 'wikipedia offline patch' extension (http://code.google.com/p/wikipedia-offline-patch/) which I believe allows you to use a compressed dump as your db stroage - saving you the pain/space of uncompressing a dump file. Probably worth a look.
Arthur