I want to find out the length of a bunch of articles. I have earlier done this for the Swedish Wikipedia by importing the page.sql dump into a local MySQL instance, which works just fine.
But now that I try it for the English Wikipedia, the database import (of 10 million rows, averaging 94 bytes) appears to take somewhere between 24 and 48 hours (with keys disabled, I'm importing some 4500 rows per minute). This seems a bit unnecessary for just finding out the length of some 1000 articles. Especially if I want to do it again when the next dump becomes available. Is there some API on the toolserver, that I can use instead? Or should I consider retrieving the action=raw from the live server and just count the bytes? Where do I start?
I could even write a Perl script that parses the insert statements in page.sql and extracts the information I need, all in one pass. But this is not really why a MySQL dump is created.