The April run of the english history dumps is incomplete. There is at
least one file that will need to be regenerated. When it's ready I'll
send an email update. I expect a delay of 4-5 days for that.
I'm a student looking to work on MediaWiki during this year's Google
Summer of Code, and one of the ideas I've been interested in is in
various formats for the data dumps (and dump work in general).
How useful would dumps from wikipedia be, if they were in sqlite
databases? Would it be useful to have all the dumps as sqlite
(history, stubs, current, etc)? Or are there certain dumps (current,
for example) which would be very useful as databases?
The dumps wouldn't be direct dumps from the mysql database (unlike the
old SQL Dumps) - they'll be in a format optimized for data processing
and imports. I'll also write supporting code such as libraries for
reading the databases, etc.
What do you folks think?
Yuvi Panda T
Observant dump watchers will notice that history piece 9 of the April en
wikipedia dumps suddenly got tiny. That's because I restarted it. The
compressor had gone out to lunch. I'll keep an eye out the next few
days and see what happens; the rest continue to run uninterrupted.
it seems to me, that you have to start each big dump manually. At the
moment there are only two big dumps produced, de and ru. pl is finished,
but no new dump is started. Couldn't you start they automatically like
the non-big dumps.