Dear Yuvi,
I do not think that getting the data in sqlite format is going to be very valuable. People
can already get the data in Mysql databases (although that is not that easy either) and so
getting it in sqlite will not give additional benefits in terms of querying capabilities.
I am also not sure if sqlite can handle such large databases.
What I do think might be valuable is to work on a text format (JSON, CSV) to store the
dumps. The reason is that we are looking at a Nosql datastore solution (for example
Hadoop) and storing the data in a non-xml but still text format is going to be really
useful.
Just my 2 cents.
Best,
Diederik
On 2011-03-28, at 6:10 PM, Yuvi Panda wrote:
I'm a student looking to work on MediaWiki during
this year's Google
Summer of Code, and one of the ideas I've been interested in is in
various formats for the data dumps (and dump work in general).
How useful would dumps from wikipedia be, if they were in sqlite
databases? Would it be useful to have all the dumps as sqlite
(history, stubs, current, etc)? Or are there certain dumps (current,
for example) which would be very useful as databases?
The dumps wouldn't be direct dumps from the mysql database (unlike the
old SQL Dumps) - they'll be in a format optimized for data processing
and imports. I'll also write supporting code such as libraries for
reading the databases, etc.
What do you folks think?
--
Yuvi Panda T
http://yuvi.in/
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l