Date: Tue, 16 Feb 2010 09:34:41 -0800 From: Brion Vibber brion@pobox.com Subject: Re: [Wikitech-l] [mwdumper] new maintainer? To: wikitech-l@lists.wikimedia.org Message-ID: hlekvf$nl0$1@ger.gmane.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed
On 2/16/10 7:03 AM, Jamie Morken wrote:
Ok, the simple question: how many people prefer XML or sql dumps?
I think we have a FAQ on this...
http://meta.wikimedia.org/wiki/Download#What_happened_to_the_SQL_dumps.3F
You *do* realize that such "SQL dumps" would have to be invented from whole cloth and couldn't just be dumped from the actual databases, right?
The raw databases include dozens of alternate clusters and have data from different revisions compressed together, including deleted items and private data, and can't simply be released by WMF even if someone actually wanted to figure out how to replicate Wikimedia's exact storage cluster layout to do a data import.
Most likely if they were created they'd simply be created by running the xml through a tool like mwdumper...
-- brion
Hi Brion,
I have not tried mwdumper yet, I have been looking at the various xml to sql conversion tools, and reading about people's use of them, but I will have to give it a try to see for myself, but it seems like an overly complex task to recreate an sql database in my opinion. Also when wikimedia dumps used to be in sql format I think there were less dump problems than there are now, although maybe the main issue is the growth of the file sizes. It is probably simpler to make an sql dump than an XML dump I bet, also the older mediawiki dumps were in sql format. For making the wikimedia dumps into sql directly I think the process would be to do sql database merge's and then make sure the private data is erased? This might be simpler than creating to XML and then using mwdumper to get back to sql. Also there is a bottleneck somewhere in the dump system (dump fails etc) maybe it is the XML part? I will get back to you after I try mwdumper and/or:
php importDump.php <17gigabytefail> :)
cheers, Jamie