Date: Tue, 16 Feb 2010 09:34:41 -0800
From: Brion Vibber <brion(a)pobox.com>
Subject: Re: [Wikitech-l] [mwdumper] new maintainer?
To: wikitech-l(a)lists.wikimedia.org
Message-ID: <hlekvf$nl0$1(a)ger.gmane.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
On 2/16/10 7:03 AM, Jamie Morken wrote:
Ok, the simple question: how many people prefer XML or
sql dumps?
I think we have a FAQ on this...
http://meta.wikimedia.org/wiki/Download#What_happened_to_the_SQL_dumps.3F
You *do* realize that such "SQL dumps" would have to be invented from
whole cloth and couldn't just be dumped from the actual databases, right?
The raw databases include dozens of alternate clusters and have data
from different revisions compressed together, including deleted items
and private data, and can't simply be released by WMF even if someone
actually wanted to figure out how to replicate Wikimedia's exact storage
cluster layout to do a data import.
Most likely if they were created they'd simply be created by running the
xml through a tool like mwdumper...
-- brion
Hi Brion,
I have not tried mwdumper yet, I have been looking at the various xml to sql conversion
tools, and reading about people's use of them, but I will have to give it a try to see
for myself, but it seems like an overly complex task to recreate an sql database in my
opinion. Also when wikimedia dumps used to be in sql format I think there were less dump
problems than there are now, although maybe the main issue is the growth of the file
sizes. It is probably simpler to make an sql dump than an XML dump I bet, also the older
mediawiki dumps were in sql format. For making the wikimedia dumps into sql directly I
think the process would be to do sql database merge's and then make sure the private
data is erased? This might be simpler than creating to XML and then using mwdumper to get
back to sql. Also there is a bottleneck somewhere in the dump system (dump fails etc)
maybe it is the XML part? I will get back to you after I try mwdumper and/or:
php importDump.php <17gigabytefail> :)
cheers,
Jamie