For folks who have not been following the saga on
http://wikitech.wikimedia.org/view/Dataset1
we were able to get the raid array back in service last night on the XML
data dumps server, and we are now busily copying data off of it to
another host. There's about 11T of dumps to copy over; once that's done
we will start serving these dumps read-only to the public again.
Because the state of the server hardware is still uncertain, we don't
want to do anything that might put the data at risk until that copy has
been made.
The replacement server is on order and we are watching that closely.
We have also been working on deploying a server to run one round of
dumps in the interrim.
Thanks for your patience (which is a way of saying, I know you are all
out of patience, as am I, but hang on just a little longer).
Ariel
I've just discovered this and thought other people parsing MediaWiki
dump files could also benefit.
Dump files contain a version number in the root element eg:
<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.3/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.3/http://www.mediawiki.org/xml/export-0.3.xsd" version="0.3"
xml:lang="vo">
A description of the changes to the format can be found at
http://www.mediawiki.org/xml/export-0.4.xsd :
Version 0.2 adds optional basic file upload info support,
which is used by our OAI export/import submodule.
Version 0.3 adds some site configuration information such
as a list of defined namespaces.
Version 0.4 adds per-revision delete flags, log exports,
discussion threading data, and a per-page redirect flag.
Notice that per-page redirect flags are documented to to begin with
version 0.4 dump files.
In fact the per-page redirect flag seems to be used from 28 July 2009
and can be found in dump files marked as version 0.3
Given this there are surely other features which occur in versions
earlier than documented so it would be wise to allow for this when
parsing dump files rather than relying on the version declaration.
Andrew Dunbar (hippietrail)
I'm trying to import the categorylinks table.
After MySQL finishes the inserts, it reaches the line:
ALTER TABLE `categorylinks` ENABLE KEYS
By doing a "show full processlist" I can see that MySQL is running this
doing "Repair with keycache". From what I've read about this online, it
takes 20-30 times longer to build the indexes this way than using "Repair by
sorting".
Any ideas how to get around this?