John Grohol wrote:
Hi, excuse my ignorance... But after searching for
information on this for
the past 2 hours, I can't seem to find any FAQ or simple instruction set on
what to do with with the new XML dumps provided at
http://download.wikimedia.org/
The sole link provided on the page mentions nothing of the new XML format,
or what to do with it. Searching through this mailing list hasn't shed much
more light.
Well, you can do anything you like with it. ;) The XML source should be
pretty self-explanatory about its contents; you should also find a
description of the format at
http://meta.wikimedia.org/wiki/Help:Export
If you want to import it into a MediaWiki 1.5 instance, the currently
canonical way would be:
* Install a fresh wiki running MediaWiki 1.5rc4.
* Set up AdminSettings.php if necessary; command-line access is required.
* Pipe the file to importDump.php:
gzip -dc pages_full.xml.gz | php importDump.php
Any help would be very much appreciated (even a
pointer to a
page that explains how to import these files into a 1.3.x or 1.4.xsystem...)!
I've been working on a utility for filtering the XML backup dump files;
it'll be used as part of our backup procedure for creating filtered
dumps as well as the complete ones.
For extra fun I've included an output mode to generate SQL statements in
1.4 or 1.5 schema which could be then read into an already-initialized
database.
At the moment this is command-line only, and the SQL mode is basically
completely untested. In the future it should be better tested and I may
add a GUI front-end and a helper for creating and initializing a database.
The tool is written in C# and runs on the cross-platform Mono runtime,
and should also run on .NET 1.1 if you're a Microsoft person. As it's
new I don't have a binary release yet, but source and a makefile for
compiling with Mono are in our CVS repository:
http://cvs.sourceforge.net/viewcvs.py/wikipedia/mwdumper/
You might run it something like this:
gzip -dc pages_full.xml.gz | mono mwdumper.exe --format=sql:1.4 > curold.sql
Then import the SQL into an installed MediaWiki 1.4 database. (The cur
and old tables should be _empty_ at this point; if using the installer
there will be data in there, so delete it first.)
-- brion vibber (brion @
pobox.com)