John Grohol wrote:
Hi, excuse my ignorance... But after searching for information on this for the past 2 hours, I can't seem to find any FAQ or simple instruction set on what to do with with the new XML dumps provided at http://download.wikimedia.org/ The sole link provided on the page mentions nothing of the new XML format, or what to do with it. Searching through this mailing list hasn't shed much more light.
Well, you can do anything you like with it. ;) The XML source should be pretty self-explanatory about its contents; you should also find a description of the format at http://meta.wikimedia.org/wiki/Help:Export
If you want to import it into a MediaWiki 1.5 instance, the currently canonical way would be:
* Install a fresh wiki running MediaWiki 1.5rc4. * Set up AdminSettings.php if necessary; command-line access is required. * Pipe the file to importDump.php: gzip -dc pages_full.xml.gz | php importDump.php
Any help would be very much appreciated (even a pointer to a page that explains how to import these files into a 1.3.x or 1.4.xsystem...)!
I've been working on a utility for filtering the XML backup dump files; it'll be used as part of our backup procedure for creating filtered dumps as well as the complete ones.
For extra fun I've included an output mode to generate SQL statements in 1.4 or 1.5 schema which could be then read into an already-initialized database.
At the moment this is command-line only, and the SQL mode is basically completely untested. In the future it should be better tested and I may add a GUI front-end and a helper for creating and initializing a database.
The tool is written in C# and runs on the cross-platform Mono runtime, and should also run on .NET 1.1 if you're a Microsoft person. As it's new I don't have a binary release yet, but source and a makefile for compiling with Mono are in our CVS repository:
http://cvs.sourceforge.net/viewcvs.py/wikipedia/mwdumper/
You might run it something like this: gzip -dc pages_full.xml.gz | mono mwdumper.exe --format=sql:1.4 > curold.sql
Then import the SQL into an installed MediaWiki 1.4 database. (The cur and old tables should be _empty_ at this point; if using the installer there will be data in there, so delete it first.)
-- brion vibber (brion @ pobox.com)