-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
As some of you may know, a German company (DirectMedia) is currently working on publishing a snapshot of the German wikipedia on DVD. They use XML as their internal format.
To help them out, I patched together a database-to-XML converter. Turns out, though, that they've just written their own.
So, this converter is sitting there [1] without a job ;-)
If you have a Windoze installation running, with access to a MediaWiki (pre-1.5) database, you might want to try it. It takes the database parameters and generates a single xml file (out.xml in the installation directory) containing the XML of all the articles in the main namespace from that database. It can also automatically reslove templates, if desired. Installation and running is dead-easy.
It does *not* use the Flex/Bison-parser (I have developed a slight Bison allergy when exposed to it;-) but works quite well otherwise. It also generates XML extremely similar to what the Bison parser would do. I ran about 10000 articles from a de database dump, and the generated XML was valid.
The software is GPL, but I didn't upload the source anywhere yet. I can mail you source and instructions if desired.
A slight variation of this software might be quite useful a number of projects. For example, it could esaily be altered to take a number of article titles and generate XML only for these. The XML could then be converted to PDF or whatever.
Just letting you know, Magnus
[1] http://www.magnusmanske.de/stuff/StaticWikiInstaller.exe