Note: I cross-posted this to several lists, because I think this is of interest to many; please reply on wikitech-l only.
A long, long time ago, I started writing a PHP script to convert MediaWiki markup into XML. I believe it is now feature-complete and relatively reliable. Not only can it process a single wiki text, but a list of articles, taking the text from any MediaWiki-based site online. It uses the same method to replace templates.
The generated XML can now be converted into other formats. For demonstration [1], I offer "plain text" and DocBook XML.
What I cannot demonstrate (due to limitations of my hosting service) is the subsequence conversion to HTML or PDF from the DocBook XML. However, it is quite easy to set up an automatic conversion locally if you have the necessary DocBook files installed.
As an example, I have generated a PDF [2] by 1. Entering the titles of the articles I want to have 2. Chosing "DocBook PDF" as output format 3. Clicking "Convert" 4. Waiting for the PDF to open Really, that easy! :-)
I am well aware of some shortcomings of the example PDF, however, most of them (no left margin, gigantic tables, misshaped images) are flaws of DocBook, or of the default stylesheets I use. I'm not really familiar with DocBook and hope for help by people that are.
While the converter seems to work pretty well, I'm sure there are lots of fun bugs to find. If you do find a page that breaks, please mail me the title so I can find the bug, or even better, fix it yourself! The code is in CVS, "wiki2xml" module, "php" directory (ignore the old C code in the main directory;-)
A word about speed: Yes, the process of creating a PDF takes some time. However, most of it is DocBook at work, and of course the loading times for articles and templates. Converting the example from wiki markup to XML to DocBook XML to PDF takes 2 minutes 20 seconds total, but the actual conversion wiki-to-XML is done in just 8 seconds.
Apart from bug fixing, my next priority is ODT (OpenOffice) format output. Also, I would like to extend Special:Export in MediaWiki so it can return a list of authors, which can then be added automagically to all converted files.
Awaiting your feedback, Magnus
[1] http://magnusmanske.de/wiki2xml/w2x.php [2] http://magnusmanske.de/wiki2xml/Biology_topics.pdf (3.7 MB!)
mediawiki-l@lists.wikimedia.org