I finally came around to work on the XML parser that has been promised
so long. Good news first: It already works so it can render real wiki
pages (in CVS HEAD).
Bad news: Still incredibly un-debugged and inclomplete.
For those of you who think "tell me when it works perfectly", you can
stop reading now ;-)
So, whatdoyaneed to test it? You'll need the "flexbisonparse" module
from CVS, which contains Timwis lexx/bison stuff. I didn't actually work
on the bison files so far. I couldn't get it to "make" on my linux box
(some zend libs not found or something), so I wrote a "Makefile.cli",
which you should rename to "Makefile". It will "make" a command-line
parser that can convert wikipedia markup to XML. One can pipe the wiki
markup in and gets the XML.
Then, follow the three-line instructions at the top of ParserXML.php. It
should work now.
The output will look strange, as it produces three copies of the article
text (the rendered XHTML, the "dumped" xml, and a structured xml tree)
as debug information. You can turn the debug information off by editing
the very end of the ParserXML.php file (you'll see where).
As I had some trouble passing the wiki markup to the command-line
version of the wiki2xml parser, I currently create a temporary file,
pipe that into the parser, catch the output, and remove the temporary
file again. I am aware that this is incredibly ugly, and that Timwi's
default makefile, creating a shared PHP object and passing the data
through there, is a lot cooler (and faster) than mine. However:
1. As I said, it didn't compile on my box, so I guess I'll not be the
only one; compiling the cli version should work everywhere
2. The shared object thingy limits the use to MediaWiki
My thoughts to #2: I hope that the wiki2xml parser will be beneficial to
many projects. One thing I thought of is a C/C++ program that can
directly access an SQL dump, read the articles, have them parsed to XML,
and then written as whatever-you-like: XHTML (static versions), PDF
(WikiReader), DigiBib format (another XML format for the German CD).
As for the XML2XHTML parser itself: basic wiki markup, tables, links,
images, basic html, nowiki are working. *Not* working is template
inclusion, and things I didn't think of ;-)
I probably won't get much more done this week. But, I'll be at the
Berlin conference, so we can talk about this, or even have a hacking
session...
Magnus