Hello, looks like many people have done almost the same thing: I've also begun to write a parser. Mine is written is Python.
It's far away from completition, but I don't want to spend my time writing useless code so I am asking how we want to continue: I don't think it's wise to have four parsers doing actually the same. However, everbody can do what he/she wants, but I personally would rather like to work on one program than doing doubled, useless work. As Magnus Manske already pointed out, such a parser could be the base of several desired tools. We could write a library that could be used by these applications.
The difference between mine program and all the others AFAIK is the point how the wiki data actually get loaded. As someone suggested on meta.wikipedia.org I have written a file called 'raw.php' that retrieves the raw wiki data. In my opinion that's quite useful for offline-editing applications and such things.
You can get my parser, along with my adapted 'Article.php' and the file 'raw.php', here: www.fms-engel.de/buildHTML.tar.bz2 (Sorry for not providing a patch, when I find time, I'll do it)
I don't want to start a language-flamewar, but I probably prefer Magnus Manske's version. C++ is quite fast and there are several GUI libraries one can use for each platform to write nice GUIs. I did not take a look at his program, but I guess it's much more mature than, for example, mine.
I would really like to discuss this topic as I think a parser and the resulting possibilities of having one is one of the most wanted features, at least for me :) There is probably a better solution than having four parsers that do actually the same thing...
Regards, Frithjof