Thanks for the information about pywikipedia. Python is my favorite programming language. :)
My parser contains a lex file for scanning and a C++ file for parsing. Just like you, I wanted to use a grammar file for parsing, but it turned out to be difficult, so I resorted to manually write a parser to process the tokens returned by the scanner. This part is working now for basic mediawiki markups listed on http://www.mediawiki.org/wiki/Help:Formatting including links, images and tables, although the exact URL to produce for links and images may not be correct yet.
I'll find a server to post them later today.
cheers, Ping
On 8/2/07, Merlijn van Deen valhallasw@arctus.nl wrote:
"Ping Yeh" ping.nsr.yeh@gmail.com wrote: So, with an html formatter and a mediawiki parser (a draft version already exist), it can already show HTML for mediawiki contents.
For your information, and possibly some inspiration; I have been working on a python-based wikitext parser to be used with pywikipedia; the source is available at http://svn.wikimedia.org/viewvc/pywikipedia/trunk/pywikiparser/ ; How far has your parser been developed already, and how does it parse? I have been trying to fit wikitext into a grammar parseable by an LL(k) parser, but this was not as easy as it looked. Hence, I have started building a parser form scratch.
I'll attend the Hacking Days Extra in the afternoon tomorrow. Maybe I can show you what I have so far and get your comments. :)
Unfortunatly, not all of us are at wikimania ;) Have you got some on-line resource where we can find more information?
--valhallasw
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l