Hello, all.
I'm posting here at the suggestion of Ævar Bjarmason, who let me know about the recent discussion here about the parser.
Recently I've been working on parsing wikitext for some projects of my own, and because I think that a general-purpose parser is just a good thing to have. In any case, I've made significant progress on a Perl implementation. I'll link to the code below, but what I think is more interesting is the english. As far as I can tell, it is possible to parse wikitext in a "single pass" fashion, and it's possible to do it quickly (more on that once I finish the last few features I need to enable benchmarking). In fact, I believe that if one is willing to forgo the TOC at the top, it's possible to parse and render incrementally (though for various reasons that's probably not such a great idea). With regard to http://www.usemod.com/cgi-bin/mb.pl?ConsumeParseRenderVsMatchTransform my code is of the "consume/parse/render" variety.
Anyway, I'll stop rambling on about it, and get to the point. At this point, I doubt that my code is clean enough or useful enough for anyone else to make use of it, but I'm mentioning it in case it provides any insight or grounds for discussion, or in case anyone would like to base work off of it or make suggestions.
The code is part of BerliOS project "wikioncd"; svn is at svn://svn.berlios.de/wikioncd/trunk/wikioncd (the parser, with a temporary driver, is in parser.pl); the ViewCVS for same is at http://svn.berlios.de/viewcvs/wikioncd/trunk/wikioncd/ .
Cheers Andrew
On 5/10/05, Andrew Rodland arodland@entermail.net wrote:
Recently I've been working on parsing wikitext for some projects of my own, and because I think that a general-purpose parser is just a good thing to have. In any case, I've made significant progress on a Perl implementation. I'll link to the code below, but what I think is more interesting is the english. As far as I can tell, it is possible to parse wikitext in a "single pass" fashion, and it's possible to do it quickly (more on that once I finish the last few features I need to enable benchmarking). In fact, I believe that if one is willing to forgo the TOC at the top, it's possible to parse and render incrementally (though for various reasons that's probably not such a great idea). With regard to http://www.usemod.com/cgi-bin/mb.pl?ConsumeParseRenderVsMatchTransform my code is of the "consume/parse/render" variety.
Anyway, I'll stop rambling on about it, and get to the point. At this point, I doubt that my code is clean enough or useful enough for anyone else to make use of it, but I'm mentioning it in case it provides any insight or grounds for discussion, or in case anyone would like to base work off of it or make suggestions.
The code is part of BerliOS project "wikioncd"; svn is at svn://svn.berlios.de/wikioncd/trunk/wikioncd (the parser, with a temporary driver, is in parser.pl); the ViewCVS for same is at http://svn.berlios.de/viewcvs/wikioncd/trunk/wikioncd/ .
Could you please add your parser implementation to this list: http://meta.wikimedia.org/wiki/Alternative_parsers
Andrew Rodland wrote:
Recently I've been working on parsing wikitext for some projects of my own, and because I think that a general-purpose parser is just a good thing to have. In any case, I've made significant progress on a Perl implementation.
I've taken a look at your code and I'm quite impressed. However, I have only seen the code; I would be very delighted if I could see the thing in action as well! Can you set up a webserver where I can just type/paste some wiki-text into a textarea and have your parser output the HTML? Once I can do this, I think I can send you some meaningful bugreports. :)
With regard to http://www.usemod.com/cgi-bin/mb.pl?ConsumeParseRenderVsMatchTransform my code is of the "consume/parse/render" variety.
First time I looked at your code I thought "Not really; it does use regular expressions, so it has bits of a "Match/Transform" algorithm too", but closer inspection reveals that you are not using regular expressions any more than a standard lexer would. Good work!
So, if you can set up a demo, I'd be most grateful :) Greetings, Timwi
Sure! Let me iron out a few more bugs / known issues, and then I'll do my best to set it up somewhere. I'd be happy to have someone else banging on it for me. :)
On 5/13/05, Timwi timwi@gmx.net wrote:
Andrew Rodland wrote:
Recently I've been working on parsing wikitext for some projects of my
own, and
because I think that a general-purpose parser is just a good thing to
have. In
any case, I've made significant progress on a Perl implementation.
I've taken a look at your code and I'm quite impressed. However, I have only seen the code; I would be very delighted if I could see the thing in action as well! Can you set up a webserver where I can just type/paste some wiki-text into a textarea and have your parser output the HTML? Once I can do this, I think I can send you some meaningful bugreports. :)
With regard to http://www.usemod.com/cgi-bin/mb.pl?ConsumeParseRenderVsMatchTransform my
code
is of the "consume/parse/render" variety.
First time I looked at your code I thought "Not really; it does use regular expressions, so it has bits of a "Match/Transform" algorithm too", but closer inspection reveals that you are not using regular expressions any more than a standard lexer would. Good work!
So, if you can set up a demo, I'd be most grateful :) Greetings, Timwi
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org