Hello, <span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; border-collapse: collapse; ">Andreas,</span><div><font class="Apple-style-span" face="arial, sans-serif"><span class="Apple-style-span" style="border-collapse: collapse;"><br>
</span></font></div><div><font class="Apple-style-span" face="arial, sans-serif"><span class="Apple-style-span" style="border-collapse: collapse;">I am interesting with your project.</span></font></div><div><font class="Apple-style-span" face="arial, sans-serif"><span class="Apple-style-span" style="border-collapse: collapse;"><br>
</span></font></div><div><font class="Apple-style-span" face="arial, sans-serif"><span class="Apple-style-span" style="border-collapse: collapse;">But I can not download the source, could you send it to me via mail (mingli.yuan AT <a href="http://gmail.com">gmail.com</a>)</span></font></div>
<div><font class="Apple-style-span" face="arial, sans-serif"><span class="Apple-style-span" style="border-collapse: collapse;"><br></span></font></div><div><font class="Apple-style-span" face="arial, sans-serif"><span class="Apple-style-span" style="border-collapse: collapse;">Thanks a lot.</span></font></div>
<div><font class="Apple-style-span" face="arial, sans-serif"><span class="Apple-style-span" style="border-collapse: collapse;"><br></span></font></div><div><font class="Apple-style-span" face="arial, sans-serif"><span class="Apple-style-span" style="border-collapse: collapse;">Regards,</span></font></div>
<div><font class="Apple-style-span" face="arial, sans-serif"><span class="Apple-style-span" style="border-collapse: collapse;">Mingli<br></span></font><br><div class="gmail_quote">On Wed, Aug 4, 2010 at 6:10 AM, Andreas Jonsson <span dir="ltr"><<a href="mailto:andreas.jonsson@kreablo.se">andreas.jonsson@kreablo.se</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Hello,<br>
<br>
I am initiating yet another attempt at writing a new parser for<br>
MediaWiki. It seems that more than six month have passed since the<br>
last attempt, so it's about time. :)<br>
<br>
Parser functions, magic words and html comments are better handled by<br>
a preprocessor than trying to integrate them with the parser (at least<br>
if you want preserve the current behavior). So I am only aiming at<br>
implementing something that can be plugged in after the preprocessing<br>
stages.<br>
<br>
In the wikimodel project (<a href="http://code.google.com/p/wikimodel/" target="_blank">http://code.google.com/p/wikimodel/</a>) we are<br>
using a parser design that works well for wiki syntax; a front end<br>
(implemented using an LL-parser generator) scans the text and feeds<br>
events to a context object, which can be queried by the front end to<br>
enable context sensitive parsing. The context object will in turn<br>
feed a well formed sequence of events to a listener that may build a<br>
tree structure, generate xml, or any other format.<br>
<br>
As of parser generators, Antlr seems to be the best choice. It have<br>
support for semantic predicates and rather sophisticated options for<br>
backtracking. I'm peeking at Steve Bennet's antlr grammar<br>
(<a href="http://www.mediawiki.org/wiki/Markup_spec/ANTLR" target="_blank">http://www.mediawiki.org/wiki/Markup_spec/ANTLR</a>), but I cannot really<br>
use that one, since the parsing algorothm is fundamentally different.<br>
<br>
There are two problems with Antlr:<br>
<br>
1. No php back-end<br>
<br>
Writing a php back-end to antlr is a matter of providing a set of<br>
templates and porting the runtime. It's a lot of work, but seems<br>
fairly straightforward.<br>
<br>
The parser can, of course, be written in C and be deployed as a php<br>
extension. The drawback is that it will be harder to deploy it,<br>
while the advantage is the performance. For MediaWiki it might be<br>
worth to maintain both a php and a C version though, since both<br>
speed and deployability are important.<br>
<br>
2. No UTF-8 support in the C runtime in the latest release of antlr.<br>
<br>
In trunk it has support of various character encodings,though, so<br>
it will probably be there in the next release.<br>
<br>
My implementation is just at the beginning stages, but I have<br>
successfully reproduced the exact behavior of MediaWiki's parsing of<br>
apostrophes, which seems to be by far the hardest part. :)<br>
<br>
I put it up right here if anyone is interested at looking at it:<br>
<br>
<a href="http://kreablo.se:8080/x/bin/download/Gob/libmwparser/libwikimodel%2D0.1.tar.gz" target="_blank">http://kreablo.se:8080/x/bin/download/Gob/libmwparser/libwikimodel%2D0.1.tar.gz</a><br>
<br>
<br>
Best regards,<br>
<br>
Andreas Jonsson<br>
<br>
_______________________________________________<br>
Wikitext-l mailing list<br>
<a href="mailto:Wikitext-l@lists.wikimedia.org">Wikitext-l@lists.wikimedia.org</a><br>
<a href="https://lists.wikimedia.org/mailman/listinfo/wikitext-l" target="_blank">https://lists.wikimedia.org/mailman/listinfo/wikitext-l</a><br>
</blockquote></div><br></div>