2010-09-27 20:58, Chad skrev:
On Mon, Sep 27, 2010 at 1:42 PM, Aryeh Gregor Simetrical+wikilist@gmail.com wrote:
On Mon, Sep 27, 2010 at 3:38 AM, Andreas Jonsson andreas.jonsson@kreablo.se wrote:
Point me to one that has.
Maybe I'm wrong. I've never looked at them in depth. I don't mean to be discouraging here. If you can replace the MediaWiki parser with something sane, my hat is off to you. But if you don't receive a very enthusiastic response from established developers, it's probably because we've had various people trying to replace MediaWiki's parser with a more conventional one since like 2003, and it's never produced anything usable in practice. The prevailing sentiment is reflected pretty well in Tim's commit summary from shortly before giving you commit access:
http://www.mediawiki.org/wiki/Special:Code/MediaWiki/71620
Maybe we're just pessimistic, though. I'd be happy to be proven wrong!
This. Tim sums up the consensus very well with that commit summary. He also made some comments on the history of wikitext and alternative parsers on foundation-l back in Jan '09[0]. Worth a read (starting mainly at ""Parser" is a convenient and short name for it").
While a real parser is a nice pipe dream, in practice not a single project to "rewrite the parser" has succeeded in the years of people trying. Like Aryeh says, if you can pull it off and make it practical, hats off to you.
-Chad
[0] http://article.gmane.org/gmane.org.wikimedia.foundation/35876/
So, Tim are raising three objections against a more formalized parser:
1. Formal grammars are too restricted for wikitext.
My implementation represents a greater class of grammars than the class of context free grammars. I believe that this gives sufficient space for wikitext.
2. Previous parser implementation had performance issues.
I have not rigourusly tested the performance of my parser, but it is linear to the size of the input complexity and seems to be comparable to the original parser on plain text. Whith increasing amount of markup, the original parser seems to degrade in performance, while my implementation maintains a fairly constant speed, regardless of input. It is possible to construct malicous input that cause the performance my parser to be offset with a constant (the same content scanned up to 13 times). But this is not a situation that would occur on a normal page.
3. Some aspects of the existing parser follows well known parser algorithms, but is better optimized. In particular, the preprocessor.
My parser implementation does not preprocess the content. I acknowledge that preprocessing is better done by the current preprocessor. One just need to detangle the independent preprocessing (parser functions, transclusion, magic words etc.) from the parser preparation preprocessing (e.g., replacing <nowiki> ... </nowiki> with "magic" string).
Regarding optimization, it doesn't matter that the current parser is "optimized" if my unoptimized implementation outperforms the existing optimized one.
/Andreas