2010-09-27 20:58, Chad skrev:
On Mon, Sep 27, 2010 at 1:42 PM, Aryeh Gregor
<Simetrical+wikilist(a)gmail.com> wrote:
On Mon, Sep 27, 2010 at 3:38 AM, Andreas Jonsson
<andreas.jonsson(a)kreablo.se> wrote:
Point me to one that has.
Maybe I'm wrong. I've never looked at them in depth. I don't
mean to
be discouraging here. If you can replace the MediaWiki parser with
something sane, my hat is off to you. But if you don't receive a very
enthusiastic response from established developers, it's probably
because we've had various people trying to replace MediaWiki's parser
with a more conventional one since like 2003, and it's never produced
anything usable in practice. The prevailing sentiment is reflected
pretty well in Tim's commit summary from shortly before giving you
commit access:
http://www.mediawiki.org/wiki/Special:Code/MediaWiki/71620
Maybe we're just pessimistic, though. I'd be happy to be proven wrong!
This. Tim sums up the consensus very well with that commit summary.
He also made some comments on the history of wikitext and alternative
parsers on foundation-l back in Jan '09[0]. Worth a read (starting mainly
at ""Parser" is a convenient and short name for it").
While a real parser is a nice pipe dream, in practice not a single project
to "rewrite the parser" has succeeded in the years of people trying. Like
Aryeh says, if you can pull it off and make it practical, hats off to you.
-Chad
[0]
http://article.gmane.org/gmane.org.wikimedia.foundation/35876/
So, Tim are raising three objections against a more formalized parser:
1. Formal grammars are too restricted for wikitext.
My implementation represents a greater class of grammars than the
class of context free grammars. I believe that this gives
sufficient space for wikitext.
2. Previous parser implementation had performance issues.
I have not rigourusly tested the performance of my parser, but it
is linear to the size of the input complexity and seems to be
comparable to the original parser on plain text. Whith increasing
amount of markup, the original parser seems to degrade in
performance, while my implementation maintains a fairly constant
speed, regardless of input. It is possible to construct malicous
input that cause the performance my parser to be offset with a
constant (the same content scanned up to 13 times). But this is
not a situation that would occur on a normal page.
3. Some aspects of the existing parser follows well known parser
algorithms, but is better optimized. In particular, the
preprocessor.
My parser implementation does not preprocess the content. I
acknowledge that preprocessing is better done by the current
preprocessor. One just need to detangle the independent
preprocessing (parser functions, transclusion, magic words etc.)
from the parser preparation preprocessing (e.g., replacing <nowiki>
... </nowiki> with "magic" string).
Regarding optimization, it doesn't matter that the current parser
is "optimized" if my unoptimized implementation outperforms the
existing optimized one.
/Andreas