Re: [Wikitech-l] Parser implementaton for MediaWiki syntax

28 Sep 2010


      2010-09-27 20:58, Chad skrev:
...
On Mon, Sep 27, 2010 at 1:42 PM, Aryeh Gregor
Simetrical+wikilist@gmail.com  wrote:
...
On Mon, Sep 27, 2010 at 3:38 AM, Andreas Jonsson
andreas.jonsson@kreablo.se  wrote:
...
Point me to one that has.
Maybe I'm wrong.  I've never looked at them in depth.  I don't mean to
be discouraging here.  If you can replace the MediaWiki parser with
something sane, my hat is off to you.  But if you don't receive a very
enthusiastic response from established developers, it's probably
because we've had various people trying to replace MediaWiki's parser
with a more conventional one since like 2003, and it's never produced
anything usable in practice.  The prevailing sentiment is reflected
pretty well in Tim's commit summary from shortly before giving you
commit access:
http://www.mediawiki.org/wiki/Special:Code/MediaWiki/71620
Maybe we're just pessimistic, though.  I'd be happy to be proven wrong!
This. Tim sums up the consensus very well with that commit summary.
He also made some comments on the history of wikitext and alternative
parsers on foundation-l back in Jan '09[0]. Worth a read (starting mainly
at ""Parser" is a convenient and short name for it").
While a real parser is a nice pipe dream, in practice not a single project
to "rewrite the parser" has succeeded in the years of people  trying. Like
Aryeh says, if you can pull it off and make it practical, hats off to you.
-Chad
[0] http://article.gmane.org/gmane.org.wikimedia.foundation/35876/
So, Tim are raising three objections against a more formalized parser:
1. Formal grammars are too restricted for wikitext.
My implementation represents a greater class of grammars than the
    class of context free grammars.  I believe that this gives
    sufficient space for wikitext.
2. Previous parser implementation had performance issues.
I have not rigourusly tested the performance of my parser, but it
    is linear to the size of the input complexity and seems to be
    comparable to the original parser on plain text.  Whith increasing
    amount of markup, the original parser seems to degrade in
    performance, while my implementation maintains a fairly constant
    speed, regardless of input.  It is possible to construct malicous
    input that cause the performance my parser to be offset with a
    constant (the same content scanned up to 13 times).  But this is
    not a situation that would occur on a normal page.
3. Some aspects of the existing parser follows well known parser
    algorithms, but is better optimized.  In particular, the
    preprocessor.
My parser implementation does not preprocess the content.  I
    acknowledge that preprocessing is better done by the current
    preprocessor.  One just need to detangle the independent
    preprocessing (parser functions, transclusion, magic words etc.)
    from the parser preparation preprocessing (e.g., replacing <nowiki>
    ... </nowiki> with "magic" string).
Regarding optimization, it doesn't matter that the current parser
    is "optimized" if my unoptimized implementation outperforms the
    existing optimized one.
/Andreas

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Parser implementaton for MediaWiki syntax