Re: [Wikitext-l] any new progress of the parser?

14 Jul 2008


      -----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Mingli
I guess everyone gave up on the dream of being able to define the
current syntax in any sane, well-defined form  ;)
I tried to build a parser similar to flexbisonparse a while ago, using
flex and bison to create an XML parse tree. Of course, I failed
miserably after two weeks of work and went back to the Perl regex
monstrosity we use at the company. But I did find out the following
things which may be useful for any future efforts:
I believe it's wrong to attempt to create a single parser for MediaWiki
syntax (like flexbisonparse attempted). A better and much more simple
way is to define multiple formal grammars for each step in the parsing.
This way you can get around the problem when an xml-like tag is
constructed from different templates for example. My attempt included
separate flex/bison parsers for:
<noinclude>, <includeonly>, ... parts
templates transclusion (e.g. {{{ and {{, constructs)
text formatting
possibly more steps for tables, etc. but I didn't get this far.
The biggest problem defining these is graceful degradation on broken
input. It's not that hard to get the parser to work in simple, well
defined cases. But if you want to get anywhere near the way the current
parser degrades on ambiguous input the parser definitions start to grow
out of hand. And parsing speed ends up in the dumps. You're just trying
to cram context into a context-free grammar.
- From my observations I believe that the only possible way that any
formal grammar will replace the current PHP parser is if the MediaWiki
team is prepared to change the current philosophy of desperately trying
to make sense of any kind of broken string of characters the user
provides i.e. if MediaWiki could throw up a syntax error on invalid
input and/or they significantly reduce the number of valid constructs
(all horrible combinations of bold/italics markup come to mind)
Given my understanding of the project I find this extremely unlikely.
But then I'm not a MediaWiki developer, so I might be completely wrong here.
Best regards
Tomaž Šolc
- ---
Tomaž Šolc, Research & Development
Zemanta Ltd, London, Ljubljana
www.zemanta.com
mail: tomaz@zemanta.com
blog: http://www.tablix.org/~avian/blog
...PGP SIGNATURE...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIeygjyJ/LzBrnoEgRAjIxAKCnrR+nBb5R43c7nJc+JJbokQvojwCgjt7F
Y8+Pajt/fmGF4KO48SGSYAE=
=WL/Z
-----END PGP SIGNATURE-----

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: [Wikitext-l] any new progress of the parser?