Re: [Wikitech-l] EBNF grammar project status?

8 Nov 2007

      On 11/8/07, Simetrical Simetrical+wikilist@gmail.com wrote:
...

Now that we have a grammar, a yacc parser is compiled, and

appropriate rendering bits are added to get it to render to HTML.
People have already done this, at least once, haven't they? Do we have
a list of attempts?
3) The stuff the BNF grammar doesn't cover is tacked on with some
...
other methods.  In practice, it seems like a two-pass parser would be
ideal: one recursive pass to deal with templates and other
substitution-type things, then a second pass with the actual grammar
of most of the language.  The first pass is of necessity recursive, so
there's probably no point in having it spend the time to repeatedly
parse italics or whatever, when it's just going to have to do it again
when it substitutes stuff in.  Further rendering passes are going to
be needed, e.g., to insert the table of contents.  Further parsing
passes may or may not be needed.
Ouch, now you're up to about 4 passes, which isn't far off the current
version. Two passes would be good, like a C compiler:
once for meta-markup (templates, parser
functions), and once for content. Would it be possible to perhaps have
an in-place pattern-based parser for the first phase, then a proper
recursive descent for the content?
Unfortunately the deliberate apparent similarity of lots of very different
language features ({{foo}} vs {{foo:blah}}, [[Project:Link]] vs
[[Category:Link]] etc) makes much of this very complex.
I guess there's no possibility of making wholesale changes to the grammar
then implementing a migration script?
4) All of this breaks a thousand different corner cases and half the
...
parser tests.  The implementers carefully go through every failed
parser test, rewrite it to the actual output, and carefully justify
why this is the correct course of action.  Or just assume it is,
depending on the level of care.
Sounds good to me.  I wonder also if there is any chance of implementing two
parsers and migrating slowly from one to the next. Perhaps all Wikipedia
pages starting with Ab... could be rendered with the new parser while others
use the old? Pages using the new parser could have a warning displayed like
"Are there problems with the way the content is displayed? Click here...".
And wait for people to actually report perceived problems - as opposed to
the page failing a regression test.
5) A PHP implementation of the exact same grammar is implemented.  How
...
practical this is, I don't know, but it's critical unless we want
pretty substantially different behavior for people using the PHP
module versus not.  It is not acceptable to force third parties to use
a PHP module, nor to grind their parser to a halt (which a naive
compilation of the grammar into PHP would probably do).
Wasn't there a move to get away from PHP for the parser? Is that not
feasible?
6) Everything is rolled out live.  Pages break left and right.  Large
...
complaint threads are started on the Village Pump, people fix it, and
everyone forgets about it.  Developers get a warm fuzzy feeling for
having finally succeeded at destroying Parser.php.
I have trouble picturing this. It could be horrendous. But if it could
be managed so there were perhaps a few dozen complaints a day and not
more, that might be doable.
This is if it's to be done properly.  A semi-formal specification
...
that's not directly useful for parsing pages would involve a lot less
work and perhaps correspondingly less benefit.  It could still improve
operability with third parties dramatically; perhaps that's the only
goal other people have in mind, not the ability to compile a parser
with some yacc equivalent.  I don't know.
The parser moves though. I don't see a semi-formal grammar which isn't used
for anything keeping pace.
Steve

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] EBNF grammar project status?