[Wikitext-l] "When Is A New Parser Done?" ( was: Re: [Wikitech-l] Parser bugs and their priority)

Jay Ashworth jra at baylink.com
Sat Jul 9 21:54:57 UTC 2011


[cross-posted]

----- Original Message -----
> From: "Mark A. Hershberger" <mhershberger at wikimedia.org>

> I suppose these are all linked to the parser work that Brion & co are
> currently working on, but the arrival of the new parser 6 months to a
> year or more away (http://www.mediawiki.org/wiki/Future/Parser_plan ),
> I'd like to get these sort of parser issues sorted out now.

My particular hobby horse, the last time that {wikitext-l was really active,
I was involved with it heavily} (those are nearly identical, but not quite)
was this question, which that wiki page does not seem to address, but
the Etherpad might.  If not, I still think it's a question that's fundamental
to the implementation of a replacement parser, so I'm going to ask it again
so everyone's thinking about it as work progresses down that path:

How good is good enough?

How many pages is a replacement parser allowed to break, and still be
certified?

That is: what is the *real* spec for mediawikitext?  If we say "the formal
grammar", then we are *guaranteed* to break some articles.  That's the
"Right Answer", from up here at 40,000 feet, where I watch from (having
the luxury of not being responsible in any way for any of this :-), but
it will involve breaking some eggs.

I bring this back up because, the last time we had this conversation, the
answer was "nope; the new parser will have to be bug-for-bug compatible
with the current one".  Or something pretty close to that.

I just think this is a question -- and answer -- that people should be
slowly internalizing as we proceed down this path.

1) Formal Spec
2) Multiple Implementations
3) Test Suite

I don't think it's completely unreasonable that we might have a way to 
grind articles against the current parser, and each new parser, and diff
the output.  Something like that's the only way I can see that we *will*
be able to tell how close new parsers come, and on which constructs they
break (not that this means that I think The Wikipedia Corpus constitutes
a valid Test Suite :-).

Cheers,
-- jra

-- 
Jay R. Ashworth                  Baylink                       jra at baylink.com
Designer                     The Things I Think                       RFC 2100
Ashworth & Associates     http://baylink.pitas.com         2000 Land Rover DII
St Petersburg FL USA      http://photo.imageinc.us             +1 727 647 1274



More information about the Wikitext-l mailing list