Re: [Wikitech-l] A Modest Proposal on grammar and parsers

12 Nov 2007


      On 11/12/07, Nick Jenkins nickpj@gmail.com wrote:
...

For the 1% that doesn't render the same, provide a list of what

constructs don't
  render the same, and an explanation of whether support for that
construct is
  planned to be added, or whether you think it should not be supported
because it's
  a corner-case or badly-thought-out construct, or something else.
That seems reasonable.
* Should be implemented in the same language (i.e. PHP) so that any
...
comparisons
  are comparing-apples-with-applies, and so that it can run on the current
installed
  base of servers as-is. Having other implementations in other languages
is fine
  (e.g. you could have a super-fast version in C too) just provide one in
PHP that can
  be directly compared with the current parser for performance and
  backwards-compatibility.
That condition seems bizarre. The parser is either faster or it's
slower. Whether it's faster because it's implemented in C is
irrelevant: it's faster.
In any case I thought it had been decided that it had to be in PHP?
* Should have a worst-case render time no worse than 2x slower on any given
...
input.
Any given? That's not reasonable. Perhaps "Any given existing
Wikipedia page"? It would be too easy to find some
construct that is rendered quickly by the existing parser but is slow
with the new one, then create a page that contained
5000 examples of that construct.
* Should use as much run-time memory as the current parser or less on
...
average, and
  no more than 2x more in the worst case.
As above.
* Any source code should be documented. The grammar used should be
...
documented.
  (since this is relates to the core driving reason for implementing a new
parser).
Err, yes. I have to say, the current parser is very nicely written and
very well commented.
* When running parserTests should introduce a net total of no more than
...
(say) 2
  regressions (e.g. if you break 5 parser tests, then you have to fix 3 or
more
  parser tests that are currently broken).
I'm not familiar enough with the current set of tests to comment on that.
(*) = I'm using the English Wikipedia here as a test corpus as it's a large
...
enough
body of work, written by enough people, that it's statistically useful
when comparing
average and worst-case performance and compatibility of wiki text as used
by people
in the real world. Any other large body of human-generated wikitext of
equivalent
size, with an equivalent number of authors, would do equally well for
comparison
purposes.
Ok I think that answers my concerns there.
Thanks for the feedback.
Steve
If you can provide an implementation that has the above characteristics, and
...
which has a documented grammar, then I think it's reasonable to assume
that
people would be willing to take a good look at that implementation.
...
I'm not sure who all the angry comments in
parser.php belong to
svn praise includes/Parser.php | less
-- All the best,
Nick.

Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wikitech-l

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] A Modest Proposal on grammar and parsers