Re: [Wikitech-l] A Modest Proposal on grammar and parsers

13 Nov 2007

      ...
...

Should be implemented in the same language (i.e. PHP) so that any

...
comparisons
  are comparing-apples-with-applies, and so that it can run on the current
installed
  base of servers as-is. Having other implementations in other languages
is fine
  (e.g. you could have a super-fast version in C too) just provide one in
PHP that can
  be directly compared with the current parser for performance and
  backwards-compatibility.
That condition seems bizarre. The parser is either faster or it's
slower. Whether it's faster because it's implemented in C is
irrelevant: it's faster.
In any case I thought it had been decided that it had to be in PHP?
No, I think if we can get a 20:1 speedup for a C version, they'd take
it.  :-0
I don't doubt it in the case of most large wiki farms - but numerically most
installations of MediaWiki are on small wikis, probably running on shared hosts,
and in those situations using a C-based parser is either not possible, or significantly 
more complicated than running a PHP script. So for those installs, if the speed of a PHP
parser suddenly gets much worse, then I expect those admins would complain. So whilst a
faster parser is a faster parser, if it requires running code that you can't run, then it
ain't going to do you much good. A custom super-fast wiki-farm parser is great, but the 
general-case parser should have similar performance characteristics and the same
software requirements (i.e. the test is that nobody should be noticeably worse off).
...
...

Should have a worst-case render time no worse than 2x slower on any given

...
input.
Any given? That's not reasonable. Perhaps "Any given existing
Wikipedia page"? It would be too easy to find some
construct that is rendered quickly by the existing parser but is slow
with the new one, then create a page that contained
5000 examples of that construct.
Sure; pathological cases are always possible.  Let's say "on any 10
randomly chosen already extant pages of wikitext."
The current parser (from my perspective) seems to cope quite well with malformed
input. So all I'm saying is that if a replacement parser could behave similarly
then that would be good - although I take your point that the input that is 
considered pathological could be different for different parsers, so let's say that
the render time on randomly generated malformed input should be equivalent on average.
...
The English Wikipedia does an provide excellent environment to test the
English language environment. It does not do the same for other languages.
Remember that MediaWiki supports over 250 languages?
Indeed - it's only intended as a test for performance and most functionality. For
a more complete compatibility test with a variety of languages, you'd probably need
to test against all the database dumps at: http://download.wikimedia.org/
...
...

When running parserTests should introduce a net total of no more than

(say) 2 regressions (e.g. if you break 5 parser tests, then you have to fix
  3 or more  parser tests that are currently broken).
I'm not familiar enough with the current set of tests to comment on that.
The core tests are in maintenance/parserTests.txt
( http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/maintenance/parserTes... )
and generally follow a structure with name of the test, wiki text input, and the
expected XHTML output, for example:
!! test
Preformatted text
!! input
 This is some
 Preformatted text
 With ''italic''
 And '''bold'''
 And a [[Main Page|link]]
!! result
<pre>This is some
Preformatted text
With <i>italic</i>
And <b>bold</b>
And a <a href="/wiki/Main_Page" title="Main Page">link</a>
</pre>
!! end
It's probably a pretty good place to start with writing a parser, in terms of what
the expected behaviour is. Then probably after that comes testing against user-generated
input versus the current parser.
-- All the best,
Nick.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] A Modest Proposal on grammar and parsers