Re: [Wikitech-l] A Modest Proposal on grammar and parsers

13 Nov 2007

You're right Steve - I missed the pound symbol in my retort.

But, some good came out of it.  These two constructs produce identical HTML:

;#what does: this render?
;#how about this?
;#or this?

;what does
:# this render
:# how about this?
:# or this?

So we conclude that:

;#A: B

is shorthand for:

;A
:# B

-- Jim

On Nov 12, 2007 7:54 PM, Nick Jenkins &lt;nickpj(a)gmail.com&gt; wrote:
...
  > > * Should be implemented in the same language
(i.e. PHP) so that any
 > > > comparisons
 > > >   are comparing-apples-with-applies, and so that it can run on the
current
 > > > installed
 > > >   base of servers as-is. Having other implementations in other languages
 > > > is fine
 > > >   (e.g. you could have a super-fast version in C too) just provide one in
 > > > PHP that can
 > > >   be directly compared with the current parser for performance and
 > > >   backwards-compatibility.
 > >
 > > That condition see ms bizarre. The parser is either faster or it's
...
    slower. Whether it's faster because it's
implemented in C is
 irrelevant: it's faster.

 In any case I thought it had been decided that it had to be in PHP? 
 No, I think if we can get a 20:1 speedup for a C version, they'd take
 it.  :-0 
 I don't doubt it in the case of most large wiki farms - but numerically most
 installations of MediaWiki are on small wikis, probably running on shared hosts,
 and in those situations using a C-based parser is either not possible, or significantly
 more complicated than running a PHP script. So for those installs, if the speed of a PHP
 parser suddenly gets much worse, then I expect those admins would complain. So whilst a
 faster parser is a faster parser, if it requires running code that you can't run,
then it
 ain't going to do you much good. A custom super-fast wiki-farm parser is great, but
the
 general-case parser should have similar performance characteristics and the same
 software requirements (i.e. the test is that nobody should be noticeably worse off).

   * Should
have a worst-case render time no worse than 2x slower on any given
  input. 
 Any given? That's not reasonable. Perhaps "Any given existing
 Wikipedia page"? It would be too easy to find some
 construct that is rendered quickly by the existing parser but is slow
 with the new one, then create a page that contained
 5000 examples of that construct. 
 Sure; pathological cases are always possible.  Let's say "on any 10
 randomly chosen already extant pages of wikitext." 
 The current parser (from my perspective) seems to cope quite well with malformed
 input. So all I'm saying is that if a replacement parser could behave similarly
 then that would be good - although I take your point that the input that is
 considered pathological could be different for different parsers, so let's say that
 the render time on randomly generated malformed input should be equivalent on average.

  The English Wikipedia does an provide excellent
environment to test the
 English language environment. It does not do the same for other languages.
 Remember that MediaWiki supports over 250 languages? 
 Indeed - it's only intended as a test for performance and most functionality. For
 a more complete compatibility test with a variety of languages, you'd probably need
 to test against all the database dumps at: http://download.wikimedia.org/

   * When
running parserTests should introduce a net total of no more than
  (say) 2 regressions (e.g. if you break 5 parser tests, then you have to fix
   3 or more  parser tests that are currently broken). 
 I'm not familiar enough with the current set of tests to comment on that. 
 The core tests are in maintenance/parserTests.txt
 (
http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/maintenance/parserTe…
)
 and generally follow a structure with name of the test, wiki text input, and the
 expected XHTML output, for example:

 !! test
 Preformatted text
 !! input
  This is some
  Preformatted text
  With ''italic''
  And '''bold'''
  And a [[Main Page|link]]
 !! result
 <pre>This is some
 Preformatted text
 With <i>italic</i>
 And <b>bold</b>
 And a <a href="/wiki/Main_Page" title="Main
Page">link</a>
 </pre>
 !! end

 It's probably a pretty good place to start with writing a parser, in terms of what
 the expected behaviour is. Then probably after that comes testing against user-generated
 input versus the current parser.

 -- All the best,
 Nick.

 _______________________________________________
 Wikitech-l mailing list
 Wikitech-l(a)lists.wikimedia.org
 http://lists.wikimedia.org/mailman/listinfo/wikitech-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] A Modest Proposal on grammar and parsers