Re: [Wikitech-l] GSoC project advice: port texvc to Python?

28 Mar 2010


      On 28/03/10 18:59, Aryeh Gregor wrote:
...
On Fri, Mar 26, 2010 at 10:48 PM, Damon Wangdamonwang@uchicago.edu  wrote:
...
(You also as a Mediawiki extension rather than a core feature; I'm going
to do that, but I won't say anything more because it seems fairly
uncontroversial.)
I actually disagree with this pretty strongly.  It would be a
regression in functionality for existing users -- if they upgrade,
their wiki breaks unless they install a new extension.  There's no
reason to remove it from core that I see that outweighs this
disadvantage.
...
Since the subset of TeX you need parsed has a context-free grammar, it
needs an LALR parser, not just a bunch of regexes. I know three ways to
get an LALR parser:
(1) write a pushdown automaton manually (i.e., be yacc)
(2) write input for a parser-generator
(3) write a parser-generator, and give it input


Option (2) is the most maintainable and feasible option, and it's
precisely the one that cannot be done in PHP. As far as I know, PHP has
no parser-generator package. (Please, please let me know if that's
incorrect so I can stop embarrassing myself and get on with writing a
GSoC proposal.)
I could probably do (1), or some hackish kludge at half of it, by
throwing custom control structures into a bucketload of regexes, but I
don't think that's in the project's best interests. As has been pointed
out, the OCaml implementation is really concise and elegant. A large
fraction of that concision and elegance comes from not actually being a
parser but rather only a context-free grammar written in a BNF-like
syntax common to most parser-generators.
Okay, well, maybe you're right.  I'd be interested to hear Tim
Starling's opinion on this (using parser generators vs. writing by
hand).  Writing it in Python would certainly be a big step forward
from OCaml -- any site with LaTeX accessible to MediaWiki will almost
certainly have Python available, so Python vs. PHP should make no
difference to end-users.  And Python is probably the second-best-known
language among MediaWiki hackers.
Have you had a look at pyparsing, which is a ready-made 
all-singing-all-dancing Python parser package with a large amount of 
syntactic sugar built in to allow the more-or-less direct input of 
grammar notations?
Given that the texvc source already has a grammar encoded into it in 
machine-executable form, it might be an idea to consider mechanically 
extract that grammar from the texvc OCaml source, and then reformatting 
it into a grammar in pyparsing's natural format.
-- Neil

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] GSoC project advice: port texvc to Python?