Re: [Wikitech-l] GSoC project advice: port texvc to Python?

28 Mar 2010

On Fri, Mar 26, 2010 at 10:48 PM, Damon Wang &lt;damonwang(a)uchicago.edu&gt; wrote:
...
  (You also as a Mediawiki extension rather than a core
feature; I'm going
 to do that, but I won't say anything more because it seems fairly
 uncontroversial.) 
I actually disagree with this pretty strongly.  It would be a
regression in functionality for existing users -- if they upgrade,
their wiki breaks unless they install a new extension.  There's no
reason to remove it from core that I see that outweighs this
disadvantage.

...
  Since the subset of TeX you need parsed has a
context-free grammar, it
 needs an LALR parser, not just a bunch of regexes. I know three ways to
 get an LALR parser:

    (1) write a pushdown automaton manually (i.e., be yacc)
    (2) write input for a parser-generator
    (3) write a parser-generator, and give it input

 Option (2) is the most maintainable and feasible option, and it's
 precisely the one that cannot be done in PHP. As far as I know, PHP has
 no parser-generator package. (Please, please let me know if that's
 incorrect so I can stop embarrassing myself and get on with writing a
 GSoC proposal.)

 I could probably do (1), or some hackish kludge at half of it, by
 throwing custom control structures into a bucketload of regexes, but I
 don't think that's in the project's best interests. As has been pointed
 out, the OCaml implementation is really concise and elegant. A large
 fraction of that concision and elegance comes from not actually being a
 parser but rather only a context-free grammar written in a BNF-like
 syntax common to most parser-generators. 
Okay, well, maybe you're right.  I'd be interested to hear Tim
Starling's opinion on this (using parser generators vs. writing by
hand).  Writing it in Python would certainly be a big step forward
from OCaml -- any site with LaTeX accessible to MediaWiki will almost
certainly have Python available, so Python vs. PHP should make no
difference to end-users.  And Python is probably the second-best-known
language among MediaWiki hackers.

...
  I think it'd be easier to find a programmer who
has worked with a
 parser-generator and can learn a little bit of OCaml, than it would be
 to find a PHP programmer who has to read himself into a manually
 implemented parser. After all, how many PHP programmers do you know who
 have experience mucking around inside an LALR parser? 
The parsing part is unlikely to need much maintenance.  There are
other things currently in OCaml that make more sense to modify from
time to time -- like the whitelist of commands, and (some of?) the
code for non-image output formats.  So for instance, MathML output is
theoretically supported, but I don't know how good the support is.
That might become more important in the future, since Firefox is
likely to support inline MathML in text/html not too long from now.
This sort of thing would be harder if it were Python rather than PHP.

I don't think it would be a big deal if it were rewritten entirely in
Python, though.  It would be a big step forward in any case, and if
it's easier for you, great.  So personally I'd be okay with it,
although it's perhaps not ideal.

...
  Also, would anyone be interested in mentoring this
project? 
I probably wouldn't be of any help for this particular project, since
I don't know anything about parsers, and my Python and TeX are
passable but not great.  We could probably come up with a mentor,
though.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] GSoC project advice: port texvc to Python?