On Fri, Mar 26, 2010 at 10:48 PM, Damon Wang <damonwang(a)uchicago.edu> wrote:
(You also as a Mediawiki extension rather than a core
feature; I'm going
to do that, but I won't say anything more because it seems fairly
uncontroversial.)
I actually disagree with this pretty strongly. It would be a
regression in functionality for existing users -- if they upgrade,
their wiki breaks unless they install a new extension. There's no
reason to remove it from core that I see that outweighs this
disadvantage.
Since the subset of TeX you need parsed has a
context-free grammar, it
needs an LALR parser, not just a bunch of regexes. I know three ways to
get an LALR parser:
(1) write a pushdown automaton manually (i.e., be yacc)
(2) write input for a parser-generator
(3) write a parser-generator, and give it input
Option (2) is the most maintainable and feasible option, and it's
precisely the one that cannot be done in PHP. As far as I know, PHP has
no parser-generator package. (Please, please let me know if that's
incorrect so I can stop embarrassing myself and get on with writing a
GSoC proposal.)
I could probably do (1), or some hackish kludge at half of it, by
throwing custom control structures into a bucketload of regexes, but I
don't think that's in the project's best interests. As has been pointed
out, the OCaml implementation is really concise and elegant. A large
fraction of that concision and elegance comes from not actually being a
parser but rather only a context-free grammar written in a BNF-like
syntax common to most parser-generators.
Okay, well, maybe you're right. I'd be interested to hear Tim
Starling's opinion on this (using parser generators vs. writing by
hand). Writing it in Python would certainly be a big step forward
from OCaml -- any site with LaTeX accessible to MediaWiki will almost
certainly have Python available, so Python vs. PHP should make no
difference to end-users. And Python is probably the second-best-known
language among MediaWiki hackers.
I think it'd be easier to find a programmer who
has worked with a
parser-generator and can learn a little bit of OCaml, than it would be
to find a PHP programmer who has to read himself into a manually
implemented parser. After all, how many PHP programmers do you know who
have experience mucking around inside an LALR parser?
The parsing part is unlikely to need much maintenance. There are
other things currently in OCaml that make more sense to modify from
time to time -- like the whitelist of commands, and (some of?) the
code for non-image output formats. So for instance, MathML output is
theoretically supported, but I don't know how good the support is.
That might become more important in the future, since Firefox is
likely to support inline MathML in text/html not too long from now.
This sort of thing would be harder if it were Python rather than PHP.
I don't think it would be a big deal if it were rewritten entirely in
Python, though. It would be a big step forward in any case, and if
it's easier for you, great. So personally I'd be okay with it,
although it's perhaps not ideal.
Also, would anyone be interested in mentoring this
project?
I probably wouldn't be of any help for this particular project, since
I don't know anything about parsers, and my Python and TeX are
passable but not great. We could probably come up with a mentor,
though.