On Fri, Mar 26, 2010 at 10:48 PM, Damon
Wang<damonwang(a)uchicago.edu> wrote:
(You also as a Mediawiki extension rather than a
core feature; I'm going
to do that, but I won't say anything more because it seems fairly
uncontroversial.)
I actually disagree with this pretty strongly. It would be a
regression in functionality for existing users -- if they upgrade,
their wiki breaks unless they install a new extension. There's no
reason to remove it from core that I see that outweighs this
disadvantage.
Since the subset of TeX you need parsed has a
context-free grammar, it
needs an LALR parser, not just a bunch of regexes. I know three ways to
get an LALR parser:
(1) write a pushdown automaton manually (i.e., be yacc)
(2) write input for a parser-generator
(3) write a parser-generator, and give it input
Option (2) is the most maintainable and feasible option, and it's
precisely the one that cannot be done in PHP. As far as I know, PHP has
no parser-generator package. (Please, please let me know if that's
incorrect so I can stop embarrassing myself and get on with writing a
GSoC proposal.)
I could probably do (1), or some hackish kludge at half of it, by
throwing custom control structures into a bucketload of regexes, but I
don't think that's in the project's best interests. As has been pointed
out, the OCaml implementation is really concise and elegant. A large
fraction of that concision and elegance comes from not actually being a
parser but rather only a context-free grammar written in a BNF-like
syntax common to most parser-generators.
Okay, well, maybe you're right. I'd be interested to hear Tim
Starling's opinion on this (using parser generators vs. writing by
hand). Writing it in Python would certainly be a big step forward
from OCaml -- any site with LaTeX accessible to MediaWiki will almost
certainly have Python available, so Python vs. PHP should make no
difference to end-users. And Python is probably the second-best-known
language among MediaWiki hackers.
Have you had a look at pyparsing, which is a ready-made
all-singing-all-dancing Python parser package with a large amount of
syntactic sugar built in to allow the more-or-less direct input of
grammar notations?
Given that the texvc source already has a grammar encoded into it in
machine-executable form, it might be an idea to consider mechanically
extract that grammar from the texvc OCaml source, and then reformatting
it into a grammar in pyparsing's natural format.
-- Neil