On Fri, Mar 26, 2010 at 7:48 PM, Damon Wang damonwang@uchicago.edu wrote:
There's a few Python-based things that might be interesting, but I think you'll get a lot more love for doing something in PHP or C. Since this is a student internship, you shouldn't be bashful about using this as a learning opportunity.
I'd only caution against convincing yourself (and us) that you'll be more interested in learning something like PHP than you truly are. It might help you land a spot, but it will work against you in having a successful project, and
this has such high visibility that you'll really want to be successful.
What visibility does this have? I thought it was some abandoned corner of the wiki that nobody has touched in the seven years since it was first written. What happens if I make a hash of this?
Hi Damon,
Oops....that was a little ambiguous and probably applies a little more pressure than intended. What I meant to say is that Google Summer of Code generally is pretty high visibility, not this project in particular. Projects often go back and review results from previous years (just like we did: http://www.mediawiki.org/wiki/Summer_of_Code_Past_Projects ). There's plenty of ways to have a noble failure that won't reflect poorly on you, but that's probably not what you should aim for. There's nothing particularly high profile about this particular project relative to other GSoC stuff.
Anyway, in response to the specifics about Python/texvc. I was looking around for some ideas about how to approach replacing texvc with a Python implementation, and stumbled into this: http://www.mediawiki.org/wiki/Texvc_PHP_Alternative
That implementation seems to punt on the whole parsing thing, and as near as I can tell from a cursory reading, just passes it all through to latex, so that probably won't do. However, there may be something I'm missing.
Interestingly enough, though, looking at the Talk page for that leads you here: http://sourceforge.net/projects/latex2mathml/
http://sourceforge.net/projects/latex2mathml/This *does* have a parser. As you might expect, the code looks pretty involved, and seems to be handling parsing 101 without the benefit of anything other than the trusty substr and strpos functions. There's enough code there doing enough character-by-character manipulation that it makes me fear for the performance. Still, it looks like there's some serious work that's actually done, so it bears some level of investigation.
Anyway, I hear what you're saying about Python's much better parsing support (it wasn't too long ago I was gushing about the simpleparse module on my blog[1]). Given the number of other external dependencies that would probably still remain even with a PHP implementation, it's probably not worth sweating the additional Python dependency in the grand scheme of things. Python seems like a much less daunting dependency than OCaml, but I know far too little about OCaml to actually assert that with any confidence.
Regardless of which path you choose, I'd be happy to be your mentor assuming we have enough slots for this project.