Python is a nice language. PHP (portability) or C/C++ (speed) would be better but Python is preferable to OCaml.
You mention ANTLR, something like that could be a good because it should allow to generate the same parser in a different language with not so much effort (probably you won't have enough time in gsoc for that, but a design taking that option into account would be interesting).
So you could do (please don't take this as a requisites list): *Figure out wth is doing the current texvc. *Document it heavily. *Design how to create the next textvc. *Any parser you make for it. *Actual implementation.
You seem to be thinking about creating a PHP extension. I don't think you should go that route. A binary is good enough, we don't need it to be in a PHP extension. That glue could be added later if needed, but would increase the complexity to write and debug.