2010/3/23 Roan Kattouw roan.kattouw@gmail.com:
2010/3/23 Aryeh Gregor Simetrical+wikilist@gmail.com:
This much I know, but is LaTeX actually a regular language?
I don't know; I was just making the point that writing a DFA parser in PHP is probably not very useful.
Sorry, I got confused and wrote DFA when I should have written LALR. DFAs cannot parse even the allowed subset of AMS-LaTeX, because there are some permitted environments.
Without claiming to know much formal language theory, a rule of thumb is that languages with matched delimiters were never regular, because of the pumping lemma: http://en.wikipedia.org/wiki/Pumping_lemma_for_regular_languages
So, for example, it's theoretically impossible to check that parentheses nested correctly using regular expressions, and similarly it'd be impossible to check that the \begin and \end commands matched up.
In practice there might be ways to hack around that by using multiple regular expressions and manually tracking how they nest, but at that point we're basically writing half of a bad LALR parser.
Fortunately, though, Python has parser generators! And if we're really concerned about speed, there's PyBison, which does the parsing in C and apparently produces (at least) five-fold improvements over Python-native alternatives.
Yours, Damon Wang