Conrad Irwin wrote:
On 03/23/2010 05:23 PM, Aryeh Gregor wrote:
On Tue, Mar 23, 2010 at 1:00 PM, Roan Kattouw roan.kattouw@gmail.com wrote:
DFAs parse regular languages, which means those languages can also be expressed as regexes. In fact, the regexes accepted by the preg_*() functions allow certain extensions to the language theory definition of regular expressions, allowing them to describe certain non-regular languages as well. In short: preg_split() can do everything a DFA can do, and more. The only reason to use a DFA parser would be performance, but since the preg_*() functions are so heavily optimized I don't think that'll be an issue.
This much I know, but is LaTeX actually a regular language?
It's not even context free, luckily the subset we are interested in is (as clearly shown by the texvc parser :p).
Just because a language is context-sensitive doesn't mean it will be hard to write a parser for it. That's just a myth propagated by computer scientists who, strangely enough given their profession, have a disdain for the algorithm as a descriptive framework.
In the last few decades, pure mathematicians have been exploring the power of algorithms as a general description of an axiomatic system. And simultaneously, computer scientists have embraced the idea that the best way to process text is by trying to shoehorn all computer languages into some Chomsky-inspired representation, regardless of how awkward that representation is, or how inefficient the resulting algorithm becomes, when compared to an algorithm constructed a priori.
-- Tim Starling