On 11/9/07, Steve Sanbeg <ssanbeg(a)ask.com>
wrote:
But some constructs in MW require an FSM to
tokenize, not a regex.
Clearly, properly tokenizing bold/italics requires complex processing
on an entire paragraph of text. Even templates and links are a little
complex, but should be doable by maintaining states with a stack.
FSMs accept regular languages by definition, so the set of things an FSM
can recognize is precisely equal to that which can be specified by a
regex. :)
In fact regexes as seen in PHP etc are more powerful than FSMs, since they
can include back references and suchlike. But I presume PHP compiles
regexes down to efficient FSMs if they don't include such constructs, so
it probably doesn't make much difference in performance terms.
Soo Reams