Simetrical wrote:
On 11/9/07, Steve Sanbeg ssanbeg@ask.com wrote:
But some constructs in MW require an FSM to tokenize, not a regex. Clearly, properly tokenizing bold/italics requires complex processing on an entire paragraph of text. Even templates and links are a little complex, but should be doable by maintaining states with a stack.
FSMs accept regular languages by definition, so the set of things an FSM can recognize is precisely equal to that which can be specified by a regex. :)
In fact regexes as seen in PHP etc are more powerful than FSMs, since they can include back references and suchlike. But I presume PHP compiles regexes down to efficient FSMs if they don't include such constructs, so it probably doesn't make much difference in performance terms.
Soo Reams