On 11/9/07, Steve Sanbeg <ssanbeg(a)ask.com> wrote:
Wouldn't regexes always be compiled to FSMs,
regardless of language or
constructs?
Not if they include back-references, assertions, or other things not
part of "real" regexes. A real regex can be defined as follows:
1) Any character is a regex.
2) Any regex followed by a * is a regex.
3) Any two regexes concatenated together are regexes.
4) Any two regexes concatenated together with a | character are regexes.
Stuff like [a-z] is just shorthand for (a|b|c|...|z), so that's still
a proper regex, but things like back-references aren't possible to
translate into those four rules, so you have to use something more
sophisticated than a FSM.