Re: [Wikitech-l] EBNF grammar project status?

9 Nov 2007


      On Fri, 09 Nov 2007 19:23:44 +0000, Soo Reams wrote:
...
Simetrical wrote:
...
On 11/9/07, Steve Sanbeg ssanbeg@ask.com wrote:
...
But some constructs in MW require an FSM to tokenize, not a regex. 
Clearly, properly tokenizing bold/italics requires complex processing
on an entire paragraph of text.  Even templates and links are a little
complex, but should be doable by maintaining states with a stack.
FSMs accept regular languages by definition, so the set of things an FSM
can recognize is precisely equal to that which can be specified by a
regex.  :)
In fact regexes as seen in PHP etc are more powerful than FSMs, since they
can include back references and suchlike. But I presume PHP compiles
regexes down to efficient FSMs if they don't include such constructs, so
it probably doesn't make much difference in performance terms.
Soo Reams
IIRC, accept means that if the language is tokenized correctly, it can
give a yes/no whether the input stream is valid.  I don't think this helps
much when trying to tokenize it to begin with.
Wouldn't regexes always be compiled to FSMs, regardless of language or
constructs?

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] EBNF grammar project status?