Re: [Wikitech-l] EBNF grammar project status?

9 Nov 2007

Steve Sanbeg wrote:
...
  IIRC, accept means that if the language is tokenized
correctly, it can
 give a yes/no whether the input stream is valid.  I don't think this helps
 much when trying to tokenize it to begin with.
    That can only happen at a level above tokenization, i.e. parsing. To 
take a C example, "! &&;" is a perfectly legal set of tokens, but 
clearly not in the language. Also, as noted elsewhere, wikitext is 
basically the set of all strings, since we don't want to generate 
"compilation errors".
...
  Wouldn't regexes always be compiled to FSMs,
regardless of language or
 constructs? Not FSMs, no. Perl-style regexes can do things that no FSM can do. For

example, since FSMs are memoryless, they can't include back-references. 
I imagine they are compiled to something, but I couldn't say what. My 
argument was that PHP is probably smart enough to recognize regexes 
which don't include these extra features, and compile them to FSMs, 
since an FSM is such an efficient implementation.

Anyway this is getting off topic, since the discussion was over whether 
an FSM is adequate to tokenize wikitext. I don't think this question has 
been answered yet, but if the answer is yes then even a true regex 
(Kleene-style) is also powerful enough. So that's fine.

Soo Reams

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] EBNF grammar project status?