That was quite amusing, I read the "Welcome to your new list" message before the wikitech-l message. Anyway, a list just for parser discussion is good.
Here's a bit of ANTLR grammar I wrote to handle basic article structure: paragraph blocks and "special blocks", where two consecutive blocks of the same type need an extra linefeed. Since I haven't written any Lex or Yacc before, I'm still wrestling a bit with what are probably fairly basic problems. In this case, I found the requirement of an extra linefeed quite challenging to implement without ambiguity problems.
As it is, this does work, but spews out a huge number of warnings and even an apparently non-fatal "fatal error". I presume some of these problems can be avoided through semantic and syntactic predicates, if not backtracking, memoization (no, that's not a typo). Any ANTLR experts here?
Steve
-- grammar paras;
article : pseries? (sseries (EOF| pseries))*; pseries : para (N+ para)* N*; sseries : specialblock (N+ specialblock)* N*;
specialblock : (spaceblock|listblock)+;
spaceblock : spaceline+;
spaceline : SPECIALCHAR char* N;
listblock : (listitem)+; listitem: (bulletitem | numberitem | indentitem | defitem);
bulletitem : BULLETCHAR (listitem | (nonlistchar char*)? N);
numberitem : NUMBERCHAR (listitem | (nonlistchar char*)? N);
indentitem : INDENTCHAR (listitem | (nonlistchar char*)? N);
defitem : DEFCHAR (nonindentchar)* (definition | INDENTCHAR? N ); definition : ':' char+ N;
BULLETCHAR: '*'; NUMBERCHAR: '#'; INDENTCHAR: ':'; DEFCHAR : ';';
para : (nonspecialchar char* N)+;
listchar: BULLETCHAR | NUMBERCHAR | INDENTCHAR | DEFCHAR;
SPECIALCHAR : ' '; nonlistchar : SPECIALCHAR | nonspecialchar; char : nonlistchar | listchar; nonindentchar : nonlistchar | BULLETCHAR | NUMBERCHAR | DEFCHAR; N : '\r'? '\n' ;
nowiki : NOWIKI; NOWIKI : '<nowiki>'( options {greedy=false;} : . )*'</nowiki>';
nonspecialchar : NONSPECIALCHAR | nowiki;
NONSPECIALCHAR : ('A'..'Z'| 'a'..'z' | '0'..'9' | ''' | '"' | '(' | ')')+; --
PS you might notice the above grammar implements two "improvements" to the ;definition:term notation:
1. The ;definition has to be the last item in the list. Constructs like ##;## are worthless. 2. A trailing : is treated literally.