That was quite amusing, I read the "Welcome to your new list" message before the wikitech-l message. Anyway, a list just for parser discussion is good.

Here's a bit of ANTLR grammar I wrote to handle basic article structure: paragraph blocks and "special blocks", where two consecutive blocks of the same type need an extra linefeed. Since I haven't written any Lex or Yacc before, I'm still wrestling a bit with what are probably fairly basic problems. In this case, I found the requirement of an extra linefeed quite challenging to implement without ambiguity problems.

As it is, this does work, but spews out a huge number of warnings and even an apparently non-fatal "fatal error". I presume some of these problems can be avoided through semantic and syntactic predicates, if not backtracking, memoization (no, that's not a typo). Any ANTLR experts here?

Steve

--
grammar paras;

article : pseries? (sseries (EOF| pseries))*;
pseries : para (N+ para)* N*;
sseries : specialblock (N+ specialblock)* N*;


specialblock
: (spaceblock|listblock)+;

spaceblock
: spaceline+;

spaceline
: SPECIALCHAR char* N;

listblock
: (listitem)+;
listitem: (bulletitem | numberitem | indentitem | defitem);

bulletitem
: BULLETCHAR (listitem | (nonlistchar char*)? N);

numberitem
: NUMBERCHAR (listitem | (nonlistchar char*)? N);

indentitem
: INDENTCHAR (listitem | (nonlistchar char*)? N);

defitem
: DEFCHAR (nonindentchar)* (definition | INDENTCHAR? N );
definition
: ':' char+ N;

BULLETCHAR: '*';
NUMBERCHAR: '#';
INDENTCHAR: ':';
DEFCHAR : ';';

para : (nonspecialchar char* N)+;

listchar: BULLETCHAR | NUMBERCHAR | INDENTCHAR | DEFCHAR;

SPECIALCHAR
: ' ';
nonlistchar
: SPECIALCHAR | nonspecialchar;
char : nonlistchar | listchar;
nonindentchar
: nonlistchar | BULLETCHAR | NUMBERCHAR | DEFCHAR;
N : '\r'? '\n' ;

nowiki : NOWIKI;
NOWIKI : '<nowiki>'( options {greedy=false;} : . )*'</nowiki>';

nonspecialchar
: NONSPECIALCHAR | nowiki;

NONSPECIALCHAR
: ('A'..'Z'| 'a'..'z' | '0'..'9' | '\'' | '"' | '(' | ')')+;
--

PS you might notice the above grammar implements two "improvements" to the ;definition:term notation:

1. The ;definition has to be the last item in the list.  Constructs like ##;## are worthless.
2. A trailing : is treated literally.