Hi all, I've published what I'm calling (for no good reason) "draft 10" here:
http://www.mediawiki.org/wiki/Markup_spec/ANTLR/draft
Mostly, I got to a certain level of feature completeness. Specifically, the list of 4 features that were previously missing (tables, magic words, categories and inline HTML) have been implemented.
I redid the table stuff - turns out I was getting too fancy for my own good. I now do less semantic checking, and am thus much more tolerant of borderline input.
I've also cleaned it up a bit and have roughly grouped all the rules into levels, thus:
Top level, block elements:
line: (table) => table^ | (headerline) => headerline^ | (listmarker) => listline^ | (hrline) => hrline^ | (spaceline) => spaceline^ | paragraph^ ;
Next level, inline text (generally, stuff that appears within a line, and doesn't contain new lines)
inline_text @init { text_levels++; } : ( ((LEFT_BRACKET LEFT_BRACKET LEFT_BRACKET) => literal_left_bracket |(literal_left_bracket bracketed_url) => literal_left_bracket |(image) => image |(category) => category |(external_link) => external_link |(internal_link) => internal_link |(magic_link) => magic_link |(magic_word) => magic_word |pre_block |(formatted_text_elem) =>formatted_text_elem ) ((nbsp_before_punctuation) => nbsp_before_punctuation)? ((ws) =>printing_ws)? )+; finally { text_levels --;}
The exception there is <pre> blocks which really do contain newlines.
Next level down is formatted text, which can appear in places like link captions:
formatted_text @init { text_levels++; } : ( (formatted_text_elem) => formatted_text_elem ((nbsp_before_punctuation) => nbsp_before_punctuation)* ((printing_ws) => printing_ws)? )+; finally { text_levels --; }
formatted_text_elem: ( (accidental_magic_link) => accidental_magic_link | ((punctuation_before_nbsp)=> punctuation_before_nbsp) | (APOSTROPHES) => bold_and_italics | angle_tag | ((html_entity) => html_entity) | unformatted_characters );
And the very lowest level is unformatted characters:
unformatted_characters: (html_dangerous |punctuation |meaningless_characters |digits );
Anyway, when I say "feature complete", most of the major features that I know of are present in some form. None of them is complete in itself (except perhaps images), but it's a start.
So what next: suggestions for more features to add would be handy.Also, I need to get around to making it do more than just generate an AST. Theoretically it's not too much work to take the ASP and spit out some kind of XHTML.
It would also be nifty if someone could figure out a way of embedding wikitext into the grammar to mark it up somehow. Does section inclusion work yet? If so, would it be possible to insert comments somehow that would allow other pages to transclude sections? Then some of the documention could be stored outside the grammar itself, yet shown alongside...
Steve