[Wikitext-l] Progress

Steve Bennett stevagewp at gmail.com
Tue Dec 11 01:21:48 UTC 2007


I'm about to head off for a week and a half, so here's a quick
progress stop. My ANTLR grammar so far is here:

http://www.mediawiki.org/wiki/User:Stevage/ANTLR

It does many features, but most aren't really complete.

Supports:
* Internal links
* External links (limited range of characters allowed)
* Images (all options)
* Headings (limits on ='s in the text)
* Nowiki, pre
* French punctuation ( foo ? -> foo ?)
* HTML entities (  is recognised, &foo; is converted to literals)
* Dangerous HTML, < -> &lt; etc
* Bold, italics (supports the basic rules, not the single-character stuff)
* Paragraphs
* Space-indented blocks
* Lists (intentionally doesn't support nested ; lists, does support ;foo:blah)
* ISBN, RFC, PMID (fully, I think)

Does not support:
* Categories
* Tables
* Inline HTML (<b>, <div> etc)
* __TOC__ etc
* HTML comments

Other limitations:
* Very reduced ranges of characters for many things, like it doesn't
know that é is a letter rather than punctuation, for instance
* Case sensitivity in some places (<NOWIKI> is not recognised)

At the moment, it simply builds an AST, but converting from that AST
to HTML should be pretty trivial. I have mind some simply
tree-cleaning steps first, like concatenating consecutive P blocks
into one (I'm using BR to indicate a gap of two or more new lines),
concatenating consecutive OL etc.

I offer this up just for curiosity's sake - no one should try and hack on it ;)

[hrm, on closer inspection, that's not the latest version of that
file. oh well.]

Steve



More information about the Wikitext-l mailing list