Hi,
I'm going on holiday for the next week. Accordingly, I will not be able
to work on the lex/yacc parser that I have written during the past weeks
or so. I will check into CVS my work so far, and anyone interested can
continue the work while I am away.
So far, the parser can do:
* paragraphs
* pre-lines (lines beginning with spaces)
* lists (* and # only)
* extensions (<math>, <hiero>)
* headings
* bold and italics
I am sorry I took so long to do bold and italics, but, just as I
originally anticipated, it was quite hard. I had discarded two failed
attempts until the third one finally worked out. There is one special
case in which I had to apply a bit of a hack, but I am sure that this is
okay, given that it works pretty much perfectly now.
As for "extensions", it currently recognises anything as an extension
that is an HTML tag without attributes and its corresponding closing
tag. Using this mechanism, <nowiki> and <pre> can be considered
"extensions" for the purposes of the parser.
What is missing:
* links, images, categories (everything in [[ ... ]])
* template inclusions and variables ({{...}} and {{{...}}})
* tables
* HTML tags that should be allowed but are not extensions (esp. div)
The lexer already recognises tokens for the former two, but not for
tables or HTML tags. In particular, it will recognise something like
<b>''something''</b> as an "extension" and not parse the '' as italics.
Obviously, this needs to be fixed.
If anything is unclear about how things work, please drop me an e-mail
and I will document the relevant bits when I am back.
Timwi