On 11/17/07, Steve Bennett <stevagewp(a)gmail.com> wrote:
The problem I have here is the options for the image: you'd like the word
"thumbnail" to be a token, but then if you get a case like:
[[image:finger.jpg|Note the impressive thumbnails.]]
you get one token for "thumbnail" rather than "t" and "h"
etc.
Solutions I can think of so far:
1) Explicitly make the match for text to be 'a'..'z' |
'A'..'Z'
| MW_img_thumbnail | ...
2) Make tokens for individual letters (Aa, Bb...) then make the parser
recognise a pattern like Tt + Hh + Uu + Mm...
3) Make a token which is
'|thumbnail', then use some trick to distinguish '|thumbnailblah' from
'|thumbnail|'.
4) Like 1), but use a localised lexer so that those words are only tokens
in this specific context.
5) Just match text, then use special markup at the parser level to look
into the text that was matched.
Omg it's so much easier than that.
6) Use a syntactic predicate:
option : (magicword '|') => magicword
| caption;
magicword
: 'magicword';
Translation: If the next two tokens are some magicword and the pipe, then
match the magic word. Otherwise, treat it as a caption.
That was easy. Woot. I thought things were a lot more complicated because
ANTLRWorks sneakily doesn't support predicates in its Interpreter mode, only
in its Debugger mode. I say "sneakily" because the error it reports looks
like an error in your code...
But: if it can produce a parser in *any* langauge, then
we have
something to run the test suite against, with a little harness
rewiring, which makes it easier to sell both the retargeting work and
the switch-MW-to-this work.
Oh, that's a good benefit too: we can regression test the new *grammar*
against the old *parser*. Obviously it won't all work, and will require
hacks to get all those magic words and stuff into the grammar. Perhaps
someone could look into creating some tests that don't require the
preprocessor (no templates, no magic variables) and that focus on specific
language features...or maybe they already exist, I haven't looked.
Steve