Re: [Wikitext-l] Image grammar

18 Nov 2007

      On 11/18/07, Steve Bennett stevagewp@gmail.com wrote:
...
...
Solutions I can think of so far:

Explicitly make the match for text to be 'a'..'z' | 'A'..'Z' |

MW_img_thumbnail | ...
...

Make tokens for individual letters (Aa, Bb...) then make the parser

recognise a pattern like Tt + Hh + Uu + Mm...
...

Make a token which is '|thumbnail', then use some

trick to distinguish '|thumbnailblah' from
'|thumbnail|'.
...

Like 1), but use a localised lexer so that those words are only tokens

in this specific context.
...

Just match text, then use special markup at the parser level to look

into the text that was matched.
...
Omg it's so much easier than that.
6) Use a syntactic predicate:
option : (magicword '|') => magicword
 | caption;
magicword
 : 'magicword';
Spoken with blissful ignorance. It turns out that that solution
doesn't work. Incorporating a literal string in the parser creates a
new lexer token, which means that this type of thing doesn't parse:
[[Image:thumbnail.jpg]].
Correct solution:
optionorcaption
        :       (mw_img_thumbnail (PIPE | LINK_END)) =>
...
        |       caption;
mw_img_thumbnail       : {textis("thumbnail") | textis("thumb")}? mwletters;
Where 'textis' is an actual function that looks at the text of the token.
This solution uses both syntactic and semantic predicates:
Syntactic predicate: if our text is an "mwletters" that matches a
semantic predicate, followed by a PIPE or LINK_END, then parse it  as
an mw_img_thumbnail, not a caption.
Semantic predicate: if the text of the mwletters is "thumbnail" or
"thumb", then it's a valid "mw_img_thumbnail" word.
The syntactic predicate stops "...|thumbnail blah|" from parsing as a
thumbnail option.
Whee, that was a bit harder than I expected.
Anyway, the mostly-complete image parsing code is here:
http://www.mediawiki.org/wiki/Markup_spec/ANTLR
It parses all image options except "page", supports infinitely nested
images and tolerates links in captions.
Steve

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: [Wikitext-l] Image grammar