Re: [Wikitext-l] Image grammar

19 Nov 2007

On 11/18/07, Steve Bennett &lt;stevagewp(a)gmail.com&gt; wrote:
...
   Solutions I
can think of so far:
 1) Explicitly make the match for text to be 'a'..'z' |
'A'..'Z' |  MW_img_thumbnail | ...
  2) Make tokens for individual letters (Aa, Bb...)
then make the parser  recognise a pattern like Tt + Hh + Uu + Mm...
  3) Make a token which is '|thumbnail',
then use some  trick to distinguish '|thumbnailblah' from
 '|thumbnail|'.
  4) Like 1), but use a localised lexer so that
those words are only tokens  in this specific context.
  5) Just match text, then use special markup at
the parser level to look  into the text that was matched.

 Omg it's so much easier than that.
 6) Use a syntactic predicate:

 option : (magicword '|') => magicword
  | caption;

 magicword
  : 'magicword'; 
Spoken with blissful ignorance. It turns out that that solution
doesn't work. Incorporating a literal string in the parser creates a
new lexer token, which means that this type of thing doesn't parse:
[[Image:thumbnail.jpg]].

Correct solution:
optionorcaption
        :       (mw_img_thumbnail (PIPE | LINK_END)) =>
...
        |       caption;

mw_img_thumbnail       : {textis("thumbnail") | textis("thumb")}?
mwletters;

Where 'textis' is an actual function that looks at the text of the token.

This solution uses both syntactic and semantic predicates:
Syntactic predicate: if our text is an "mwletters" that matches a
semantic predicate, followed by a PIPE or LINK_END, then parse it  as
an mw_img_thumbnail, not a caption.
Semantic predicate: if the text of the mwletters is "thumbnail" or
"thumb", then it's a valid "mw_img_thumbnail" word.

The syntactic predicate stops "...|thumbnail blah|" from parsing as a
thumbnail option.

Whee, that was a bit harder than I expected.

Anyway, the mostly-complete image parsing code is here:
http://www.mediawiki.org/wiki/Markup_spec/ANTLR

It parses all image options except "page", supports infinitely nested
images and tolerates links in captions.

Steve

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: [Wikitext-l] Image grammar