Re: [Wikitext-l] Draft 10 published

13 Feb 2008

On 2/12/08, Daniel Kinzler &lt;daniel(a)brightbyte.de&gt; wrote:
...
  1) "parser hook" extensions (aka tag hooks
aka extension tags), which conform to
 a (fuzzy) xml syntax: <name foo="bar" bla=12 blubb>...</name>; The
... in
 between the tags should be completely opaque, the parser should skip everything
 up to the closing tag. There is no support for nesting, no expansion of
 templates or template parameters, nothing. Also, the the text *returned* by the
 extension is expected to be HTML, and should be passed through the generation
 stage untouched. 
The trouble there is that <ref> for example can contain
wikitext...which needs to be parsed. e.g.:

<ref>''The origin of species'', Darwin</ref>

So at a minimum I think we would need to distinguish those extensions
whose internal text needs to be parsed?

...

 2) "parser functions" which conform to an extended template syntax:
 {{#name: param|param|param...}}; In this case, all parameters have to be fully
 parsed and expanded, so this needs to work:
 {{#foo:xx|{{#bar|{{{bla|frob}}}}}|{{something}}}}

 The output of parser functions may be wikitext that has to be further processed
 in context (just as if it where a normal template), or it may be HTML that has
 to be passed through (and a few more minor options). This is determined by each
 extension when registering the hook. 
Afaik, these are converted by the preprocessor (recently rewritten by
Tim), and are completely invisible to the parser?

...
  Extensions may also introduce arbitrary magic words.
Such extensions are
 impossible to make compatible with a new ANTRL based parser, they would have to
 be rewritten as plugins to such a parser. Would it be possible to allow such
 plugins? I'm thinking of allowing a way for extensions to redifine individula
 bits of the grammar. 
It depends a bit on the limits of these "arbitrary magic words". I
think it's actually suprisingly feasible to allow magic words that,
say, consist of strings of letters surrounded by space, or certain
predefined punctuation.

At first I thought that would be a nightmare, but in practice it
isn't. As the second last rule before rendering a string of letters
literally, I would simply add a (Java/PHP) check to see if the string
matched any registered extension, and parse it as an extension magic
word instead. Here's how that happens with __TOC__ etc:

magic_word: UNDERSCORE UNDERSCORE  magic_word_text UNDERSCORE UNDERSCORE
-> ^(MAGIC_WORD magic_word_text);

magic_word_text: {is_magic_word()}? letters;

@members {
....
  boolean is_magic_word() {
    return
        input.LT(1).getText().equalsIgnoreCase("NOTOC") ||
        input.LT(1).getText().equalsIgnoreCase("TOC") ||
        input.LT(1).getText().equalsIgnoreCase("FORCETOC") ||
        input.LT(1).getText().equalsIgnoreCase("NOGALLERY") ||
        input.LT(1).getText().equalsIgnoreCase("NOEDITSECTION")
    ;
  }

}

It would only be a problem if the contents of the magic word
interfered with the lexer - say a combination of letters and other
punctuation. But if the available combinations were predefined (eg,
hyphen hyphen letters digit hyphen hyphen) then they can be dealt
with, and the letters themselves defined at runtime.

Steve

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: [Wikitext-l] Draft 10 published