Re: [Wikitext-l] Wikitext Madness of the Day: Internal Links

12 Aug 2010


      2010-08-12 09:30, Andreas Jonsson skrev:
[...]
...
However, requiring a link to be properly closed in order to be a link
is fairly complex.  What should the parser should do with the link
title, if it desides that it is not really a link title after all?  It
may contain tokens.  Thus, the lexer must use lookahead and not
produce any spurious link open tokens.  To avoid the n^2 worst case, a
full extra pass to compute hints would be necessary before doing the
actual lexing.
Replying to myself.  I might be wrong about the complexity of finding
the closing token.  The below lexer hack may actually do the trick:
a rule that matches the empty string if there is a valid closing tag
ahead.  Since it does not search past '[[' tokens, no content will be
scanned more than once by this rule.  So the worst case running time
is still linear.
fragment
LINK_CLOSE_LOOKAHEAD
@init{
         bool success = false;
}:
     (
         ( /*
            * List of all other lexer rules that may contain the strings
            * ']]' or '[['.
            */
              BEGIN_TABLE
            | TABLE_ROW_SEPARATOR
            | TABLE_CELL
            | TABLE_CELL_INLINE
           /*
            * Alternative: don't search beyond other block elements:
            */
            //   ({BOL}?=> '{|')=>   '{|'            {false}?=>
            // | (LIST_ELEMENT)=>    LIST_ELEMENT    {false}?=>
            // | (NEWLINE NEWLINE)=> NEWLINE NEWLINE {false}?=>
           /*
            * Otherwise, anything goes except ']]' or '[['.
            */
            | ~('['|']')
            | {!PEEK(2, '[')}?=> '['
            | {!PEEK(2, ']')}?=> ']'
         )+
         (
              ']]' {(success = true), false}?=>
            | {false}?=>
         )
     )
     |
     {success}?=>
     ;

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: [Wikitext-l] Wikitext Madness of the Day: Internal Links