2010-08-12 09:30, Andreas Jonsson skrev: [...]
However, requiring a link to be properly closed in order to be a link is fairly complex. What should the parser should do with the link title, if it desides that it is not really a link title after all? It may contain tokens. Thus, the lexer must use lookahead and not produce any spurious link open tokens. To avoid the n^2 worst case, a full extra pass to compute hints would be necessary before doing the actual lexing.
Replying to myself. I might be wrong about the complexity of finding the closing token. The below lexer hack may actually do the trick: a rule that matches the empty string if there is a valid closing tag ahead. Since it does not search past '[[' tokens, no content will be scanned more than once by this rule. So the worst case running time is still linear.
fragment LINK_CLOSE_LOOKAHEAD @init{ bool success = false; }: ( ( /* * List of all other lexer rules that may contain the strings * ']]' or '[['. */ BEGIN_TABLE | TABLE_ROW_SEPARATOR | TABLE_CELL | TABLE_CELL_INLINE /* * Alternative: don't search beyond other block elements: */ // ({BOL}?=> '{|')=> '{|' {false}?=> // | (LIST_ELEMENT)=> LIST_ELEMENT {false}?=> // | (NEWLINE NEWLINE)=> NEWLINE NEWLINE {false}?=> /* * Otherwise, anything goes except ']]' or '[['. */ | ~('['|']') | {!PEEK(2, '[')}?=> '[' | {!PEEK(2, ']')}?=> ']' )+ ( ']]' {(success = true), false}?=> | {false}?=> ) ) | {success}?=> ;