Re: [Wikitext-l] Draft 10 published

14 Feb 2008

...
  -----Original Message-----
 From: wikitext-l-bounces(a)lists.wikimedia.org 
 [mailto:wikitext-l-bounces@lists.wikimedia.org] On Behalf Of 
 Daniel Kinzler
 Sent: 13 February 2008 22:30
 To: Wikitext-l
 Subject: Re: [Wikitext-l] Draft 10 published

  Your not going to get 100% compatibility moving
from the multiple 
 search/replace method into a single parse.

 Hooks embedded within the parser,  like InternalParseBeforeLinks, 
 ParserBeforeTidy become impossible to do.  
 True. I was thinking of "clean" tag hooks and parser 
 functions. These should continue to work wit ha minimum of 
 modification. I don't mind the black magic braking.

 > That is, the grammar should NOT know about
<ref>, not what   it does, 
 > not even that it exists. It should simply
have a facility   that allows 
   externam
(php) code to handle the characters (unchanged!) between 
 (some specific) tags.  
 Agreed, the grammar should know how to pass and correct tag   soup style 
  HTML/XML that gets handed off to deal with.

 Yes, though for the parser, there are three cases to consider 
 for HTML/XML style
 tags:

 1) (whitelisted) HTML tags, which can occur "soupy", and are 
 more or less passed through (or "tidied" into valid xhtml).
 2) Other tags (potentially handled by an extension) which 
 must match in pairs exactly and cause the parser to take 
 anything *inbetween* LITERALLY, and pass it to the extension 
 for processing.
 3) In case there is no such extension, it needs to go back, 
 read the *tags* literally, and then parse the text between the tags. 
All tag attributes are parsed Santizer::decodeTagAttributes() I believe so
things like attributes with missing values
<foo bar> are possible for all tags.

In 2, not sure they must always be matched in pairs. Think somewhere
(possibly in Parser::extractTagsAndParams()) allows unterminated tags to run
to the end of the text.

3, unrecoginised tags should just cause the parser to output a &lt; and
carry on parsing. 

...

 There's even a fourth case, namely magic tags like <nowiki> 
 that have to be known to the parser for special handling - 
 these may also include <includeonly>, <onlyinclude> and 
 <noinclude>, though those might be handled by the 
 preprocessor, i'm not sure about that. 
I believe (haven't looked into it or implemented yet) that the onlyinclude
and noinclude are essentially filters that occur at transclusion time.
Includeonly is a filter at save time? Preventing a template being associated
with a category for example. 

...

 In the case of (some!) parser functions, it has to be 
 considered that the
 *output* of the extension would have to be parsed to, 
 inlined. But that stuff is probably handled by the 
 preprocessor - if that is indeed the case, there's nothing to 
 worry about.

 -- Daniel

 _______________________________________________
 Wikitext-l mailing list
 Wikitext-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitext-l 

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: [Wikitext-l] Draft 10 published