[Wikitech-l] Re: Test my lex/yacc parser!

20 Sep 2004

Rowan Collins wrote:

...
  I know the current version doesn't do anything,
but I've been meaning
 for a while to finalise a patch to show a message saying "This is a
 redirect to [[foo]]". 
This has already been done in 1.4.

...
  * Things like
__NOTOC__ and stuff can be handled like this:
   * Regard *everything* of the form __CAPITALLETTERS__ as a special
     token  
 Actually, it can be lower case currently. Unless we're going to hunt
 the database for examples where it is, best just treat
 __anystringofletters__ as needing to be investigated. 
Indeed. I didn't know that. But it isn't a problem at all. Even with it 
being case-insensitive, I don't think it's asking too much of the users 
to put <nowiki> around anything that looks like these, since they are 
rarely enough intended to be actual text. I would highly doubt that any 
significant amount of articles is currently relying on them being text.

...
  * The template
pseudo-variables (e.g. CURRENTMONTH) are similarly
   handled in post-processing.  
 By which, do you mean they are treated as templates and then
 recognised as magic after? Just curious. 
Yep, that's right.

...
  * HTML tags and
extension names are either not internationalised, or all
   translations of them are made to work on all Wikipedias.  
 That seems a bit of a step backwards to me. Actually, everything that
 looks like a SGML tag has to be treated one of three ways:

 a) it is an extension, and everything from there to its partner should
 be unparsed / sent somewhere else for parsing
 b) it's an allowed HTML tag, and should be put in the parse-tree as
 that kind of element, with its contents parsed "independently" (sort
 of)
 c) it is neither of the above, and needs entity escaping so that it
 doesn't get as far as the browser still looking like HTML 
I am perfectly happy with this, but since the parser is a stand-alone 
module, I cannot treat a particular word as case (a) on one Wikipedia 
but case (c) on another.

I'm not sure why you think allowing all translations on all Wikipedias 
would be a "step backwards"? Or do you seriously think someone would use 
the Chinese translation of <math> on the English Wikipedia? :)

But if you still insist on this, then I have two suggestions:

* We could replace the "other-language" words with the
"this-language"
   words upon save. I.e. if someone wrote <math> on the Chinese
   Wikipedia, it would automatically be changed into "<" + some Chinese
   characters + ">" before storing it in the DB.

* Alternatively, we could have the parser recognise only the canonical
   (English) words, and have the PHP software replace non-English magic
   words with the canonical (English) words before invoking the parser.
   I am uncomfortable with this solution because it resorts to the same
   kind of patchwork that is erking me about the current not-a-parser.

...
  Perhaps extensions could be made to return a parse
sub-tree (even if
 it only has one element).  Then we could use a HTML "extension" bound
 to all allowed HTML tags, which just called the original parser back
 on the contents of the tags. 
This is an interesting thought, but I think it is inefficient with 
regards to performance. If the parser knows about allowed HTML tags (and 
the difference between an HTML tag and an extension) beforehand, this 
extra step would be saved. Additionally, your idea works only for tags 
that are independent of other tags; it would not work well with tables.

Timwi

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[Wikitech-l] Re: Test my lex/yacc parser!