[Wikitech-l] Re: xml wiki representation

16 Feb 2005


      Brion Vibber wrote:
...
Our present "parser" is a hack with a series of regexps and other
horrors, whose steps often stomp on each other and produce hard to fix
errors. It's not something to be emulated; rather it is our greatest
shame. Currently we cannot guarantee that XHTML output will be
well-formed, so changing it to a custom XML format would be a waste of
time, as it would not be transformable.
But, still, a parser written in php is necessary. Albeit a better one.
...
A character-by-character parser that can go from the beginning to the
end and churn something out that's guaranteed to be well-formed should
be less error-prone and easier to maintain. Whether flex/bison is the
best route I cannot say, but it's worth exploring.
A proof-of-concept implementation might be a good thing to have around. 
But if I may, I can't see how, for instance, a simple flex/bison parser 
could adequately parse a set of varying extension languages, like the 
one used in <math> tags, into valid XML (In this case, MathML, I guess).
The parser would have to be modular, so each parser module would be used 
to translate a language. Well, this sparks some ideas.
...
Having this parser output an internal XML format instead of XHTML
directly means a) we can maintain semantic information that would be
lost in HTML and b) we can keep the base _parser_ separate from the code
that does things like check for page existence, format the URLs for
local links, and perhaps template transclusions. This allows
transformation to other formats (XHTML, DocBook?) with less crap than eg
trying to rewrite all the HTML into DocBook.
I completely agree. My question was about the best way of doing that 
parsing.
Cheers,
Pedro.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[Wikitech-l] Re: xml wiki representation