Re: [Wikitech-l] Re: xml wiki representation

16 Feb 2005


      Pedro Medeiros wrote:
...
Speaking of which, why have a flex/bison parser? Wouldn't it be better
if mediawiki created XML pages directly, like an "atom feed" or "rss"
button? Mediawiki already carries a HTML engine for rendering wikitext
to HTML, wouldn't it be easy to, with little modification, make it
output XML (or even Docbook/XML) instead of HTML?
Our present "parser" is a hack with a series of regexps and other
horrors, whose steps often stomp on each other and produce hard to fix
errors. It's not something to be emulated; rather it is our greatest
shame. Currently we cannot guarantee that XHTML output will be
well-formed, so changing it to a custom XML format would be a waste of
time, as it would not be transformable.
A character-by-character parser that can go from the beginning to the
end and churn something out that's guaranteed to be well-formed should
be less error-prone and easier to maintain. Whether flex/bison is the
best route I cannot say, but it's worth exploring.
Having this parser output an internal XML format instead of XHTML
directly means a) we can maintain semantic information that would be
lost in HTML and b) we can keep the base _parser_ separate from the code
that does things like check for page existence, format the URLs for
local links, and perhaps template transclusions. This allows
transformation to other formats (XHTML, DocBook?) with less crap than eg
trying to rewrite all the HTML into DocBook.
-- brion vibber (brion @ pobox.com)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Re: xml wiki representation