Re: [Wikitech-l] Wikitext to HTML translator and Wikitext language specification

21 Oct 2003

      On Tue, Oct 14, 2003 at 11:16:19AM +1300, Richard Grevers wrote:
...
On Mon, 13 Oct 2003 17:21:15 -0400, David Friedland david@nohat.net gave 
utterance to the following:
...
There seems to be a lot of disjoint discussion on Meta about this. Viz:

There is work that has been done by Taw on an OCAML lexer at
http://meta.wikipedia.org/wiki/Wikipedia_lexer

My suggestions would be "the broken wikitext language", or the "invalid 
wikitext language".
Because of its UseMod ancestry, the current parser produces some very bad 
HTML code*, and in particular handles lists and nesting of blocks really 
badly.

not so bad if HTML 3.2 or 4 is our target, but it would be nice to be

able to produce clean XHTML.
A few months back I started work on a ValidWiki parser, which has a much 
stronger concept of block and line elements, and uses both block and line 
stacks to open and close all elements correctly.
I think I'm about 2/3 of the way through the block parser, and hadn't yet 
written the line parser. I have no idea how the code would comapre for 
efficiency.
Unfortunately the only language I know how to code in is MivaScript, so it 
would need porting. (Miva performs okay for your mid-level merchant 
application, but doesn't have the efficiency for something with the 
workload of Wikipedia.
Uhm, my parser has block stack + line stack architecture too.
But the sources at http://meta.wikipedia.org/wiki/Wikipedia_lexer aren't
the most recent.
Newer sources attached.
It's not complete but it wasn't really meant to be.
It was meant to be a proof of concept that a mix of wiki markup and HTML can
be parsed in a XHTML-correct and DWIM way extremely efficiently.
Concept proven, but integrating the parser with the rest of Wikipedia would
take much more time than I'm willing to spend right now.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Wikitext to HTML translator and Wikitext language specification