On Mon, 13 Oct 2003 17:21:15 -0400, David Friedland <david(a)nohat.net> gave
utterance to the following:
There seems to be a lot of disjoint discussion on Meta
about this. Viz:
* There is work that has been done by Taw on an OCAML lexer at
<http://meta.wikipedia.org/wiki/Wikipedia_lexer>
* There are some links at
<http://meta.wikipedia.org/wiki/Wikitext_syntax>
* A proposal for a radically different Wiki text language at
<http://meta.wikipedia.org/wiki/Wikitax>
* A brief take at
<http://meta.wikipedia.org/wiki/Wiki_markup_syntax>
* A nearly content-free page at
<http://meta.wikipedia.org/wiki/Wiki_syntax>
* A draft XML syntax of Wikitext at
<http://meta.wikipedia.org/wiki/Wikipedia_DTD>
Clearly there needs to be some kind of centralized place for work on
formalizing the language. I would suggest the recently-created
<http://meta.wikipedia.org/wiki/Wikitext_standard>
Right now what we should work on, is like Ed says, to describe and
formalize a 1.0 version of the Wikitext language, based on what is used
currently. In other words this work should not (for right now) involve
incorporating improvements or changes to the Wikitext language.
Moving on...
First, a couple issues of nomenclature that we should probably get out
the way:
(1) We need to decide on a name for the wiki markup language or Wiki
text. I would advocate calling the language "Wikitext" (and calling it
"The Wikitext language" when usage might be ambiguous, like "C or
"The C
Language"). This seems to be common usage.
My suggestions would be "the broken wikitext language", or the "invalid
wikitext language".
Because of its UseMod ancestry, the current parser produces some very bad
HTML code*, and in particular handles lists and nesting of blocks really
badly.
* not so bad if HTML 3.2 or 4 is our target, but it would be nice to be
able to produce clean XHTML.
A few months back I started work on a ValidWiki parser, which has a much
stronger concept of block and line elements, and uses both block and line
stacks to open and close all elements correctly.
I think I'm about 2/3 of the way through the block parser, and hadn't yet
written the line parser. I have no idea how the code would comapre for
efficiency.
Unfortunately the only language I know how to code in is MivaScript, so it
would need porting. (Miva performs okay for your mid-level merchant
application, but doesn't have the efficiency for something with the
workload of Wikipedia.
--
Richard Grevers
Between two evils always pick the one you haven't tried