On Friday 13 August 2004 20:59, Brion Vibber wrote:
Magnus Manske wrote:
I therefore suggest a new structure:
- Preprocessor
- Wiki markup to XML
- XML to (X)HTML
This doesn't actually solve any of the issues with the current parser, since it merely has it produce a different output format.
The main problems are that we have a mess of regexps that stomp on each other all the time.
Are you kidding? That is exactly what it would solve! If you would let the preprocessor be generated with a lex/yacc type of tool then you would for the first time have a decent formal documentation of the wiki-syntax in the form of a context-free grammar. That not only would give you a better idea of what the wiki-syntax exactly is and tell you exactly whether any new mark-up interferes with old mark-up, but you could also more easily add context-sensitive rules (like replacing 2 dashes with — but only in normal text). Moreover it would give you the power to make small changes to the mark-up language because you could easily generate a parser that translates all old texts to the new mark-up. Finally, having an explicit grammar also makes it more easy to make sure that you actually generate well-formed and valid XHTML, or anything else that you would like to generate from it and that needs somehow to satisfy a certain syntax.
It's simply a briliant idea, and frankly I think it is in the long run as unavoidable as the step to a database-backend. If there is performance problem you could even consider storing the XML in the database so you only need do the raw parse at write time and the xml parse at read time.
That hard part is of course to come up with the contex-free grammar (it should probably be LALR(1) at that). Since I used to teach compiler theory I might be of some help there.
-- Jan Hidders
PS. You could even get rid of the OCaml code since the Latex parsing could be integrated in the general parser.