Ivan Krstic wrote:
Pedro de Medeiros wrote:
To make this change into a valid summer of code
project, I propose to
do a wiki parser, for which I have already designed some draft rules
in a yacc/bison manner.
Have you looked at the existing parser attempts in SVN (I don't remember
if they're all still there)? Getting the first 90% of a real parser for
MediaWiki syntax will take a small fraction of the time required to get
a full parser. This makes it easy to create another
almost-but-not-quite-finished parser by the end of the summer, and we'd
be no better off for it.
I strongly recommend investigating the existing parser attempts, and
finishing one of them.
SVN module "wiki2xml", directory "php". Includes a mostly-working,
reasoably fast converter (almost-parser) to XML, and several subsequent
converters to XHTML, DocBook, OpenDocument, plain text. Includes a
script to convert a wikipedia dump to lots'o'text files, which can then
be browsed offline based on the wiki-to-XML-to-(X)HTML converters. I'm
currently working on plugging in the lucene engine to add offline