Hello everybody. I haven't participated on this
list for a while, but I
have been drawn attention to this thread.
With that said, I am interest in such project if
it involves coding.
To make this change into a valid summer of code project, I propose to
do a wiki parser, for which I have already designed some draft rules
in a yacc/bison manner.
I am interested in continuing developing my parser (flexbisonparse) if
other people are interested in helping me. I am happy to explain what I
remember of how it works, because I know it's really hard to figure out,
but I'm sure it's not that hard to explain.
I am disappointed that people are *still* trying to re-start the effort
from scratch. Surely the plethora of existing parsers has shown that
every new effort will end up the same, especially if no effort is made
to understand the existing unfinished products and to recognise their
flaws and faults. You'll just make the same mistakes again and again.
As this is probably aimed at me :-) I tried to improve the
flexbisonparse software, but I don't have that much experience with the
matter, so I failed to understand the inner workings, especially of the
lexer, which frankly looks like crypto to me ;-)
My wiki2xml acually rebulids the workings of a "real" parser, except it
is not generated by a compiler-compiler but manually. While this
potentially adds another error source, it is not that different from a
real parser IMHO.
My mistake was to fail to recognise the importance and
HTML and HTML-like tags in the wiki mark-up. My parser can parse
everything non-HTML/SGML-based that was part of the syntax at the time I
wrote it. With co-operation, I'm sure we can do the rest. Without, I'm
sure no-one can.
I think you'll also find that "template hell" has gotten worse
day. While improving wiki2xml, I found lots of constructs that cannot be
resolved by putting the template name and paramaters in a neat XML tag.
Live transclusion and changing of the very text that is parsed is IMHO
the only solution to generate valid XML while maintaining the indended
result. This might prove hard to do in bison, though.