Timwi schrieb:
Hello everybody. I haven't participated on this list for a while, but I have been drawn attention to this thread.
With that said, I am interest in such project if it involves coding. To make this change into a valid summer of code project, I propose to do a wiki parser, for which I have already designed some draft rules in a yacc/bison manner.
I am interested in continuing developing my parser (flexbisonparse) if other people are interested in helping me. I am happy to explain what I remember of how it works, because I know it's really hard to figure out, but I'm sure it's not that hard to explain.
I am disappointed that people are *still* trying to re-start the effort from scratch. Surely the plethora of existing parsers has shown that every new effort will end up the same, especially if no effort is made to understand the existing unfinished products and to recognise their flaws and faults. You'll just make the same mistakes again and again.
As this is probably aimed at me :-) I tried to improve the flexbisonparse software, but I don't have that much experience with the matter, so I failed to understand the inner workings, especially of the lexer, which frankly looks like crypto to me ;-)
My wiki2xml acually rebulids the workings of a "real" parser, except it is not generated by a compiler-compiler but manually. While this potentially adds another error source, it is not that different from a real parser IMHO.
My mistake was to fail to recognise the importance and complexity of HTML and HTML-like tags in the wiki mark-up. My parser can parse everything non-HTML/SGML-based that was part of the syntax at the time I wrote it. With co-operation, I'm sure we can do the rest. Without, I'm sure no-one can.
I think you'll also find that "template hell" has gotten worse since the day. While improving wiki2xml, I found lots of constructs that cannot be resolved by putting the template name and paramaters in a neat XML tag. Live transclusion and changing of the very text that is parsed is IMHO the only solution to generate valid XML while maintaining the indended result. This might prove hard to do in bison, though.
Magnus