Hi,
I'm going on holiday for the next week. Accordingly, I will not be able to work on the lex/yacc parser that I have written during the past weeks or so. I will check into CVS my work so far, and anyone interested can continue the work while I am away.
So far, the parser can do:
* paragraphs * pre-lines (lines beginning with spaces) * lists (* and # only) * extensions (<math>, <hiero>) * headings * bold and italics
I am sorry I took so long to do bold and italics, but, just as I originally anticipated, it was quite hard. I had discarded two failed attempts until the third one finally worked out. There is one special case in which I had to apply a bit of a hack, but I am sure that this is okay, given that it works pretty much perfectly now.
As for "extensions", it currently recognises anything as an extension that is an HTML tag without attributes and its corresponding closing tag. Using this mechanism, <nowiki> and <pre> can be considered "extensions" for the purposes of the parser.
What is missing:
* links, images, categories (everything in [[ ... ]]) * template inclusions and variables ({{...}} and {{{...}}}) * tables * HTML tags that should be allowed but are not extensions (esp. div)
The lexer already recognises tokens for the former two, but not for tables or HTML tags. In particular, it will recognise something like <b>''something''</b> as an "extension" and not parse the '' as italics. Obviously, this needs to be fixed.
If anything is unclear about how things work, please drop me an e-mail and I will document the relevant bits when I am back.
Timwi
On Fri, 20 Aug 2004 22:07:51 +0100 Timwi timwi@gmx.net wrote:
Hi,
I'm going on holiday for the next week. Accordingly, I will not be able to work on the lex/yacc parser that I have written during the past weeks or so. I will check into CVS my work so far, and anyone interested can continue the work while I am away.
So far, the parser can do:
- paragraphs
- pre-lines (lines beginning with spaces)
- lists (* and # only)
- extensions (<math>, <hiero>)
- headings
- bold and italics
I am sorry I took so long to do bold and italics, but, just as I originally anticipated, it was quite hard. I had discarded two failed attempts until the third one finally worked out. There is one special case in which I had to apply a bit of a hack, but I am sure that this is okay, given that it works pretty much perfectly now.
As for "extensions", it currently recognises anything as an extension that is an HTML tag without attributes and its corresponding closing tag. Using this mechanism, <nowiki> and <pre> can be considered "extensions" for the purposes of the parser.
What is missing:
- links, images, categories (everything in [[ ... ]])
- template inclusions and variables ({{...}} and {{{...}}})
- tables
- HTML tags that should be allowed but are not extensions (esp. div)
The lexer already recognises tokens for the former two, but not for tables or HTML tags. In particular, it will recognise something like <b>''something''</b> as an "extension" and not parse the '' as italics. Obviously, this needs to be fixed.
If anything is unclear about how things work, please drop me an e-mail and I will document the relevant bits when I am back.
Hi
I don't manage to compile your source files. << wikiparse.y: conflits: 119 décalage/réduction, 13 réduction/réduction
Can you give us the required versions of flex and bison ?
thx
Emmanuel
Emmanuel Engelhart wrote:
I don't manage to compile your source files. << wikiparse.y: conflits: 119 décalage/réduction, 13 réduction/réduction
It *does* work (you should now have a file called wikiparse.tab.c)
"Conflicts" means that the grammar isn't perfect yet.
Regards, Stephan
Stephan Walter wrote:
It *does* work (you should now have a file called wikiparse.tab.c) "Conflicts" means that the grammar isn't perfect yet.
While I won't claim my grammar is perfect, the conflicts don't mean there's anything wrong with it. The grammar is written in such a way that the default disambiguation rules do the right thing. There is no need to get rid of the conflicts. It would only make the grammar less legible.
Timwi
Emmanuel Engelhart wrote:
I don't manage to compile your source files. << wikiparse.y: conflits: 119 décalage/réduction, 13 réduction/réduction
Here's how you compile it:
bison -d wikiparse.y flex wikilex.l gcc wikiparse.tab.c parsetree.c lex.yy.c -o wikiparse
Regards, Stephan
wikitech-l@lists.wikimedia.org