[Wikitech-l] lex/yacc parser

21 Aug 2004


      Hi,
I'm going on holiday for the next week.  Accordingly, I will not be able 
to work on the lex/yacc parser that I have written during the past weeks 
or so.  I will check into CVS my work so far, and anyone interested can 
continue the work while I am away.
So far, the parser can do:
* paragraphs
* pre-lines (lines beginning with spaces)
* lists (* and # only)
* extensions (<math>, <hiero>)
* headings
* bold and italics
I am sorry I took so long to do bold and italics, but, just as I 
originally anticipated, it was quite hard.  I had discarded two failed 
attempts until the third one finally worked out.  There is one special 
case in which I had to apply a bit of a hack, but I am sure that this is 
okay, given that it works pretty much perfectly now.
As for "extensions", it currently recognises anything as an extension 
that is an HTML tag without attributes and its corresponding closing 
tag.  Using this mechanism, <nowiki> and <pre> can be considered 
"extensions" for the purposes of the parser.
What is missing:
* links, images, categories (everything in [[ ... ]])
* template inclusions and variables ({{...}} and {{{...}}})
* tables
* HTML tags that should be allowed but are not extensions (esp. div)
The lexer already recognises tokens for the former two, but not for 
tables or HTML tags. In particular, it will recognise something like 
<b>''something''</b> as an "extension" and not parse the '' as italics. 
Obviously, this needs to be fixed.
If anything is unclear about how things work, please drop me an e-mail 
and I will document the relevant bits when I am back.
Timwi

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[Wikitech-l] lex/yacc parser