Re: [Wikitech-l] Re: java based format parser

14 Feb 2005


      Hi,
thanks for the response to everybody.
Some comments.
+ Using c in java makes for my project no sense and I personal made a 
lot of bad experience with jni.
+ javacc would be from the performance point of view the best choice 
however it will be very hard to write a javacc parser for the wikipedia 
syntax, since the syntax is not very strictly.
+ Axel parser looks interesting since - at first it already exists. :-) 
and the concept of using Radeox Wiki Engine make sense.
However as Axel already mentioned the code is not very 'contribution' 
friendly and it is not a own 'jar'.
Before I heard of Axel parser I spend some hours to write a parser 
based on apache nekohtml.
http://www.apache.org/~andyc/neko/doc/html/
The advantage is that nekohtml can be extended by filters, use the 
Xerces Native Interface (XNI) framework and already will parse some 
possible html snippets in the text.
Attention the result of the nekohtml parser extend by a wikipedia 
filter set is not HTML but a in memory DOM object that can be 
transformed to xhtml.
I have more or less written the basic stuff and defined an interface 
that need to be implemented to handle the content of a 'tagged' text 
snippet.
Anyway I need now to implement as many Classes as tags exist in the 
wikipedia syntax. :-o
For me two possibilities would be interesting refactor and separate the 
parser from Axel.
Or I can contribute my code in cvs somewhere and we clue things 
together.
Anyway I personal searching for quick solution. ;-)
Stefan
---------------------------------------------------------------
company:		http://www.media-style.com
forum:		http://www.text-mining.org
blog:			http://www.find23.net

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Re: java based format parser