Stefan Groschupf schrieb:
Before I heard of Axel parser I spend some hours to write a parser based on apache nekohtml. http://www.apache.org/~andyc/neko/doc/html/
The advantage is that nekohtml can be extended by filters, use the Xerces Native Interface (XNI) framework and already will parse some possible html snippets in the text. Attention the result of the nekohtml parser extend by a wikipedia filter set is not HTML but a in memory DOM object that can be transformed to xhtml.
I have more or less written the basic stuff and defined an interface that need to be implemented to handle the content of a 'tagged' text snippet. Anyway I need now to implement as many Classes as tags exist in the wikipedia syntax. :-o
For me two possibilities would be interesting refactor and separate the parser from Axel. Or I can contribute my code in cvs somewhere and we clue things together.
Sounds interesting, could you please send me the current sources so that I can make some tests with the parser inside the eclipse plugin?
If you like I can also give you CVS access.