Tim Starling wrote:
If the only thing missing from JAMWiki was ParserFunctions, that would be very impressive. ParserFunctions is simple. And indeed, there's a lot of really impressive code in there, although it's easy to find edge cases that don't work the same way.
True; it was just one of the first things we ran into with basic rendering of Wikipedia pages.
For str_repeat("[http://a] ", 1000), it took so long that I gave up waiting. MediaWiki does either of these things in linear time, on the order of hundreds of microseconds per loop.
[...]
It's unfortunate that a modern parser generator for a supposedly fast language like Java can't match hand-optimised PHP for speed. It's not like we've set a high bar here.
I'm not sure about not having set a high bar... However, we can confirm the parser generator vs hand-optimized parser issue. You just showed that JFlex, the parser generator used by JAMWiki doesn't scale up nicely. We found the same for ANTLR, another parser generator for Java, which also doesn't perform as well as MediaWiki when run against stripped down pages (our parser parses Wiki Creole which on a stripped down level is equivalent to MediaWiki syntax) [1]. MediaWiki performed equally well or better; in general I think the advantage of parser generator is easier maintainability and clarity of the language (you can view the grammar as a domain-specific language for describing acceptable syntax), but not performance :-(
Thanks for your insights!
Dirk
[1] http://www.riehle.org/2008/07/19/a-grammar-for-standardized-wiki-markup/