----- Original Message -----
From: "Mihály Héder" hedermisi@gmail.com
By following this list I hope I gathered how they plan to tackle this really hard problem: -a functional decomposition of what the current parser does to a separate tokenizer, an AST(aka WOM or now just DOM) builder and a serializer. Also, AST building might be further decomposed to the builder part and an error handling according to html specs. -in architecture terms, all this will be a separate component, unlike the old php parser which is really hard to take out from the rest of the code. In this setup there is hope that the tokenizing task can be specified with a set of rules, thus effectively creating a wikitext tokenizing standard (already a great leap forward!) Then the really custom stuff (because wikitext still lacks a formal grammar) can be encapsulated in AST building.
As I noted in a reply I wrote on this thread a few minutes ago (but it was kinda buried): there are between 4 and 7 projects with varying stages of seriosity that are already in work, some of them having posted to this list one or more times.
At least a couple of them had as a serious goal producing a formalized, architecturally cleaner parser that could be dropped into Mediawiki.
The framing of your reply suggests that you needed to know that and didn't.
Cheers, -- jra