On Thu, Sep 23, 2010 at 8:47 AM, Andreas Jonsson andreas.jonsson@kreablo.se wrote:
. . . You can can come up with thousands of situations like this, and without a consistent plan on how to handle them, you will need to add thousands of border cases to the code to handle them all.
I have avoided this by simply disabling all html block tokens inside wikitext list items. Of course, it may be that someone is actually relying on being able to mix in this way, but it doesn't seem likely as the result tends to be strange.
The way the parser is used in real life is that people just write random stuff until it looks right. They wind up hitting all sorts of bizarre edge cases, and these are propagated to thousands of pages by templates. A pure-PHP parser is needed for end users who can't install binaries, and any replacement parser must be compatible with it in practice, not just on the cases where the pure-PHP parser behaves sanely. In principle, we might be able to change parser behavior in lots of edge cases and let users fix the broken stuff, if the benefit is large enough. But we'd have to have a pure-PHP parser that implements the new behavior too.
The parts you considered to be the hard parts are not that hard. We've had lots of parser projects, and I'm sure some have handled those. The hard part is coming up with a practical way to integrate the new and simplified parser into MediaWiki in such a way that it can actually be used at some point on sites like Wikipedia. Do you have any plans for how to do that?