Well, whatever the problem is, I suspect I know one way that would fix it: rewriting the parser in C(++). Unfortunately, that's a whole lot easier said than done. Rewriting even part of it, though, say replaceVariables, might be a big benefit.
Working out what the parser is actually meant to do would be required first, though. At the moment it does what it does and that the best anyone can say. Trying to translate that idiosyncratic behaviour into a new language would be a nightmare.
For now it might be best to refine our heuristics of what's slow to render. Currently we use a simple text-length heuristic, but perhaps it would make more sense to incorporate additional criteria. Maximum number of template inclusions? Maximum template depth? It would require testing to see what would be effective.
I suspect depth would be the best one to try. People can tell by looking at an article's source how many templates there are, and can keep that under control. Telling how deep templates go is often impossible for anyone that isn't an expert on MediaWiki template syntax, so they could easily end up with 100s of templates being processed without noticing.