The trouble there is that <ref> for example can contain wikitext...which needs to be parsed. e.g.:
<ref>''The origin of species'', Darwin</ref>
So at a minimum I think we would need to distinguish those
extensions
whose internal text needs to be parsed?
No. If a tag-style extension wants to support wiki text, it has to explicitly invoke a new parser pass on the text contained between the tags. The text MUST NOT be parsed/transformed before being passed to the extension, and what the extension returns must not be parsed either (the latter is only partially true for the current parser, but i would call that a bug, not a feature - see bug 8997).
A <ref> essentially changes the output destination of the parser.
If your building a XHTML DOM document , the ref handler just needs to switch the output destination to <li> of a references list, and lets the parser continue. </ref> resets it back to where ever it was.
And when see a <references/> tag the list is inserted into the main document.
That's how I've implemented it anyway.
Jared