Hi Roan&Daniel!
Do you need the clean DOM just for reading, or for writing as well?
Read and write. WikiTrust does it very fast before the other scripts run. Then it releases the ready lock for all other scripts and adds its user interface.
use AJAX to fetch the HTML source of the page, and work with that
I did it. That doubled the traffic and destroyed most of the other scripts. Toggle TableOfContents, Maps and so on.
It would be really nice if your blame engine didn't rely on character offsets in the HTML, but used something more robust.
I did not see any error because of this so far. As long as we know what MediaWiki does we can be responsive on that.
As you said, the preferred implementation would be something that's close to the parser and puts extra annotations (like <span> tags) in the parser-generated HTML
You talk about up to several megabytes per page.
Server-side blaming doesn't have to be expensive as long as you use an incremental blame-as-you-go implementation where you store a blame map for each revision, and after each edit you use the edit diff and the previous revision's blame map to generate the new revision's blame map.
This is what Collaborativetrust already does. Unfortunately it does it not well.
Thomas