For some while we've been using HTML Tidy to do additional correction and clean-up on output. Normally this is relatively quick compared to the other database overhead on page edits, though the time can get relatively large for huge-o pages.
More seriously, forking and spawning an external tidy program can be a bigger problem when the system's under heavy load.
I've checked into CVS HEAD the ability to use the PECL extension which exposes an interface to the tidy library in-process. This speeds things up a bit:
Short page ([[Stuff]], about 2.5k of HTML): 0.010ms no-op 2.891ms internalTidy 8.710ms externalTidy
Long page (a village pump page, 450k+ of HTML): 1.783ms no-op 266.066ms internalTidy 306.098ms externalTidy
Testing on a heavily loaded system the difference can go waaay up!
10 simultaneous tidy test threads: Short page: 0.010ms no-op 2.736ms internalTidy 213.108ms externalTidy
Long page: 2.343ms no-op 565.822ms internalTidy 5868.871ms externalTidy
Heavy disk seeking (make clean on a GCC build) + 10 simultaneous tidy test threads; Short page: 0.010ms no-op 2.637ms internalTidy 928.098ms externalTidy
Long: 2.353ms no-op 4305.380ms internalTidy 6686.658ms externalTidy
This is coded for the PHP 4.3.x version of the extension, and may not work on PHP5. Once installed ('pear install tidy' and add 'extension=tidy.so' to php.ini) it should automatically be picked up if you've got $wgUseTidy on.
The changes are localized and don't alter the code interface, so I'll backport this to 1.4 as a performance fix.
-- brion vibber (brion @ pobox.com)