Robert Rohde wrote:
Some years back I was importing a large number of complex templates to a wiki that didn't have tidy enabled. The results were nothing short of horrendous in a substantial number of cases. Wiki authors will generally stop worrying about their code as long as the results look right. For good or ill, tidy does a remarkable job of localizing unclosed tags, and often that is enough to effectively fix the appearance of broken HTML syntax so it doesn't spill over into other sections. Without Tidy (or its equivalent) there will be a lot of template garbage that needs to be repaired.
As we get saner input mechanisms (CodeEditor, VisualEditor, ScoreEditor, etc.), we'll likely see a reduction in direct HTML editing, which seems to be what most often results in introducing layout-disrupting invalid input.
The garbage in -> garbage out approach might seem appealing in principle, but any transition to such a condition is going to dredge up a lot of malformed HTML code created by wiki editors that we've been hiding for many years. If one is going to replace Tidy with something substantially different in execution, I would suggest that one needs a significant test suite of complex pages in order to judge how bad the collateral damage is likely to be, and ideally some set of tools to help editors fix it.
I think dredging up bad input in order to fix it is appropriate. A transition period could include the ability to temporarily render a page without Tidy enabled to see what issues present themselves. As I said previously, browsers are fairly resilient to moderately bad input, but even the really bad code should probably be properly addressed via the wiki process instead of being glossed over with magical fixes and replacements in the form of Tidy.
In addition to following the garbage principle, we would also be following the idea of failing fast and loudly, if the layout gets borked by a missing tag, for example.
(In continuing to think about this problem generally and how other sites/platforms have solved or mitigated it, it's amusing to me that we allow div, span, and inline styling and arbitrary attributes (both of which require separate sanitization), and yet we continue to disallow rendering of the anchor element.)
MZMcBride