Robert Rohde wrote:
Some years back I was importing a large number of
complex templates to a
wiki that didn't have tidy enabled. The results were nothing short of
horrendous in a substantial number of cases. Wiki authors will generally
stop worrying about their code as long as the results look right. For
good or ill, tidy does a remarkable job of localizing unclosed tags, and
often that is enough to effectively fix the appearance of broken HTML
syntax so it doesn't spill over into other sections. Without Tidy (or
its equivalent) there will be a lot of template garbage that needs to be
repaired.
As we get saner input mechanisms (CodeEditor, VisualEditor, ScoreEditor,
etc.), we'll likely see a reduction in direct HTML editing, which seems to
be what most often results in introducing layout-disrupting invalid input.
The garbage in -> garbage out approach might seem
appealing in principle,
but any transition to such a condition is going to dredge up a lot of
malformed HTML code created by wiki editors that we've been hiding for
many years. If one is going to replace Tidy with something substantially
different in execution, I would suggest that one needs a significant test
suite of complex pages in order to judge how bad the collateral damage is
likely to be, and ideally some set of tools to help editors fix it.
I think dredging up bad input in order to fix it is appropriate. A
transition period could include the ability to temporarily render a page
without Tidy enabled to see what issues present themselves. As I said
previously, browsers are fairly resilient to moderately bad input, but
even the really bad code should probably be properly addressed via the
wiki process instead of being glossed over with magical fixes and
replacements in the form of Tidy.
In addition to following the garbage principle, we would also be following
the idea of failing fast and loudly, if the layout gets borked by a missing
tag, for example.
(In continuing to think about this problem generally and how other
sites/platforms have solved or mitigated it, it's amusing to me that we
allow div, span, and inline styling and arbitrary attributes (both of
which require separate sanitization), and yet we continue to disallow
rendering of the anchor element.)
MZMcBride