On 08/19/2015 08:22 AM, MZMcBride wrote:
And, as
several others have noted, you can't just disable Tidy, since
the effects of unclosed tags are not confined to the content area, and
there is a large amount of existing content that depends on it. I have
seen the effects of Tidy being accidentally disabled on the English
Wikipedia, it is not pleasant.
Am I correct in saying that MZMcBride is the only person in this
thread in favour of the idea of getting rid of HTML cleanup?
I think it depends
what you mean by "HTML cleanup." Are you referring only
to "fixing" mismatched HTML elements or are you also referring to
reimplementing all of the other behavior that Tidy brings in?
Bartosz wrote:
We really do need this feature. Not anything else
that Tidy does, most
of its behavior is actually damaging, but we need to match the open and
close tags to prevent the interface from getting jumbled.
My reading of this
thread is that this is the consensus view. The problem,
as I see it, is that Tidy has been deployed long enough that some users
are also relying on all of its other bad behaviors. It seems to me that a
replacement for Tidy either has to reimplement all of its unwanted
behaviors to avoid breakage with current wikitext or it has to break an
unknown amount of current wikitext.
In response to both these queries, see this
snippet from my earlier post
on this thread (
https://lists.wikimedia.org/pipermail/wikitech-l/2015-August/082806.html )
"Even replacing it with a HTML5 parser (as per the current
plan) is not entirely straightforward simply because of all the other
unrelated-to-html5-semantics behavior. Part of the task of replacing
Tidy is to figure out all the ways those pages might break and the best
way to handle that breakage."
Also see
https://phabricator.wikimedia.org/T89331#1499979 about how we
might go about evaluating this.
So, we aren't saying we'll implement those Tidy behaviors here. Part of
the solution might very well be to break some of that Tidy behavior and
have the pages be fixed up (bots, manually, however). In any case, the
first step is to understand those impacts.
Subbu.