Technical hurdles for enabling $wgHtml5 on Wikimedia sites?

List overview All Threads
Download

newer

older

Offline Sandbox, IP address limit,...

Personal sandbox space in Gerrit

Rob Lanphier

28 Jun 2012 28 Jun '12

9:11 p.m.

Hi all, We have a longstanding request to enable HTML5 on all sites: https://bugzilla.wikimedia.org/show_bug.cgi?id=27478 We've had it enabled on mediawiki.org for ages, with minimal death and mayhem. There are two issues listed as blockers: Bug 30525: Search bar icon/button slightly lower when html5 mode is enabled https://bugzilla.wikimedia.org/show_bug.cgi?id=30525 Bug 36495: Sanitizer incorrectly converts align="right" for elements that are not table-cells https://bugzilla.wikimedia.org/show_bug.cgi?id=36495 Bug 30525 doesn't seem like a blocker to me (but patches definitely welcome). Bug 36495 seems more likely to cause problems, though I'd like to nudge Krinkle to explain Comment 9. Assuming we can either get these fixed, or agree they aren't blockers, I say we set a date and go. Should we plan on sometime in July (say a week or two after Wikimania)? Rob

Show replies by date

Max Semenik

28 Jun 28 Jun

10:11 p.m.

New subject: Technical hurdles for enabling $wgHtml5 on Wikimedia sites?

On 29.06.2012, 1:11 Rob wrote:

...

We have a longstanding request to enable HTML5 on all sites: https://bugzilla.wikimedia.org/show_bug.cgi?id=27478

...

We've had it enabled on mediawiki.org for ages, with minimal death and mayhem. There are two issues listed as blockers:

...

Bug 30525: Search bar icon/button slightly lower when html5 mode is enabled https://bugzilla.wikimedia.org/show_bug.cgi?id=30525

Doesn't look that scary.

...

Bug 36495: Sanitizer incorrectly converts align="right" for elements that are not table-cells https://bugzilla.wikimedia.org/show_bug.cgi?id=36495

I could poke at it as part of my 20% time.

...

Bug 30525 doesn't seem like a blocker to me (but patches definitely welcome). Bug 36495 seems more likely to cause problems, though I'd like to nudge Krinkle to explain Comment 9.

...

Assuming we can either get these fixed, or agree they aren't blockers, I say we set a date and go. Should we plan on sometime in July (say a week or two after Wikimania)?

I say go for it. Some people are always going to whine, but we shouldn't wait forever for a few XML-loving bots to upgrade. We recently added IPv6 support, yet Wikipedia didn't die in pain while some anti-vandalism tools were broken. Same thing here. -- Best regards, Max Semenik ([[User:MaxSem]])

K. Peachey

10:51 p.m.

New subject: Technical hurdles for enabling $wgHtml5 on Wikimedia sites?

On Fri, Jun 29, 2012 at 7:11 AM, Rob Lanphier <robla(a)wikimedia.org> wrote:

...

We've had it enabled on mediawiki.org for ages, with minimal death and mayhem. There are two issues listed as blockers:

We don't have half of the <del>automated crap</del><ins>Anti-vandal and random other tools</ins> running around on mw wiki. As far as my view, We have given them enough warnings, If they still aren't fixed to use the API (and/or not suck in HTML5 mode) it's their loss and not ours if they break.

...

Bug 30525: Search bar icon/button slightly lower when html5 mode is enabled https://bugzilla.wikimedia.org/show_bug.cgi?id=30525

...

Bug 30525 doesn't seem like a blocker to me (but patches definitely welcome).

It isn't a "blocker", that is just the best way to link stuff in BZ, I filed it just as a "note" bug to look at when we change to the world of tomorrow (HTML5).

Helder .

29 Jun 29 Jun

12:26 a.m.

New subject: Technical hurdles for enabling $wgHtml5 on Wikimedia sites?

Anomie pointed out on enwiki's Village Pump[1] the problem with the Cite extension mentioned on https://bugzilla.wikimedia.org/show_bug.cgi?id=27478#c12 Will $wgExperimentalHtmlIds be set to false? How is it configured on mw.org? Best regards, Helder [1] https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)?oldid=4998… On Thu, Jun 28, 2012 at 7:51 PM, K. Peachey <p858snake(a)gmail.com> wrote:

...

On Fri, Jun 29, 2012 at 7:11 AM, Rob Lanphier <robla(a)wikimedia.org> wrote:

We've had it enabled on mediawiki.org for ages, with minimal death and mayhem. There are two issues listed as blockers:

Bug 30525: Search bar icon/button slightly lower when html5 mode is enabled https://bugzilla.wikimedia.org/show_bug.cgi?id=30525

Bug 30525 doesn't seem like a blocker to me (but patches definitely welcome).

It isn't a "blocker", that is just the best way to link stuff in BZ, I filed it just as a "note" bug to look at when we change to the world of tomorrow (HTML5). _______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Rob Lanphier

5:25 p.m.

New subject: Technical hurdles for enabling $wgHtml5 on Wikimedia sites?

Hi Helder, Thanks for posting this to VPT, and relaying things back here. Comments inline... On Thu, Jun 28, 2012 at 5:26 PM, Helder . <helder.wiki(a)gmail.com> wrote:

...

Anomie pointed out on enwiki's Village Pump[1] the problem with the Cite extension mentioned on https://bugzilla.wikimedia.org/show_bug.cgi?id=27478#c12

I'm not sure. I replied on VPT though. It would be great if someone could repro this problem on test2, and then, if its still a problem, file a separate bug.

...

Will $wgExperimentalHtmlIds be set to false? How is it configured on mw.org?

Doesn't seem to be explicitly mentioned in our site config, so I think this is false. Rob [1] http://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#HTML5_is_co…

MZMcBride

30 Jun 30 Jun

12:22 a.m.

New subject: Technical hurdles for enabling $wgHtml5 on Wikimedia sites?

Rob Lanphier wrote:

...

Assuming we can either get these fixed, or agree they aren't blockers, I say we set a date and go. Should we plan on sometime in July (say a week or two after Wikimania)?

Your e-mail was unclear to me. It's difficult to tell whether you just looked at the blockers of bug 27478 or if you read (all of) the bug's comments (and the related previous mailing list discussions about this). Are you following the deployment plan outlined by Roan here: <https://bugzilla.wikimedia.org/show_bug.cgi?id=27478#c18>? (It was a follow-up to Aryeh's post here: <http://lists.wikimedia.org/pipermail/wikitech-l/2011-June/053775.html>. As I understand it, the "enable HTML5 on Wikimedia wikis" goal has become a bit murky. There's $wgHtml5, but that's distinct from setting the doctype (which is what I think most people consider to be the most relevant part). It's unclear how many features (or more pointedly how many lines of additional code) are dependent on this configuration variable, which was part of the reason Aryeh laid out the deployment plan he did. It's also unclear whether every issue reported in the comments of bug 27478 were filed as separate bugs. In particular, I'm unsure if Cite was ever properly fixed (or if Aryeh's mentioned alternate, stop-gap solution was implemented). As I recall, the Cite breakage was breaking links in articles. MZMcBride

Rob Lanphier

2 Jul 2 Jul

4:36 p.m.

New subject: Technical hurdles for enabling $wgHtml5 on Wikimedia sites?

On Fri, Jun 29, 2012 at 5:22 PM, MZMcBride <z(a)mzmcbride.com> wrote:

...

Are you following the deployment plan outlined by Roan here: <https://bugzilla.wikimedia.org/show_bug.cgi?id=27478#c18>? (It was a follow-up to Aryeh's post here: <http://lists.wikimedia.org/pipermail/wikitech-l/2011-June/053775.html>.

That plan may be more conservative than we need to be, given it's been enabled on mediawiki.org for so long. At the time Aryeh wrote that, the feature hadn't been as well tested as it is now. That's not to say that we won't find bugs, but that I don't think there will be as many, that they aren't likely to be severe, and it seems we're in a better position to address them quickly than we were when that was written. I wouldn't mind going that route if a lot of other people feel we should, but it seems likely to me that we might accidentally introduce production glitches in the process of implementing the interim steps, and that there could very well be bugs in the interim states that don't occur in the final stage.

...

As I understand it, the "enable HTML5 on Wikimedia wikis" goal has become a bit murky. There's $wgHtml5, but that's distinct from setting the doctype (which is what I think most people consider to be the most relevant part).

Are you sure that $wgHtml5 is distinct from the doctype? It looks like mediawiki.org also has the doctype set, and it looks as though Html.php sets it based on that variable.

...

It's also unclear whether every issue reported in the comments of bug 27478 were filed as separate bugs. In particular, I'm unsure if Cite was ever properly fixed (or if Aryeh's mentioned alternate, stop-gap solution was implemented). As I recall, the Cite breakage was breaking links in articles.

This is what I'm hoping we can get some clarity on. How many of those comments are still relevant? FWIW, I'm not in a big rush to enable this; it's just that it seems like we're running out of good reasons not to just do it already. Rob

MZMcBride

3 Jul 3 Jul

6:51 a.m.

New subject: Technical hurdles for enabling $wgHtml5 on Wikimedia sites?

Rob Lanphier wrote:

...

On Fri, Jun 29, 2012 at 5:22 PM, MZMcBride <z(a)mzmcbride.com> wrote:

I agree that it's more conservative and likely needlessly so. Aryeh has since clarified that the source of most of the previous breakage ($wgExperimentalHtmlIds) was enabled and then re-disabled by default: <https://bugzilla.wikimedia.org/show_bug.cgi?id=27694#c6>. As long as $wgExperimentalHtmlIds stays disabled, the issues with Cite, etc. shouldn't re-appear and $wgHtml5 should be safe to enable.

...

Are you sure that $wgHtml5 is distinct from the doctype? It looks like mediawiki.org also has the doctype set, and it looks as though Html.php sets it based on that variable.

Sorry, I was a little unclear here. I was talking about $wgDocType and $wgDTD, as discussed by Roan here: <https://bugzilla.wikimedia.org/show_bug.cgi?id=27478#c18>. By default, the DOCTYPE is automatically set to "<!DOCTYPE html>\n" when $wgHtml5 is set to true (from includes/Html.php): --- if ( $wgHtml5 ) { $ret .= "<!DOCTYPE html>\n"; if ( $wgHtml5Version ) { $attribs['version'] = $wgHtml5Version; } } --- Roan's plan called for adjusting the DOCTYPE and/or DTD before setting $wgHtml5 to true. This is probably unnecessary to do, as you say. My point was that for most people, the DOCTYPE is the most important/relevant piece and that setting $wgDocType = '<!doctype html>\n' is (or can be, rather) distinct from setting $wgHtml5 = 'true';. Depending on how much new and untested code is reliant on $wgHtml5, setting only the DOCTYPE might be a good interim solution iff issues arise with $wgHtml5, but you want to output an HTML5 DOCTYPE.

...

I believe not enabling $wgHtml5 is holding up other development efforts (based on some of the comments at bug 27478, e.g., comments 15 and 21). I also don't see (m)any good reasons to not just do it already. :-) MZMcBride

Aryeh Gregor

9:44 a.m.

New subject: Technical hurdles for enabling $wgHtml5 on Wikimedia sites?

On Mon, Jul 2, 2012 at 7:36 PM, Rob Lanphier <robla(a)wikimedia.org> wrote:

...

Just to clarify the history here, I originally suggested just turning it on. I expected (and expect) that there will be a bit of fallout, but not a lot -- it should be quickly fixable. The stuff that carries bigger compatibility risks is behind separate switches such as $wgWellFormedXml and $wgExperimentalHtmlIds.

...

Are you sure that $wgHtml5 is distinct from the doctype? It looks like mediawiki.org also has the doctype set, and it looks as though Html.php sets it based on that variable.

IIRC, I added a separate variable that allows changing the doctype separately from $wgHtml5 in case anyone wanted to experiment with changing the doctype and rest of the page separately. This is because changing the doctype will affect rendering in certain cases, moving from "almost-standards" to "standards" rendering, while changing the rest of the markup might have unrelated effects. But the doctype should change along with $wgHtml5 if you don't override it.

...

This is what I'm hoping we can get some clarity on. How many of those comments are still relevant?

Comments 0-5 are still relevant. r82413 will likely need to be reinstated and enforced in review if you don't want to break XML processors. Named entities like   will no longer work in XML parsers with no DTD in the doctype -- except for the core & < > " '. This is likely to be a big issue, because it will be a headache to make sure extensions don't output such entities in raw HTML. (The parser/sanitizer will already take care of them in user input or parsed HTML, though.) If auditing isn't put into place, I'd expect that XML parsers would break as soon as the change is deployed, and regularly break thereafter as people accidentally introduce new entities. The way around this would be either to use a non-HTML5 doctype (see end of post), or just give up on XML scrapers and tell them that their bots will break until they switch to an HTML5 parser or the API. In the latter case, $wgWellFormedXml can be set to false also, if people like. Comment 12 is no longer relevant, because $wgExperimentalHtmlIds was turned off by default. http://lists.wikimedia.org/pipermail/wikitech-l/2011-June/053775.html is still a good summary of possible issues, particularly the emphasis on issue 2. I don't know if comment 27 is still relevant -- probable, but it should be trivial to fix. There are likely to be some pages using table-based layout and images that will start displaying badly and that users will have to add a few extra style rules to fix. The major issue that I see is still the named-entities problem, which is what led to rapid disabling both previous times $wgHtml5 was turned on. To avoid breaking XML tools, the doctype could be set to XHTML 1.0 Strict or such with $wgHtml5 on, so HTML5 features would still work. This would make the page valid HTML5, since HTML5 allows some legacy doctypes that do specify a DTD: http://www.whatwg.org/specs/web-apps/current-work/multipage/syntax.html#obs… The issue is it would confuse validator.w3.org into trying to validate as XHTML 1.0 etc., which would make people complain the pages are invalid. You would have to set it specifically to validate as HTML5 for it to pass. (HTML5 validators are generally much pickier, though, so expect a lot of pages not to validate as HTML5 either.) The alternative, as I said, would be to just let XML screen-scraper bots break. Most languages provide some type of HTML parser that they could be switched to, I do believe. Python has a particularly good HTML5 parser, I think, which will parse the page the same as browsers. In this case, switching off $wgWellFormedXml won't hurt anything and will decrease page size slightly.

4347

days inactive

4352

days old

wikitech-l@lists.wikimedia.org

Manage subscription

8 comments

6 participants

tags (0)

participants (6)

Aryeh Gregor
Helder .
K. Peachey
Max Semenik
MZMcBride
Rob Lanphier