Plans for shifting Wikimedia to HTML5, probably starting with en:wp Main Page.
(Simetrical is quite keen on this change, as apart from anything else it'll cut our served page size *after gzipping* by 5-20%.)
HTML5 is the new HTML standard. It's specifically been written be backward compatible with most of the horrible quirks in all past browsers - it's a vendor-driven standard - and now it's the W3C official future of HTML. So nothing should break for anyone. Note provisions in below plan in case something does.
- d.
---------- Forwarded message ---------- From: Aryeh Gregor Simetrical+wikilist@gmail.com Date: 2009/7/10 Subject: Re: [Wikitech-l] Proposal: switch to HTML 5 To: Wikimedia developers wikitech-l@lists.wikimedia.org
Apparently something ate my last post here. (I think it was my Chromium nightly build.) Okay, reposting from memory:
After discussion with Brion on IRC, I've provisionally enabled an HTML 5 doctype in r53034:
http://www.mediawiki.org/wiki/Special:Code/MediaWiki/53034
My thoughts on what we should do in the immediate future are:
1) Get at least the enwiki Main Page set up so it will validate as HTML 5 when we scap: http://validator.w3.org/check?uri=http://en.wikipedia.org/wiki/&charset=(detect+automatically)&doctype=HTML5&group=0
1a) Remove border="0" from Wikimedia's $wgCopyrightIcon (it does nothing anyway).
1b) Rope some enwiki sysops into getting rid of all cellpadding, cellspacing, align, and clear attributes on the Main Page (converting them to CSS).
2) Scap (whenever this happens -- maybe not so immediate future :) ).
3) Wait a couple of hours to see if anything breaks.
4) Make a tech blog post and post a notice to the whatwg list (I'll do this). We'll have our front page validating as HTML 5 at this point, hopefully, to make a more positive impact.
5) See what happens!
I expect this will pick up some interest, since we'll probably be increasing the number of HTML 5 page views by a factor of -- oh, ten thousand? (Is there any top *1000* site that uses HTML 5 for all its primary content?) We can see how things develop, and if all goes well start using more HTML 5 features.
I'd recommend that until the code goes live, this should be considered an *experimental* *development* change. People shouldn't go around announcing this everywhere until it's actually live. For one thing, some unknown problem might crop up and we'd have to temporarily roll back, which would cause confusion and bad press for both us and HTML 5. For another thing, it would be nice if we could link to a validating main page in the announcement. I'm sure people can hold off posting stories to Slashdot for a week or two, right? :)
_______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
2009/7/10 David Gerard dgerard@gmail.com
Plans for shifting Wikimedia to HTML5, probably starting with en:wp Main Page.
(Simetrical is quite keen on this change, as apart from anything else it'll cut our served page size *after gzipping* by 5-20%.)
HTML5 is the new HTML standard. It's specifically been written be backward compatible with most of the horrible quirks in all past browsers - it's a vendor-driven standard - and now it's the W3C official future of HTML. So nothing should break for anyone. Note provisions in below plan in case something does.
- d.
---------- Forwarded message ---------- From: Aryeh Gregor <Simetrical+wikilist@gmail.comSimetrical%2Bwikilist@gmail.com
Date: 2009/7/10 Subject: Re: [Wikitech-l] Proposal: switch to HTML 5 To: Wikimedia developers wikitech-l@lists.wikimedia.org
Apparently something ate my last post here. (I think it was my Chromium nightly build.) Okay, reposting from memory:
After discussion with Brion on IRC, I've provisionally enabled an HTML 5 doctype in r53034:
http://www.mediawiki.org/wiki/Special:Code/MediaWiki/53034
My thoughts on what we should do in the immediate future are:
- Get at least the enwiki Main Page set up so it will validate as
HTML 5 when we scap: < http://validator.w3.org/check?uri=http://en.wikipedia.org/wiki/&charset=...
1a) Remove border="0" from Wikimedia's $wgCopyrightIcon (it does nothing anyway).
1b) Rope some enwiki sysops into getting rid of all cellpadding, cellspacing, align, and clear attributes on the Main Page (converting them to CSS).
Scap (whenever this happens -- maybe not so immediate future :) ).
Wait a couple of hours to see if anything breaks.
Make a tech blog post and post a notice to the whatwg list (I'll do
this). We'll have our front page validating as HTML 5 at this point, hopefully, to make a more positive impact.
- See what happens!
I expect this will pick up some interest, since we'll probably be increasing the number of HTML 5 page views by a factor of -- oh, ten thousand? (Is there any top *1000* site that uses HTML 5 for all its primary content?) We can see how things develop, and if all goes well start using more HTML 5 features.
I'd recommend that until the code goes live, this should be considered an *experimental* *development* change. People shouldn't go around announcing this everywhere until it's actually live. For one thing, some unknown problem might crop up and we'd have to temporarily roll back, which would cause confusion and bad press for both us and HTML 5. For another thing, it would be nice if we could link to a validating main page in the announcement. I'm sure people can hold off posting stories to Slashdot for a week or two, right? :)
If we can do it, this sounds like a great idea. Forgive my ignorance though, but what does "scap" mean?
Pete / the wub
On Fri, Jul 10, 2009 at 4:22 PM, Peter Coombethewub.wiki@googlemail.com wrote:
<snip HTML5 stuff>
If we can do it, this sounds like a great idea. Forgive my ignorance though, but what does "scap" mean?
At a wild guess, following a look at the SCAP disambiguation page:
http://en.wikipedia.org/wiki/SCAP
I'd say "Separation of presentation and content":
http://en.wikipedia.org/wiki/Separation_of_presentation_and_content
That is only a guess though.
That led me to this:
http://en.wikipedia.org/wiki/What_You_See_Is_What_You_Mean
Which is the point where I stop writing this e-mail and start reading.
Carcharoth
On Fri, Jul 10, 2009 at 2:29 PM, David Gerarddgerard@gmail.com wrote:
Plans for shifting Wikimedia to HTML5, probably starting with en:wp Main Page.
Clarification: the plan is to have MediaWiki output HTML 5 by default, always. (I.e., it will serve an HTML 5 doctype.) Once it's doing that, we want to have it be *valid* HTML 5 -- if it's invalid it will still work perfectly fine, but we want to be compliant with the standard to the extent practical, all things being equal.
I and others have changed things so on most pages the *software* should generate only valid HTML 5. But user input might still be invalid. For the initial announcement, it would be nice if this link could report that we're providing valid HTML 5 on the Main Page:
http://validator.w3.org/check?uri=http://en.wikipedia.org/wiki/Main_Page&...
If that doesn't happen, I can give a link to Special:AllPages or something instead, but the Main Page is obviously more visible. The following errors are fixed in the software and/or must be fixed by sysadmins, and do *not* need to be fixed by enwiki admins:
Bad value Content-Style-Type for attribute http-equiv on element meta. Text not allowed in element script in this context. Attribute name not allowed on element a at this point. Attribute border not allowed on element img at this point.
The following errors are caused by the content on the Main Page, and *must* be fixed by enwiki sysops:
Attribute cellpadding not allowed on element table at this point. Attribute cellspacing not allowed on element table at this point. Attribute align not allowed on element div at this point. Attribute clear not allowed on element br at this point.
These attributes are among those removed in HTML 5. They were already deprecated in XHTML 1.0 Transitional, which we currently use, and are removed in XHTML 1.0 Strict. (The software doesn't need significant updates to be conformant HTML 5, because it already avoided deprecated elements in almost all cases.)
So these just need to be replaced by CSS. align="right" should be basically the same as style="float: right"; clear="all" should be identical to style="clear: both"; cellpadding and cellspacing may require a little more fidgeting. Someone should do this who's good with CSS -- maybe I'll rope a dev-sysop like Werdna into it. (There are these rare cases when enwiki sysophood will be useful . . . perhaps I'll try RFA again someday.)
It should be easy enough to do for someone who knows enough about HTML and CSS. It's not a big project that needs a lot of help, just something we need one person to spend half an hour fiddling with.
(Simetrical is quite keen on this change, as apart from anything else it'll cut our served page size *after gzipping* by 5-20%.)
First of all, this is only one of a number of reasons to switch to HTML 5, and not nearly the biggest. The major reason is so we can use new functionality as it's added to the spec and supported by browsers, in the long term. And since we're going to switch *eventually* anyway, we may as well do it now and get some modest benefits immediately.
Second of all, we're not doing anything to reduce page size now. For the time being we're still serving well-formed XML to avoid breaking bots, so we don't get size savings from killing quotes/closing tags/etc. So this is a non-reason at present, although in the medium term it may be relevant.
Third of all, I *really* doubt it will be close to 20% savings. I'm sure I never said anything close to that. I think I estimated 5%, and that's only if we're really thorough -- which would require a large amount of work (with not so much benefit, compared to the work required and the potential for introducing bugs) and won't happen quickly even when we do break XML well-formedness.
If/when the change goes live, I'll put up a blog post on the tech blog explaining things in a practical manner. The short answer is: there is likely to be very little practical difference to the average user right now. It's more of a single step in a longer-term process. Users of recent non-IE browsers will likely see some modest improvements in the very near future -- like slightly better video support, slightly more helpful forms for Opera 9.6x users -- but most of the benefit is long-term.
I'd also like to emphasize that for the time being, this change is still experimental, and it's still possible at this point that it will be reverted for some reason. We have not yet deployed HTML 5 and probably should not bother announcing the switch to the broader world until we know it's not going to be vaporware.
On Fri, Jul 10, 2009 at 3:22 PM, Peter Coombethewub.wiki@googlemail.com wrote:
If we can do it, this sounds like a great idea. Forgive my ignorance though, but what does "scap" mean?
It's a Wikimedia-specific term, an abbreviation for the script sync-common-all-php. Basically it means update the site software to the latest version, bringing all new features/bug fixes live that were made since the last scap and weren't synced individually. Scaps require that all recent software changes be reviewed, which takes time, and a careful eye needs to be kept on the site for a while afterward to fix any breakage. These days we tend to have a scap every few weeks.
On Fri, Jul 10, 2009 at 5:56 PM, Aryeh GregorSimetrical+wikilist@gmail.com wrote:
<snip>
The following errors are caused by the content on the Main Page, and *must* be fixed by enwiki sysops:
Attribute cellpadding not allowed on element table at this point. Attribute cellspacing not allowed on element table at this point. Attribute align not allowed on element div at this point. Attribute clear not allowed on element br at this point.
These attributes are among those removed in HTML 5. They were already deprecated in XHTML 1.0 Transitional, which we currently use, and are removed in XHTML 1.0 Strict. (The software doesn't need significant updates to be conformant HTML 5, because it already avoided deprecated elements in almost all cases.)
So these just need to be replaced by CSS. align="right" should be basically the same as style="float: right"; clear="all" should be identical to style="clear: both"; cellpadding and cellspacing may require a little more fidgeting. Someone should do this who's good with CSS -- maybe I'll rope a dev-sysop like Werdna into it. (There are these rare cases when enwiki sysophood will be useful . . . perhaps I'll try RFA again someday.)
<me thought you were already an admin>...
It should be easy enough to do for someone who knows enough about HTML and CSS. It's not a big project that needs a lot of help, just something we need one person to spend half an hour fiddling with.
Any of the admins who worked on the Main Page (re)designs should be confident enough to do that. Almost any en-wiki admin you ask to do this will sandbox it first, as causing the main page to go wrong gets en-wiki admins sent to the stocks to be pelted with rotten fruit. :-)
I suppose you could ask here:
http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Usability http://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Usability/Main_Page/...
There was a redesign proposal in 2008, but I never quite kept up with that.
Anyway, as you only want tweaks, you could also ask at:
http://en.wikipedia.org/wiki/Talk:Main_Page
Strangely enough, even though posting in the "errors" reporting section of the talk page for the Main Page is the wrong place to post, you are most likely to find admins there who are not scared of editing the main page.
Carcharoth
On Fri, Jul 10, 2009 at 1:48 PM, Charles Matthewscharles.r.matthews@ntlworld.com wrote:
I have been looking around to see what effect this might have on rendering of mathematics. Is that potentially good, but only if everyone agrees to usethe right browsers?
It's easier to embed MathML (and SVG) into HTML 5 than into XHTML, yes -- with XHTML you need to serve as XML before it works in any browser, AFAIK. And we don't want to serve as XML due to fatal errors on malformedness. Currently I think only Firefox 3.6 nightlies support embedded MathML, and only if you set html5.enable to true in about:config, but sooner or later widespread support is likely.
There's a MathML output option in the math preferences. I don't know if it actually works. But it could be made to, undoubtedly, without too much trouble, if we have a safe and reliable LaTeX -> MathML converter. Obviously it couldn't be the default until we get a lot more uptake of MathML. It would be neat if we could get it working well and then use some kind of sniffing to determine whether to use MathML or PNG, but that would be tricky.
I'd like to emphasize again, though, that these benefits are *long-term*. There will be *no* immediate user-visible difference between XHTML 1.0 and HTML 5 for users of web browsers, or nearly all bot operators.