[WikiEN-l] Fwd: [Wikitech-l] Proposal: switch to HTML 5

Aryeh Gregor Simetrical+wikilist at gmail.com
Fri Jul 10 16:56:28 UTC 2009


On Fri, Jul 10, 2009 at 2:29 PM, David Gerard<dgerard at gmail.com> wrote:
> Plans for shifting Wikimedia to HTML5, probably starting with en:wp Main Page.

Clarification: the plan is to have MediaWiki output HTML 5 by default,
always.  (I.e., it will serve an HTML 5 doctype.)  Once it's doing
that, we want to have it be *valid* HTML 5 -- if it's invalid it will
still work perfectly fine, but we want to be compliant with the
standard to the extent practical, all things being equal.

I and others have changed things so on most pages the *software*
should generate only valid HTML 5.  But user input might still be
invalid.  For the initial announcement, it would be nice if this link
could report that we're providing valid HTML 5 on the Main Page:

http://validator.w3.org/check?uri=http://en.wikipedia.org/wiki/Main_Page&charset=(detect+automatically)&doctype=HTML5&group=0

If that doesn't happen, I can give a link to Special:AllPages or
something instead, but the Main Page is obviously more visible.  The
following errors are fixed in the software and/or must be fixed by
sysadmins, and do *not* need to be fixed by enwiki admins:

Bad value Content-Style-Type for attribute http-equiv on element meta.
Text not allowed in element script in this context.
Attribute name not allowed on element a at this point.
Attribute border not allowed on element img at this point.

The following errors are caused by the content on the Main Page, and
*must* be fixed by enwiki sysops:

Attribute cellpadding not allowed on element table at this point.
Attribute cellspacing not allowed on element table at this point.
Attribute align not allowed on element div at this point.
Attribute clear not allowed on element br at this point.

These attributes are among those removed in HTML 5.  They were already
deprecated in XHTML 1.0 Transitional, which we currently use, and are
removed in XHTML 1.0 Strict.  (The software doesn't need significant
updates to be conformant HTML 5, because it already avoided deprecated
elements in almost all cases.)

So these just need to be replaced by CSS.  align="right" should be
basically the same as style="float: right"; clear="all" should be
identical to style="clear: both"; cellpadding and cellspacing may
require a little more fidgeting.  Someone should do this who's good
with CSS -- maybe I'll rope a dev-sysop like Werdna into it.  (There
are these rare cases when enwiki sysophood will be useful . . .
perhaps I'll try RFA again someday.)

It should be easy enough to do for someone who knows enough about HTML
and CSS.  It's not a big project that needs a lot of help, just
something we need one person to spend half an hour fiddling with.

> (Simetrical is quite keen on this change, as apart from anything else
> it'll cut our served page size *after gzipping* by 5-20%.)

First of all, this is only one of a number of reasons to switch to
HTML 5, and not nearly the biggest.  The major reason is so we can use
new functionality as it's added to the spec and supported by browsers,
in the long term.  And since we're going to switch *eventually*
anyway, we may as well do it now and get some modest benefits
immediately.

Second of all, we're not doing anything to reduce page size now.  For
the time being we're still serving well-formed XML to avoid breaking
bots, so we don't get size savings from killing quotes/closing
tags/etc.  So this is a non-reason at present, although in the medium
term it may be relevant.

Third of all, I *really* doubt it will be close to 20% savings.  I'm
sure I never said anything close to that.  I think I estimated 5%, and
that's only if we're really thorough -- which would require a large
amount of work (with not so much benefit, compared to the work
required and the potential for introducing bugs) and won't happen
quickly even when we do break XML well-formedness.

If/when the change goes live, I'll put up a blog post on the tech blog
explaining things in a practical manner.  The short answer is: there
is likely to be very little practical difference to the average user
right now.  It's more of a single step in a longer-term process.
Users of recent non-IE browsers will likely see some modest
improvements in the very near future -- like slightly better video
support, slightly more helpful forms for Opera 9.6x users -- but most
of the benefit is long-term.

I'd also like to emphasize that for the time being, this change is
still experimental, and it's still possible at this point that it will
be reverted for some reason.  We have not yet deployed HTML 5 and
probably should not bother announcing the switch to the broader world
until we know it's not going to be vaporware.

On Fri, Jul 10, 2009 at 3:22 PM, Peter Coombe<thewub.wiki at googlemail.com> wrote:
> If we can do it, this sounds like a great idea. Forgive my ignorance though,
> but what does "scap" mean?

It's a Wikimedia-specific term, an abbreviation for the script
sync-common-all-php.  Basically it means update the site software to
the latest version, bringing all new features/bug fixes live that were
made since the last scap and weren't synced individually.  Scaps
require that all recent software changes be reviewed, which takes
time, and a careful eye needs to be kept on the site for a while
afterward to fix any breakage.  These days we tend to have a scap
every few weeks.



More information about the WikiEN-l mailing list