Great, looks like HTML5 vs. XHTML fight is infecting everything.
Just my 2 cents - I don't think that switching to new not yet W3C Recomendation is a good idea - many extensions and features are not yet finished (e.g. RDFa support for it) and considering a huge commotion in this area it might not be a very good decision.
Thank you,
Sergey
-- Sergey Chernyshev http://www.sergeychernyshev.com/
On Tue, Jul 7, 2009 at 9:38 AM, Aryeh Gregor <Simetrical+wikilist@gmail.comSimetrical%2Bwikilist@gmail.com
wrote:
On Tue, Jul 7, 2009 at 2:37 AM, Remember the dotrememberthedot@gmail.com wrote:
That page clearly says that there will be an XHTML 5. XHTML is not going away.
By "XHTML" I meant "the family of standards including XHTML 1.0, 1.1, 2.0, etc.". XHTML 5 is identical to HTML 5 except with a different serialization. Practically speaking, however, it looks like no one will use XHTML 5 either, because it's impossible to deploy on the current web. (See below.) As far as I can tell, it was thrown in as a sop to XML fans, on the basis that it cost very little to add it to the spec (given the definition in terms of DOM plus serializations), without any expectation that anyone will use it in practice.
What's to prevent a malicious user from manually posting an invalid submission? If there are no server-side checks, will the servers crash?
Obviously there will be server-side checks as well! This will just serve to inform the user immediately that they're missing a required field, without having to wait for the server or use JavaScript.
Why be cruel to our bot operators? XHTML is simpler and more consistent
than
tag soup HTML, and it's a lot easier to find a good XML parser than a
good
HTML parser.
Because it will make the markup easier to read and write for humans, and smaller. Things like leaving off superfluous closing elements do not make for "tag soup". One of the great features of HTML 5 is that it very carefully defines the text/html parsing model in painstaking backward-compatible detail. For example, the description of unquoted attributes is as follows:
"The attribute name, followed by zero or more space characters, followed by a single U+003D EQUALS SIGN character, followed by zero or more space characters, followed by the attribute value, which, in addition to the requirements given above for attribute values, must not contain any literal space characters, any U+0022 QUOTATION MARK (") characters, U+0027 APOSTROPHE (') characters, U+003D EQUALS SIGN (=) characters, U+003C LESS-THAN SIGN (<) characters, or U+003E GREATER-THAN SIGN (>) characters, and must not be the empty string.
"If an attribute using the unquoted attribute syntax is to be followed by another attribute or by one of the optional U+002F SOLIDUS (/) characters allowed in step 6 of the start tag syntax above, then there must be a space character separating the two." http://dev.w3.org/html5/spec/Overview.html#attributes
Given that browsers need to implement all these complicated algorithms anyway, there's no reason to prohibit the use of convenient shortcuts for authors. They're absolutely well-defined, and even if they're more complicated for machines to parse, they're easier for humans to use than the theoretically simpler XML rules.
Anyway. Bots should not be scraping the site. They should be using the bot API, which is *vastly* easier to parse for useful data than any variant of HTML or XHTML. We could use this as an opportunity to push bot operators toward using the API -- screen-scraping has always been fragile and should be phased out anyway. Bot operators who screen-scrape will already break on other significant changes anyway; how many screen-scrapers will keep working when Vector becomes the default skin?
So I view the added difficulty of screen-scraping as a long-term side benefit of switching to HTML 5, like validation failures for presentational elements. It makes behavior that was already undesirable more *obviously* undesirable.
Clearly we can't break all the bots, though. So try breaking XML well-formedness. If there are only a few isolated complaints, go ahead with it. If it causes large-scale breakage, revert and tell all the bot operators to switch to the API, then try again in a few months or a year. Or when we enable Vector, which will probably break all the bots anyway.
So, while I see some benefit to switching to HTML 5, I'd prefer to use
XHTML
5 instead.
XHTML 5, by definition, must be served under an XML MIME type. Anything served as text/html is not XHTML 5, and is required to be an HTML (not XHTML) serialization. We cannot serve content under non-text/html MIME types, because that would break IE, so we can't use XHTML 5. Even if we could, it would still be a bad idea. In XHTML 5, as in all XML, well-formedness errors are fatal. And we can't ensure that well-formedness errors are impossible without rewriting a lot of the parser *and* UI code.
We can, however, serve HTML 5 that happens to also be well-formed XML. This will allow XML parsers to be used, and is what I propose we do to start with.
On Tue, Jul 7, 2009 at 2:48 AM, Gregory Maxwellgmaxwell@gmail.com wrote:
What do you think we're doing now? A jpeg 'poster' is displayed. When the user clicks the poster is replaced by the appropriate playback mechanism.
I'm confused. What we're currently doing (correct me if I'm wrong) is displaying a JPEG <img> as a poster, and replacing it via JavaScript with the appropriate content when it's clicked. What we should do, ideally, is use something like <video src=foo.ogg poster=bar.jpg>, which will cause the poster to be displayed in place of the video on conformant browsers (including Firefox 3.6, but not 3.5). Of course, the <img> can be put in the fallback content for the <video>.
I said it needed to be weighed, not that the weighing would come out any particular way. I'm a fan of using Video natively. The fact that it makes save-page work the way it should is really great.
Okay, great.
I'm not sure how you think it currently works but there is currently zero need to load cortado for HTML5 supporting browsers.
I was probably confused about what "Cortado" is -- apparently it's only the Java-based player, not the whole JavaScript framework? I never looked into our implementation of this very much. Anyway, the point is we won't have to load the JavaScript logic even if the user does have JavaScript enabled, which is a plus.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l