I don't suppose that the members of this list appreciate the epic Microdata vs. RDFa battle leaking into this mailing list, but I want to address a few inaccuracies below.
Introduction: I work for Opera Software and have been active in the WHATWG and W3C HTML WG devloping HTML5 for the last year and a half. I believe I have a good understanding of what browser vendors are likely and not likely to support, although I don't speak for or make any promises on behalf of Opera Software in this mail.
I have also worked on implementing the microdata DOM API in JavaScript, an ongoing experiment at http://gitorious.org/microdatajs and will be able to answer any technical questions about the processing of microdata. In short, I can only say that it is really quite intuitive and simple, with few surprises. It maps well to the RDF model if you want it, but doesn't force authors to think in terms of subject, predicate, object triples.
On Sat, Jan 16, 2010 at 06:32, Manu Sporny msporny@digitalbazaar.com wrote:
Aryeh Gregor <Simetrical+wikilist <at> gmail.com> writes:
[snip]
The compactness of the markup between Microdata and RDFa is more or less the same in this particular example. There are some things that are easier to express in Microdata and there are some things that are easier to express in RDFa. We get the following Microdata out:
type http://n.whatwg.org/work work http://upload.wikimedia.org/...terrestrialglobe-1592-20061127.jpg title "Emery Molyneux Terrestrial Globe" author "Bob Smith" license http://creativecommons.org/licenses/by-sa/3.0/us/
So, we get more-or-less the same number of data items out, but there is a problem. What does "title" mean in the semantic sense? Does it mean "job title" or does it mean "work title"? The term "title" in this case is ambiguous.
No, as long as an item type is used (http://n.whatwg.org/work) there is no ambiguity. This particular item type is defined at http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#l...
Title here "Gives the name of the work." without ambiguity.
Furthermore, for this particular vocabulary the mapping to RDF is defined, as such:
title: http://purl.org/dc/elements/1.1/title author: http://creativecommons.org/ns#attributionName license: http://www.w3.org/1999/xhtml/vocab#license
In other words you express the exact same information as with RDFa but without the mental overhead of triples or mixing multiple vocabularies.
Concern #2:
Getting Microdata and RDFa markup correct is easier if there are templates or if the semantic markup is performed automatically by the CMS based on a pre-defined form. For example, http://en.wikipedia.org/wiki/Augustus, note the Infobox on the right. It would be much better for the RDFa markup to happen automatically via MediaWiki's template process, than for it to be marked up by hand.
Certainly, but if wiki editors are *able* to do it by hand, then IMHO microdata is much less error-prone.
However - XHTML1+RDFa is a published W3C Recommendation and it is safe
Is Wikipedia using XHTML served as application/xml+xhtml? It seems that RDFa in "XHTML" as deployed only works because consumers pretend that the data is XHTML even though it is served as text/html and treated as such by browsers. I would assume that most pages using RDFa today are neither valid XHTML, nor served with the XHTML MIME type. Any attempts to use browser DOM APIs to access the data will have surprising/confusing results, as HTML doesn't have namespaces but RDFa uses the syntax.
Concern #4:
While I can't fault Aryeh's enthusiasm, I am now concerned that there may be questions in this community that are going unanswered related to RDFa and Microdata. I hope this will be a deliberate process as it is easy to get semantic data markup wrong (regardless of the implementation language - Microformats, Microdata or RDFa).
Agreed.
The microdata spec for the curious: http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html
Finally I will note that it is very likely that the microdata DOM APIs will get implemented in browsers, making the semantic data available to both scrapers, to native browser interfaces and to browser extensions such as user JavaScript. As an example, you might see an icon in the address bar for saving events to a calendar, or the license information of an image displayed in the native properties dialog. I stress again that I don't make any promises on behalf of Opera or any other browser vendor, these are just my predictions.
In other goodies, microdata already has a defined mapping to JSON, so dumping all embedded data as JSON via a web interface would be quite trivial and be using the same format that you will get from browsers when they have implemented some of the DOM APIs.