Trying my best to limit length of reply.
On Sat, Jan 16, 2010 at 23:16, Manu Sporny msporny@digitalbazaar.com wrote:
Philip Jägenstedt wrote:
[ed: Microdata] maps well to the RDF model if you want it, but doesn't force authors to think in terms of subject, predicate, object triples.
Well, Microdata /almost/ maps to the RDF model. Microdata doesn't support RDF literal typing, which is basically a fancy way of saying that you can't verify that weights, volumes, speeds, the full range of dates in different calendars, encodings such as chemical compositions, and varying other typed information is expressed cleanly by the Wikipedia contributors.
So, if you wanted to say something like this:
The speed of light is 299792458 m/s.
You would do this in RDFa:
<div about="#light"> The speed of light is <span property="measure:speed" datatype="measure:meters-per-second">299792458</span> m/s. </div>
which would generate the following triple:
<#light> measure:speed "299792458"^^measure:meters-per-second .
AFAIK, there is no way to do the equivalent in Microdata, is there Philip?
The datatype is a part of the vocabulary, if you want to validate your data you validate it against the vocabulary, not what the author claims. For examples, you'll see that the vCard vocabulary defines its own datatypes: http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#v...
Allowing mixing different types (like m/s and km/h) seems risky, but is one of the things that exist in the RDF model that can't be expressed directly using microdata, that is correct.
The above is how you would do it in RDFa. Philip, I haven't seen any work related to this in Microdata - have there been any recent developments with regard to data validation in Microdata?
There is nothing like automatic validation, your software has understand a certain vocabulary to be able to say if the data conforms to the constraints of that particular vocabulary. (I don't know if this is any different from the RDF model or if RDF software is able to "automatically" learn how to validate measure:meters-per-second from just seeing the string "measure:meters-per-second".)
So, we get more-or-less the same number of data items out, but there is a problem. What does "title" mean in the semantic sense? Does it mean "job title" or does it mean "work title"? The term "title" in this case is ambiguous.
No, as long as an item type is used (http://n.whatwg.org/work) there is no ambiguity. This particular item type is defined at http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#l...
Title here "Gives the name of the work." without ambiguity.
This is new! I'm glad this issue was addressed in Microdata as it was one of my criticisms of it when I last read the Microdata spec about six months ago. Looks like that section of the spec was last changed on October 23rd 2009? Do you know when this was put in there, Philip?
Originally microdata used item="http://n.whatwg.org/work", but even then there was no ambiguity about what a particular property meant.
What happens when an author forgets to include itemtype? So, if somebody does this:
<div itemscope> <span itemprop="title">Emery Molyneux Terrestrial Globe</span> </div>
There's nothing to ground the "title" property. The way I'm reading the spec, it becomes ambiguous at that point, right?
Like Aryeh said it's not ambiguous, it's meaningless. Microdata allows typeless items for site-private use (much like data-*), but such data *should not* be used by external parties and is in fact ignored by the RDF extraction algorithm.
... and with the added danger of expressing ambiguous data. This is not the real danger, though. While data ambiguity is really bad when it comes to data stores, centralized vocabulary management is even worse.
Anyone can make up a vocabulary, just point to it in itemtype. The WHATWG maintains a few core vocabularies, but I expect that new vocabularies will be developed independently by communities like microformats.
Philip, could you give us an update on what the WHATWG sees as the publishing process for Microdata vocabularies? For example, if Wikipedia wanted to start expressing royal bloodlines using a vocabulary specific to Wikipedia, how would they go about getting that vocabulary into the HTML5 Microdata specification?
No process, just do it :)
Finally I will note that it is very likely that the microdata DOM APIs will get implemented in browsers, making the semantic data available to both scrapers, to native browser interfaces and to browser extensions such as user JavaScript. As an example, you might see an icon in the address bar for saving events to a calendar, or the license information of an image displayed in the native properties dialog. I stress again that I don't make any promises on behalf of Opera or any other browser vendor, these are just my predictions.
Again, this is exciting news and while I don't think Microdata is the proper solution for the Web, for the same reasons that are outlined above and many more, I'm delighted to hear that Opera is taking in-browser semantic data expression very seriously. How far we have come in just 18 months! :)
I will stress again that I don't speak for Opera in these matters, but I do think that microdata in many ways bridges the gap between the "browsable web" and the "semantic web" (actually, there is only one web). Browsers already do add some UI features based on the data in documents (apart from rendering), e.g. exposing RSS feeds in the address bar or navigating to the next page based on rel="next". Microdata isn't really new in that regard, it just adds some new data for browsers to expose.