Trying my best to limit length of reply.
On Sat, Jan 16, 2010 at 23:16, Manu Sporny <msporny(a)digitalbazaar.com> wrote:
Philip Jägenstedt wrote:
[ed:
Microdata] maps well to the
RDF model if you want it, but doesn't force authors to think in terms
of subject, predicate, object triples.
Well, Microdata /almost/ maps to the RDF model. Microdata doesn't
support RDF literal typing, which is basically a fancy way of saying
that you can't verify that weights, volumes, speeds, the full range of
dates in different calendars, encodings such as chemical compositions,
and varying other typed information is expressed cleanly by the
Wikipedia contributors.
So, if you wanted to say something like this:
The speed of light is 299792458 m/s.
You would do this in RDFa:
<div about="#light">
The speed of light is <span property="measure:speed"
datatype="measure:meters-per-second">299792458</span> m/s.
</div>
which would generate the following triple:
<#light>
measure:speed
"299792458"^^measure:meters-per-second .
AFAIK, there is no way to do the equivalent in Microdata, is there Philip?
The datatype is a part of the vocabulary, if you want to validate your
data you validate it against the vocabulary, not what the author
claims. For examples, you'll see that the vCard vocabulary defines its
own datatypes:
http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#…
Allowing mixing different types (like m/s and km/h) seems risky, but
is one of the things that exist in the RDF model that can't be
expressed directly using microdata, that is correct.
The above is how you would do it in RDFa. Philip, I
haven't seen any
work related to this in Microdata - have there been any recent
developments with regard to data validation in Microdata?
There is nothing like automatic validation, your software has
understand a certain vocabulary to be able to say if the data conforms
to the constraints of that particular vocabulary. (I don't know if
this is any different from the RDF model or if RDF software is able to
"automatically" learn how to validate measure:meters-per-second from
just seeing the string "measure:meters-per-second".)
So, we get more-or-less the same number of data items
out, but there is
a problem. What does "title" mean in the semantic sense? Does it mean
"job title" or does it mean "work title"? The term "title"
in this case
is ambiguous.
No, as long as an item type is used (
http://n.whatwg.org/work) there
is no ambiguity. This particular item type is defined at
http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#…
Title here "Gives the name of the work." without ambiguity.
This is new! I'm glad this issue was addressed in Microdata as it was
one of my criticisms of it when I last read the Microdata spec about six
months ago. Looks like that section of the spec was last changed on
October 23rd 2009? Do you know when this was put in there, Philip?
Originally microdata used
item="http://n.whatwg.org/work"work", but even
then there was no ambiguity about what a particular property meant.
What happens when an author forgets to include
itemtype? So, if somebody
does this:
<div itemscope>
<span itemprop="title">Emery Molyneux Terrestrial Globe</span>
</div>
There's nothing to ground the "title" property. The way I'm reading
the
spec, it becomes ambiguous at that point, right?
Like Aryeh said it's not ambiguous, it's meaningless. Microdata allows
typeless items for site-private use (much like data-*), but such data
*should not* be used by external parties and is in fact ignored by the
RDF extraction algorithm.
... and with the added danger of expressing ambiguous
data. This is not
the real danger, though. While data ambiguity is really bad when it
comes to data stores, centralized vocabulary management is even worse.
Anyone can make up a vocabulary, just point to it in itemtype. The
WHATWG maintains a few core vocabularies, but I expect that new
vocabularies will be developed independently by communities like
microformats.
Philip, could you give us an update on what the WHATWG
sees as the
publishing process for Microdata vocabularies? For example, if Wikipedia
wanted to start expressing royal bloodlines using a vocabulary specific
to Wikipedia, how would they go about getting that vocabulary into the
HTML5 Microdata specification?
No process, just do it :)
Finally I will
note that it is very likely that the microdata DOM APIs
will get implemented in browsers, making the semantic data available
to both scrapers, to native browser interfaces and to browser
extensions such as user JavaScript. As an example, you might see an
icon in the address bar for saving events to a calendar, or the
license information of an image displayed in the native properties
dialog. I stress again that I don't make any promises on behalf of
Opera or any other browser vendor, these are just my predictions.
Again, this is exciting news and while I don't think Microdata is the
proper solution for the Web, for the same reasons that are outlined
above and many more, I'm delighted to hear that Opera is taking
in-browser semantic data expression very seriously. How far we have come
in just 18 months! :)
I will stress again that I don't speak for Opera in these matters, but
I do think that microdata in many ways bridges the gap between the
"browsable web" and the "semantic web" (actually, there is only one
web). Browsers already do add some UI features based on the data in
documents (apart from rendering), e.g. exposing RSS feeds in the
address bar or navigating to the next page based on rel="next".
Microdata isn't really new in that regard, it just adds some new data
for browsers to expose.
--
Philip Jägenstedt