On Mon, Jan 18, 2010 at 23:34, Manu Sporny msporny@digitalbazaar.com wrote:
You cannot, however, express RDF fully in Microdata - it is impossible in cases where it matters to Wikipedia (like data-typing).
I'm not a Wikipedia developer or particularly active editor, but it sounds quite doubtful that XML Schema Datatypes matters to Wikipedia. Perhaps I haven't understood RDFa, but surely the vocabulary must define the datatype? If not, is @datatype a mandatory attribute that just adds dead weight all over the place? And if vocabularies do define the datatypes, why do you need to override them?
I do also think that Microdata has made several really big mistakes that we made in the Microformats community that were corrected in the RDFa community. Namely, not using CURIEs and adding the requirement that all URLs are repeated as many times as they're used. It's fine as an option, but not that great if one has to repeat a URL 50 times in a web page... which Wikipedia will eventually have to do if it is using Microdata.
There are other solutions to the "URLs are long" problem than prefix schemes. Incidentally http://n.whatwg.org/work is rather short, and I hope future vocabularies will have the good taste to use even shorter URLs.
That's provably false. Microdata vocabulary validation is hard-coded in the specification. Dan Brickly and Ian Hickson had an IRC conversation about just this today[1]. In order to validate Microdata, you must first either convert it to RDF and even if you do, it will fail attempts to validate the literals that should have a datatype.
Is the only kind of validation that RDF provides validation that something is the same kind of data it claims to be? That sounds similar to and as unhelpful as doctypes. What if the author doesn't set the datatype?
If you want a Microdata vocabulary validator, you have to create one for each vocabulary... just like we had to do in the Microformats community, which some of us now recognize as a catastrophic mistake.
RDFa, via RDF, allows arbitrary data validation - one validator with any number of vocabularies. Microdata does not allow arbitrary validation - there must be one hard-coded validator per vocabulary.
What are the exact mechanisms here? Does a RDFa validator dereference all predicates and try to get an RDF Schema to validate against? Doesn't that destroy any web server which hosts schemas for popular vocabularies (like with W3C doctypes)? On the other hand, if only the document itself is used, what kind of validation can be meaningful?
In any case, validators for microdata is something to be worked on, but I don't think either dereferencing vocabulary URLs or an official schema language is likely to be part of the solution (the latter because you need a full programming language to validate certain types of data, not just grammar rules).