On Mon, Jan 18, 2010 at 5:34 PM, Manu Sporny msporny@digitalbazaar.com wrote:
Not necessarily. Javascript can use the RDFa on the page to generate more intuitive interfaces for the page.
Sure, but if we're providing the JavaScript, we could do it without RDFa just as well. Or can you provide a specific case where you think it would be easier for MediaWiki to implement some feature via RDFa (or microdata) than via any other means, not counting communication with outside software? Such cases might exist (like if there's a library to do it that already happens to use RDFa), but they'd be hard to find and debatable at best, I suspect.
Placing this data in your HTML documents has a direct impact on browsers and search engines. Browsers can collect triples and use them later to help you answer questions that you may have about a particular subject. Search engines can crawl the HTML and make their indexes more accurate based on semantic data that Wikipedia's pages expose.
*Can*. Yes, in theory. But do they? Will they? If not, then it's probably not worth the effort to put much work into it so speculatively, especially if it increases the complexity of editing. On the other hand, if they do implement feature X if you provide in-page metadata, would they be equally willing to use a separate RDF stream?
RDF/XML, which was largely unsuccessful, was designed to be used for publishing in a dual-stream setup. It was expected that web publishers would publish semantic data beside web page data, just as you've proposed that Wikipedia does, but this proved to be far too difficult for most sites to manage both types of serializations.
Is it managing two serializations that was the problem? Or just that most sites aren't willing to encode data in the hope that some consumer somewhere might use it for something in the future? Personally, I don't think it would be hard at all to maintain multiple data streams. The content is all script-generated anyway. We already have multiple ways to access the same data or subsets thereof in various formats, like:
http://en.wikipedia.org/wiki/RDFa http://en.wikipedia.org/wiki/RDFa?action=raw http://en.wikipedia.org/w/api.php?action=query&prop=categories&title... http://en.wikipedia.org/w/api.php?action=query&prop=extlinks&titles=... http://en.wikipedia.org/w/api.php?action=query&prop=templates&titles...
and many others. You can append &format=xml to the API queries to get them in proper XML, or &format=json for JSON, php for PHP array syntax, yaml for YAML, txt for plaintext, etc. It would be pretty simple to write a new API module or query prop or whatever that would retrieve any type of data from the wikitext of the page and format it as RDF or whatever else you liked.
Wikipedia is already short on developers, creating a new data stream is just going to exacerbate the problem.
No, it would be pretty simple, in my opinion as a MediaWiki developer.
There are new features going into RDFa 1.1, but classifying them as "major" changes makes it sound like RDFa 1.1 isn't going to be backwards-compatible with RDFa 1.0, when it most definitely is going to be backwards-compatible (except possibly for XMLLiterals, which was our bad).
I apologize if I inadvertently misrepresented the status of RDFa 1.1. I'm not familiar with RDFa, as I said.
There's a huge difference in both stability and standard-ness - XHTML+RDFa is a W3C REC - it's a standard. Microdata and HTML+RDFa aren't even close to becoming a W3C REC. That's very important information for this community to consider.
When do you think that Microdata is going to be a REC at the W3C?
I don't really care about formal status at the W3C. I care about providing useful features to users of Wikipedia and other MediaWiki wikis. Both RDFa and microdata are stable and usable enough right now that I think it's appropriate to evaluate them on their technical merits, not their theoretical spec status. We use plenty of things that aren't specified by any conventional standards body, like rel="canonical", OpenSearch, RSS, and so on. As long as they're well-specified de facto standards, it doesn't really matter who specifies them or what that group labels them -- why should it?
There were changes to the Microdata spec made by Ian less than 12 hours ago (January 18th 2010). If a spec is being actively edited, I don't think it's a good idea to say that it's stable and ready for deployment:
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2010-January/024760.html
I don't see why not, as long as the editor is committed to avoiding backward-incompatible changes if possible. In the unlikely event something major comes up and there is such a change, it's not the end of the world -- we can deal with it when it comes up.
Microdata doesn't support data typing (via @datatype),
More precisely, it leaves it up to each vocabulary to determine how to handle data typing.
data value overriding (via @content),
<meta itemprop="foo" content="bar">?
doesn't support URI short-handing via CURIEs (via @xmlns:PREFIX),
It doesn't require URIs to be used for anything except one itemtype per item, so this isn't a big deal if you only have a few items of any given type per page (which would usually be the case for, e.g., image licenses).
and it doesn't support anonymous subjects via bnodes (blank nodes).
I'm not sure what this even means. :)
I do also think that Microdata has made several really big mistakes that we made in the Microformats community that were corrected in the RDFa community. Namely, not using CURIEs and adding the requirement that all URLs are repeated as many times as they're used. It's fine as an option, but not that great if one has to repeat a URL 50 times in a web page... which Wikipedia will eventually have to do if it is using Microdata.
Not if we only use it for a few things, like image licenses. Those are only displayed on the image description page, so it would be once per page in that case. I don't propose we use it for anything where we'd have fifty items per page.
RDFa seems longer even if you don't count the xmlns: stuff, anyway. Above, I found a microdata example to add 145 characters to the base markup, while equivalent RDFa (with xmlns:) added 305 characters. If you remove the two xmlns: declarations, I count only 86 characters saved, so RDFa still adds 219 characters, 50% more than microdata. So at best, microdata could save some space, but it's still significantly shorter than RDFa, at least for this example.
That's provably false. Microdata vocabulary validation is hard-coded in the specification. Dan Brickly and Ian Hickson had an IRC conversation about just this today[1]. In order to validate Microdata, you must first either convert it to RDF and even if you do, it will fail attempts to validate the literals that should have a datatype. If you want a Microdata vocabulary validator, you have to create one for each vocabulary... just like we had to do in the Microformats community, which some of us now recognize as a catastrophic mistake.
RDFa, via RDF, allows arbitrary data validation - one validator with any number of vocabularies. Microdata does not allow arbitrary validation - there must be one hard-coded validator per vocabulary.
I think you agreed with what I said. Both microdata and RDFa allow validation. RDFa allows some validation constraints to be expressed in a standard form, so they can be checked by generic RDFa validators. Microdata does not.
But it's not clear to me that this is a disadvantage in practice. Presumably anything that actually uses the data will necessarily be smart enough anyway to discard invalid data at no extra cost, so why not just do it at that stage? Or, if you're using a very small set of vocabularies as I propose MediaWiki does, you can assume that validators will exist for them anyway.