On Mon, Jan 18, 2010 at 7:47 AM, Henri Sivonen hsivonen@iki.fi wrote:
It's true that both HTML+RDFa and Microdata have been published in Working Drafts at the W3C. However, Microdata has never been through a Working Group Decision to publish as a First Public Working Draft while HTML+RDFa has. Microdata was added to a Working Draft after FPWD and there has since been a Working Group decision to take Microdata out of that spec.
It is reasonable to expect that soon HTML+RDFa and Microdata could be in the same stage Process-wise, but it's inaccurate to portray them as being at the same stage Process-wise right now.
I simplified a bit, yes. The current Working Draft of HTML contains microdata, but the next one won't, but it seems almost certain that there will be a microdata FPWD published concurrently with the next HTML WD. So they're about the same -- both currently at WD, both almost certain to still be in WD at the next publication (microdata a bit less certain, but not much IMO).
On the other hand, RDFa+XHTML is a REC at the W3C, and currently we do output well-formed valid XHTML 1.0, albeit with a text/html MIME type. On the other other hand, microdata is at Last Call at the WHATWG, and we have an assurance of stability from its editor, independent of its status at the W3C. On the other other other hand, RDFa 1.1 is under development and looks like it will make major changes, so from that perspective microdata is arguably more stable.
So, it's complicated. :) But from our perspective, I don't think there's a big difference in terms of stability or standard-ness, so I skipped over all this.
On Mon, Jan 18, 2010 at 8:52 AM, Neil Harris neil@tonal.clara.co.uk wrote:
Since both RDFa and Microdata support the same underlying data model, and it's likely to take some time to resolve which will be the eventual winner, perhaps we should decouple the generation of the final HTML output from the markup of semantic text in articles.
Since it makes no sense to implement yet another incompatible "semantic wikitext" format for internal use, we will probably end up using something that is pretty close to one or the other, buried inside templates, to perform the actual in-wiki markup. Given this, is it worth considering which is easier for template authors to write, and which is easier to convert to the other -- RDFa to microdata or vice-versa?
AFAIK, Microdata is slightly less expressive than RDFa (it can't express cycles or something like that -- maybe someone else could clarify?), so converting the microdata graph to RDFa might be easier than the reverse. I also think microdata is much easier to author for people with an HTML (not RDF) background -- template editors tend to have a good working knowledge of HTML, but not web-data technologies. I'd be interested in what Manu (or other RDFa supporters) has to say here.
On Mon, Jan 18, 2010 at 9:46 AM, Daniel Kinzler daniel@brightbyte.de wrote:
Perhaps the right approach for us would be to have "some" syntax for providing this info, and then generating html5 microdata and/or rdfa into the rendered html, write the triple into a smw backend store, and provide rdf/xml/n3/whatever output via the api.
there are three aspects here: specify, store, output. perhaps we should look at them separately.
There are two separate things we want to do here, IMO:
1) Output a very few pieces of metadata that would be useful to HTML consumers, like license metadata. For these, we should use microdata or RDFa, maybe just with one or two vocabularies whitelisted, and it would be simplest to just let people type it into templates via wikitext. I'm pretty certain about this.
2) Output more generic metadata extracted from infoboxes and such. For this, I think we should use a separate RDF stream. I don't think we need to do conversion here -- we should be able to just publish template parameter triples as RDF, and let consumers convert it to something conventional via OWL or things like that. I'm less certain about this, because I know less about web data technology. I'd want to look more deeply into dbpedia or such before trying to solve this. We don't want to use RDFa or microdata here, that I'm quite sure of. I also don't think we need any kind of in-band semantics here, we should be able to just use template parameters so template authors don't have to be bothered.
So for license data specifically, I think our current best option is to use microdata for the wikitext input, and microdata for the HTML output. On the input side, this is
* Simpler for template editors to author. * More conveniently tailored to our precise use-case (at least microdata license vocabulary vs. the RDFa vocabs used so far, more convenient RDFa vocabs might exist).
On the output side, microdata
* Uses fewer bytes. * Validates better (I think -- there's an HTML5+microdata validator, but I know of no HTML5+RDFa validator). * Looks like it will have better support in browsers (e.g., Opera might be interested in exposing license metadata through a GUI).
We can always add new input formats or switch the output format later if we have good reason, though. Especially if we keep input restricted to one or two vocabularies -- or three, which for microdata is all of them right now. :)