Re: [Wikitech-l] RDFa and Microdata in MediaWiki

18 Jan 2010

On Mon, Jan 18, 2010 at 7:47 AM, Henri Sivonen &lt;hsivonen(a)iki.fi&gt; wrote:
...
  It's true that both HTML+RDFa and Microdata have
been published in Working Drafts at the W3C. However, Microdata has never been through a
Working Group Decision to publish as a First Public Working Draft while HTML+RDFa has.
Microdata was added to a Working Draft after FPWD and there has since been a Working Group
decision to take Microdata out of that spec.

 It is reasonable to expect that soon HTML+RDFa and Microdata could be in the same stage
Process-wise, but it's inaccurate to portray them as being at the same stage
Process-wise right now. 
I simplified a bit, yes.  The current Working Draft of HTML contains
microdata, but the next one won't, but it seems almost certain that
there will be a microdata FPWD published concurrently with the next
HTML WD.  So they're about the same -- both currently at WD, both
almost certain to still be in WD at the next publication (microdata a
bit less certain, but not much IMO).

On the other hand, RDFa+XHTML is a REC at the W3C, and currently we do
output well-formed valid XHTML 1.0, albeit with a text/html MIME type.
 On the other other hand, microdata is at Last Call at the WHATWG, and
we have an assurance of stability from its editor, independent of its
status at the W3C.  On the other other other hand, RDFa 1.1 is under
development and looks like it will make major changes, so from that
perspective microdata is arguably more stable.

So, it's complicated.  :)  But from our perspective, I don't think
there's a big difference in terms of stability or standard-ness, so I
skipped over all this.

On Mon, Jan 18, 2010 at 8:52 AM, Neil Harris &lt;neil(a)tonal.clara.co.uk&gt; wrote:
...
  Since both RDFa and Microdata support the same
underlying data model,
 and it's likely to take some time to resolve which will be the eventual
 winner, perhaps we should decouple the generation of the final HTML
 output from the markup of semantic text in articles.

 Since it makes no sense to implement yet another incompatible "semantic
 wikitext" format for internal use, we will probably end up using
 something that is pretty close to one or the other, buried inside
 templates, to perform the actual in-wiki markup. Given this, is it worth
 considering which is easier for template authors to write, and which is
 easier to convert to the other -- RDFa to microdata or vice-versa? 
AFAIK, Microdata is slightly less expressive than RDFa (it can't
express cycles or something like that -- maybe someone else could
clarify?), so converting the microdata graph to RDFa might be easier
than the reverse.  I also think microdata is much easier to author for
people with an HTML (not RDF) background -- template editors tend to
have a good working knowledge of HTML, but not web-data technologies.
I'd be interested in what Manu (or other RDFa supporters) has to say
here.

On Mon, Jan 18, 2010 at 9:46 AM, Daniel Kinzler &lt;daniel(a)brightbyte.de&gt; wrote:
...
  Perhaps the right approach for us would be to have
"some" syntax for providing
 this info, and then generating html5 microdata and/or rdfa into the rendered
 html, write the triple into a smw backend store, and provide rdf/xml/n3/whatever
 output via the api.

 there are three aspects here: specify, store, output. perhaps we should look at
 them separately. 
There are two separate things we want to do here, IMO:

1) Output a very few pieces of metadata that would be useful to HTML
consumers, like license metadata.  For these, we should use microdata
or RDFa, maybe just with one or two vocabularies whitelisted, and it
would be simplest to just let people type it into templates via
wikitext.  I'm pretty certain about this.

2) Output more generic metadata extracted from infoboxes and such.
For this, I think we should use a separate RDF stream.  I don't think
we need to do conversion here -- we should be able to just publish
template parameter triples as RDF, and let consumers convert it to
something conventional via OWL or things like that.  I'm less certain
about this, because I know less about web data technology.  I'd want
to look more deeply into dbpedia or such before trying to solve this.
We don't want to use RDFa or microdata here, that I'm quite sure of.
I also don't think we need any kind of in-band semantics here, we
should be able to just use template parameters so template authors
don't have to be bothered.

So for license data specifically, I think our current best option is
to use microdata for the wikitext input, and microdata for the HTML
output.  On the input side, this is

* Simpler for template editors to author.
* More conveniently tailored to our precise use-case (at least
microdata license vocabulary vs. the RDFa vocabs used so far, more
convenient RDFa vocabs might exist).

On the output side, microdata

* Uses fewer bytes.
* Validates better (I think -- there's an HTML5+microdata validator,
but I know of no HTML5+RDFa validator).
* Looks like it will have better support in browsers (e.g., Opera
might be interested in exposing license metadata through a GUI).

We can always add new input formats or switch the output format later
if we have good reason, though.  Especially if we keep input
restricted to one or two vocabularies -- or three, which for microdata
is all of them right now.  :)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] RDFa and Microdata in MediaWiki