Re: [Wikitech-l] RDFa and Microdata in MediaWiki

20 Jan 2010


      On Mon, Jan 18, 2010 at 7:34 PM, Happy-melon happy-melon@live.com wrote:
...
I was saying that license templates are significantly easier to machine-read
than infoboxes, because their data is simpler.  The ultimate goal is, as you
say, to allow machine reading without bespoke parsing, but that's a long way
down the line.
No it's not.  Google already does it for RDFa and microformats.  Any
major user of microdata would encourage them to support that too
(especially since they invented it).  Multiple browsers have also
announced interest in supporting microdata.
...
At least we now *know* we're talking about different things :-D
Yep.  :P
...
I agree
there are gradations of what is 'worth' putting into the markup; although
""adding things on the basis of 'someone will surely find it useful'"" is
**exactly** what we will get if we allow the busy bee template developers
access to a metadata markup, almost by definition.
I bet very few people would bother adding metadata without a concrete
use.  And they'd probably get into fights with other people annoyed at
them for making it harder to edit wikitext.  This would all be
irrelevant if we only supported a few whitelisted vocabularies,
though, as the current microdata implementation does.  We should
encourage bulky and not-so-useful stuff to go in a separate stream.
...
I would say it's
definitely 'worth' exposing license metadata on every use of an image; the
status of a page's images affects our whole terms of use, whether we can say
"yes you can use all this in this fashion" verses "you have to jump through
these hoops for these images because they're different".  Author, location,
capture date; yes these probably aren't 'worth' the cost of exposing on
pages.  But being able to search commons for all photos taken in Berlin
between 1989 and 1991 would be worth its weight in gold.
Sure -- but that can be exposed in a separate data stream, since
...
99.9% of page views won't need it.
...
Indeed, but that's data *output*, not input.  Currently our categories are
input via [[Category:Foo]] and output via some HTML at the bottom of the
page, but also via the API in a variety of formats; people use both methods
to extract the metadata.  Once MW knows what data an object has, how it
outputs that data back is totally open as you say.  So given that a
translation into a format that MW understands is desirable for its own sake,
and that from there it's trivial to translate back into whatever output
format(s) the current web demands, why would we choose an input format like
<span xmlns:dc="http://purl.org/dc/elements/1.1/"
href="http://purl.org/dc/dcmitype/StillImage" property="dc:title"
rel="dc:type">EmeryMolyneux-terrestrialglobe-1592-20061127.jpg</span>
by <span xmlns:cc="http://creativecommons.org/ns#" href="#mw-image"
property="cc:attributionName" rel="cc:attributionURL">Bob Smith</span>
is licensed under a <a rel="license"
href="http://creativecommons.org/licenses/by-sa/3.0/us/">Creative
Commons Attribution-Share Alike 3.0 United States License</a>
Rather than an input format like [[License::CC-BY-SA-3.0]]??
First, why are you asking me why we would choose RDFa when I don't
think we should?  At least quote microdata.
Second, this is apples to oranges.  Your RDFa sample a) says that the
work is a still image, b) gives its name, c) gives the author's name,
d) gives the URL of the license, e) contains user-visible prose.  Your
wikitext sample just gives the license name (not even a license URL!).
 No kidding the latter is shorter.  A more realistic comparison might
be
<p><span itemprop="title">EmeryMolyneux-terrestrialglobe-1592-20061127.jpg</span>
by <span itemprop="author">Bob Smith</span> is licensed under a <a
itemprop="license"
href="http://creativecommons.org/licenses/by-sa/3.0/us/">Creative
Commons Attribution-Share Alike 3.0 United States License</a>.</p>
vs.
<p>[[title::EmeryMolyneux-terrestrialglobe-1592-20061127.jpg|]]
by [[author::Bob Smith|]] is licensed under a
[[license::http://creativecommons.org/licenses/by-sa/3.0/us/%7C%5Bhttp://creativecommon...
Creative
Commons Attribution-Share Alike 3.0 United States License]]].</p>
or something, which is not such an easy call.  The wikitext is not
that much shorter or simpler -- particularly when you account for the
fact that you'd have to separately define mappings to concrete
microdata/RDFa/RDF vocabularies for output.  (Yes, I left out the
itemtype on the microdata, but again, that would have to be defined
somewhere for the wikisyntax too.)
On Mon, Jan 18, 2010 at 7:47 PM, Manu Sporny msporny@digitalbazaar.com wrote:
...
Looks like I've had my hand slapped twice during this discussions. I
thought this was the first warning, but David seems to think
differently. That means that either I've been too aggressive or David is
not familiar with the level of intensity surrounding the Microdata/RDFa
debates.
That veiled insults and questioning others' motives is par for the
course on public-html doesn't mean we're going to tolerate it here.
It shouldn't happen there either, of course, but we can't help that.
...
I strongly disagree with the idea of getting
Microdata integrated with Wikipedia at this stage, before REC
This is just not a reasonable position to take outside the ivory tower
of standards-making.  We are not going to deny our users useful
features just because some spec somewhere that happens to describe the
feature is not absolutely 100% fully finished.  We use zillions of
features that aren't in any spec at all, or are only in Working Draft,
as do all authors.  Do you really think we shouldn't be using CSS3
Selectors or CSS2.1 until they're REC?  Should we only use a Java
video player even when multiple browsers support a much better *and*
more standards-compliant experience via <video>, just because HTML5 is
still a WD?
This is just not tenable.  We use features when they're useful, not
when someone else thinks we should use them.  Our goal is to serve our
users, not spec writers.  Users above authors above implementers above
specification writers . . .
On Tue, Jan 19, 2010 at 2:40 AM, Dmitriy Sintsov questpc@rambler.ru wrote:
...
[[work::http://upload.wikimedia.org/...terrestrialglobe-1592-20061127.jpg]]
[[title::Emery Molyneux Terrestrial Globe]]
[[author::Bob Smith]]
[[license::http://creativecommons.org/licenses/by-sa/3.0/us/]]
We could use this, but I don't see a big advantage over raw microdata
if a) we'll be outputting as microdata at first anyway, and b) it's
only expected to be used for a very few things like licenses,
presumably hidden away behind templates.  If it is done, though, it
should be with curly braces for sanity's sake: {{#prop:author|Bob
Smith}} or whatnot.
This sort of thing might be good syntax for a separate RDF stream, but
I think we can keep that simpler.  Instead of having {{Infobox
foo|name=Bob Smith}} contain, somewhere, {{#prop:name|{{{name}}}}},
creating the triple (page name, 'name', 'Bob Smith') for the page, why
not just leave out the #prop and have *every* template parameter
create a triple?  So {{foo|bar=baz|quuz=quuuz}} would create the
triples (page name, 'foo|bar', 'baz'), (page name, 'foo|quuz',
'quuuz') with no extra markup needed.  The triples could then be
transformed into a more useful form by the consumer, using a language
like OWL.  This is something like how dbpedia.org works right now,
AFAICT.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] RDFa and Microdata in MediaWiki