On Mon, Jan 18, 2010 at 7:34 PM, Happy-melon happy-melon@live.com wrote:
I was saying that license templates are significantly easier to machine-read than infoboxes, because their data is simpler. The ultimate goal is, as you say, to allow machine reading without bespoke parsing, but that's a long way down the line.
No it's not. Google already does it for RDFa and microformats. Any major user of microdata would encourage them to support that too (especially since they invented it). Multiple browsers have also announced interest in supporting microdata.
At least we now *know* we're talking about different things :-D
Yep. :P
I agree there are gradations of what is 'worth' putting into the markup; although ""adding things on the basis of 'someone will surely find it useful'"" is **exactly** what we will get if we allow the busy bee template developers access to a metadata markup, almost by definition.
I bet very few people would bother adding metadata without a concrete use. And they'd probably get into fights with other people annoyed at them for making it harder to edit wikitext. This would all be irrelevant if we only supported a few whitelisted vocabularies, though, as the current microdata implementation does. We should encourage bulky and not-so-useful stuff to go in a separate stream.
I would say it's definitely 'worth' exposing license metadata on every use of an image; the status of a page's images affects our whole terms of use, whether we can say "yes you can use all this in this fashion" verses "you have to jump through these hoops for these images because they're different". Author, location, capture date; yes these probably aren't 'worth' the cost of exposing on pages. But being able to search commons for all photos taken in Berlin between 1989 and 1991 would be worth its weight in gold.
Sure -- but that can be exposed in a separate data stream, since
99.9% of page views won't need it.
Indeed, but that's data *output*, not input. Currently our categories are input via [[Category:Foo]] and output via some HTML at the bottom of the page, but also via the API in a variety of formats; people use both methods to extract the metadata. Once MW knows what data an object has, how it outputs that data back is totally open as you say. So given that a translation into a format that MW understands is desirable for its own sake, and that from there it's trivial to translate back into whatever output format(s) the current web demands, why would we choose an input format like
<span xmlns:dc="http://purl.org/dc/elements/1.1/" href="http://purl.org/dc/dcmitype/StillImage" property="dc:title" rel="dc:type">EmeryMolyneux-terrestrialglobe-1592-20061127.jpg</span> by <span xmlns:cc="http://creativecommons.org/ns#" href="#mw-image" property="cc:attributionName" rel="cc:attributionURL">Bob Smith</span> is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/3.0/us/">Creative Commons Attribution-Share Alike 3.0 United States License</a>
Rather than an input format like [[License::CC-BY-SA-3.0]]??
First, why are you asking me why we would choose RDFa when I don't think we should? At least quote microdata.
Second, this is apples to oranges. Your RDFa sample a) says that the work is a still image, b) gives its name, c) gives the author's name, d) gives the URL of the license, e) contains user-visible prose. Your wikitext sample just gives the license name (not even a license URL!). No kidding the latter is shorter. A more realistic comparison might be
<p><span itemprop="title">EmeryMolyneux-terrestrialglobe-1592-20061127.jpg</span> by <span itemprop="author">Bob Smith</span> is licensed under a <a itemprop="license" href="http://creativecommons.org/licenses/by-sa/3.0/us/">Creative Commons Attribution-Share Alike 3.0 United States License</a>.</p>
vs.
<p>[[title::EmeryMolyneux-terrestrialglobe-1592-20061127.jpg|]] by [[author::Bob Smith|]] is licensed under a [[license::http://creativecommons.org/licenses/by-sa/3.0/us/%7C%5Bhttp://creativecommon... Creative Commons Attribution-Share Alike 3.0 United States License]]].</p>
or something, which is not such an easy call. The wikitext is not that much shorter or simpler -- particularly when you account for the fact that you'd have to separately define mappings to concrete microdata/RDFa/RDF vocabularies for output. (Yes, I left out the itemtype on the microdata, but again, that would have to be defined somewhere for the wikisyntax too.)
On Mon, Jan 18, 2010 at 7:47 PM, Manu Sporny msporny@digitalbazaar.com wrote:
Looks like I've had my hand slapped twice during this discussions. I thought this was the first warning, but David seems to think differently. That means that either I've been too aggressive or David is not familiar with the level of intensity surrounding the Microdata/RDFa debates.
That veiled insults and questioning others' motives is par for the course on public-html doesn't mean we're going to tolerate it here. It shouldn't happen there either, of course, but we can't help that.
I strongly disagree with the idea of getting Microdata integrated with Wikipedia at this stage, before REC
This is just not a reasonable position to take outside the ivory tower of standards-making. We are not going to deny our users useful features just because some spec somewhere that happens to describe the feature is not absolutely 100% fully finished. We use zillions of features that aren't in any spec at all, or are only in Working Draft, as do all authors. Do you really think we shouldn't be using CSS3 Selectors or CSS2.1 until they're REC? Should we only use a Java video player even when multiple browsers support a much better *and* more standards-compliant experience via <video>, just because HTML5 is still a WD?
This is just not tenable. We use features when they're useful, not when someone else thinks we should use them. Our goal is to serve our users, not spec writers. Users above authors above implementers above specification writers . . .
On Tue, Jan 19, 2010 at 2:40 AM, Dmitriy Sintsov questpc@rambler.ru wrote:
[[work::http://upload.wikimedia.org/...terrestrialglobe-1592-20061127.jpg]] [[title::Emery Molyneux Terrestrial Globe]] [[author::Bob Smith]] [[license::http://creativecommons.org/licenses/by-sa/3.0/us/]]
We could use this, but I don't see a big advantage over raw microdata if a) we'll be outputting as microdata at first anyway, and b) it's only expected to be used for a very few things like licenses, presumably hidden away behind templates. If it is done, though, it should be with curly braces for sanity's sake: {{#prop:author|Bob Smith}} or whatnot.
This sort of thing might be good syntax for a separate RDF stream, but I think we can keep that simpler. Instead of having {{Infobox foo|name=Bob Smith}} contain, somewhere, {{#prop:name|{{{name}}}}}, creating the triple (page name, 'name', 'Bob Smith') for the page, why not just leave out the #prop and have *every* template parameter create a triple? So {{foo|bar=baz|quuz=quuuz}} would create the triples (page name, 'foo|bar', 'baz'), (page name, 'foo|quuz', 'quuuz') with no extra markup needed. The triples could then be transformed into a more useful form by the consumer, using a language like OWL. This is something like how dbpedia.org works right now, AFAICT.