"Aryeh Gregor" <Simetrical+wikilist(a)gmail.com> wrote in message
news:7c2a12e21001200638y759365c8oeecd8f06f761a583@mail.gmail.com...
On Mon, Jan 18, 2010 at 7:34 PM, Happy-melon
<happy-melon(a)live.com> wrote:
I bet very few people would bother adding metadata without a concrete
use. And they'd probably get into fights with other people annoyed at
them for making it harder to edit wikitext. This would all be
irrelevant if we only supported a few whitelisted vocabularies,
though, as the current microdata implementation does. We should
encourage bulky and not-so-useful stuff to go in a separate stream.
Yes, very few people would bother. Those few people would still introduce a
monstrous amount of extra markup by working deep in the template stack.
Doesn't take much to add kilobytes to large articles; I've added 5kb to
[[Barack Obama]] myself just by adding a span round reference brackets.
Just adding author metadata to citation templates would add seconds to load
times for large articles.
I would say
it's
definitely 'worth' exposing license metadata on every use of an image;
the
status of a page's images affects our whole terms of use, whether we can
say
"yes you can use all this in this fashion" verses "you have to jump
through
these hoops for these images because they're different". Author,
location,
capture date; yes these probably aren't 'worth' the cost of exposing on
pages. But being able to search commons for all photos taken in Berlin
between 1989 and 1991 would be worth its weight in gold.
Sure -- but that can be exposed in a separate data stream, since
>99.9% of page views won't need it.
I'm not talking about exposing it in a data stream per se, I'm suggesting
that that's what our internal search would be able to achieve if the
metadata was accessible to MediaWiki.
Indeed, but
that's data *output*, not input. Currently our categories
are
input via [[Category:Foo]] and output via some HTML at the bottom of the
page, but also via the API in a variety of formats; people use both
methods
to extract the metadata. Once MW knows what data an object has, how it
outputs that data back is totally open as you say. So given that a
translation into a format that MW understands is desirable for its own
sake,
and that from there it's trivial to translate back into whatever output
format(s) the current web demands, why would we choose an input format
like
<span
xmlns:dc="http://purl.org/dc/elements/1.1/"
href="http://purl.org/dc/dcmitype/StillImage" property="dc:title"
rel="dc:type">EmeryMolyneux-terrestrialglobe-1592-20061127.jpg</span>
by <span
xmlns:cc="http://creativecommons.org/ns#"
href="#mw-image"
property="cc:attributionName" rel="cc:attributionURL">Bob
Smith</span>
is licensed under a <a rel="license"
href="http://creativecommons.org/licenses/by-sa/3.0/us/">Creat…
Commons Attribution-Share Alike 3.0 United States License</a>
Rather than an input format like [[License::CC-BY-SA-3.0]]??
First, why are you asking me why we would choose RDFa when I don't
think we should? At least quote microdata.
Second, this is apples to oranges. Your RDFa sample a) says that the
work is a still image, b) gives its name, c) gives the author's name,
d) gives the URL of the license, e) contains user-visible prose. Your
wikitext sample just gives the license name (not even a license URL!).
No kidding the latter is shorter. A more realistic comparison might
be
<p><span
itemprop="title">EmeryMolyneux-terrestrialglobe-1592-20061127.jpg</span>
by <span itemprop="author">Bob Smith</span> is licensed under a
<a
itemprop="license"
href="http://creativecommons.org/licenses/by-sa/3.0/us/">Creat…
Commons Attribution-Share Alike 3.0 United States License</a>.</p>
vs.
<p>[[title::EmeryMolyneux-terrestrialglobe-1592-20061127.jpg|]]
by [[author::Bob Smith|]] is licensed under a
[[
license::http://creativecommons.org/licenses/by-sa/3.0/us/|[http://creative…
Creative
Commons Attribution-Share Alike 3.0 United States License]]].</p>
or something, which is not such an easy call. The wikitext is not
that much shorter or simpler -- particularly when you account for the
fact that you'd have to separately define mappings to concrete
microdata/RDFa/RDF vocabularies for output. (Yes, I left out the
itemtype on the microdata, but again, that would have to be defined
somewhere for the wikisyntax too.)
True, the markup Dmitry offered is more suitable. But Ryan is absolutely
right. You're only thinking about the the *current* generation of formats,
and assuming (maybe legitimately, I don't know) that microdata is the best
format for us to use. What happens when the next generation of format(s)
come out? With a format-neutral input format, MW sites can quickly adapt to
accommodate it. Plus this method of data-injection will much more work to
allow MW to extract the data from the wikitext, which puts our searching for
photos in Berlin issue further out of reach.
You could say that we're talking about different things again; that you're
talking about marking up data for external use. But there's no reason why a
{{#prop:foo|bar}} magic word can't *also* output some appropriate metadata
format into the wikitext. Marking up in a format-neutral syntax allows us
to output metadata from wikitext *and* from MW generally, and to change
*both* formats at the drop of a hat. Marking up in a particular format,
whatever the format is, makes it damn near impossible (or at least
hopelessly hackish) to change wikitext output from one format to another,
and equally horrible for MW to collect data at all.
--HM