On Sat, Jan 16, 2010 at 9:07 PM, Jesse (Pathoschild)
<pathoschild(a)gmail.com> wrote:
Wikisource, especially, is in desperate need of
metadata. We have some
140,000 pages on the English wiki alone that represent poems,
chapters, tables of contents, and so forth. These are essentially
disorganized: we have human-usable templates and categories, but
there's really no good way to find works besides searching their
titles.
A few years ago we combined our metadata templates into two standard
templates, {{header}} (for works) and {{author}} (for authors). Every
single page already provides metadata to these templates, so
implementing a metadata format for machine use is trivial once it is
available on MediaWiki. We *really* want this; it would allow us to
index our jumbled pile of works and authors in all sorts of very
useful and interesting ways. Just a few example are author search and
autocompletion (we currently list works manually), finding works by
genre and year and subject and so forth, searching work descriptions,
and distinguishing works from subpages.
What we're talking about (microdata, RDFa, RDF, etc.) is categorically
useless for Wikimedia-internal use. The only use that any of this
metadata stuff has to us is exposing info to *non*-Wikimedia agents.
For internal use, we can make up our own custom formats and use plain
old database queries much more easily than resorting to any standard
format.
For instance, we have lots of images on Commons under various
licenses. *We* know which license each is under, because we use
MediaWiki's category system. But *other* people (e.g., search
engines) also want to know what licenses our images are under. So for
this we want a standard format like microdata or RDFa, so they don't
have to keep track of our internal data formats.
What Wikisource needs here is a MediaWiki extension. Standard
metadata languages are not going to help at all. If no one is willing
to write an extension for it now, no one will be willing with RDF
support -- since that won't make the job the slightest bit easier.
Both formats have their own advantages and
disadvantages. Microdata's
simplicity is a significant advantage, but RDFa's built-in validation
is also nice.
Neither has more built-in validation than the other. Both allow
arbitrary validation. RDFa seems to allow validation to be encoded in
a more machine-readable format, but whether that's an advantage at all
is debatable. HTML5 does not provide a DTD, XML Schema, or any other
machine-readable language description, for good reason.