On Sat, Jan 16, 2010 at 9:37 PM, Aryeh Gregor
What we're talking about (microdata, RDFa, RDF,
etc.) is categorically
useless for Wikimedia-internal use. The only use that any of this
metadata stuff has to us is exposing info to *non*-Wikimedia agents.
For internal use, we can make up our own custom formats and use plain
old database queries much more easily than resorting to any standard
For instance, we have lots of images on Commons under various
licenses. *We* know which license each is under, because we use
MediaWiki's category system.
Unfortunately, categories and database queries are inadequate for our
needs. Someone can indeed navigate to Categories::Works::Works by
and they'll find all 5 pages that someone thought to categorize to
this depth. But if someone hopes to find our 1872 American
biographies, they are going to be sorely disappointed.
Metadata, whether a standard or internal format, allows machines to
extract this data from template output and store it in a database for
human use. If you want 1872 American biographies mentioning a Willard,
just fill in the year, location, and description fields, and check off
the relevant genres from the database. This will return a list of
actual works that match the exact criteria given, not subpages or
mid-text false matches which are the best we can get now.
If we simply extend MediaWiki to support metadata for works or
authors, the metadata is limited to these types and fields. Public
metadata can be extended and parsed in any way the local community or
our content users feel useful. Users can add their own metadata
(translators? publishers? work licenses?) to templates, and add their
own tools and databases to the collection.
This is also not possible with database queries, since the metadata is
not provided to the software except as part of the wiki text. It's
conceivable to extract it directly from the wiki text of a wiki dump,
but this would be horrendously complex given the number of different
options and combinations. It's possible to use an internal Wikimedia
format, but this would be useless outside Wikimedia.
There is very little difference between internal and external use;
it's no easier for a Wikisource editor to find those 1872 American
biographies. Editors are also users. Categories are inadequate beyond
the simplest one-dimensional criteria.
So, these metadata formats are definitely *not* useless for internal