On Sat, Jan 16, 2010 at 9:37 PM, Aryeh Gregor Simetrical+wikilist@gmail.com wrote:
What we're talking about (microdata, RDFa, RDF, etc.) is categorically useless for Wikimedia-internal use. The only use that any of this metadata stuff has to us is exposing info to *non*-Wikimedia agents. For internal use, we can make up our own custom formats and use plain old database queries much more easily than resorting to any standard format. [...] For instance, we have lots of images on Commons under various licenses. *We* know which license each is under, because we use MediaWiki's category system.
Unfortunately, categories and database queries are inadequate for our needs. Someone can indeed navigate to Categories::Works::Works by genre::Non-fiction::Governmental::Biographies::Ancient biographies, and they'll find all 5 pages that someone thought to categorize to this depth. But if someone hopes to find our 1872 American biographies, they are going to be sorely disappointed.
Metadata, whether a standard or internal format, allows machines to extract this data from template output and store it in a database for human use. If you want 1872 American biographies mentioning a Willard, just fill in the year, location, and description fields, and check off the relevant genres from the database. This will return a list of actual works that match the exact criteria given, not subpages or mid-text false matches which are the best we can get now.
If we simply extend MediaWiki to support metadata for works or authors, the metadata is limited to these types and fields. Public metadata can be extended and parsed in any way the local community or our content users feel useful. Users can add their own metadata (translators? publishers? work licenses?) to templates, and add their own tools and databases to the collection.
This is also not possible with database queries, since the metadata is not provided to the software except as part of the wiki text. It's conceivable to extract it directly from the wiki text of a wiki dump, but this would be horrendously complex given the number of different options and combinations. It's possible to use an internal Wikimedia format, but this would be useless outside Wikimedia.
There is very little difference between internal and external use; it's no easier for a Wikisource editor to find those 1872 American biographies. Editors are also users. Categories are inadequate beyond the simplest one-dimensional criteria.
So, these metadata formats are definitely *not* useless for internal community use.