On Sat, Jan 16, 2010 at 9:07 PM, Jesse (Pathoschild) pathoschild@gmail.com wrote:
Wikisource, especially, is in desperate need of metadata. We have some 140,000 pages on the English wiki alone that represent poems, chapters, tables of contents, and so forth. These are essentially disorganized: we have human-usable templates and categories, but there's really no good way to find works besides searching their titles.
A few years ago we combined our metadata templates into two standard templates, {{header}} (for works) and {{author}} (for authors). Every single page already provides metadata to these templates, so implementing a metadata format for machine use is trivial once it is available on MediaWiki. We *really* want this; it would allow us to index our jumbled pile of works and authors in all sorts of very useful and interesting ways. Just a few example are author search and autocompletion (we currently list works manually), finding works by genre and year and subject and so forth, searching work descriptions, and distinguishing works from subpages.
What we're talking about (microdata, RDFa, RDF, etc.) is categorically useless for Wikimedia-internal use. The only use that any of this metadata stuff has to us is exposing info to *non*-Wikimedia agents. For internal use, we can make up our own custom formats and use plain old database queries much more easily than resorting to any standard format.
For instance, we have lots of images on Commons under various licenses. *We* know which license each is under, because we use MediaWiki's category system. But *other* people (e.g., search engines) also want to know what licenses our images are under. So for this we want a standard format like microdata or RDFa, so they don't have to keep track of our internal data formats.
What Wikisource needs here is a MediaWiki extension. Standard metadata languages are not going to help at all. If no one is willing to write an extension for it now, no one will be willing with RDF support -- since that won't make the job the slightest bit easier.
Both formats have their own advantages and disadvantages. Microdata's simplicity is a significant advantage, but RDFa's built-in validation is also nice.
Neither has more built-in validation than the other. Both allow arbitrary validation. RDFa seems to allow validation to be encoded in a more machine-readable format, but whether that's an advantage at all is debatable. HTML5 does not provide a DTD, XML Schema, or any other machine-readable language description, for good reason.