So, when I was weeding through the RFEs on SourceForge last week, I noted quite a few having to do with metadata, including:
579758 Consider keyword meta tags 629323 Simple article categorisation system 766213 Provide Standardized Subject Search 839394 Meta ICBM tags when appropriate
I think I must have closed like 5 other RFEs and referred them to #629323. There's lots of people who want category metadata.
Anyways, I wanted to float a design idea for inclusion of metadata in MediaWiki articles. None of this is particularly new, and it seems like most has been suggested at one point or another.
---8<---
WHAT IS METADATA? -----------------
Metadata is data about data. For example, the filename and count of bytes of a file on a filesystem is metadata: not part of the data, but _about_ the data. Metadata can arise naturally from the data (byte count) or can be added from outside the data (file name).
Metadata helps us sort data, find data, and make decisions about data. For example, you can sort files in a directory alphabetically by filename to find a file you want. Or you can sort them by size to find the biggest or smallest. You can delete a file called "TEMPFILE.$$$" because you know it's a temporary file.
We use metadata so often, we sometimes forget that it's "meta". But it is: changing the name of a file, or moving it to another folder, doesn't change the contents of the file. It just changes how we find and access the file.
METADATA IN MEDIAWIKI ---------------------
We use a lot of metadata for MediaWiki articles: page sizes, page histories, new pages page, recent changes page, etc. etc. Some of the metadata we use is calculated from the data itself -- sizes -- and some of it is not -- timestamps, who made changes, etc. It would probably be fair to say that article titles and redirects are metadata.
One form of metadata that we now use is interlanguage links. Interlanguage links are metadata that say: "There is an article in the language XX Wikipedia (or other installation) that covers this same topic." That's pretty cool metadata to have.
What I propose it that MediaWiki expand this metadata format to cover other types of metadata, such as:
* categorization -- saying that particle physics is in the physics category, or that Lord of the Rings is in the fantasy books category * relationships between articles -- break up a single page into multiple chapters or sections, and note that they're all part of the same article * synopses -- providing a synopsis or description of an article * geography -- marking up pages to specify that they cover a particular geographical location * customizable per-installation metadata -- metadata that may make sense for different installations.
I'm particularly interested in this last one, since it would help us with Wikitravel.
METADATA IN THE DATABASE ------------------------
I think it's pretty reasonable to just think of metadata as name-value pairs, where the name key is not necessarily unique. For example, we could have a table like this for the article on David Brinkley:
article id metadata_name metadata_value ------------------------------------------------ 1 category journalists 1 category American people 1 see_also CBS 1 see_also CBS News
It should be relatively easy to slurp up the metadata on an article when rendering the article, and to pluck out the metadata from an article when saving it. We do this with links, broken_links, and image_links now.
METADATA IN WIKITEXT --------------------
One way to do metadata is to have a different entry format for metadata than for the text and markup of an article. I'm going to reject that, since we already have a winning mechanism for marking up some metadata -- interlanguage links -- within the Wiki markup. Whether this would make it _really_ data instead of metadata is left as an exercise to the reader.
There's a couple of ways we could do this in wikitext:
<<name=value>> <name:value> [[meta:name=value]] [[meta:name:value]] [[name:value]]
The first couple are kinda radical, and don't really jibe with interlinks, anyways. The last one has the potential to clash with other namespaces.
I prefer [[meta:name=value]], just because it's kinda easy. I realize that the name "meta" clashes with "metawikipedia", so maybe another work would work.
It's not particularly important what format we use; what's important is that we have a way to enter arbitrary name-value pairs into the text of articles.
RENDERING METADATA ------------------
I think there are a couple of ways we can deal with metadata in an article when rendering it:
* Add it as a <meta> tag to the HTML <head> * Add it as a <link> tag to the HTML <head> * Add it as an out-of-page link, like Interlanguage links work now * Render it in the text of the document (I can't see why this would be useful, but) * Ignore it
I think we could have several pre-defined names, with predefined rendering, and other names with possible renderings configurable per-installation.
OTHER USES OF METADATA ----------------------
There are some other uses of metadata, of course. One would be automatically-built directories, by category.
---8<---
So, my second big proposal of the day.
~ESP