New subject: A wiki of semantics for the wiki?

4 Dec 2003


      So, when I was weeding through the RFEs on SourceForge last week, I
noted quite a few having to do with metadata, including:
579758 Consider keyword meta tags
      629323 Simple article categorisation system
      766213 Provide Standardized Subject Search
      839394 Meta ICBM tags when appropriate
I think I must have closed like 5 other RFEs and referred them to
#629323. There's lots of people who want category metadata.
Anyways, I wanted to float a design idea for inclusion of metadata in
MediaWiki articles. None of this is particularly new, and it seems
like most has been suggested at one point or another.
---8<---
WHAT IS METADATA?
-----------------
Metadata is data about data. For example, the filename and count of
bytes of a file on a filesystem is metadata: not part of the data, but
_about_ the data. Metadata can arise naturally from the data (byte
count) or can be added from outside the data (file name).
Metadata helps us sort data, find data, and make decisions about
data. For example, you can sort files in a directory alphabetically by
filename to find a file you want. Or you can sort them by size to find
the biggest or smallest. You can delete a file called "TEMPFILE.$$$"
because you know it's a temporary file.
We use metadata so often, we sometimes forget that it's "meta". But it
is: changing the name of a file, or moving it to another folder,
doesn't change the contents of the file. It just changes how we find
and access the file.
METADATA IN MEDIAWIKI
---------------------
We use a lot of metadata for MediaWiki articles: page sizes, page
histories, new pages page, recent changes page, etc. etc. Some of the
metadata we use is calculated from the data itself -- sizes -- and
some of it is not -- timestamps, who made changes, etc. It would
probably be fair to say that article titles and redirects are
metadata.
One form of metadata that we now use is interlanguage
links. Interlanguage links are metadata that say: "There is an article
in the language XX Wikipedia (or other installation) that covers this
same topic." That's pretty cool metadata to have.
What I propose it that MediaWiki expand this metadata format to cover
other types of metadata, such as:
* categorization -- saying that particle physics is in the
        physics category, or that Lord of the Rings is in the fantasy
        books category
      * relationships between articles -- break up a single page into
        multiple chapters or sections, and note that they're all part
        of the same article
      * synopses -- providing a synopsis or description of an article
      * geography -- marking up pages to specify that they cover
        a particular geographical location
      * customizable per-installation metadata -- metadata that may
        make sense for different installations.
I'm particularly interested in this last one, since it would help us
with Wikitravel.
METADATA IN THE DATABASE
------------------------
I think it's pretty reasonable to just think of metadata as name-value
pairs, where the name key is not necessarily unique. For example, we
could have a table like this for the article on David Brinkley:
article id   metadata_name        metadata_value
      ------------------------------------------------
      1            category             journalists
      1            category             American people
      1            see_also             CBS
      1            see_also             CBS News
It should be relatively easy to slurp up the metadata on an article
when rendering the article, and to pluck out the metadata from an
article when saving it. We do this with links, broken_links, and
image_links now.
METADATA IN WIKITEXT
--------------------
One way to do metadata is to have a different entry format for
metadata than for the text and markup of an article. I'm going to
reject that, since we already have a winning mechanism for marking up
some metadata -- interlanguage links -- within the Wiki
markup. Whether this would make it _really_ data instead of metadata
is left as an exercise to the reader.
There's a couple of ways we could do this in wikitext:
<<name=value>>
        <name:value>
        [[meta:name=value]]
        [[meta:name:value]]
        [[name:value]]
The first couple are kinda radical, and don't really jibe with
interlinks, anyways. The last one has the potential to clash with
other namespaces.
I prefer [[meta:name=value]], just because it's kinda easy. I realize
that the name "meta" clashes with "metawikipedia", so maybe another
work would work.
It's not particularly important what format we use; what's important
is that we have a way to enter arbitrary name-value pairs into the
text of articles.
RENDERING METADATA
------------------
I think there are a couple of ways we can deal with metadata in an
article when rendering it:
* Add it as a <meta> tag to the HTML <head>
        * Add it as a <link> tag to the HTML <head>
        * Add it as an out-of-page link, like Interlanguage links work
          now
        * Render it in the text of the document (I can't see why this
          would be useful, but)
        * Ignore it
I think we could have several pre-defined names, with predefined
rendering, and other names with possible renderings configurable
per-installation.
OTHER USES OF METADATA
----------------------
There are some other uses of metadata, of course. One would be
automatically-built directories, by category.
---8<---
So, my second big proposal of the day.
~ESP
-- 
Evan Prodromou evan@wikitravel.org
Wikitravel - http://www.wikitravel.org/
The free, complete, up-to-date and reliable world-wide travel guide

I never metadata I didn't like