[Commons-l] All about Wikimania, future projects, licenses, etc etc

Andrew Gray shimgray at gmail.com
Sun Aug 12 17:10:55 UTC 2007


On 12/08/07, Brianna Laugher <brianna.laugher at gmail.com> wrote:

> Tagging is flawed - some people put 'wiki', some put 'wikis', some put
> 'wikipedia', etc etc. And yet somehow it doesn't seem to matter. this
> is puzzling. I haven't really seen a site do
> intentionally-collaborative tagging, where the users actively try to
> have the same understanding for the same tag. no wonder we have so
> many problems with categories. ;)

The problem is that you need a controlled vocabulary of some form - a
way of saying "mark all images with cats as 'cats', not 'cat'", twelve
thousand times - and we so do not have this; we have a classic
folksonomy, people tagging with whatever they feel like.

[Okay, now I've shown I can remember the buzzwords they taught me at
library school...]

It's possible to turn a folksonomy into a controlled vocabulary, in two ways:

a) Manual patrolling - change all uses to conform with a controlled vocabulary
b) Tag equivalence - ensure everything *corresponds* to an entry in a
controlled vocabulary

a) is essentially what gets done with categories. People keep looking
at categories, merging them and renaming them and organising them; new
data gets subsumed into the existing structure. (Enwiki's category
intersections - "French mathematicians" - are great examples of this;
there's two or three ways to phrase each one, and dozens of people
doing nothing more than make sure they're all standardised). Here,
you'd hunt out all incidences of "cats" and change them to "cat". The
problem is the meta-standardisation of this... I'm not sure quite how
long-term workable it is without constant maintenance.

b) is perhaps more interesting. In the bowels of whatever system you
use for tagging, set it up so that one tag, one identifier, can be
represented by many different tags. In effect, allow "cat" and "cats",
but ensure a search for one displays the other as well. LibraryThing
does this, and does it fairly well; their tag lists contain
mispellings and foreign terms as well as variant names, which is quite
useful. Configuring this and ensuring it doesn't get accidentally
snarled up - inadvertently merging two large groups can be confusing -
is tricky, but once it's up and running it should require less ongoing
maintenance.

A few representative collections, representing one tag each:

philosophy of science, Ciencia-Filosofía, Philosophy (Science),
philosophy_of_science, Science - Philosophy, science philosophy,
Science-Philosophy

theology, teologia, theolgy, theologie, Theololgy, Theoloogy

wwii, 2nd world war, second world war, SecondWorldWar,
second_world_war, segunda guerra, Segunda Guerra Mundial, w.w.ii, war
(WWII), war world ii, word war 2, World War (1939-1945), world war
1939-1945, world war 2, world war ii, World War II 1939-1945, world
war ll, world war two, world war. 1939-1945, World War2, world-war-2,
worldwarII, world_war_ii, ww 2, ww ii, ww11, ww2, WW_II, Zweiter
Weltkrieg

The main problem here is ambiguous terms, the classic that LT deals
with being 'sf' - science fiction, or books about San Francisco? There
are also long debates to be had about meaningful correspondences - are
'paranormal' and 'supernatural' the same thing? 'humor' and 'humour'?
The last won't really apply for photos, but is an interesting question
with regard to the written word - compare
http://www.librarything.com/tag/humor and
http://www.librarything.com/tag/humour - and demonstrates the
subtleties that can be found in folksonomies...

-- 
- Andrew Gray
  andrew.gray at dunelm.org.uk



More information about the Commons-l mailing list