This isn't specifically on the topic of GLAM, but is a follow up to
one of the discussions that we had at last weekend's GLAM wiki
bootcamp, and something I'm very interested in, and maybe others will
be, too. I don't know a lot about this topic -- I'm hoping that
others can provide me with some pointers to the communities and
discussions on the mediawikis, where I might learn more.
Jarek, during his presentation, talked about the problems with
categories on Wikimedia Commons, using the example, "Painted portraits
of men of France". The main problem he described had to do with all
of the possible permutations of subcategories, and the difficulty of
consistently maintaining that subcategory tree.
A related problem also applies to a category we've been hearing a lot
about recently, "American women writers". In this case, there are (at
least) two problems that have been discussed: that this is an
asymmetric category (there is no "American men writers"); and that
people who are in this category are not also in "American writers",
although they should be.
Another problem with this method of categorization is that it doesn't
allow easy access to lists that might not correspond to a direct
category, but might be interesting, and could be derived from the
other categories. For example, all women writers who are *not*
American.
IMO, this is a fundamentally flawed way of adding semantic information
to articles. Jarek suggested that these should be tags, and I agree
that that would be simpler. So, for example, instead of having a
category "American women writers", there would be separate tags for
"American", "female", and "writer". Then, the original
category would
be the union of these three. That would also solve the problem of the
asymmetry in the way male writers and female writers are categorized,
which is part of what led to the recent uproar.
But even better, IMO, would be true semantic tagging, in the sense of
the Semantic Web. (I'm not an expert in this topic, but have read
just enough to be dangerous.) Semantic linking lets you express
simple truths about things, such as, "this is a person", "this is
female", "this person is the author of Wuthering Heights", etc.
Furthermore, ''ontologies'' allow people to express relationships
among relationships. For example, "anyone who is the author of x is a
writer", "all authors are also humans", etc.
Having heard a little bit about the Semantic MediaWiki and Wikidata, I
just dug a little bit to see if there is work afoot to try to
re-implement Wikipedia categories in some kind of scheme like this.
The Semantic MediaWiki is a MediaWiki extension that's designed to
allow this kind of tagging, but it was never integrated with any of
the Wikipedias. Instead, Wikidata is being developed and integrated.
There's some info on the relationships among Wikipedia, SMW, and
Wikidata on the Wikipedia entry for SMW [1] and on the SMW FAQ [2].
I know Wikidata has tackled inter-language links first, and next on
the agenda are infoboxes, but I'm not sure if anyone anywhere is
planning on reimplementing categories with this kind of scheme. Of
course it would be an enormous undertaking, and wouldn't happen
anytime soon. But, it does seem to me there's the potential of a much
more robust system. If anyone had any pointers to where I might learn
more, or join in some of those discussions, I'd be very interested. I
just posted a question to the wikidata-l mailing list.
----
[1]
http://en.wikipedia.org/wiki/Semantic_MediaWiki#Semantic_MediaWiki_and_Wiki…
[2]
http://www.semantic-mediawiki.org/wiki/FAQ#What_is_the_relationship_between…
----
Cheers!
Chris