This isn't specifically on the topic of GLAM, but is a follow up to one of the discussions that we had at last weekend's GLAM wiki bootcamp, and something I'm very interested in, and maybe others will be, too. I don't know a lot about this topic -- I'm hoping that others can provide me with some pointers to the communities and discussions on the mediawikis, where I might learn more.
Jarek, during his presentation, talked about the problems with categories on Wikimedia Commons, using the example, "Painted portraits of men of France". The main problem he described had to do with all of the possible permutations of subcategories, and the difficulty of consistently maintaining that subcategory tree.
A related problem also applies to a category we've been hearing a lot about recently, "American women writers". In this case, there are (at least) two problems that have been discussed: that this is an asymmetric category (there is no "American men writers"); and that people who are in this category are not also in "American writers", although they should be.
Another problem with this method of categorization is that it doesn't allow easy access to lists that might not correspond to a direct category, but might be interesting, and could be derived from the other categories. For example, all women writers who are *not* American.
IMO, this is a fundamentally flawed way of adding semantic information to articles. Jarek suggested that these should be tags, and I agree that that would be simpler. So, for example, instead of having a category "American women writers", there would be separate tags for "American", "female", and "writer". Then, the original category would be the union of these three. That would also solve the problem of the asymmetry in the way male writers and female writers are categorized, which is part of what led to the recent uproar.
But even better, IMO, would be true semantic tagging, in the sense of the Semantic Web. (I'm not an expert in this topic, but have read just enough to be dangerous.) Semantic linking lets you express simple truths about things, such as, "this is a person", "this is female", "this person is the author of Wuthering Heights", etc. Furthermore, ''ontologies'' allow people to express relationships among relationships. For example, "anyone who is the author of x is a writer", "all authors are also humans", etc.
Having heard a little bit about the Semantic MediaWiki and Wikidata, I just dug a little bit to see if there is work afoot to try to re-implement Wikipedia categories in some kind of scheme like this. The Semantic MediaWiki is a MediaWiki extension that's designed to allow this kind of tagging, but it was never integrated with any of the Wikipedias. Instead, Wikidata is being developed and integrated. There's some info on the relationships among Wikipedia, SMW, and Wikidata on the Wikipedia entry for SMW [1] and on the SMW FAQ [2].
I know Wikidata has tackled inter-language links first, and next on the agenda are infoboxes, but I'm not sure if anyone anywhere is planning on reimplementing categories with this kind of scheme. Of course it would be an enormous undertaking, and wouldn't happen anytime soon. But, it does seem to me there's the potential of a much more robust system. If anyone had any pointers to where I might learn more, or join in some of those discussions, I'd be very interested. I just posted a question to the wikidata-l mailing list.
---- [1] http://en.wikipedia.org/wiki/Semantic_MediaWiki#Semantic_MediaWiki_and_Wikid... [2] http://www.semantic-mediawiki.org/wiki/FAQ#What_is_the_relationship_between_...
----
Cheers! Chris