* Evan Prodromou evan@wikitravel.org | | I also added a proposal for applying this feature to categorization; | see here: | | | http://meta.wikipedia.org/wiki/Categorization_with_field-value_pairs
Hi. I'm new to the list but have been following the discussion regarding metadata with some interest. I welcome this discussion because I believe that Wikipedia could gain a lot from adding "metadata" to the articles. I am a complete newbie when it comes to Wikipedia but have been interested in the categorisation of content for a while. I have just had a look at the above document and wish to make a few comments. There has also just been a flourish of mails on the list and this post applies to them as well.
1. What actually meant by category?
I believe that the relationship "category" is perhaps being used too liberally and that it could lead to problems down the track. Certainly I can see some problems in the example already given. The main problem I see so far is that category has been used to cover different concepts: "type/instance", "type/subtype", "whole/part" and "related". Mixing these up will cause problems for those wanting to etract the metadata and do something with it. From a web surfers or a display perspective it looks OK because these three concepts can some kind of containment. Type pages can list instances. Type pages can list subtypes, pages can list related.
I'll take a quick look at one of the examples given so far to show what I mean:
Example A:
w:William Carlos Williams [[category=American poets]] [[category=20th-century Americans]]
In this example category is being used as a type/instance: - William Carlos Williams IsA American Poet - William Carlos Williams IsA 20th-century American
Example B:
w:American poets [[category=Americans]] [[category=poets]]
In this example category is being used as a type/subtype - American poet IsASubType of American - American poet IsASubType of Poet
Example C:
w:Canada [[category=History of Canada]]
In this later example it is being used as a "related" link. Maybe this could be considered a "facet" of the article. - History of Canada isAbout Canada
Example D:
w:Muffler [[category=Car]]
And here is a "part/whole" relationship thrown in for good measure. - Muffler IsAPartOf Car
The point is that "category" is really disguising all of these "containment" relationships. I believe that it would be a big mistake for users to apply the term liberally across the whole site as you are really hiding just as much as you are revealing. You will have achieved you aim of being able to provide lists but I also believe that you would have missed a big opportunity to provide some structure to the textual content you already have.
I therefore beieve that a reaosnable case could be made for the following basic roles: [[category=x]] type/instance [[superType=x]] type/subtype [[related=x]] related [[whole=x]] part/whole
IMO the "category" or type/instance relationship will be the one most commonly used. This is where all of the low hanging fruit is - what with the list pages and all.
There are a few further short comings (but I guess that these were non aims anyway) - How do you express properties?. eg: Elvis hasProperty diedAtAge 42. - How do you express other relationships? eg Elvis played Rock
If you were going to deal with these two situations then things get a bit more complicated syntax wise. You would also need to loosen up the kinds of roles which were allowed.
2. Where can assertions be made?
The current formulation assumes that the current page is the subject, however, there will be circumstances where this won't bethe case. This will happen in situations where it is easier to group assertions together. For example, on any "lists" page an author is listing out instances of a certain type (x isA y). In this case x is the subject of the assertion but not the subject of the page - the subject is "List of Ys". It therefore might be of benefit to be able to specify all three parts of a triple on these pages.
eg.List of Colours red [red: category=colour] blue [blue: category=colour]
[nb: I am not proposing a syntax here - just showing that all three parts of a triple should be able to be expressed.]
This means that type information can be managed in one place and that the author doesn't have to open up every Colour page to add the info. If this were possible then it would be very easy to mark up type information quickly. The same could probably be said for timelines. However, I do note that the rationale for the proposal was to avoid just this type of thing as it is tedious to keep lists of things. Personally, I do agree with this rationale, however, it might be more natural for the maintainers of lists to specify all of the relationships in one place. This would lead to duplication of information however.
A stronger case for being able to supply a full triple can be made by considering that some pages deal with complex relationships and it makes sense to be able to include the full triple on that page.
eg.Complementary Colours Blue and Green should never be seen. [blue:clashesWith=green] Blue and Orange are cool. [blue:complements=orange]
These last features probably fall outside the 80/20 for the categorisation system. I'm just flagging them as something that users may want to do in the future.
[I have just seen that there has been some recent conversion on this on the list!]
3. What do the parts of the triple represent?
The point has been made by others that all parts of the triple can be represented by Wikipedia pages. IMO this is an amzingly strong point of the system. ie. Every page represents a subject and the subjects can be all parts of a triple: "objects" and "predicates" as well as "subjects". Don't forget the predicate!
The area where it falls down is in trying to represent discrete values which are generally represented by properties - the value doesn't really represent a subject. eg. Elvis ageAtDeath 42 Life hasMeaningOf 42
"42" could be a subject in Wikipedia by being given a page (eg. this has been done for representing years) but this doesn't mean that all property assignments should necessariy link to this page. In the Elvis example it probably shouldn't link. In the Life example it could.
So, I am arguing that a metadata system needs a way of specifying plain property values. I guess that maybe I should get with the Wiki program and realise that "42" could just be a blank page waiting for someone to add some content to it. Wikis are very good at growing in the areas they need to.
4. A look at some of the Disadvantages listed at http://meta.wikipedia.org/wiki/Categorization_with_field-value_pairs
* Huge amount of manual work to do to categorize English Wikipedia
There is a hell of a lot of info contained in the lists pages. Wikipedia is largely about proper nouns and instances of things. This is all contained in the lists pages. You can get up and running very quickly with this.
* Lists of articles and sub-categories may get unreasonable long (although this can be ameliorated with careful categorization schemes).
If you output the lists with a script then in must be able to support paging. Imagine a list of all of the "people" in Wikipedia. Lists of types high in the type hierarchy will always have many instances.
* Unclear generalization of what the "type" field means. What other types would there be?*
I didn't understand what you were trying to achive with [[type=category]]. Assuming that categories are for type/instance relationships then you could have:
w:American poets [[category=type]] - American poets IsA Type
This way you can have a "Types" page which would list American Poet and all other types in the system. I would just drop the the [type=x] because that is exactly what [category=] is doing, assuming that category represents type/instance :)
* Unclear whether categories refer to the article or to the subject of the article. For example, would [[category=biography]] be appropriate fo r w:William Carlos Williams?*
Good question. In general I would say the "subject of the article". When I say "Blue isA Colour" I am talking about the subjects (Blue and Green), not about the articles (the resources). This proposal is therefore about making assertions between subjects, rather than applying metadata to articles.
Having said that, the biography is clearly about the article and I would feel uneasy about saying [[category=biography]] as it would not be strictly correct. The subject is a person not an article. If you were doing it properly you would have to create a "articleType" role, [[articleType=biography]]. If you were being anal about it and wanted to signify that the articleType role was not about the subject you could say about article type [[category=resourceRole]]. Also, "biogrpahy" could be an instance of "article type". Wikipedia readers would be happy enough to see "Article Type: Biography" and a link to "biography" which would list all of the biographies.
Erik has just stated that: | 2) Category pages are not articles. Like talk pages and meta pages, they | should be logically separated from articles, which has numerous benefits | (easier searching/filtering, counting etc.)
I would disagree and so that all pages represent subjects. Each article is designed to represent a subject in a computer in an addressable way. The subjects can be anything we wish to talk about - instances and types. This is a huge advantage for Wikipedia - all subjects are treated the same. Why should they be logically seperated. You already have a create system for managing the textual content for all subjects. This proposal is just a layer on top. Seperation = complexity. The system is ultra flexible at the moment and drawing lines through it would be a difficut task.
eg. The "american poet" page has some text which explains what an american poet is. It then gets the added goodies of the categorisation system offers.
* May get abused for purposes other than categorization where more specific fields may be more appropriate. For example, a status field for articles ([[status=stub]]) may be useful, but could be preempted by [[category=stub]] before the "status" field is implemented.*
This is the same case as [[category=biography]]. The assertion is about the article not the subject. So, if it were possible to create new roles for things such as status you could do:
Article X [[status=stub]]
Status [[category=resourceRole]]
resourceRole [[category=type]]
* Fuzzy distinction between part-whole relationships and category-member relationships.*
I agree that this is a disadvantage for the reasons outline above. I think that you can get around this by specifying that "category=" can only be used for type/instance kinds of relationships. Don't forget type/subtype and related as well.
5. Back to work
Maybe the subject line of this email is a bit erroneous - I've broadened things out. Sorry about that. Just a few thoughts I had to get down. You might want to take a look at an italian opera topic map which provides examples of Subject Indexes, Resource Indexes and Role Indexes, as well as the instances of these types. There are a few features in this topic map which we would never be able to do in wikipedia but the basics are achievable with a few well selected relationship types.
http://www.ontopia.net/omnigator/models/topicmap_complete.jsp?tm=opera.x tm
cheers
Murray Woodman -- http://www.murraywoodman.com/ http://www.veryhappening.com/ http://www.topicmap.com/