Some of the issues around the structure of categories have been both contentious and promising for a long time, 2 years or better... (see http://en.wikipedia.org/wiki/Wikipedia:Category_math_feature, and many links to category intersections / category math on meta)
I *still* think categories as tags is one of the most promising applications of a wiki, as opposed to hierarchical categories like dmoz. We currently seem to be doing some of both (the administrative categories are essentially tags, while many other categories, ie "Norwegian composershttp://en.wikipedia.org/wiki/Category:Norwegian_composers" which is several levels down from the high level musicians category, etc.) and that's really pretty cool.
I think we should *embrace* doing both types of categories, because we can in the wiki structure, it's not an mutually exclusive way of doing things. We can make everybody (sort of) happy. The hurdle then becomes writing code efficient enough to sift through all the records to find what you want.
Now, as I mentioned before, I don't' have experience with writing code to deal with zillions of hits, but I think a simple implementation of "find all pages belonging to category1 AND category2" is accomplished using a sql like this pseudocode: "select pages, count(pages) as count where (category='category1' or category='category2') group by pages having count='2'"
Again, I'm not efficiency expert, but that puts the heavy lifting on mysql and is pretty straightforward. I've currently written an implementation that is a bit more full featured, messy, and inefficient than that.. but I'd be happy to pitch in to a new one (I'm still on mediawiki 1.4x, sshh!)
Best Regards, Aerik
On 8/21/06, Aerik Sylvan aerik@thesylvans.com wrote:
I *still* think categories as tags is one of the most promising applications of a wiki, as opposed to hierarchical categories like dmoz. We currently seem to be doing some of both (the administrative categories are essentially tags, while many other categories, ie "Norwegian composershttp://en.wikipedia.org/wiki/Category:Norwegian_composers" which is several levels down from the high level musicians category, etc.) and that's really pretty cool.
I think we should *embrace* doing both types of categories, because we can in the wiki structure, it's not an mutually exclusive way of doing things. We can make everybody (sort of) happy. The hurdle then becomes writing code efficient enough to sift through all the records to find what you want.
We had a great discussion about all of this several months ago. Still not sure what to make of it all ;) However, it seems that there are several different fundamental types of categories (primarily, strict hierarchies, and thematic hierarchies), and some good rules on how to combine them (strict hierarchies can descend from thematic hierarchies, but never the other way). It would theoretically be possible and maybe desirable to make the granddaddy of all categories a basic metacategory like "administrative tag" or "taxonomic category" or whatever.
I had a dream that all pages could be categorised along at least one strict taxonomic category structure without resorting to copouts like "... terminology", but I found some tricky pages. Like the rock climbing term "enchainement" which refers to climbing several mountains without a break in the middle. I can't think of any plausible category except the artificial "rock climbing feats". And a couple of other things which really describe particular aspects of various fields that are so unique to the field I just can't think of a category.
I do need to look more at semanticwiki and category math and stuff, though. And to be optimistic that it can and will be integrated into Wikipedia when it's ready.
Steve
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Moin,
On Monday 21 August 2006 18:49, Aerik Sylvan wrote:
Some of the issues around the structure of categories have been both contentious and promising for a long time, 2 years or better... (see http://en.wikipedia.org/wiki/Wikipedia:Category_math_feature, and many links to category intersections / category math on meta)
I *still* think categories as tags is one of the most promising applications of a wiki, as opposed to hierarchical categories like dmoz. We currently seem to be doing some of both (the administrative categories are essentially tags,
I wonder why for these purposes templates aren't used. You can then retrieve all pages that have the "tag" via WhatLinksHere.
Maybe I didn't get the memo, but on wikis I contributed to that was done this way instead of using categories :)
All the best,
Tels - -- Signed on Mon Aug 21 19:59:53 2006 with key 0x93B84C15. Visit my photo gallery at http://bloodgate.com/photos/ PGP key on http://bloodgate.com/tels.asc or per email.
"MY CAUSE IS JUST... MY WILL IS STRONG... AND MY GUN IS VERY VERY LARGE!" - The Doom Guy
On 8/21/06, Tels nospam-abuse@bloodgate.com wrote:
I wonder why for these purposes templates aren't used. You can then retrieve all pages that have the "tag" via WhatLinksHere.
Maybe I didn't get the memo, but on wikis I contributed to that was done this way instead of using categories :)
Templates aren't really designed for that the way categories are. The syntax is less clear (the function of "{{foo}}" is less obvious than that of [[Category:Foo]], because of the keyword "category"), the "what links here" page is sorted quite randomly to most people's eyes (by date added, not alphabetic order), there aren't category links on the bottom of the page, fewer entries are displayed per page by default, you can't easily browse to a supercategory (which might in some cases by handy even for simple "tags"), and you can't split up large categories into subcategories.
Why would you *want* to use templates instead of categories?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Moin,
On Monday 21 August 2006 20:03, Simetrical wrote:
On 8/21/06, Tels nospam-abuse@bloodgate.com wrote:
I wonder why for these purposes templates aren't used. You can then retrieve all pages that have the "tag" via WhatLinksHere.
Maybe I didn't get the memo, but on wikis I contributed to that was done this way instead of using categories :)
Templates aren't really designed for that the way categories are. The syntax is less clear (the function of "{{foo}}" is less obvious than that of [[Category:Foo]], because of the keyword "category"), the "what links here" page is sorted quite randomly to most people's eyes (by date added, not alphabetic order), there aren't category links on the bottom of the page, fewer entries are displayed per page by default, you can't easily browse to a supercategory (which might in some cases by handy even for simple "tags"), and you can't split up large categories into subcategories.
Sensible points.
Why would you *want* to use templates instead of categories?
Because you can have a large, red block "THIS PAGES NEEDS CLEANUP!" screaming a the viewer :)
Mainly because I used templates before and for a small wiki it doesnt matter much. For Wikipedia, well, a category might make more sense as you pointed out. I was just asking.
Best wishes,
Tels
- -- Signed on Mon Aug 21 21:02:19 2006 with key 0x93B84C15. Visit my photo gallery at http://bloodgate.com/photos/ PGP key on http://bloodgate.com/tels.asc or per email.
"The campaign should combat the messages of pornography by putting signs on buses saying sex with children is not OK." -- Mary Anne Layden in ttp://tinyurl.com/6a9cy
Now, as I mentioned before, I don't' have experience with writing code to deal with zillions of hits, but I think a simple implementation of "find all pages belonging to category1 AND category2" is accomplished using a sql like this pseudocode: "select pages, count(pages) as count where (category='category1' or category='category2') group by pages having count='2'"
Heh, I wonder how you came up with this SQL... it's definitely not the most straightforward way of doing it, and (I'm afraid) also not the most efficient. It feels like it was deliberately trying to be more clever than necessary ;)
You method would: * retrieve all pages from category1 * retrieve all pages from category2 * sort the whole list * group them (i.e. replace runs with their counts) * filter out those where the count is 2
The straightfoward method would be:
select a.page from categorylinks a where a.category='category1' AND EXISTS ( select * from categorylinks b where a.page=b.page AND b.category='category2' )
This would: * retrieve all the pages from category2 * find all the pages in category1 that are any of those
I don't have MySQL handy, but tested this on MSSQL and the second method was way faster. :)
Timwi
wikitech-l@lists.wikimedia.org