On Wed, 03 Dec 2008 17:05:39 +0100, Roan Kattouw roan.kattouw@home.nl wrote:
Daniel Schwen schreef:
So how does this take care of deep indexing non-atomic categories?
Err.. what? Please explain what you mean by that.
I think he means finding stuff that's already buried in sub-sub categories, when you query on a parent category. Like querying for and intersection of [[Category:Deceased people]] and [[Category:Presidents of the United States]] won't find the guys listed in [[Category:Deceased Presidents of the United States]] without re categorizing those entries.
=>How will this extension be even remotely useful for let's say commons?
Without addressing Commons in particular, having an efficient way to get pages in the intersection of multiple categories would allow wikis to delete a category such as [[Category:Deceased Presidents of the United States]] and replace it by, say, [[Intersection:Deceased Presidents of the United States]], which would list all articles in [[Category:Deceased people]] and [[Category:Presidents of the United States]]. My extension alone doesn't make that possible, but it makes implementing such a feature considerably easier.
This discussion is far from over. The basic problems are _not_ solved.
Would you care to elaborate on what those unsolved problems are?
I thought we were 90% of the way there when you wrote this extension, having reasonably solved the efficiency (speed) issues with the fulltext and lucene based approaches, and the view of the atomic categories problem was that it would be solved by people, not tech. In other words, I thought we all assumed that once people were empowered with category intersections, they'd make categories that make use of them. If not, then that's a problem to solve, but not an obstacle to implementing category intersection. My input would be to implement intersections, see what happens, and look at other functionality for intersections v.2.
I'm sure this thread will die out soon. Half of the participants will again be soothed by the promise of some
easy
solution just barely beyond the horizon, while the half that realizes
that
said solution _cannot possibly work_ without a radical reform of the
category
system will again be too annoyed (I'm getting there already) to continue discussing.
It would be nice if you didn't judge people as naive rightaway.
Seconded.
But it sounds like maybe those of us who'd like to see this happen should discuss a UI (or several) for it. I was thinking the most intuitive interface was a sort of "browse" type function, where for any given group of categories (could just be one category), you have two result sets: related categories (other categories of pages in the starting category), and articles at the intersection of the group. The articles are what we generally think of, but the related categories gives us an intuitive way to navigate through category intersections.
The articles in the group of categories are the problem we've already solved (mostly): they are the result from the fulltext or lucene search. The related categories problem is harder, I think, as the most obvious way to get to that is to get all the categories belonging to those articles, and then collapse them and rank them. For large result sets, this can get time consuming again, and we would not want to (I think) build the related categories only with the first page of results. OTOH... if we took the first 100 results of a given category intersection, then queries the categorylinks table for all the categories belonging to those articles, and collapsed that... that would be a pretty good estimate at related categories. It wouldn't give all of them, but it would be a nice set of sample data.
What do you think?
Onto a soap box for a minute: the fact that this topic won't die, in 4 years, to me means that it's a really needed feature. Once implemented it will give people a great tool to more efficiently find information. Looking at things that are happening around the web with tags, Google adopting ideas from Wikia search, semantic web stuff, I'm thinking that we are really at the beginning of a movement to add structured metadata to information on the net. In concert with all the wonderful algorithms that try to guess what a given web page is about, we are doing things to explicitly state what a web page is about, providing users a much better chance at being able to find it. Developing category intersections for Wikipedia would be a milestone in that movement.
Aerik