New subject: The never-dying topic: category intersection

4 Dec 2008


      On Wed, 03 Dec 2008 17:05:39 +0100,  Roan Kattouw roan.kattouw@home.nl
wrote:
...
Daniel Schwen schreef:
...
So how does this take care of deep indexing non-atomic categories?
Err.. what? Please explain what you mean by that.
I think he means finding stuff that's already buried in sub-sub categories,
when you query on a parent category.  Like querying for and intersection
of [[Category:Deceased people]] and [[Category:Presidents of the
United States]] won't find the guys listed in  [[Category:Deceased
Presidents of the United States]] without re categorizing those entries.
...
...
=>How will this extension be even remotely useful for let's say commons?
Without addressing Commons in particular, having an efficient way to get
pages in the intersection of multiple categories would allow wikis to
delete a category such as [[Category:Deceased Presidents of the United
States]] and replace it by, say, [[Intersection:Deceased Presidents of
the United States]], which would list all articles in
[[Category:Deceased people]] and [[Category:Presidents of the United
States]]. My extension alone doesn't make that possible, but it makes
implementing such a feature considerably easier.
...
This discussion is far from over. The basic problems are _not_ solved.
Would you care to elaborate on what those unsolved problems are?
I thought we were 90% of the way there when you wrote this extension, having
reasonably solved the efficiency (speed) issues with the fulltext and lucene
based approaches, and the view of the atomic categories problem was that it
would be solved by people, not tech.  In other words, I thought we all
assumed that once people were empowered with category intersections, they'd
make categories that make use of them. If not, then that's a problem to
solve, but not an obstacle to implementing category intersection.  My input
would be to implement intersections, see what happens, and look at other
functionality for intersections v.2.
...
...
I'm sure this thread will die out soon.
Half of the participants will again be soothed by the promise of some
easy
...
solution just barely beyond the horizon, while the half that realizes
that
...
said solution _cannot possibly work_ without a radical reform of the
category
...
system will again be too annoyed (I'm getting there already) to continue
discussing.
It would be nice if you didn't judge people as naive rightaway.
Seconded.
But it sounds like maybe those of us who'd like to see this happen should
discuss a UI (or several) for it.  I was thinking the most intuitive
interface was a sort of "browse" type function, where for any given  group
of categories (could just be one category), you have two result sets:
 related categories (other categories of pages in the starting category),
and articles at the intersection of the group.  The articles are what we
generally think of, but the related categories gives us an intuitive way to
navigate through category intersections.
The articles in the group of categories are the problem we've already solved
(mostly): they are the result from the fulltext or lucene search.  The
related categories problem is harder, I think, as the most obvious way to
get to that is to get all the categories belonging to those articles, and
then collapse them and rank them.  For large result sets, this can get time
consuming again, and we would not want to (I think) build the related
categories only with the first page of results.  OTOH... if we took the
first 100 results of a given category intersection, then queries the
categorylinks table for all the categories belonging to those articles, and
collapsed that... that would be a pretty good estimate at related
categories.  It wouldn't give all of them, but it would be a nice set of
sample data.
What do you think?
Onto a soap box for a minute:  the fact that this topic won't die, in 4
years, to me means that it's a really needed feature.  Once implemented it
will give people a great tool to more efficiently find information.  Looking
at things that are happening around the web with tags, Google adopting ideas
from Wikia search, semantic web stuff, I'm thinking that we are really at
the beginning of a movement to add structured metadata to information on the
net.  In concert with all the wonderful algorithms that try to guess what a
given web page is about, we are doing things to explicitly state what a web
page is about, providing users a much better chance at being able to find
it.  Developing category intersections for Wikipedia would be a milestone in
that movement.
Aerik
-- 
http://eventfeed.org - An Initiative Promoting Syndication of Events
http://www.wikidweb.com - the Wiki Directory of the Web
http://tagthis.info - Hosted Tagging for your website!

Re: [Wikitech-l] The never-dying topic: category intersection