On Wed, 03 Dec 2008 17:05:39 +0100, Roan Kattouw roan.kattouw@home.nl wrote:
Daniel Schwen schreef:
So how does this take care of deep indexing non-atomic categories?
Err.. what? Please explain what you mean by that.
I think he means finding stuff that's already buried in sub-sub categories, when you query on a parent category. Like querying for and intersection of [[Category:Deceased people]] and [[Category:Presidents of the United States]] won't find the guys listed in [[Category:Deceased Presidents of the United States]] without re categorizing those entries.
=>How will this extension be even remotely useful for let's say commons?
Without addressing Commons in particular, having an efficient way to get pages in the intersection of multiple categories would allow wikis to delete a category such as [[Category:Deceased Presidents of the United States]] and replace it by, say, [[Intersection:Deceased Presidents of the United States]], which would list all articles in [[Category:Deceased people]] and [[Category:Presidents of the United States]]. My extension alone doesn't make that possible, but it makes implementing such a feature considerably easier.
This discussion is far from over. The basic problems are _not_ solved.
Would you care to elaborate on what those unsolved problems are?
I thought we were 90% of the way there when you wrote this extension, having reasonably solved the efficiency (speed) issues with the fulltext and lucene based approaches, and the view of the atomic categories problem was that it would be solved by people, not tech. In other words, I thought we all assumed that once people were empowered with category intersections, they'd make categories that make use of them. If not, then that's a problem to solve, but not an obstacle to implementing category intersection. My input would be to implement intersections, see what happens, and look at other functionality for intersections v.2.
I'm sure this thread will die out soon. Half of the participants will again be soothed by the promise of some
easy
solution just barely beyond the horizon, while the half that realizes
that
said solution _cannot possibly work_ without a radical reform of the
category
system will again be too annoyed (I'm getting there already) to continue discussing.
It would be nice if you didn't judge people as naive rightaway.
Seconded.
But it sounds like maybe those of us who'd like to see this happen should discuss a UI (or several) for it. I was thinking the most intuitive interface was a sort of "browse" type function, where for any given group of categories (could just be one category), you have two result sets: related categories (other categories of pages in the starting category), and articles at the intersection of the group. The articles are what we generally think of, but the related categories gives us an intuitive way to navigate through category intersections.
The articles in the group of categories are the problem we've already solved (mostly): they are the result from the fulltext or lucene search. The related categories problem is harder, I think, as the most obvious way to get to that is to get all the categories belonging to those articles, and then collapse them and rank them. For large result sets, this can get time consuming again, and we would not want to (I think) build the related categories only with the first page of results. OTOH... if we took the first 100 results of a given category intersection, then queries the categorylinks table for all the categories belonging to those articles, and collapsed that... that would be a pretty good estimate at related categories. It wouldn't give all of them, but it would be a nice set of sample data.
What do you think?
Onto a soap box for a minute: the fact that this topic won't die, in 4 years, to me means that it's a really needed feature. Once implemented it will give people a great tool to more efficiently find information. Looking at things that are happening around the web with tags, Google adopting ideas from Wikia search, semantic web stuff, I'm thinking that we are really at the beginning of a movement to add structured metadata to information on the net. In concert with all the wonderful algorithms that try to guess what a given web page is about, we are doing things to explicitly state what a web page is about, providing users a much better chance at being able to find it. Developing category intersections for Wikipedia would be a milestone in that movement.
Aerik
On Wed, Dec 3, 2008 at 12:37 PM, Aerik Sylvan aerik@thesylvans.com wrote: [snip]
But it sounds like maybe those of us who'd like to see this happen should discuss a UI (or several) for it. I was thinking the most intuitive interface was a sort of "browse" type function, where for any given group of categories (could just be one category), you have two result sets: related categories (other categories of pages in the starting category), and articles at the intersection of the group. The articles are what we generally think of, but the related categories gives us an intuitive way to navigate through category intersections.
The articles in the group of categories are the problem we've already solved (mostly): they are the result from the fulltext or lucene search. The related categories problem is harder,
[snip]
So an interface I had that was really pleasing was that I asked the database to find a random subset of the results, which it could do quickly, (or I used the whole results if the initial query contained them) and I found the set of categories which maximally bisected the result and presented the list with a set of +/- buttons.
I.e. you search for Animal and you'd get: Mammal[+/-] Reptile[+/-] Kittens[+/-] Taken with Canon Camera[+/-] Human[+/-]
based on the how close to 50% of the results have the suggested category.
It's not exactly a 'related category', but I thought it was very useful.
I also did a fuzzy text matching search one the category names using a trigram index, so it was always sure to suggest Category:Cats when you searched for Cat, or whatever. (I did this with an ajaxy-search-while you type, it was handy)
Gregory Maxwell wrote:
So an interface I had that was really pleasing was that I asked the database to find a random subset of the results, which it could do quickly, (or I used the whole results if the initial query contained them) and I found the set of categories which maximally bisected the result and presented the list with a set of +/- buttons.
I.e. you search for Animal and you'd get: Mammal[+/-] Reptile[+/-] Kittens[+/-] Taken with Canon Camera[+/-] Human[+/-]
based on the how close to 50% of the results have the suggested category.
It's not exactly a 'related category', but I thought it was very useful.
Wow! And this was at some point live, directly on the Commons category pages?!
Has the whole thing been scrapped since, or is there some way to still try it out, e.g. by installing some custom JavaScript?
Aerik Sylvan wrote:
But it sounds like maybe those of us who'd like to see this happen should discuss a UI (or several) for it. I was thinking the most intuitive interface was a sort of "browse" type function, where for any given group of categories (could just be one category), you have two result sets: related categories (other categories of pages in the starting category), and articles at the intersection of the group. The articles are what we generally think of, but the related categories gives us an intuitive way to navigate through category intersections.
Another useful feature, which would probably make the system much more likely to be adopted in practice, would be an easy interface to get from articles (or images, etc.) to various relevant intersections.
For example, if I'm looking at an image which is in the categories "Maple", "Leaves" and "Green", I should be able to easily get to pages where I can browse other pictures of either maple leaves or green leaves, not to mention other pictures of green maple leaves.
A _minimal_ solution would be simply to present a link to the intersection of _all_ the categories (which might well have only one page on it) and let the user broaden the intersection from there. Even better if this can be done in an AJAXish way directly on the image page itself, though obviously some fallback interface would still be needed for users without JavaScript.
On Thu, Dec 4, 2008 at 7:39 AM, Ilmari Karonen nospam@vyznev.net wrote:
A _minimal_ solution would be simply to present a link to the intersection of _all_ the categories (which might well have only one page on it) and let the user broaden the intersection from there. Even better if this can be done in an AJAXish way directly on the image page itself, though obviously some fallback interface would still be needed for users without JavaScript.
As for the JavaScript, add importScript('User:Magnus_Manske/category_intersection.js'); to your monobook.js
Currently, this links to my tool on toolserver. It could support other tools as well. If you like it, someone make a gagdet from it ;-)
Cheers, Magnus
On Wed, Dec 3, 2008 at 12:37 PM, Aerik Sylvan aerik@thesylvans.com wrote:
But it sounds like maybe those of us who'd like to see this happen should discuss a UI (or several) for it.
No, someone should *write* a UI. It should be written and added to the software. If it's subpar, fine, it can be improved later. Better that a mediocre UI should be written and committed now than that yet another category intersection discussion should die away as they always do.
On Thu, Dec 4, 2008 at 11:43 AM, Aerik Sylvan aerik@thesylvans.com wrote:
Are you pinging a live database, or a copy made from a dump? (please excuse my ignorance if this is common knowledge)
It's a toolserver tool, so he's most likely using the toolserver database. This is a read-only copy of the real database, replicated in real time and used for toolserver tools only (so if someone runs a query that causes it to lag by two hours, it won't affect the real site).
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Aryeh Gregor wrote:
On Wed, Dec 3, 2008 at 12:37 PM, Aerik Sylvan aerik@thesylvans.com wrote:
But it sounds like maybe those of us who'd like to see this happen should discuss a UI (or several) for it.
No, someone should *write* a UI. It should be written and added to the software. If it's subpar, fine, it can be improved later. Better that a mediocre UI should be written and committed now than that yet another category intersection discussion should die away as they always do.
I'm Brion Vibber, and I approve this message.
(Note that we can be open to alternative, more efficient backends such as the Postgres system Greg's experimented with, or a Lucene backend, or whatever, but to be something people can actively develop and test with we need to at least have _something_ that works on MySQL, in the core software, available by default.)
- -- brion
wikitech-l@lists.wikimedia.org