On Sat, Feb 23, 2008 at 5:21 AM, Samuel Wantman <wantman(a)earthlink.net> wrote:
Since we've also been discussing
the problems of implementing "Category Intersection", an interim
solution could be repopulating parent categories and "hiding"
intersection categories. Fully populated parent categories are the norm
in some projects like German Wikipedia and they also appear sometimes in
English Wikipedia (eg. Category:Operas). I have a proposal posted
currently about fully populating "Index" categories at en:Wikipedia
talk:Categorization, and it would be much improved if the intersection
categories could be hidden. The primary reason we have been deleting
intersection categories is because they clutter articles. If they
didn't clutter articles, they wouldn't be a problem.
A very reasonable idea, that can be implemented Right Now with little
effort on the user level. Important or interesting intersections can
be manually populated, or populated in batches with bots. It's not as
good as a real solution, but it should work well enough.
Perhaps the non-hidden categories could be expanded
with a [+] the same
way subcategories are expanded. For example, if someone is listed under
"Methodist", clicking on the plus might add the hidden categories
"American methodist" or "Methodist presidents".
A very specific feature. I might even call it kind of weird in its
narrowness: it assumes a *very* particular use of hidden categories.
I think it's okay to have people just click on the general category,
and then navigate their way down to intersection categories if they're
so inclined.
If making the query were to create a hidden
category and automatically categorize all the articles that result from
the query, the next time the request is made it could just display the
results, just like any other category. There might be a timer that
resets (every week?) that would force another query to update the
category. This way each intersection query would happen fairly
infrequently -- as infrequently as need be to keep from overloading the
servers.
Well, this is just caching. We do that anyway (query cache, etc.).
It's still not really acceptable enough. Especially because your
proposed caching method would require not just a scan of a large chunk
of an index, but insertion of potentially thousands or tens of
thousands of rows.
There would need to be a naming convention for the
automatically
generated categories, perhaps using a double colon -- so the
intersection of Category:Mozart and Category:Operas would generate
Category:Mozart::Operas. . . .
By this point, the feature you describe would be more difficult to
implement than just implementing real and properly efficient category
intersection with Lucene or something. When you realize that to get
your hack working properly, you need to implement so many workarounds
than the real feature would be easier, it's time to discard the idea
of a hack.