Hidden Categories and Category Intersection - Wikitech-l

23 Feb 2008

Hidden categories should appear in the category namespace, but not 
elsewhere.

As for being   "Hidden" or "Admin",  I can envision uses for both.  
Admin categories could have a separate collapsible listing, while hidden 
categories might have some other uses.  Since we've also been discussing 
the problems of implementing "Category Intersection", an interim 
solution could be repopulating parent categories and "hiding" 
intersection categories.  Fully populated parent categories are the norm 
in some projects like German Wikipedia and they also appear sometimes in 
English Wikipedia (eg. Category:Operas).  I have a proposal posted 
currently about fully populating "Index" categories at en:Wikipedia 
talk:Categorization, and it would be much improved if the intersection 
categories could be hidden.  The primary reason we have been deleting 
intersection categories is because they clutter articles.  If they 
didn't clutter articles, they wouldn't be a problem.

Perhaps the non-hidden categories could be expanded with a [+] the same 
way subcategories are expanded.  For example, if someone is listed under 
"Methodist", clicking on the plus might add the hidden categories 
"American methodist" or "Methodist presidents".  This would require 
searching to see if any of the hidden categories are descendants of the 
clicked on category.  This pseudo categorization intersection system
 would also be an incentive to get ready for a real implementation.  For 
Category Intersection to work, hundreds of categories will need to be 
repopulated.

Along those lines, I'm wondering about yet another interim step toward 
full category intersection.  A while back, several of us editors on  
English Wikipedia worked on a design for an interface for implementing  
Category Intersection  (it is at  en:WP:CI).  We envision check boxes 
next to each category listing in an article, and then a button that 
queries the intersection.  If making the query were to create a hidden 
category and automatically categorize all the articles that result from 
the query, the next time the request is made it could just display the 
results, just like any other category.  There might be a timer that 
resets (every week?) that would force another query to update the 
category.  This way each intersection query would happen fairly 
infrequently -- as infrequently as need be to keep from overloading the 
servers.

There would need to be a naming convention for the automatically 
generated categories, perhaps using a double colon -- so the 
intersection of Category:Mozart and Category:Operas would generate 
Category:Mozart::Operas.  I don't think we'd want these auto-generated 
categories to  be orphan categories.  The category could be 
automatically put in a maintenance category, or better yet, a child 
category of each parent could be created to hold all automatically 
created categories.  If the category is called "Operas" this holding 
category could be called "Intersections with Operas" or "Operas
and..."  
If the query is worth keeping it could be recategorized by an editor 
(eg. Category:Operas by composer).   It would probably be useful to be 
able to see how often the query was requested.   If intersection 
categories get renamed, a category redirect should be able to get the 
user (and future queries) to the correct place.

If any of these intersection queries cause problems, an administrator 
could protect the category page.  The next time the query is requested, 
the blocked page would keep the query from being run.  The user would 
see the reason for the blocked query posted on the category page.  This 
would prevent two or more huge categories from being intersected (eg. 
Category:Living People intersected with Category:Films).  If the CPU 
time was analyzed for each query automatically, the blocks might be able 
to happen automatically.

-- Samuel Wantman
en:User:Sam