[Wikipedia-l] Categories: An implementation

Ray Saintonge saintonge at telus.net
Tue Jun 17 04:43:40 UTC 2003


Daniel Mayer wrote:

>>Suggestion: We could easily import the existing
>>lists into my implementation.
>>
>>Magnus
>>
>Only the major ones please. Many of the lists in WikiEn are really obscure and 
>special purpose. 
>
Agreed.  The obscure ones could be fit in at a later stage when we have 
a better grasp of how things are going.

>IMO there should be very few categories (at least at first); One for each 
>major category that is already on, for example, the en.wiki's main page 
>(science, mathematics, physics, Linguistics, etc - the whole lot) along with 
>type categories like "biography", "city/town/village", "country/nation", 
>"subnational entitiy", "stub" etc. 
>
That's consistent with my approach, but not necessarily with the 
categories mentioned.  Top level categories need to be comprehensive to 
insure that '''every''' article can be placed in at least one of these. 
 If these top level categories are given single letter codes that means 
a maximum of 26   Some of the more obvious 2nd level codes could be 
implemented fairly early on.

>Keep it simple please (at least at first). If /that/ works then we can slowly 
>expand the system in a measured and thoughtful way - voting for new 
>categories may be a good solution here in order to prevent a rapid expansion 
>of categories that would render the whole system useless. 
>
Good!  I have a lot of these to propose, but always following principles 
of comprehensive and inclusion.  This would give a place for all 
subcategories, and would ensure that even articles which do not yet have 
a subcategory can be found in the broader category.  I would likely 
propose categories in related batches.

>I'm very wary of allowing just anybody to create any category since from a 
>database management perspective it would be easy for people to create many 
>different variants for the same intended category ([[category:biology]], 
>[[category:life sciences]], [[category:life science]], [[category:living 
>things]], [[category:the study of living things]] etc). And having too many 
>categories will be very difficult for people to remember and very unwieldy 
>for people to choose from in a category search. Let the lists stay for the 
>obscure stuff.
>
These are real risks.  There may be a way of dealing with this within 
the distinction of integrated and non-integrated categories.  The latter 
would be far more flexible in what they would allow,  but they could be 
more easily culled as their uselessness and duplication became apparent.

>We also need to devise guidelines for when to assign labels. Again from a 
>database design perspective: If this is not done consistently then the output 
>or any sort will be suspect and perhaps complete garbage. As KQ pointed out 
>we could assign the category 'crime' to [[Jesus Christ]] along with the 
>category of 'biography' so that JC shows up in a list of criminals. But is JC 
>most famous for being a crime figure? No! He is famous for being a religious 
>figure so the category 'religion' would be there but not 'crime.' Same for 
>[[Bill Clinton]] ([[category:biography]] and [[category:politics]] apply 
>there).
>
I don't think we'll ever ever be able to prevent people from assigning 
goofy categories to things, any more than we can completely prevent 
goofy edits and vandalism.  Doing this would require becoming a closed 
projec, and that would defeat Wikipedia's prime directive.

>With categories in place then one day we could have our special pages use 
>these tags for things like Recent Changes. RC in the English Wikipedia often 
>has 10 or more edits a minute now! It would be nice to have the option to 
>only see articles that may interest the user (well for me that would be 
>everything but one of my biology professors may want a recent changes that 
>only displays articles relating to mathematics, the sciences and economics).  
>
Certainly, but this addresses how it might be used rather than how it 
might be set up.

>This will become absolutely necessary if we continue our quasi-exponential 
>growth pattern for WikiEn; one day in not too many years (fewer than most 
>people probably think) there will be an average of hundreds of edits a 
>minute! We need to sort that out somehow or no human will bother looking at 
>RC anymore and will only revert vandalism to articles on their watchlists 
>(even with sorting, bots would probably have to help with automatic 'probable 
>vandalism/copyvio detection' - outputting results on a separate RC for that 
>type of stuff - maybe even by assigning those categories to articles...).
>
Only 867,440 articles to Mega-Wiki! :-)   Watching everything is already 
impossible for a single person.

> Alas it looks like Wikipedia is growing up. This time next year WikiEn 
>Wikipedians may start to specialize their edits based on which categories 
>they choose RC to output and several other language Wikis may follow the year 
>after. Our small town is beginning to show signs of emerging cityhood (or has 
>WikiEn at least already become a small city?). 
>
>We are going where no wiki has gone before. There is both danger and 
>opportunity in that. 
>
Exactly






More information about the Wikipedia-l mailing list