[Commons-l] "Did you know?" ... The family tree of [[Category:Copyright statuses]] and our broken category system.

Gregory Maxwell gmaxwell at gmail.com
Mon Jan 29 21:19:49 UTC 2007


Did you know that "Bundt pans" is a copyright status?

[[:category:Copyright_statuses]]->[[:category:Free_licenses]]->[[:category:GNU_licenses]]->[[:category:GFDL]]->[[:category:Periodic_table]]->[[:category:Chemical_elements_by_periodic_table_group]]->[[:category:Periodic_table_group_16]]->[[:category:Oxygen]]->[[:category:Oxygen_compounds]]->[[:category:Organo-oxygen_compounds]]->[[:category:Carbohydrates]]->[[:category:Sugars]]->[[:category:Sweet_food]]->[[:category:Cakes_and_cookies]]->[[:category:Cakes]]->[[:category:Bundt_cake]]->[[:category:Bundt_pans]]

It was news to me.

I guess I always knew that being a Harry Potter character would impact
something's copyright status, but without commons I never would have
known that this connection involved the humble hydrogen atom:

[[:category:Copyright_statuses]]->[[:category:Free_licenses]]->[[:category:GNU_licenses]]->[[:category:GFDL]]->[[:category:Periodic_table]]->[[:category:Chemical_elements_by_periodic_table_row]]->[[:category:Periodic_table_row_1]]->[[:category:Hydrogen]]->[[:category:Hydrogen_compounds]]->[[:category:Water]]->[[:category:Bodies_of_water]]->[[:category:Islands]]->[[:category:Islands_of_Europe]]->[[:category:Ireland_(island)]]->[[:category:Ireland]]->[[:category:Culture_of_Ireland]]->[[:category:Languages_of_Ireland]]->[[:category:English_language]]->[[:category:Literature_of_England]]->[[:category:Writers_from_England]]->[[:category:J._K._Rowling]]->[[:category:Harry_Potter]]->[[:category:Harry_Potter_Characters]]

(nevermind the fact that anything in the child cat is almost certainly
an unlicensed derivative work...)

I don't think most people realize it... But our category system is
terribly broken today.  I have provided just a few example which are
fairly easy to fix, but everywhere you look in the category system you
can find problems like this.

Sometime in the not too distant future we will gain a search system
which allows us to perform category intersections. People will be able
to search for images which are in combinations of categories. It could
be very powerful...

But it will not be very powerful, because categories have been broken
into zillions of tiny sub-categories.

Instead of Category:Men or Category:Human_males we use
Category:Human_male_who_lived_in_the_1960s_and_liked_to_wear_funny_hats.

You might think that there would be a Category:Men which would be a
parent of this category, and you would be right.. But it is not useful
because even if we ignore the large computational burden of finding
all the children of a category, we're still left with the sad fact
that due to semantic drift, the supercategory would contain many
things we do not want, just like my examples at the top. Many higher
level categories often a substantial subset of all the categories on
commons. (Copyright statuses is a 'parent' of about 13% of all the
commons cats).

Direct navigation is nice, but it doesn't scale to millions of images.
For people to be able to find images on commons they will increasingly
depend on search.

We need to radically change how we use categories if we are going to
make them 'machine readable' in a manner which enables search.

To facilitate this change, we need to stop breaking categories into
tiny subcategories. Instead, we should use broad conceptual categories
which will work well when intersected with other categories. We should
also include all categories that apply. For example, a antique car
might be placed in [[Category:Transportation devices]],
[[Category:Cars]], [[Category:Ford motor products]],
[[Category:Manmade]], and [[Category:Antiques]] rather than in
[[Category:Antique ford motor products]].

This shift will make categories less useful as a direct navigational
tool. However, many categories are already poor devices for direct
navigation due to an inability to place their content in order which
is sane to humans, and an inability to include explanatory text
inline.

For human navigation we have gallery pages, which are more powerful
for that application.

Categories would still keep their parent child relationship, but we
would acknowledge that fact that such categorization is useful for
humans to navigate to find categories... and that it's not a useful
too to have the computers traverse.

Unless I find huge opposition here, I'm going to begin changing the
commons instruction pages to reflect this use of categories rather
than our historic use.



More information about the Commons-l mailing list