Did you know that "Bundt pans" is a copyright status?
[[:category:Copyright_statuses]]->[[:category:Free_licenses]]->[[:category:GNU_licenses]]->[[:category:GFDL]]->[[:category:Periodic_table]]->[[:category:Chemical_elements_by_periodic_table_group]]->[[:category:Periodic_table_group_16]]->[[:category:Oxygen]]->[[:category:Oxygen_compounds]]->[[:category:Organo-oxygen_compounds]]->[[:category:Carbohydrates]]->[[:category:Sugars]]->[[:category:Sweet_food]]->[[:category:Cakes_and_cookies]]->[[:category:Cakes]]->[[:category:Bundt_cake]]->[[:category:Bundt_pans]]
It was news to me.
I guess I always knew that being a Harry Potter character would impact something's copyright status, but without commons I never would have known that this connection involved the humble hydrogen atom:
[[:category:Copyright_statuses]]->[[:category:Free_licenses]]->[[:category:GNU_licenses]]->[[:category:GFDL]]->[[:category:Periodic_table]]->[[:category:Chemical_elements_by_periodic_table_row]]->[[:category:Periodic_table_row_1]]->[[:category:Hydrogen]]->[[:category:Hydrogen_compounds]]->[[:category:Water]]->[[:category:Bodies_of_water]]->[[:category:Islands]]->[[:category:Islands_of_Europe]]->[[:category:Ireland_(island)]]->[[:category:Ireland]]->[[:category:Culture_of_Ireland]]->[[:category:Languages_of_Ireland]]->[[:category:English_language]]->[[:category:Literature_of_England]]->[[:category:Writers_from_England]]->[[:category:J._K._Rowling]]->[[:category:Harry_Potter]]->[[:category:Harry_Potter_Characters]]
(nevermind the fact that anything in the child cat is almost certainly an unlicensed derivative work...)
I don't think most people realize it... But our category system is terribly broken today. I have provided just a few example which are fairly easy to fix, but everywhere you look in the category system you can find problems like this.
Sometime in the not too distant future we will gain a search system which allows us to perform category intersections. People will be able to search for images which are in combinations of categories. It could be very powerful...
But it will not be very powerful, because categories have been broken into zillions of tiny sub-categories.
Instead of Category:Men or Category:Human_males we use Category:Human_male_who_lived_in_the_1960s_and_liked_to_wear_funny_hats.
You might think that there would be a Category:Men which would be a parent of this category, and you would be right.. But it is not useful because even if we ignore the large computational burden of finding all the children of a category, we're still left with the sad fact that due to semantic drift, the supercategory would contain many things we do not want, just like my examples at the top. Many higher level categories often a substantial subset of all the categories on commons. (Copyright statuses is a 'parent' of about 13% of all the commons cats).
Direct navigation is nice, but it doesn't scale to millions of images. For people to be able to find images on commons they will increasingly depend on search.
We need to radically change how we use categories if we are going to make them 'machine readable' in a manner which enables search.
To facilitate this change, we need to stop breaking categories into tiny subcategories. Instead, we should use broad conceptual categories which will work well when intersected with other categories. We should also include all categories that apply. For example, a antique car might be placed in [[Category:Transportation devices]], [[Category:Cars]], [[Category:Ford motor products]], [[Category:Manmade]], and [[Category:Antiques]] rather than in [[Category:Antique ford motor products]].
This shift will make categories less useful as a direct navigational tool. However, many categories are already poor devices for direct navigation due to an inability to place their content in order which is sane to humans, and an inability to include explanatory text inline.
For human navigation we have gallery pages, which are more powerful for that application.
Categories would still keep their parent child relationship, but we would acknowledge that fact that such categorization is useful for humans to navigate to find categories... and that it's not a useful too to have the computers traverse.
Unless I find huge opposition here, I'm going to begin changing the commons instruction pages to reflect this use of categories rather than our historic use.
Gregory Maxwell wrote:
Unless I find huge opposition here, I'm going to begin changing the commons instruction pages to reflect this use of categories rather than our historic use.
Hee hee, I'm thinking I should take a couple weeks of wikicommonsbreak right about now! But seriously, what you're talking about will stir up a large percentage of the commons people who obsess over categories. I'm not even sure having category intersections in the software will satisfy some of these folks - they go on about server loads and the like as well. I'd suggest just bringing up the issue and watching the fur fly, before deciding to invest time in rewriting anything...
Stan
On 1/29/07, Stan Shebs stanshebs@earthlink.net wrote:
Hee hee, I'm thinking I should take a couple weeks of wikicommonsbreak right about now! But seriously, what you're talking about will stir up a large percentage of the commons people who obsess over categories. I'm not even sure having category intersections in the software will satisfy some of these folks - they go on about server loads and the like as well. I'd suggest just bringing up the issue and watching the fur fly, before deciding to invest time in rewriting anything...
:) But the fun is just beginning! :)
As far as load goes, it's only a question of implementation. A few months back I posted some example performance data for intersections using inverted indexing using the actual enwiki category data: http://lists.wikimedia.org/mailman/htdig/wikitech-l/2006-September/026715.ht...
Even the most evil cases (intersecting two huge categories) ran very quickly. (For example the intersection of GFDL images and living_persons each which has over 100,000 members took 25ms).
The technical obstacle of getting the software implemented is far smaller than the data quality issues we have.
Gregory Maxwell wrote:
As far as load goes, it's only a question of implementation. A few months back I posted some example performance data for intersections using inverted indexing using the actual enwiki category data: http://lists.wikimedia.org/mailman/htdig/wikitech-l/2006-September/026715.ht...
Even the most evil cases (intersecting two huge categories) ran very quickly. (For example the intersection of GFDL images and living_persons each which has over 100,000 members took 25ms).
I'm with you, just reporting past flamage on the subject. I wouldn't even count on empirical data having much effect... 1/2 :-)
Stan
On 29/01/07, Stan Shebs stanshebs@earthlink.net wrote:
Gregory Maxwell wrote:
Unless I find huge opposition here, I'm going to begin changing the commons instruction pages to reflect this use of categories rather than our historic use.
some of these folks - they go on about server loads and the like as well. I'd suggest just bringing up the issue and watching the fur fly, before deciding to invest time in rewriting anything...
They're going to tell a MediaWiki dev what his ideas will do to server load. Perhaps this could be the time to invoke the "Brion says STFU about server load unless it's your job to care" policy^Wprocess^Wguideline^Wlaw of physics.
- d.
On 1/29/07, Gregory Maxwell gmaxwell@gmail.com wrote:
Did you know that "Bundt pans" is a copyright status?
[[:category:Copyright_statuses]]->[[:category:Free_licenses]]->[[:category:GNU_licenses]]->[[:category:GFDL]]->[[:category:Periodic_table]]->[[:category:Chemical_elements_by_periodic_table_group]]->[[:category:Periodic_table_group_16]]->[[:category:Oxygen]]->[[:category:Oxygen_compounds]]->[[:category:Organo-oxygen_compounds]]->[[:category:Carbohydrates]]->[[:category:Sugars]]->[[:category:Sweet_food]]->[[:category:Cakes_and_cookies]]->[[:category:Cakes]]->[[:category:Bundt_cake]]->[[:category:Bundt_pans]]
It was news to me.
I guess I always knew that being a Harry Potter character would impact something's copyright status, but without commons I never would have known that this connection involved the humble hydrogen atom:
[[:category:Copyright_statuses]]->[[:category:Free_licenses]]->[[:category:GNU_licenses]]->[[:category:GFDL]]->[[:category:Periodic_table]]->[[:category:Chemical_elements_by_periodic_table_row]]->[[:category:Periodic_table_row_1]]->[[:category:Hydrogen]]->[[:category:Hydrogen_compounds]]->[[:category:Water]]->[[:category:Bodies_of_water]]->[[:category:Islands]]->[[:category:Islands_of_Europe]]->[[:category:Ireland_(island)]]->[[:category:Ireland]]->[[:category:Culture_of_Ireland]]->[[:category:Languages_of_Ireland]]->[[:category:English_language]]->[[:category:Literature_of_England]]->[[:category:Writers_from_England]]->[[:category:J._K._Rowling]]->[[:category:Harry_Potter]]->[[:category:Harry_Potter_Characters]]
(nevermind the fact that anything in the child cat is almost certainly an unlicensed derivative work...)
[snip]
Unless I find huge opposition here, I'm going to begin changing the commons instruction pages to reflect this use of categories rather than our historic use.
Go go go!
Delphine (who has never understood categories on any Wikimedia project anyway)
On 30/01/07, Gregory Maxwell gmaxwell@gmail.com wrote:
Unless I find huge opposition here, I'm going to begin changing the commons instruction pages to reflect this use of categories rather than our historic use.
Hold up just a tiny bit. You are not paying attention to the reason that narrow categories developed in the first place. It's because the support for working with categories is incredibly poor. Categories with over 200 items find half their subcats go missing (from the first page) because subcat paging is not separate to article(/media) paging.(bug 1211)
Narrow categories developed because there was no means to perform category intersections. They are manual category intersections. I wouldn't suggest changing that at least until this category intersection thing is implemented and we can see its utility.
Secondly, is this category intersection thing really not going to include an option to automatically search subcats as well? (a la http://tools.wikimedia.de/~daniel/WikiSense/CategoryIntersect.php?&wikil... see "depth=3") That seems like a pretty vital basic functionality that will be needed, even if we switch to using broad cats.
Thirdly, your proposal is going to massively increase the average number of categories for each file. (Although they will typically have shorter names.) How far "up" the tree do you propose it will be reasonable to go? For a portrait of a woman, we should put [[category:women]] [[category:homo]] [[category:Hominidae]] [[category:primates]] [[category:Mammalia]] [[Category:Vertebrata]] [[Category:Chordata]] [[Category:Animalia]] [[Category:Eukaryota]] ? They're all true, are they not?
Instead of doing that, I think it would be more sensible to continue the tradition of only putting the most specific cat that applies, and adjusting the software to have an option to display subcategory items into the current category (like "flatten" I think) when desired. (bug 2725)
regards Brianna user:pfctdayelise
[[:category:Copyright_statuses]] ->[[:category:Free_licenses]] ->[[:category:GNU_licenses]] ->[[:category:GFDL]] ->[[:category:Periodic_table]] ->[[:category:Chemical_elements_by_periodic_table_row]] ->[[:category:Periodic_table_row_1]] ->[[:category:Hydrogen]] ->[[:category:Hydrogen_compounds]] ->[[:category:Water]] ->[[:category:Bodies_of_water]] ->[[:category:Islands]] ->[[:category:Islands_of_Europe]] ->[[:category:Ireland_(island)]] ->[[:category:Ireland]] ->[[:category:Culture_of_Ireland]] ->[[:category:Languages_of_Ireland]] ->[[:category:English_language]] ->[[:category:Literature_of_England]] ->[[:category:Writers_from_England]] ->[[:category:J._K._Rowling]] ->[[:category:Harry_Potter]] ->[[:category:Harry_Potter_Characters]]
I don't think most people realize it... But our category system is terribly broken today. I have provided just a few example which are fairly easy to fix, but everywhere you look in the category system you can find problems like this.
Direct navigation is nice, but it doesn't scale to millions of images. For people to be able to find images on commons they will increasingly depend on search.
What is the problem? If people look for Harry Potter images characters, they don's start from [[:category:Free_licenses]]. They go straigh to [[:category:Harry_Potter]]. 2 levels deep!
Really, our system is far more useful than what Gregory suggests. I know this because every week I transfer tens of images from fi.wikipedia to Commons and I will have to find categories for things whose name I don't know in English (or even in Finnish).
In fact, finding a category is often hardest when the categories aren't specific enough. And when it comes to finding an image, for example, for an article about art nouveau/jugend windows, [[:Category:Windows]] is just useless. We do need categories like [[:Category:Jugend windows in Finland]].
samuli@samulilintula.net wrote:
In fact, finding a category is often hardest when the categories aren't specific enough. And when it comes to finding an image, for example, for an article about art nouveau/jugend windows, [[:Category:Windows]] is just useless. We do need categories like [[:Category:Jugend windows in Finland]].
What about the category intersection tool mentioned above?
Regards,
Flo
On 30/01/07, Florian Straub Flominator@gmx.net wrote:
What about the category intersection tool mentioned above?
Yes. The point being to make "categories" function more like "tags", for added searchability. (And added Web 2.0 'l33tn3ss.)
Ya know what Commons needs? A dev with access to large image libraries like Getty, etc. If we want to get the best press in the world, make an interface so simple even journalists can use it.
- d.
"David Gerard" dgerard@gmail.com wrote:
Yes. The point being to make "categories" function more like "tags", for added searchability. (And added Web 2.0 'l33tn3ss.)
One idea for a consensus: Categories are no longer allowed to contain whitespaces or underscores :)
Regards,
Flo
On Tue, 30 Jan 2007, Florian Straub wrote:
One idea for a consensus: Categories are no longer allowed to contain whitespaces or underscores :)
That simply wont work - most of the categories I deal with are related to railways, where many of them need multiple words because they are proper names - e.g. [[category:First Great Western]], [[category:London Underground]], [[category:British Rail Class 153]]
But what about when there's article about 'Jugend windows in Finland' in some wiki, should then the gallery page link to commons made also using category intersection tool?
-raul-
2007/1/30, Florian Straub Flominator@gmx.net:
samuli@samulilintula.net wrote:
In fact, finding a category is often hardest when the categories aren't specific enough. And when it comes to finding an image, for example, for an article about art nouveau/jugend windows, [[:Category:Windows]] is
just
useless. We do need categories like [[:Category:Jugend windows in Finland]].
What about the category intersection tool mentioned above?
Regards,
Flo
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
Commons-l mailing list Commons-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/commons-l
"Raul Kern" raunator@gmail.com wrote:
2007/1/30, Florian Straub Flominator@gmx.net:
What about the category intersection tool mentioned above?
But what about when there's article about 'Jugend windows in Finland' in some wiki, should then the gallery page link to commons made also using category intersection tool?
It should link to an article that is located in the categories Jugend, windows and Finland.
Regards,
Flo
samuli@samulilintula.net wrote:
In fact, finding a category is often hardest when the categories aren't specific enough. And when it comes to finding an image, for example, for an article about art nouveau/jugend windows, [[:Category:Windows]] is just useless. We do need categories like [[:Category:Jugend windows in Finland]].
What about the category intersection tool mentioned above?
Does it exist? Or is it just a theoretical concept?
Yes, it would work fine if all our users were perfect. Let's continue with the same example and imagine what Catgory:Finland would become like. 10.000 images, maybe? Then we have this dufus (see http://commons.wikimedia.org/wiki/Commons:Welcome_log) who uploads an image with only [[Category:Finland]]. That image will be lost forever. No one will ever find it and no one will ever be able to use.
Thanks to what we have now, that wasn't the fate of these images: http://commons.wikimedia.org/w/index.php?title=Special:Contributions&off...
samuli@samulilintula.net wrote:
In fact, finding a category is often hardest when the categories aren't specific enough. And when it comes to finding an image, for example,
for
an article about art nouveau/jugend windows, [[:Category:Windows]] is just useless. We do need categories like [[:Category:Jugend windows in Finland]].
What about the category intersection tool mentioned above?
Does it exist? Or is it just a theoretical concept?
http://meta.wikimedia.org/wiki/CatScan
Yes, it would work fine if all our users were perfect. Let's continue with the same example and imagine what Catgory:Finland would become like. 10.000 images, maybe? Then we have this dufus (see http://commons.wikimedia.org/wiki/Commons:Welcome_log) who uploads an image with only [[Category:Finland]]. That image will be lost forever. No one will ever find it and no one will ever be able to use.
One could create a query or a special page that lists images with few categories. Maybe we could even sort them by age ...
Regards,
Flo --- User:Flominator
Yes, it would work fine if all our users were perfect. Let's continue with the same example and imagine what Catgory:Finland would become like. 10.000 images, maybe? Then we have this dufus (see http://commons.wikimedia.org/wiki/Commons:Welcome_log) who uploads an image with only [[Category:Finland]]. That image will be lost forever. No one will ever find it and no one will ever be able to use.
One could create a query or a special page that lists images with few categories. Maybe we could even sort them by age ...
We could. The problem is that it would make a common user more dependent on the work that admins do. And admins don't need any more jobs - we can't handle what we have now. But that is not my main concern. Maybe, eventually, we will cope better with the workload.
My main concern is now, if we have a problem that needs fixing? Or if we are inventing one because we have a technical way of solving it? (With an external tool whose interface was so difficult it took me ten minutes to figure it out.)
Small correction -
On 30/01/07, Florian Straub Flominator@gmx.net wrote:
samuli@samulilintula.net wrote:
What about the category intersection tool mentioned above?
Does it exist? Or is it just a theoretical concept?
I'm pretty sure Greg has been referring to a new category intersection tool which has been under discussion in wikitech-l recently, which I'm pretty sure is planned to be a native part of MediaWiki (like a Special page) and... I'm not sure if it will even have all the functionality of Duesentrieb's existing CatScan, so I'll let Greg finish talking it up. :)
cheers Brianna