Fwd: "Did you know?" ... The family tree of [[Category:Copyright statuses]] and our broken category system.

List overview All Threads
Download

newer

older

"Did you know?" ... The family...

Killing the main namespace?

Brianna Laugher

30 Jan 2007 30 Jan '07

6:05 a.m.

On 30/01/07, Gregory Maxwell gmaxwell@gmail.com wrote:

...

On 1/29/07, Brianna Laugher brianna.laugher@gmail.com wrote:

...
Instead of doing that, I think it would be more sensible to continue the tradition of only putting the most specific cat that applies, and adjusting the software to have an option to display subcategory items into the current category (like "flatten" I think) when desired. (bug 2725)

I thought I provided some pretty clear examples of why flattening is no good. Is there a reason you ignored that point? :)

It works poorly if you expand all the way, and the higher up in the tree you start, the worse it works. In my experience using Duesentrieb's tool, it works quite well when you specify a low depth (depth=1,2,3). Often 1 is appropriate.

As is obvious to anyone who works with categories on a regular basis, several types of relationships are encoded in category relations (3 examples: is type of, is component of, is related to).

Simply using broad categories instead of narrow ones, as you suggest, will not stop unexpected results because not every category link is a "is type of" which is what is needed for it to work.

some examples. [[category:Hominidae]] is type of [[category:primates]] (I think all TOL stuff would be like this) [[category:wheels]] is component of [[category:automobiles]] [[category:Culture, People, Geography, States, etc of Country X]] is related to [[Category:Country X]].

I don't think you are suggesting we should stop including links like these are you?

insert blah blah Semantic MediaWiki blah blah... until there is something extra available to us to distinguish between itypeof and isrelatedto category links, and the rest of them, the tree will ALWAYS be "broken". That doesn't mean it's not useful in its current status though. Ways to improve it are always welcome. But I am not certain this will be one of them.

cheers Brianna

Show replies by date

Gregory Maxwell

30 Jan 30 Jan

7:33 a.m.

On 1/30/07, Brianna Laugher brianna.laugher@gmail.com wrote:

...

It works poorly if you expand all the way, and the higher up in the tree you start, the worse it works. In my experience using Duesentrieb's tool, it works quite well when you specify a low depth (depth=1,2,3). Often 1 is appropriate.

*sigh*.

In my example, cutting at depth 3 would prevent you from finding many of the copyright tags for example, [[:category:Copyright_statuses]]->[[:category:Public_domain]]->[[:category:PD_US_Government]]->[[:category:PD_US_Military]]->[[:category:PD_US_Navy]]->[[:category:PD_US_Navy_Historical_Center]]

The policy of only placing the most specific categories combined with a constant pressure to shrink categories ensures that objects are placed at maximal depth.

If I fix bug1211 on commons will you withdraw your objection to large categories? Changing the code to make two queries probably wouldn't be too stab worthy.

In any case, even marked up relationships don't solve the drift problem.. because you can still get it with pure subset operations. Not to mention that multiple edge types makes writing queries even more mind bending.

Frankly, I think it's really offensive that we'll waste are time talking about army waving dreams of semantic mediawiki with its academic appeal, when we can't even manage to provide the basic service our users require.

Go over to Getty images (http://creative.gettyimages.com/source/home/home.aspx) and try out their search and see just how much we suck. Do a 'search all creative' and type in 'black child eating icecream'.

Their system is quite simple but very powerful. They have many tags, from very broad to very specific, and images are marked with all that apply (sometimes many dozens). A simple tag suggestion system makes it easy to find the tags that are in use, and clear up ambiguities (do you want black the color or black the race?). You then query them with a simple and quick intersection tool. You can drill down or adjust your search string, but it's all very simple quick and easy. There are no complex query languages, no snazzy semantic markup, no funky idea hierarchies. It just WORKS. And it works for many tens of thousands of people every day.

With the man power we can put behind marking up our content there is no reason we couldn't be just as good in this regard as the commercial stock photo houses. But we're not. Commons stinks in comparison and if we continue to put off simple and straight-forward measures which will provide the basic features that people need in favor of using commons as a science project and forever waiting for some ill-defined great academic pie in the sky that may never come true... we will just continue to suck.

Oh well, at least we're Free. :)

Brianna Laugher

8:03 a.m.

On 30/01/07, Gregory Maxwell gmaxwell@gmail.com wrote:

...

On 1/30/07, Brianna Laugher brianna.laugher@gmail.com wrote:

...
It works poorly if you expand all the way, and the higher up in the tree you start, the worse it works. In my experience using Duesentrieb's tool, it works quite well when you specify a low depth (depth=1,2,3). Often 1 is appropriate.

*sigh*.

In my example, cutting at depth 3 would prevent you from finding many of the copyright tags for example,

There is no universally correct depth, you have to play around and find what's appropriate for your topic.

...

If I fix bug1211 on commons will you withdraw your objection to large categories? Changing the code to make two queries probably wouldn't be too stab worthy.

Not at least until we see the category intersection tool in action. Seriously, if broad categories make this tool more useful, there will be a natural community push to adapt the category system. What is the need for a pre-emptive push?

Two other improvements I would like to see on categories are (a) user-specified size to show (instead of 200, offer choices in special:preferences), or possibly just a 'show all' option on all cats

...

200; and (b) different category sort options, such as date uploaded

(chron/reverse), date added to category (chron/reverse).

Hm, maybe I need to open bug reports for those ideas.

...

Frankly, I think it's really offensive that we'll waste are time talking about army waving dreams of semantic mediawiki with its academic appeal, when we can't even manage to provide the basic service our users require.

Sorry, I didn't mean to imply that I think Semantic MediaWiki will be implemented any time during the next 5 years, because I don't. :) I agree, it's not worth wasting time discussing it at the moment.

And I also agree that we are doing an extraordinarily bad job at providing basic services such as search. ( http://bugzilla.wikimedia.org/show_bug.cgi?id=8738 ) I can tell how much we suck just from using Flickr.

MediaWiki is probably not the best tool for managing a media database, at least in its current form. ( http://bugzilla.wikimedia.org/show_bug.cgi?id=3712 )

...

With the man power we can put behind marking up our content there is no reason we couldn't be just as good in this regard as the commercial stock photo houses.

I think you are vastly overstating the potential of a cat intersection tool if you think simply switching to broad cats instead of narrow will produce the kind of dazzling results that Getty images does. :P Getty obviously pays devs to spend a lot of time on this kind of thing. Ditto Flickr. AFAIK we have no devs who are particularly interested in improving media search. Of course we are the poor cousin. It is very frustrating but it seems to me that that is the lot of the open content volunteer.

It is one thing to mark up our content specially, but we need the other side of that - a way to query it properly. There is no reason this only applies to categories. There is all sorts of info metadata that it would be useful to specifically query, but we have no way to.

But we're not. Commons stinks in comparison and if

...

we continue to put off simple and straight-forward measures which will provide the basic features that people need in favor of using commons as a science project

?? How about our everyday users?

Confronting them with vast categories of thousands of items doesn't sound like such a great idea to me. Of course offering them ridiculously narrow cats is also frustrating. We try to strike a balance.

Anyway, you didn't answer my other questions. Are you proposing that for an image of a woman, we should put For a portrait of a woman, we should put [[category:women]] [[category:homo]] [[category:Hominidae]] [[category:primates]] [[category:Mammalia]] [[Category:Vertebrata]] [[Category:Chordata]] [[Category:Animalia]] [[Category:Eukaryota]] and all the other cats up the tree? This would be a lot more intensive for the categoriser who has to find all these categories, instead of simply finding the most specific one that applies, as they currently do.

and secondly is the cat intersection tool going to have a "flatten to a specified depth" option, or not?

cheers, Brianna

Brianna Laugher

8:08 a.m.

On 30/01/07, Gregory Maxwell gmaxwell@gmail.com wrote:

...

Go over to Getty images (http://creative.gettyimages.com/source/home/home.aspx) and try out their search and see just how much we suck. Do a 'search all creative' and type in 'black child eating icecream'.

Their system is quite simple but very powerful. They have many tags, from very broad to very specific, and images are marked with all that apply (sometimes many dozens). A simple tag suggestion system makes it easy to find the tags that are in use, and clear up ambiguities (do you want black the color or black the race?). You then query them with a simple and quick intersection tool. You can drill down or adjust your search string, but it's all very simple quick and easy. There are no complex query languages, no snazzy semantic markup, no funky idea hierarchies. It just WORKS. And it works for many tens of thousands of people every day.

Credit where credit's due: I think Commons is stronger when it comes to species identification. Stock photo archives like this tend to offer lots of generally pretty pictures of plants and animals, but Commons has a much stronger structure and identification info on species generally I think.

At Getty I searched for 'banksia serrata' and it returned this http://creative.gettyimages.com/source/classes/FrameSet.aspx?&UQR=ojqlag... which is identified as a Banksia ericifolia, not to mention this http://creative.gettyimages.com/source/classes/FrameSet.aspx?&UQR=ojqlag... which is simply 'a Banksia flower'. Commons, however, led me straight to http://commons.wikimedia.org/wiki/Banksia_serrata . :P

cheers Brianna

6513

Age (days ago)

6513

Last active (days ago)

commons-l@lists.wikimedia.org

3 comments

2 participants

tags (0)

participants (2)

Brianna Laugher
Gregory Maxwell