On 8/12/07, Gregory Maxwell gmaxwell@gmail.com wrote:
What is flawed is the the model of "user contributed content" which lacks strong facilities for collaboration. We understand collaboration, we have strong facilities for it. This is why "Joe uses 'dog', john uses 'dogs' isn't and shouldn't be a hard problem for us. With collaboration we can just go fix all of it.
Or just pull the redirects from en and use them as data for a basic disambiguation system.
Different audience, they cater to commercial stock photography.. advertisements and such, while our primary customer is an encyclopedia, but that has nothing to do with annotation.
Their annotation is fantastic. For example, searching for "kangaroo fighting" ... gives you only pictures of kangaroos fighting. Searching for "kangaroo costume" asks you to clarify if you want "traditional clothing" or "Costume (Dressing Up)" you either get a painting of people in what appears to be tribal dress cooking a kangaroo, or you get pictures of people in cheezy kangaroo costumes depending on your choice.
This blows away anything that we currently offer. It's fantastically useful.
Depends. You can get something close to that if you use wikipedia as your commons search engine (of course in the case of en you will also hit a load of images that are not on commons but give it time).
What stinks, in my view, is we're not that far from being able to have that kind of search ourselves. Their keyword data looks a lot like our categories. The keywords themselves are classified into groups (to help people find the right keywords, but not to classify the images), and there is keyword disambiguation data.
The biggest difference, as far as I can tell, is that we're utterly paranoid about "over categorization". While they apply all that are appropriate, people on commons are constantly trying to reduce images to a few.. or even one category. It's nuts and it clearly doesn't work.
A typical image in getty's web collection will have something between 20 and 40 'keywords' assigned to them. We have an average of 2.9 (including all the license cats).
To an extent you could get around that by looking at the wikipedia articles images appear in.