On 8/12/07, Gregory Maxwell <gmaxwell(a)gmail.com> wrote:
What is flawed is the the model of "user
contributed content" which
lacks strong facilities for collaboration. We understand
collaboration, we have strong facilities for it. This is why "Joe
uses 'dog', john uses 'dogs' isn't and shouldn't be a hard
problem for
us. With collaboration we can just go fix all of it.
Or just pull the redirects from en and use them as data for a basic
disambiguation system.
Different audience, they cater to commercial stock
photography..
advertisements and such, while our primary customer is an
encyclopedia, but that has nothing to do with annotation.
Their annotation is fantastic. For example, searching for "kangaroo
fighting" ... gives you only pictures of kangaroos fighting.
Searching for "kangaroo costume" asks you to clarify if you want
"traditional clothing" or "Costume (Dressing Up)" you either get a
painting of people in what appears to be tribal dress cooking a
kangaroo, or you get pictures of people in cheezy kangaroo costumes
depending on your choice.
This blows away anything that we currently offer. It's fantastically useful.
Depends. You can get something close to that if you use wikipedia as
your commons search engine (of course in the case of en you will also
hit a load of images that are not on commons but give it time).
What stinks, in my view, is we're not that far
from being able to have
that kind of search ourselves. Their keyword data looks a lot like
our categories. The keywords themselves are classified into groups
(to help people find the right keywords, but not to classify the
images), and there is keyword disambiguation data.
The biggest difference, as far as I can tell, is that we're utterly
paranoid about "over categorization". While they apply all that are
appropriate, people on commons are constantly trying to reduce images
to a few.. or even one category. It's nuts and it clearly doesn't
work.
A typical image in getty's web collection will have something between
20 and 40 'keywords' assigned to them. We have an average of 2.9
(including all the license cats).
To an extent you could get around that by looking at the wikipedia
articles images appear in.
--
geni