Am 17.10.2011 12:47, schrieb Andreas Kolbe:
Note: This foundation-l post is cross-posted to
commons-l, since this
discussion may be of interest there as well.*
*> From:* Tobias Oelgarte <tobias.oelgarte(a)googlemail.com>
*> It is a in house made problem, as i explained at brainstorming .
To put it short: It is a self made problem, based
on the fact that this
images got more attention then others. Thanks to failed deletion
requests they had many people caring about them. This results in more
exact descriptions and file naming then in average images. Thats what
search engines prefer; and now we have them at a top spot. Thanks for
caring so much about this images and not treating them like anything
I don't think that is the case, actually. Brandon described how the
search function works here:
To take an example, the file
(a prominent search result in searches for "shower") has never had its
name or description changed since it was uploaded from Flickr. My
impression is that refinement of file names and descriptions following
discussions has little to do with sexual or pornography-related media
appearing prominently in search listings. The material is simply
there, and the search function finds it, as it is designed to do.
That is again the picking of an example. But what do you expect to find?
Say that someone actually searches for an image of this practice. Should
he find it at the last spot? An good search algorithm treats everything
equal and delivers the closest matches. A search which is more
intelligent would deliver images of showers first if you search for
"shower", since it knows the difference between the terms "golden
shower" and "shower". Thats how it should work. It's definitely not an
error of the search engine itself, but it could be improved to deliver
better matching results, without any marking. Extending it to exclude
marked content leads back to the basic question(s), which should be
currently represent exactly that kind of argumentation
leads into anything, but not to a solution. I
described it already in
the post "Controversial Content vs Only-Image-Filter" , that single
examples don't represent the overall thematic. It also isn't an
to the discussion as an argument. It would be an
argument if we would
know the effects that occur. We have to clear the question:
It is hard to say how
else to provide evidence of a problem, other
than by giving multiple (not single) examples of it.
You could also search for blond, blonde, red hair, strawberry, or
What is striking is the crass sexism of some of the filenames and
image descriptions: "blonde bombshell", "Blonde teenie sucking",
so sexy", "These two had a blast showing off" etc.
One of the images shows a young woman in the bathroom, urinating:
Her face is fully shown, and the image, displayed in the Czech
Wikipedia, carries no personality rights warning, nor is there
evidence that she has consented to or is even aware of the upload.
And I am surprised how often images of porn actresses are found in
search results, even for searches like "Barbie". Commons has 917 files
in Category:Unidentified porn actresses alone. There is no
corresponding Category:Unidentified porn actors (although there is of
course a wealth of categories and media for gay porn actors).
Evidence would be a
statistic in which it is shown how many people are
actually happy with the results. With happy in the meaning: "i will use
it again and was not so offended to not use it".
If the naming of that images is a problem then we can just rename them
to something more useful. We have templates and bots for that. Marking
the images would not help in this case. But doing what we can already
do, would be a simple and working solution: Rename it.
The case of this image and others is already addressed in COM:PEOPLE. I
also see no direct relation between this topic (keeping/deleting) and
the search function and it's result.
Everyone should know that "Barbie" is a often used term or part of a
pseudonym. That the search reacts to both is quite right. The word
itself does not distinguish between multiple meanings. But thats again
not the problem.
I must remind you not construct special cases. Better spend the time in
searching for good solutions, which don't need to discriminate content
to give the best results as possible.
* Is it a problem that the search function
displays sexual content? (A
search should find anything related, by definition.)
I think the search function works as designed, looking for matches in
file names and descriptions.
That means, that it does it's job as intended.
* Is sexual content is overrepresented by the
I don't think so. The search function simply shows what is there.
However, the sexual content that comes up for innocuous searches
sometimes violates the principle of least astonishment, and thus may
turn some users off using, contributing to, or recommending Commons as
an educational resource.
That needs a big quotation mark and is an unproven
statement since the
beginning of the discussion. Commons and Wikipedia are meant to
represent the whole variety of knowledge. A search for words will
eventually deliver anything that is called that way, ambiguous or not.
That means you will find anything related, since the projects don't aim
at a special audience. For example "kids".
* If that is the case. Why is it that way?
* Can we do something about it, without drastic changes, like
One thing that might help would be for the search function to
privilege files that are shown in top-level categories containing the
search term: e.g. for "cucumber", first display all files that are in
category "cucumber", rather than those contained in subcategories,
like "sexual penetrative use of cucumbers", regardless of the file
name (which may not have the English word "cucumber" in it).
search should definitely be an option. After reading
Brandon's comment I must also wonder why it doesn't consider categories.
That are the places where content is already pre-sorted by ourself. It
would definitely worth the effort, since it would two things at once:
1. It would most likely give better results, even if the description or
filename is not translated.
2. Given a search function which finds content more effective, would
also minimize the effect we are talking about.
A second step would be to make sure that sexual content is not housed
in the top categories, but in appropriately named subcategories. This
is generally already established practice. Doing both would reduce the
problem somewhat, at least in cases where there is a category that
matches the search term.
I'm a little against categories that are purely
introduced to divide
content in sexual (offensive) and non sexual (not offensive) content. If
the practice/depiction has a own specialized term than it is acceptable.
But introducing pseudo categories just blows up the category tree and
effectively hides content. If we implement the first idea and introduce
special categories, then we are effectively back at filtering and non
PS: I was wondering which mail client you use. Usually the structure is
destroyed and the order of mails (re:) is not kept, which makes it hard
to follow conversations.
Am 17.10.2011 02:56, schrieb Andreas Kolbe:
Personality conflicts aside, we're noting
that non-sexual search
terms in Commons can prominently return sexual images of
explicitness, from mild nudity to hardcore, and that this is
different from entering a sexual search term and finding that
Google fails to filter some results.
I posted some more Commons search terms where this happens on
Black, Caucasian, Asian;
Male, Female, Teenage, Woman, Man;
Drawing, Drawing style;
Drinking, Custard, Tan;
Hand, Forefinger, Backhand, Hair;
Bell tolling, Shower, Furniture, Crate, Scaffold;
Galipette – French for "somersault"; this leads to a collection
1920s pornographic films which are undoubtedly of significant
historical interest, but are also pretty much as explicit as any
modern representative of the genre.
Commons-l mailing list