Message: 5 Date: Tue, 11 Oct 2011 16:22:37 +0100 (BST) From: Andreas Kolbe jayen466@yahoo.com Subject: [Commons-l] Commons search function vs. Google To: Wikimedia Commons Discussion List commons-l@lists.wikimedia.org Message-ID: 1318346557.48784.YahooMailNeo@web29620.mail.ird.yahoo.com Content-Type: text/plain; charset="iso-8859-1"
We are wondering on Meta[1]?what criteria the Commons search function uses to establish the order of search results displayed.
To give some examples, searching for "pearl necklace" in Commons shows a woman with sperm on her neck as the first image result:
http://commons.wikimedia.org/w/index.php?title=Special%3ASearch&search=p...
The same image is way down in a Google search (with safe search off) for pearl necklace on Commons:
http://www.google.co.uk/search?q=cucumber+site:commons.wikimedia.org&um=... .,cf.osb&biw=1111&bih=774&uss=1#um=1&hl=en&safe=off&tbm=isch&sa=1&q=pearl+necklace+site: commons.wikimedia.org&oq=pearl+necklace+site:commons.wikimedia.org &aq=f&aqi=&aql=&gs_sm=e&gs_upl=113279l114967l0l115854l14l11l0l0l0l8l261l2003l0.8.3l11l0&bav=on.2,or.r_gc.r_pw.,cf.osb&fp=49f703222a617ec&biw=1111&bih=774
Searching for "electric toothbrushes" in Commons shows a woman masturbating with a toothbrush as the second image result:
http://commons.wikimedia.org/w/index.php?title=Special%3ASearch&search=e...
The same image turns up in Google as well (with safe search switched off), though not as one of the first results:
http://www.google.co.uk/search?q=cucumber+site:commons.wikimedia.org&um=... .,cf.osb&biw=1111&bih=774&uss=1#um=1&hl=en&safe=off&tbm=isch&sa=1&q=electric+toothbrushes+site: commons.wikimedia.org&pbx=1&oq=electric+toothbrushes+site: commons.wikimedia.org &aq=f&aqi=&aql=&gs_sm=e&gs_upl=341351l344565l0l345961l21l19l0l0l0l13l255l3528l0.11.8l19l0&bav=on.2,or.r_gc.r_pw.,cf.osb&fp=49f703222a617ec&biw=1111&bih=774
Searching for "cucumber" in Commons shows a woman with a cucumber up her vagina on the first page of search results:
http://commons.wikimedia.org/w/index.php?title=Special%3ASearch&search=c...
Doing a Google search for cucumber on Commons (with safe search off) does not bring this image up among the first hundred or so results:
http://www.google.co.uk/search?q=cucumber+site:commons.wikimedia.org&um=... .,cf.osb&biw=1111&bih=774&uss=1
Why is our listing so different from the one in Google, and why are sexual images so much higher up in our listing of search results?
Andreas
[1]?http://meta.wikimedia.org/wiki/Controversial_content/Brainstorming?
I don't know how Google does it, but I'd bet that our search prioritises by word order in the description. So a description that starts Pearl Necklace comes before "A white pearl necklace". If you amend the description them I suspect the search results will change.
WereSpielChequers
On 11 October 2011 16:53, WereSpielChequers werespielchequers@gmail.com wrote:
I don't know how Google does it, but I'd bet that our search prioritises by word order in the description. So a description that starts Pearl Necklace comes before "A white pearl necklace". If you amend the description them I suspect the search results will change.
There's some notes on the internals of Lucene-search here:
http://www.mediawiki.org/wiki/User:Rainman/search_internals
"Article content" presumably is the same as the image description in our context. I don't know quite what the "rank" metric would mean in the Commons context - presumably, only links from local pages on Commons count?
It may be that more controversial images provoke more meta-discussion, with more links to them as a result (from talkpages, deletion discussions, etc) and so are more likely to appear "popular" to the search system, but that's just a guess.
Andrew Gray, 11/10/2011 20:11:
It may be that more controversial images provoke more meta-discussion, with more links to them as a result (from talkpages, deletion discussions, etc) and so are more likely to appear "popular" to the search system, but that's just a guess.
Hm, Lucene Streisand effect.
Béria Lima, 11/10/2011 20:31:
I guess that has something to do with the name of the images. The sexual image has the name of File:Sexuality *pearl necklace* small.png
http://commons.wikimedia.org/wiki/File:Sexuality_pearl_necklace_small.png
so, would be obvious to be one of the first results if you are looking for *pearl necklace*.
Looks like there are 248 exact file matches. https://commons.wikimedia.org/w/index.php?title=Special%3ASearch&search=%22pearl+necklace%22&fulltext=Search&ns6=1&profile=advanced I see that the first image doesn't use information template, perhaps descriptions within templates are treated differently? Could be a wrong assumption based on how infoboxes work on Wikipedia. (Just more imaginative speculations...)
Nemo
Federico Leva (Nemo), 11/10/2011 21:26:
Looks like there are 248 exact file matches. https://commons.wikimedia.org/w/index.php?title=Special%3ASearch&search=%22pearl+necklace%22&fulltext=Search&ns6=1&profile=advanced
I see that the first image doesn't use information template, perhaps descriptions within templates are treated differently? Could be a wrong assumption based on how infoboxes work on Wikipedia. (Just more imaginative speculations...)
I've added the template and it's now 5th. Links seem to be still the same, so looks like that was the problem?
Nemo
I guess that has something to do with the name of the images. The sexual image has the name of File:Sexuality *pearl necklace* small.pnghttp://commons.wikimedia.org/wiki/File:Sexuality_pearl_necklace_small.pngso, would be obvious to be one of the first results if you are looking for *pearl necklace*. _____ *Béria Lima* http://wikimedia.pt/(351) 925 171 484
*Imagine um mundo onde é dada a qualquer pessoa a possibilidade de ter livre acesso ao somatório de todo o conhecimento humano. É isso o que estamos a fazer http://wikimediafoundation.org/wiki/Nossos_projetos.*
On 11 October 2011 16:53, WereSpielChequers werespielchequers@gmail.comwrote:
Message: 5 Date: Tue, 11 Oct 2011 16:22:37 +0100 (BST) From: Andreas Kolbe jayen466@yahoo.com Subject: [Commons-l] Commons search function vs. Google To: Wikimedia Commons Discussion List commons-l@lists.wikimedia.org Message-ID: 1318346557.48784.YahooMailNeo@web29620.mail.ird.yahoo.com Content-Type: text/plain; charset="iso-8859-1"
We are wondering on Meta[1]?what criteria the Commons search function uses to establish the order of search results displayed.
To give some examples, searching for "pearl necklace" in Commons shows a woman with sperm on her neck as the first image result:
http://commons.wikimedia.org/w/index.php?title=Special%3ASearch&search=p...
The same image is way down in a Google search (with safe search off) for pearl necklace on Commons:
http://www.google.co.uk/search?q=cucumber+site:commons.wikimedia.org&um=... .,cf.osb&biw=1111&bih=774&uss=1#um=1&hl=en&safe=off&tbm=isch&sa=1&q=pearl+necklace+site: commons.wikimedia.org&oq=pearl+necklace+site:commons.wikimedia.org &aq=f&aqi=&aql=&gs_sm=e&gs_upl=113279l114967l0l115854l14l11l0l0l0l8l261l2003l0.8.3l11l0&bav=on.2,or.r_gc.r_pw.,cf.osb&fp=49f703222a617ec&biw=1111&bih=774
Searching for "electric toothbrushes" in Commons shows a woman masturbating with a toothbrush as the second image result:
http://commons.wikimedia.org/w/index.php?title=Special%3ASearch&search=e...
The same image turns up in Google as well (with safe search switched off), though not as one of the first results:
http://www.google.co.uk/search?q=cucumber+site:commons.wikimedia.org&um=... .,cf.osb&biw=1111&bih=774&uss=1#um=1&hl=en&safe=off&tbm=isch&sa=1&q=electric+toothbrushes+site: commons.wikimedia.org&pbx=1&oq=electric+toothbrushes+site: commons.wikimedia.org &aq=f&aqi=&aql=&gs_sm=e&gs_upl=341351l344565l0l345961l21l19l0l0l0l13l255l3528l0.11.8l19l0&bav=on.2,or.r_gc.r_pw.,cf.osb&fp=49f703222a617ec&biw=1111&bih=774
Searching for "cucumber" in Commons shows a woman with a cucumber up her vagina on the first page of search results:
http://commons.wikimedia.org/w/index.php?title=Special%3ASearch&search=c...
Doing a Google search for cucumber on Commons (with safe search off) does not bring this image up among the first hundred or so results:
http://www.google.co.uk/search?q=cucumber+site:commons.wikimedia.org&um=... .,cf.osb&biw=1111&bih=774&uss=1
Why is our listing so different from the one in Google, and why are sexual images so much higher up in our listing of search results?
Andreas
[1]?http://meta.wikimedia.org/wiki/Controversial_content/Brainstorming?
I don't know how Google does it, but I'd bet that our search prioritises by word order in the description. So a description that starts Pearl Necklace comes before "A white pearl necklace". If you amend the description them I suspect the search results will change.
WereSpielChequers
Commons-l mailing list Commons-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/commons-l