<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Unfortunately we currently have zero developers working on search
(as far as I know). There are several more significant search bugs
that are also not going to be fixed any time soon. Another issue is
that our search engine is Java while the rest of MediaWiki is PHP.
This makes sense for performance reasons, but makes the pool of
potential developers who are able and willing to work on it much
smaller. In other words, this might get fixed in a few years, but I
wouldn't hold my breathe. In the meantime, it would be good to
follow Sarah's lead and proactively curate the content we have so
that there is less potential for astonishment in our search results.<br>
<br>
Ryan Kaldari<br>
<br>
On 10/13/11 5:37 PM, Andreas Kolbe wrote:
<blockquote
cite="mid:1318552659.18505.YahooMailNeo@web29611.mail.ird.yahoo.com"
type="cite">
<div style="color: rgb(0, 0, 0); background-color: rgb(255, 255,
255); font-family: times new roman,new york,times,serif;
font-size: 12pt;"><span class="Apple-style-span" style="color:
rgb(69, 69, 69); font-family: Arial,Helvetica,sans-serif;
font-size: 12px;">
<div class="msg-body inner undoreset" style="margin: 25px 24px
22px 29px; overflow-x: auto; overflow-y: hidden;">
<div id="yiv467794009">
<div>
<div style="color: rgb(0, 0, 0); background-color:
rgb(255, 255, 255); font-family: 'times new
roman','new york',times,serif; font-size: 12pt;">
<div>John,</div>
<div><br>
<blockquote style="margin-left: 5px; font-family:
times,serif; font-size: 12pt; border-left: 2px
solid rgb(16, 16, 255); padding-left: 5px;">
<div
class="yiv467794009yui_3_2_0_15_131855095902857"
style="font-family: 'times new roman','new
york',times,serif; font-size: 12pt;">
<div
class="yiv467794009yui_3_2_0_15_131855095902859"
style="font-family: 'times new roman','new
york',times,serif; font-size: 12pt;"><font
face="Arial" size="2"><b><span
style="font-weight: bold;">From:</span></b> John
Vandenberg <a class="moz-txt-link-rfc2396E" href="mailto:jayvdb@gmail.com"><jayvdb@gmail.com></a><br>
</font>> (Searching for "levee" in Commons
brings up an image of a<br>
> naked Suicide Girl called Levee in third
place.)<br>
<br>
Its a thumbnail for !@#$ sake, and anyone who
finds that image<br>
offensive should turn off their internet
connection.<br>
</div>
</div>
</blockquote>
<span class="yiv467794009Apple-style-span"
style="font-size: 16px;">
<div><span class="yiv467794009Apple-style-span"
style="font-size: 16px;"><br>
</span></div>
</span><span class="yiv467794009Apple-style-span"
style="font-size: 16px;">It's a perfectly nice
image, but does it answer the user's need? In most
cases probably not. If I google levee, I see
levees, not nude girls:</span></div>
<div><span class="yiv467794009Apple-style-span"
style="font-size: 16px;"><br>
</span></div>
<div><span class="yiv467794009Apple-style-span"
style="font-size: 16px;"><a moz-do-not-send="true"
rel="nofollow" target="_blank"
href="http://www.google.co.uk/search?gcx=c&q=levee&um=1&ie=UTF-8&hl=en&tbm=isch&source=og&sa=N&tab=wi&biw=1041&bih=638"
style="color: rgb(35, 71, 134); outline-style:
none;">http://www.google.co.uk/search?gcx=c&q=levee&um=1&ie=UTF-8&hl=en&tbm=isch&source=og&sa=N&tab=wi&biw=1041&bih=638</a><br>
</span></div>
<div><br>
</div>
<div>If I want to google for pictures of Levee, I
google for "Levee Suicide Girls", and there she is:</div>
<div><br>
</div>
<div><a moz-do-not-send="true" rel="nofollow"
target="_blank"
href="http://www.google.co.uk/search?gcx=c&q=levee&um=1&ie=UTF-8&hl=en&tbm=isch&source=og&sa=N&tab=wi&biw=1041&bih=638#um=1&hl=en&tbm=isch&sa=1&q=levee+suicide+girl&pbx=1&oq=levee+suicide+girl&aq=f&aqi=&aql=&gs_sm=e&gs_upl=127182l129981l0l130379l15l15l0l11l0l0l291l930l0.1.3l4l0&bav=on.2,or.r_gc.r_pw.,cf.osb&fp=120e52a58330422e&biw=1041&bih=638"
style="color: rgb(35, 71, 134); outline-style:
none;">http://www.google.co.uk/search?gcx=c&q=levee&um=1&ie=UTF-8&hl=en&tbm=isch&source=og&sa=N&tab=wi&biw=1041&bih=638#um=1&hl=en&tbm=isch&sa=1&q=levee+suicide+girl&pbx=1&oq=levee+suicide+girl&aq=f&aqi=&aql=&gs_sm=e&gs_upl=127182l129981l0l130379l15l15l0l11l0l0l291l930l0.1.3l4l0&bav=on.2,or.r_gc.r_pw.,cf.osb&fp=120e52a58330422e&biw=1041&bih=638</a></div>
<div><br>
</div>
<div>I guess Commons should give more weight to
categories, and less weight to file names. So when I
google cucumber, it should show me images in the
cucumber category first of all, and not images that
happen to have cucumber in the title.</div>
<div><br>
</div>
<div>Brandon, is there something developers could do
in this regard?</div>
<div><br>
</div>
<div><br>
</div>
<div>
<blockquote style="margin-left: 5px; font-family:
times,serif; font-size: 12pt; border-left: 2px
solid rgb(16, 16, 255); padding-left: 5px;">
<div
class="yiv467794009yui_3_2_0_15_131855095902857"
style="font-family: 'times new roman','new
york',times,serif; font-size: 12pt;">
<div
class="yiv467794009yui_3_2_0_15_131855095902859"
style="font-family: 'times new roman','new
york',times,serif; font-size: 12pt;">I am sure
you'll be appalled that libraries include nude
pictures in<br>
their search results, often when searching for
something else.<br>
<br>
<a moz-do-not-send="true" rel="nofollow"
target="_blank"
href="http://trove.nla.gov.au/picture/result?q=contemporary+north+america+20th+century"
style="color: rgb(35, 71, 134);
outline-style: none;">http://trove.nla.gov.au/picture/result?q=contemporary+north+america+20th+century</a><br>
<br>
fix the metadata.<br>
<br>
create a gallery page.<br>
<br>
create a category and populate it.<br>
<br>
etc<br>
<br>
p.s. abstract art offends me. Can we please
remove media related to<br>
John Levee's from the Commons search results
for the term 'Levee'. ;-)<br>
<br>
> We should be under no illusion that we
can find all search terms whose<br>
> results violate the principle of least
surprise, presenting adult images for<br>
> everyday search terms.<br>
><br>
> New such situations arise on a daily
basis, each time someone uploads an<br>
> explicit file that has a plausible search
term in its name and<br>
> description (try searching Commons for
"eating", and then search for<br>
> "drinking"; or try finding images of
Prince Albert).<br>
<br>
The ordering of the search results isnt
ideal. Have you raised a bug?<br>
</div>
</div>
</blockquote>
<div style="font-family: times,serif; font-size:
12pt;"><br>
</div>
<div style="font-family: times,serif; font-size:
12pt;"><br>
</div>
<font class="yiv467794009Apple-style-span" size="3">The
thing is, John, it's not a bug. How is it a bug?
The image is called "Drinking urine" or whatever,
and so it's a valid search result for "drinking".
No doubt, a bunch of people would argue that it
would be non-neutral to exclude it from the search
results for drinking, because Wikipedia is not
censored, and we don't care if people are unhappy
with our service, because that would be
non-neutral. ;)</font></div>
<div><br>
</div>
<div style="font-family: times,serif; font-size:
12pt;"><Imagine rant here.></div>
<div style="font-family: times,serif; font-size:
12pt;"><br>
</div>
<div style="font-family: times,serif; font-size:
12pt;"><br>
</div>
<div>
<blockquote style="margin-left: 5px; font-family:
times,serif; font-size: 12pt; border-left: 2px
solid rgb(16, 16, 255); padding-left: 5px;">
<div
class="yiv467794009yui_3_2_0_15_131855095902857"
style="font-family: 'times new roman','new
york',times,serif; font-size: 12pt;">
<div style="font-size: 12pt; font-family:
times,serif;">
<div>It puts too much weight on the filename,
which isnt good because<br>
</div>
recommend against rename, so the current
search results are gamable by<br>
the uploader.<br>
<br>
> We should simply offer safe search, like
Google does.<br>
<br>
Google provides safe search. They need to
convert 'the internet' into<br>
a search results page that their customer
wants to see, and the<br>
Internet has a whole lot of stuff that 99% of
the world never wants to<br>
see.<br>
<br>
Wikipedia provides encyclopedic information.<br>
<br>
Commons provides a depository of media, and if
you search for keywords<br>
in the metadata you'll see thumbnails of the
matching media.</div>
</div>
</blockquote>
<div><br>
</div>
<div><br>
</div>
I find Google safe search seriously useful, because
it gives me a choice, and enables me to tailor my
search to my requirements. If I want to see porn, I
can see porn. If I'm looking for something else, I
can prevent my search being flooded with porn. </div>
<div><br>
</div>
<div>If I am a researcher looking for images of Prince
Albert on Commons, I would appreciate not being
forced to wade through dozens of images of penises
with rings in them to find the image I'm looking
for.</div>
<div><br>
</div>
<div><a moz-do-not-send="true" rel="nofollow"
target="_blank"
href="http://commons.wikimedia.org/w/index.php?title=Special:Search&redirs=1&ns0=1&ns6=1&ns9=1&ns12=1&ns14=1&ns100=1&ns106=1&search=Prince+albert&limit=500&offset=0"
style="color: rgb(35, 71, 134); outline-style:
none;">http://commons.wikimedia.org/w/index.php?title=Special:Search&redirs=1&ns0=1&ns6=1&ns9=1&ns12=1&ns14=1&ns100=1&ns106=1&search=Prince+albert&limit=500&offset=0</a><br>
</div>
<div><br>
</div>
<div>We will not attract a more mature audience until
we get our act together.</div>
<div><br>
</div>
<div>Andreas<br>
</div>
<div><br>
</div>
<div> </div>
<div><br>
</div>
</div>
</div>
</div>
</div>
</span></div>
<pre wrap="">
<fieldset class="mimeAttachmentHeader"></fieldset>
_______________________________________________
Gendergap mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Gendergap@lists.wikimedia.org">Gendergap@lists.wikimedia.org</a>
<a class="moz-txt-link-freetext" href="https://lists.wikimedia.org/mailman/listinfo/gendergap">https://lists.wikimedia.org/mailman/listinfo/gendergap</a>
</pre>
</blockquote>
</body>
</html>