Hey Jimmy,
Thanks for the report. This problem is one that we've been aware of in
Discovery for quite some time. It actually serves as a good example of a
typical problem that we face in improving search: we know there's an issue
with a small subset of searches, and could fix this problem easily with a
hack, but that hack would make as many searches worse as it makes better.
Meanwhile, better solutions take much more time.
But, I have good news! This quarter one of Discovery's goals [1] is to work
on a proper solution to this very problem. We previously studied the
problem in detail [2]. Now, following on from our upgrade of Elasticsearch
last quarter, we're hoping that switching us over to BM25 [3] will fix many
of these relevance issues, and we're investigating that more right now.
Stay tuned!
Thanks,
Dan
[1]:
https://www.mediawiki.org/wiki/Wikimedia_Engineering/2016-17_Q1_Goals#Disco…
[2]:
https://phabricator.wikimedia.org/T125083
[2]:
https://en.wikipedia.org/wiki/Okapi_BM25
On 28 Jul 2016 5:09 a.m., "Jimmy Wales" <jimmywales(a)wikia-inc.com>
wrote:
First, some context:
I was in Philadelphia for the Democratic National Convention earlier
this week, where I had been invited to speak (in a small side event)
about connectivity and global development. I spoke about our work in
the languages of the developing world, and made a point to say that bad
laws in the developed world which might hurt our work can be damaging
for the development of the Internet in the rest of the world and urged
lawmakers to not just think of various Internet legal questions as being
"Silicon Valley versus Hollywood" but to understand that they impact how
our volunteer community and many other ordinary people online.
Second, the story:
The main conference was held in the [[Wells Fargo Center
(Philadelphia)]], an indoor arena where basketball and hockey teams play
normally.
A journalist friend said to me that he "finally found something that
Wikipedia doesn't have" and he was surprised. What was that, I said?
"The history of Wells Fargo". What?!! Really?!! That seemed impossible
to me. He said we have an article about Wells Fargo that seems to be
mostly about the contemporary bank, and when you search for Wells Fargo
history there's also an article about the Wells Fargo History Museum.
I popped on my phone and used my own personal preferred method of
finding things in Wikipedia: Google. I typed in "Wells Fargo history"
and sure enough, the first two links are history pages from their
official websites and the third link is Wikipedia - a normal state of
affairs. He started to apologize for raising a false alarm
I asked him for more details on exactly how he searched, and explained
that I regard it to be very sad if some volunteers spend hundreds of
hours working on an article, painstakingly going over tons of details in
an effort to get it right, and then someone couldn't find it.
Here's what he did - and I replicated the steps and all was clear.
Go to
http://www.wikipedia.org/
Make sure the dropdown in the search box is set to 'EN' - which it would
have been for him.
Start typing 'Wells Fargo history' and watch as the dropdown selections
narrow. You'll have the experience that he had - you'll see the bank
article prominently featured and then various buildings (they have a
habit of sponsoring sports arenas in various US cities) and finally as
you start typing history it focuses in on the History Museum.
If you don't choose any of those, then hit enter, you'll get to the
search results page. This is the one with a huge box of options at the
top (which will be confusing and frightening to people who aren't
already wikipedians) and then by my count the desired article is 13th on
the page: [[History of Wells Fargo]].
Now, I strongly suspect this could be fixed by making a redirect from
[[Wells Fargo history]] to [[History of Wells Fargo]].
Or a more serious fix could be had if the search engine understood that
very very often in English [[X of Y]] can be written [[Y X]]. ([[List
of French monarchs]] becomes [[French monarchs list]], see:
https://en.wikipedia.org/wiki/Special:Search?search=french+monarchs+list
where the desired article is in 10th place.
But my point is not to argue for any specific fix. My point is to
illustrate that there is a real problem with search, that it is
impacting users, and that we should invest in fixing it.
--Jimbo
_______________________________________________
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l(a)lists.wikimedia.org
Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
_______________________________________________
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l(a)lists.wikimedia.org
Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>