Hey Jimmy,
Thanks for the report. This problem is one that we've been aware of in Discovery for quite some time. It actually serves as a good example of a typical problem that we face in improving search: we know there's an issue with a small subset of searches, and could fix this problem easily with a hack, but that hack would make as many searches worse as it makes better. Meanwhile, better solutions take much more time.
But, I have good news! This quarter one of Discovery's goals [1] is to work on a proper solution to this very problem. We previously studied the problem in detail [2]. Now, following on from our upgrade of Elasticsearch last quarter, we're hoping that switching us over to BM25 [3] will fix many of these relevance issues, and we're investigating that more right now. Stay tuned!
Thanks, Dan
[1]: https://www.mediawiki.org/wiki/Wikimedia_Engineering/2016-17_Q1_Goals#Discov... [2]: https://phabricator.wikimedia.org/T125083 [2]: https://en.wikipedia.org/wiki/Okapi_BM25
On 28 Jul 2016 5:09 a.m., "Jimmy Wales" jimmywales@wikia-inc.com wrote:
First, some context:
I was in Philadelphia for the Democratic National Convention earlier this week, where I had been invited to speak (in a small side event) about connectivity and global development. I spoke about our work in the languages of the developing world, and made a point to say that bad laws in the developed world which might hurt our work can be damaging for the development of the Internet in the rest of the world and urged lawmakers to not just think of various Internet legal questions as being "Silicon Valley versus Hollywood" but to understand that they impact how our volunteer community and many other ordinary people online.
Second, the story:
The main conference was held in the [[Wells Fargo Center (Philadelphia)]], an indoor arena where basketball and hockey teams play normally.
A journalist friend said to me that he "finally found something that Wikipedia doesn't have" and he was surprised. What was that, I said? "The history of Wells Fargo". What?!! Really?!! That seemed impossible to me. He said we have an article about Wells Fargo that seems to be mostly about the contemporary bank, and when you search for Wells Fargo history there's also an article about the Wells Fargo History Museum.
I popped on my phone and used my own personal preferred method of finding things in Wikipedia: Google. I typed in "Wells Fargo history" and sure enough, the first two links are history pages from their official websites and the third link is Wikipedia - a normal state of affairs. He started to apologize for raising a false alarm
I asked him for more details on exactly how he searched, and explained that I regard it to be very sad if some volunteers spend hundreds of hours working on an article, painstakingly going over tons of details in an effort to get it right, and then someone couldn't find it.
Here's what he did - and I replicated the steps and all was clear.
Go to http://www.wikipedia.org/
Make sure the dropdown in the search box is set to 'EN' - which it would have been for him.
Start typing 'Wells Fargo history' and watch as the dropdown selections narrow. You'll have the experience that he had - you'll see the bank article prominently featured and then various buildings (they have a habit of sponsoring sports arenas in various US cities) and finally as you start typing history it focuses in on the History Museum.
If you don't choose any of those, then hit enter, you'll get to the search results page. This is the one with a huge box of options at the top (which will be confusing and frightening to people who aren't already wikipedians) and then by my count the desired article is 13th on the page: [[History of Wells Fargo]].
Now, I strongly suspect this could be fixed by making a redirect from [[Wells Fargo history]] to [[History of Wells Fargo]].
Or a more serious fix could be had if the search engine understood that very very often in English [[X of Y]] can be written [[Y X]]. ([[List of French monarchs]] becomes [[French monarchs list]], see: https://en.wikipedia.org/wiki/Special:Search?search=french+monarchs+list where the desired article is in 10th place.
But my point is not to argue for any specific fix. My point is to illustrate that there is a real problem with search, that it is impacting users, and that we should invest in fixing it.
--Jimbo
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe