tl;dr: Search continues to expand functionality by displaying more information on the search results page
Ever started searching for something on Wikipedia and wondered—*really*, is that all that there is? Does it feel like you’re somehow playing hide and seek with all the knowledge that’s out there? And...wouldn’t it be great to see articles or categories that are similar to your search query and maybe some related images or links to other languages in which to read that article? Or, maybe you just want to read and contribute to projects other than Wikipedia but need a jump start with a few short summaries from sister projects. The Discovery Search team has been testing out some really cool new features that will enable some fun and fascinating clicking—down the rabbit hole of Wikipedia.[1] But first, let’s recap what we’ve been doing recently.
We've been doing tons of work creating, updating, and finessing the search back end to enhance search queries. There have been many complex things that have happened, things like: adding ascii-folding and stemming, detecting when a visitor might be typing in a language that is different than the Wikipedia that they are on, switching from tf-idf to BM25, dropping trailing question marks, and updating to ElasticSearch version 5. [2][3][4][5][6][7] Whew!
We have much more planned in the coming months—machine learning with ‘learning to rank’, investigating and deploying new language analyzers, and, after exhaustive analysis, removing quotes within queries by default.[8][9][10][11] We’ll also be working closely with the new Structured Data team in their brand new work on Commons.[12][13]
We also want to improve the part that our readers and editors interface with: the search results page! We started brainstorming during the late summer of 2016 on what we could do to make search results better—to easily find interesting, relevant content and to create a more intuitive viewing experience.[14] We designed and refined numerous ideas on how to improve the search results page and received lots of good feedback from the community.[15]
Empowered by the feedback, we began testing starting with a display of results from the Wikimedia sister projects next to the regular search results.[16] The idea for this test was to enable discovery into other projects—projects that our visitors might not have known about—by displaying interesting results in small snippets. The sidebar display of the sister projects borrows from a similar feature in use on the Italian, Catalan and French Wikipedias. We've run two A/B tests on the sister project search results with detailed analysis and, after a bit of final touches to the code, we will release the new functionality into production on all Wikipedias near the end of April 2017.
Our next A/B test will be to add additional information and related results for each search query. This will be in the form of an ‘explore similar’ link that, when someone interacts with the link, an expanded display will appear with related pages, categories and links to the article in other languages—all of which might lead to further knowledge discovery.[17] We know that not every search query will return exactly what folks were looking for, but we feel that adding links to similar, but related information would be helpful and, possibly, super interesting!
We also plan on doing a few more A/B tests in the coming year: * Test a new display that will show the pronunciation of a word with its definition and part of speech—all from existing data in Wiktionary. Initially this will be in English only. * Test placing a small image (from the article) next to each search result that is displayed on the page. * Test an additional future using a new auto completion metadata display in the search box that is located on the top right of most pages in Wikipedia, similar to what happens on the Wikipedia.org portal.[18]
For the more technical minded, there is a way to test out these new features in your own browser. To display the sister project search results, it will require a bit of URL manipulation; but for the explore similar and Wiktionary widget, you can modify your common.js file to test an early version of the features. Detailed information is available on MediaWiki.org.[19]
Once the testing, analysis and feedback cycle is done for each new feature, we’d like to slowly implement them into production on all Wikipedias throughout the rest of the year. We’re really hoping that these enhancements to how search works will further the usefulness of search and make our readers and editors more productive.
Cheers from the Discovery Search team!
[1] https://xkcd.com/214/ [2] https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/R e-Ordering_Stemming_and_Ascii-Folding_on_English_Wikipedia [3] https://blog.wikimedia.org/2016/07/27/wikipedia-language-search/ [4] https://en.wikipedia.org/wiki/Tf%E2%80%93idf [5] https://en.wikipedia.org/wiki/Okapi_BM25 [6] https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Drop ping_Final_Question_Marks_in_the_Top_10_Wikipedias [7] https://phabricator.wikimedia.org/T154501 [8] https://en.wikipedia.org/wiki/Learning_to_rank [9] https://phabricator.wikimedia.org/T154511 [10] https://commons.wikimedia.org/wiki/File:From_ Zero_to_Hero_-_Anticipating_Zero_Results_From_Query_ Features,_Ignoring_Content.pdf [11] https://www.mediawiki.org/wiki/User:TJones_(WMF)/ Notes/Quotes_and_Questions [12] https://commons.wikimedia.org/wiki/Commons:Structured_data [13] https://blog.wikimedia.org/2017/01/09/sloan-foundation-structured-data/ [14] https://www.mediawiki.org/wiki/Cross-wiki_Search_Result_Improvements [15] https://www.mediawiki.org/wiki/Talk:Cross-wiki_ Search_Result_Improvements [16] https://www.mediawiki.org/wiki/Cross-wiki_Search_ Result_Improvements/Testing#A.2FB_test:_Add_cross-wiki_ search_results_in_a_right_hand_sidebar [17] https://www.mediawiki.org/wiki/Cross-wiki_Search_ Result_Improvements/Testing#A.2FB_test:_Add_.27explore_similar.27_pages_and_ categories_for_search_results [18] https://www.wikipedia.org/ [19] https://www.mediawiki.org/wiki/Cross-wiki_Search_ Result_Improvements/self-guided_testing
-- deb tankersley irc: debt Product Manager, Discovery Wikimedia Foundation