Discovery July 2016

discovery@lists.wikimedia.org

15 participants
25 discussions

Re: [discovery] [WikimediaMobile] Fwd: geo search is now live
by Justin Ormont 15 Jul '16

15 Jul '16

Looks great to me. On Fri, Jul 15, 2016 at 1:18 AM, Sam Smith <samsmith(a)wikimedia.org> wrote: > Nice work Erik (and any other folks that worked on this)! > > -Sam > > On Thu, Jul 14, 2016 at 11:07 PM, Corey Floyd <cfloyd(a)wikimedia.org> > wrote: > >> Super excited to start working on the UI for this. Thanks again for >> getting this implemented! 🎉 >> >> On Thu, Jul 14, 2016 at 5:51 PM, Erik Bernhardson < >> ebernhardson(a)wikimedia.org> wrote: >> >>> This week we shipped a new feature at the request of the mobile apps >>> team, geo integration to full text search. >>> >>> (bare bones) Documentation: >>> https://www.mediawiki.org/wiki/Help:CirrusSearch#Geo_Search >>> >>> Quick example usage for bounded geo search: >>> >>> >>> https://people.wikimedia.org/~ebernhardson/mapsearch.html#Pittsburgh|intitl… >>> >>> https://people.wikimedia.org/~ebernhardson/mapsearch.html#San_Francisco|has… >>> :"National_Register_of_Historic_Places_in_California" >>> >>> https://people.wikimedia.org/~ebernhardson/mapsearch.html#200km,Moscow|inca… >>> >>> Direct search example: >>> >>> >>> https://en.wikipedia.org/w/index.php?fulltext=1&search=incategory%3AKremlin… >>> >>> Also implemented a "boost" version which increases the search score of >>> pages within a particular area, rather than the default which limits to a >>> geographic area. Compare: >>> >>> >>> https://en.wikipedia.org/w/index.php?fulltext=1&search=boost-neartitle%3A%2… >>> >>> vs. >>> >>> https://en.wikipedia.org/w/index.php?fulltext=1&search=museum >>> >>> >>> >>> _______________________________________________ >>> Mobile-l mailing list >>> Mobile-l(a)lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/mobile-l >>> >>> >> >> >> -- >> Corey Floyd >> Software Engineer >> Reading / iOS >> Wikimedia Foundation >> >> _______________________________________________ >> Mobile-l mailing list >> Mobile-l(a)lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/mobile-l >> >> > > _______________________________________________ > Mobile-l mailing list > Mobile-l(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/mobile-l > >

1 0

Fwd: geo search is now live
by Erik Bernhardson 14 Jul '16

14 Jul '16

This week we shipped a new feature at the request of the mobile apps team, geo integration to full text search. (bare bones) Documentation: https://www.mediawiki.org/wiki/Help:CirrusSearch#Geo_Search Quick example usage for bounded geo search: https://people.wikimedia.org/~ebernhardson/mapsearch.html#Pittsburgh|intitl… https://people.wikimedia.org/~ebernhardson/mapsearch.html#San_Francisco|has… :"National_Register_of_Historic_Places_in_California" https://people.wikimedia.org/~ebernhardson/mapsearch.html#200km,Moscow|inca… Direct search example: https://en.wikipedia.org/w/index.php?fulltext=1&search=incategory%3AKremlin… Also implemented a "boost" version which increases the search score of pages within a particular area, rather than the default which limits to a geographic area. Compare: https://en.wikipedia.org/w/index.php?fulltext=1&search=boost-neartitle%3A%2… vs. https://en.wikipedia.org/w/index.php?fulltext=1&search=museum

1 0

Research and conclusions for Wikipedia.org survey
by Deborah Tankersley 14 Jul '16

14 Jul '16

Hi, We recently wrapped up survey participant sessions that were centered around the Wikipedia.org portal: how did they get there, what did they do when they arrived, do they know that the footer (with sister project descriptive text) exists, new language by article count dropdown proposal and search satisfaction. We were able to chat with these participants because they took one of our surveys (in May <https://commons.wikimedia.org/wiki/File:Wikipedia_Portal_Survey_-_May_2016.…> and June <https://phabricator.wikimedia.org/T136874#2418095> 2016) that we ran on the portal page; which was run to attempt to determine why and how users arrived at Wikipedia.org. Daisy Chen conducted the sessions, sifted and compiled the data and it's written up here <https://commons.wikimedia.org/wiki/File:Discovery_-_Wikipedia.org_Portal_St…>. Daisy will also present the findings and conclusions during July's Research Showcase <https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase> next week. This research is also instrumental to the new page layout discussion we have ongoing here <https://www.mediawiki.org/wiki/Wikipedia.org_updated_page_layout>. If there are any questions, please feel free to ping Daisy or myself on the research and/or the Wikipedia portal and search work. Cheers, Deb -- Deb Tankersley Product Manager, Discovery IRC: debt Wikimedia Foundation

2 1

Categorizing phab tickets
by David Causse 14 Jul '16

14 Jul '16

Hi, maybe not the best place to talk about that but... I'd like to categorize some phab tasks so that I can access them quickly in the future. At first I thought that tags would be a perfect fit by creating my own custom tags. But as far as I understood tags are projects and I'm not allowed to create them. I suppose that if this feature is protected behind permissions this is because phab admins do not want someone to pollute the system with random tags. My usecase is: Sometimes users report queries that are not performing very well. Usually by reading the query I can identify and classify the cause. This cause can be something like: - bad weighting of words in the title - text analysis issue - index/db discrepancies - ... This list is quite vague... While it's not worth fixing a particular issue that mentions a specific query it's sometimes helpful to retrieve such tickets (where sometimes I added a comment) while I'm working on this class of problems: - just to have more examples to test - maybe I was wrong with the initial classification and the problem is elsewhere Retrieving such tickets is painful today, because I have to rely on search, not to blame phab developpers, search is hard we all know :) Today I used the parent/child relationships e.g. https://phabricator.wikimedia.org/T128073 but I don't think it's the proper approach because when I classify tickets I don't necessarily have a parent task ready. Thanks for your suggestions.

8 15

Terminology for search engine sidebars of Wikipedia content
by Pine W 13 Jul '16

13 Jul '16

Hi Discovery, Is there a particular term for search engine sidebars of Wikipedia content? For example, do we call them "search engine previews" or "Wikipedia sidebars on search pages"? I imagine that Google and Microsoft have certain terminology, and I'd like to be consistent when I'm referring to them in the LearnWiki videos, provided that the term is something that the average user would understand. Thanks, Pine

4 4

TextCat and Confidence
by Trey Jones 12 Jul '16

12 Jul '16

Hey everyone, Mikhail has written up and should soon release his report on our recent TextCat A/B tests; the results look good, and language identification and cross-wiki searching definitely improve the results (in terms of results shown and results clicked) for otherwise poorly performing queries (those that get fewer than 3 results). Mikhail's report also suggests looking at some measure of confidence for the language identification to see if that has any effect on the quality (in terms of number of results, but more importantly clicks) of the crosswiki (also "interwiki") results. This sounds like a good idea, but TextCat doesn't make it super easy to do. I have some ideas, though, and I would love some suggestions from anyone else who has any ideas. The details are kind of technical, so if that kind of thing makes your eyes glaze over, you should avert your gaze now. Otherwise, check out my write up on TextCat and confidence <https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/TextCat_and_Confiden…> and share your ideas here, or on the talk page. Thanks! —Trey Trey Jones Software Engineer, Discovery Wikimedia Foundation

2 1

Wikipedia.org Portal: new page design
by Deborah Tankersley 12 Jul '16

12 Jul '16

Hello, The Wikimedia Foundation Discovery Portal team [1] has recently completed tests to see if a new Wikipedia.org [3] page design would be easier to navigate by adding a dropdown that contained all the languages by article count list. The test results [4] did indeed find that visitors were more likely to click through to a link and there was a lower amount of 'non-action' on the page, because of the new design. We also tested the page with many Wikipedia users and received comments that the new page design was aesthetically pleasing and less cluttered. Based on this information, we would like to promote this into production on the Wikipedia.org portal, but we want to ask the community for feedback and/or suggestions [5]. Cheers from the Discovery Portal team! [1] https://www.mediawiki.org/wiki/Wikipedia.org_Portal [2] https://phabricator.wikimedia.org/T131526 [3] https://www.wikipedia.org [4] https://commons.wikimedia.org/wiki/File:Wikipedia_Portal_AB_Test_of_Collaps… [5] https://www.mediawiki.org/wiki/Wikipedia.org_updated_page_layout -- Deb Tankersley Product Manager, Discovery IRC: debt Wikimedia Foundation

2 4

Weekly elasticsearch slowdown
by Guillaume Lederrey 12 Jul '16

12 Jul '16

Hello! While looking at the elasticsearch dashboard on Grafana [1] I see that we have weekly spikes in response times from codfw. My guess is that this is related to the weekly update of page rank. More details: We see fairly large spikes on the overall 95%-ile for codfw (from a usual ~300[ms] to ~1-1.5[s]). Those spikes are more visible on codfw than on eqiad as we have less overall traffic on codfw compared to eqiad. This makes indexing more visible compared to reads. So far, no problem, the graph look bad, but this can be explained and does not show user impact. We also see weekly spikes on the 75%-ile of more-like queries (from a usual ~200-300[ms] to 300-400[ms]). More-like queries are the only queries sent to codfw. This is not yet worrisome, but is probably something we should keep an eye on and improve before it starts to be an issue. I have mostly no idea how those page rank updates work. Would it be possible to throttle the index update from those jobs? Increase the frequency of those update to reduce the impact? Idea welcomed... Guillaume [1] https://grafana-admin.wikimedia.org/dashboard/db/elasticsearch-percentiles -- Guillaume Lederrey Operations Engineer, Discovery Wikimedia Foundation

1 0

Discovery Weekly Update for the week starting 2016-07-04
by Chris Koerner 08 Jul '16

08 Jul '16

Howdy, Here is this week's update from the Discovery department. * A new quarter has begun. Discovery's goals for this new quarter have been posted here: Discovery 2016-17 Q1 Goals.[1] The results of our goals for the quarter just ended can be seen here: Discovery 2015-16 Q4 Goals. [2] * Search: sent out request for comment on the handling of question marks at the end of a query via email and village pumps [3] * Portal: sent out request for comment on the new page layout design via email (and soon to be released to village pumps) [4] * Geo bounded search keyword will roll forward to production next week. No map integration yet. [1] https://www.mediawiki.org/wiki/Wikimedia_Engineering/2016-17_Q1_Goals#Disco… [2] https://www.mediawiki.org/wiki/Wikimedia_Engineering/2015-16_Q4_Goals#Disco… [3] https://meta.wikimedia.org/wiki/Discovery/Handling_question_marks_in_search… [4] https://www.mediawiki.org/wiki/Wikipedia.org_updated_page_layout ---- The full update, and archive of past updates, can be found on Mediawiki.org: https://www.mediawiki.org/wiki/Discovery/Status_updates -- Yours, Chris Koerner Community Liaison - Discovery Wikimedia Foundation

1 0

Search: How should question marks be handled?
by Deborah Tankersley 07 Jul '16

07 Jul '16

Hello, The Wikimedia Foundation Discovery Search team <https://www.mediawiki.org/wiki/Wikimedia_Discovery/Search> has recently discovered <https://commons.wikimedia.org/wiki/File:From_Zero_to_Hero_-_Anticipating_Ze…> that search queries that end with a question mark (i.e. "*how old is Tom Cruise?*") can sometimes lead to zero (or unusable) results being returned. This zero result rate is one of the primary ways that the Search team determines how much our users are satisfied with their query results <http://discovery.wmflabs.org/metrics/#kpi_zero_results>. In order to improve the results that queries containing a questions mark, we'd like to change the behavior of the search on the backend. However, we would love to have feedback from the community to make sure that this is a smart change to do. If you are interested in how search works, or see this change as a possible disruption for your work, please learn more about this potential change <https://meta.wikimedia.org/wiki/Discovery/Handling_question_marks_in_search…> and let us know your thoughts. Cheers from the Discovery Search team! -- Deb Tankersley Product Manager, Discovery IRC: debt Wikimedia Foundation

1 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Discovery July 2016