Discovery March 2016

discovery@lists.wikimedia.org

27 participants
58 discussions

Dashboards down (but we have backups!)
by Mikhail Popov 23 Mar '16

23 Mar '16

There's an issue with Vagrant that's preventing our dashboards from running, but does not impact the data processing scripts we have so for now we can use copies of the dashboards running on our beta instance: http://discovery-beta.wmflabs.org/ Bryan Davis is working on resolving this issue ( https://phabricator.wikimedia.org/T130803). Apologies for the inconvenience. -- *Mikhail Popov* // Count Logula, Discovery <https://www.mediawiki.org/wiki/Wikimedia_Discovery> https://wikimediafoundation.org/ *Imagine a world in which every single human being can freely share in the **sum of all knowledge. That's our commitment.* Donate <https://donate.wikimedia.org/>.

2 1

Search Satisfaction/Success/Whatever Metrics and The Relevance Forge
by Erik Bernhardson 22 Mar '16

22 Mar '16

This thread started off list, but I'm hoping all of you watching along can help us along to brainstorm and improve search satisfaction. Note that these aren't all my thoughts, they are a conglomeration of thoughts (many copy/pasted from off-list emails) from Trey, David, Mikhail and I. That's also why this might not all read like one person wrote it. A few weeks ago I attended ElasticON and there was a good presentation about search satisfaction by Paul Nelson. One of the things he thought was incredibly important, that we had already been thinking about but hadn't moved forward enough on, was generating an Engine Score. This week Paul held an online webinar where he gave the same presentation but without such strict time constraints which Trey attended. You can find my summary of this presentation in last weeks email to this list, 'ElasticON notes' Some things of note: - He doesn't like the idea of golden corpora—but his idea is different from Trey's. He imagines a hand-selected set of "important" queries that find "important" documents. I don't like that either (at least not by itself). I always imagine a random selection of queries for a golden corpus. - He lumps BM25 in with TF/IDF and calls them ancient and unmotivated and from the 80s and 90s. David's convinced us that BM25 is a good thing to pursue. Of course, part of Search Technologies' purpose is to drum up business, so they can't say, "hey just use this in Elastic Search" or they'd be out of business. - He explains the mysterious K factor that got all this started in the first place. It controls how much weight changes far down the results list carry. It sounds like he might tune K based on the number of results for every query, but my question about that wasn't answered. In the demo, he's only pulling 25 results, which Erik's click-through data shows is probably enough. - He mentions that 25,000 "clicks" is a good enough sized set for measuring a score (and having random noise come out in the wash). Not clear if he meant 25K clicks, or 25K user sessions, since it was in the Q&A. David and Trey talked about this some, and Trey think's the idea of Paul's metric (Σ power(FACTOR, position) * isRelevant[user, searchResult[Q,position].DocID]) has a lot of appeal. It's based on clicks and user sessions, so we'd have to be able to capture all the relevant information and make it available somewhere to replay in Relevance Forge for assessment. We currently have a reasonable amount of clickthrough data collected from 0.5% of desktop search sessions that we can use for this task. There are some complications though because this is PII data and so has to be treated carefully. Mikhail's goal for our user satisfaction metric is to have a function that maps features including dwell time to user satisfaction ratio. (e.g., 10s = 20% likely to be satisfied, 10m = 94% likely to be satisfied, etc.). The predictive model is going to include a variety of features of varying predictive power, such as dwell time, clickthrough rate, engagement (scrolling), etc. One problem with the user satisfaction metric is that it isn't replayable. We can't re-run the queries in vitro and get data on what users think of the new results. However it does play into Nelson's idea, discussed in the paper and maybe in the video, of gradable relevance. Assigning a user satisfaction score to a given result would allow us to weight various clicks in his metric rather than treating them all as equal (though that works, too, if it's all you have). We need to build a system that we are able to tune in an effective way. As pointed by Trey cirrus does not allow us to tune the core similarity function params. David tend's to think that we need to replace our core similarity function with a new one that is suited for optimizations and BM25 allows it, there are certainly others and we could build our own. But the problem will be: How to tune these parameters in an effective way, with BM25 we will have 7 fields with 2 analyzers : 14 internal lucene fields. BM25 allows to tune 3 params : weight, k1, and b for each field - weight is likely to range between 0 and 1 with maybe 2 digits precision steps - k1 from 1 to 2 - b from 0 to 1 And I'm not talking about the query independent factors like popularity, pagerank & co that we may want to add. It's clear that we will have to tackle hard search performance problems... David tend's to think that we need to apply an optimization algorithm that will search for optimal combination according to an objective. David doesn't think we can run such optimization plan with A/B testing, it's why we need a way to replay a set of queries and compute various search engine scores. We don't know what's the best approach here: - extract the metrics from the search satisfaction schema that do not require user intervention (click and result position). - build our own set of queries with the tool Erik is building (temporary location: http://portal.wmflabs.org/search/index.php) -- Erik thinks we should do both, as they will give us completely different sets of information. The metrics about what our users are doing is a great source of information provides a good signal. The tool Erik is building comes at the problem from a different direction, sourcing search results from wiki/google/bing/ddg and getting humans to rate which results are relevant/not relevant on a scale of 1 to 4. This can be used with other algorithms to generate an independent score. Essentially I think the best Relevance Forge will output a multi-dimensional engine score and not just a single number. -- We should set up records of how this engine score changes over days, months, and longer, so we can see a rate of improvement (or lack thereof. But hopefully improvement :) And in the end will this (BM25 and/or searching with weights per field) work? - not sure, maybe the text features we have today are not relevant and we need to spend more time on extracting relevant text features from the mediawiki content model (https://phabricator.wikimedia.org/T128076) but we should be able to say : this field has no or only bad impact impact. The big picture would be: - Refactor cirrus in a way that everything is suited for optimization - search engine score: the objective (Erik added it as goal) - Optimization algorithm to search/tune the system params. Trey has prior experience working within optimization frameworks. Mikhail also has relevant machine learning experience. - A/B testing with advanced metrics to confirm that the optimization found good combination With a framework like that we could spend more time on big impact text features (wikitext, synonyms, spelling correction ...). But yes it's a huge engineering task with a lot of challenges :/ It's also

4 5

Wikidata Query Service (WQDS) regular deployment window
by Guillaume Lederrey 22 Mar '16

22 Mar '16

Hello! After discussion with Stas, we want to have a regular deployment window for Wikidata Query Service. This should help give better visibility on when new version arrives and help track issues with those new versions. I will take care of the deployments (with Stas' support, of course). The deployment window is: every Monday, from 7pm CET (10am PST - 5pm UTC) starting from Monday April 11th. Let me know if you have any question or if you know of another place where I should publicize this deployment window. Take care, Guillaume -- Guillaume Lederrey Operations Engineer, Discovery Wikimedia Foundation

2 1

Re: [discovery] [Ops] mini-hackathon
by Guillaume Lederrey 22 Mar '16

22 Mar '16

Thanks for the list! It is a long list... I'm sure there are a few interesting tasks for our purpose in there. It will take some time reading them... On Tue, Mar 22, 2016 at 4:21 PM, Filippo Giunchedi <fgiunchedi(a)wikimedia.org> wrote: > Hi Guillaume, > I think it is a nice idea! For #operations specifically I don't think we > have an explicit tag for this kind of tasks. > > Though a list of Low/Lowest #operations tasks sorted by date updated could > help/guide the selection, > https://phabricator.wikimedia.org/maniphest/query/MEKdyfITov4i/#R > > HTH, > filippo > > > On Tue, Mar 22, 2016 at 1:21 PM, Guillaume Lederrey > <glederrey(a)wikimedia.org> wrote: >> >> Hello! >> >> I can't make it to the Wikimedia Hackathon [1]. I like the concept and >> I have a few friends and former coworkers who would be interested in >> sending some contributions to Wikipedia. So we decided to spend the >> day together on Saturday April 9th and see if we can find tasks that >> need some love. I have a small list, but if you have any additional >> idea of something easy to work on, send them my way. The goal is as >> much to expose my friends to what we do as to contribute meaningful >> improvements. >> >> I'm looking for tasks that are mainly Ops oriented (that's the >> background for most of us, even if some have experience in PHP, >> Pyhton, and a few other area). Tasks that are simple enough to be done >> in a day. Tasks that do not require understanding the complete WMF >> architecture to get started. >> >> Let me know... >> >> Guillaume >> >> >> [1] https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2016 >> >> >> -- >> Guillaume Lederrey >> Operations Engineer, Discovery >> Wikimedia Foundation >> >> _______________________________________________ >> Ops mailing list >> Ops(a)lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/ops > > -- Guillaume Lederrey Operations Engineer, Discovery Wikimedia Foundation

1 0

mini-hackathon
by Guillaume Lederrey 22 Mar '16

22 Mar '16

Hello! I can't make it to the Wikimedia Hackathon [1]. I like the concept and I have a few friends and former coworkers who would be interested in sending some contributions to Wikipedia. So we decided to spend the day together on Saturday April 9th and see if we can find tasks that need some love. I have a small list, but if you have any additional idea of something easy to work on, send them my way. The goal is as much to expose my friends to what we do as to contribute meaningful improvements. I'm looking for tasks that are mainly Ops oriented (that's the background for most of us, even if some have experience in PHP, Pyhton, and a few other area). Tasks that are simple enough to be done in a day. Tasks that do not require understanding the complete WMF architecture to get started. Let me know... Guillaume [1] https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2016 -- Guillaume Lederrey Operations Engineer, Discovery Wikimedia Foundation

1 0

Re: [discovery] WDQS deployments
by Guillaume Lederrey 22 Mar '16

22 Mar '16

Just wondering: why do you want a regular deployment schedule? I'm all for it, and I can see a few reasons to support this decision (if you start delegating deployment, it is easier on a regular schedule would be my main reason). In general, I'd prefer to have this deployment window early enough in my day so that I have time to fix it if I screw up. Ideally, 10-11am UTC Monday, Tuesday of Thursday. It might make sense for you to be available (at least for the first few deployments), so it might be better to have it later (3-4pm UTC?). For the afternoon, any day could work... On Fri, Mar 18, 2016 at 7:05 PM, Stas Malyshev <smalyshev(a)wikimedia.org> wrote: > Hi! > > I thought a bit more about WDQS deployments and I think it's time to > have regular deployment schedule, at least while we're still in > development. So I though about having weekly deployment point where most > of the non-urgent things will be deployed. We could of course deploy > urgent stuff anytime, but I think having a scheduled point would be also > good. > > Since I'd like you to do this in most cases, I wonder what day/time > would work best for you? > > Thanks, > -- > Stas Malyshev > smalyshev(a)wikimedia.org -- Guillaume Lederrey Operations Engineer, Discovery Wikimedia Foundation

1 0

Discovery Weekly Update for the week starting 2016-03-14
by Chris Koerner 21 Mar '16

21 Mar '16

Hello, Here is the Discovery department's weekly status update. * The completion suggester left beta and is now the default search-as-you-type for all wikis (except Wikidata). ** http://blog.wikimedia.org/2016/03/17/completion-suggester-find-what-you-nee… * Last week we enabled Kartographer extension for Wikivoyage sites, allowing users to add maps to wiki pages without any additional wmf labs and JavaScript tricks. ** A demo of Kartographer and VisualEditor integration can be found here: http://vem3.wmflabs.org/wiki/Main_Page This is our second week summarizing our work in this way and our first week sharing it with wikitech-l. Feedback and suggestions are welcome. Read the full update at the following link. https://www.mediawiki.org/wiki/Discovery/Status_updates/2016_03_18 -- Yours, Chris Koerner Community Liaison - Discovery Wikimedia Foundation

1 0

Knowledge discovery: brainstorm & research
by Luigi Assom 19 Mar '16

19 Mar '16

Hello everybody, in late 2015 I was asking for tech info to document myself for a project about knowledge discovery I was implementing. I am now happy to have published it and I will be happy to hear for comments to improve utility. http://nifty.works is a side project I built for brainstorming and help in research of factual knowledge. I used a data model I co-authored during my first entrepreneurial experience in Italy, and that I also experimented in a mobile app. Here, I wanted to bring the things forward and make something publicly available over the web! This project is about experimental interaction to traverse knowledge graphs. I implemented this functionalities: - query the context <http://nifty.works/about/PAyD8rYKqxEM47Vk/machine-learning#topic-article> of a topic - test for suggesting logical paths <http://nifty.works/path/> in between of topics (pathway discovery) - a gamified <http://nifty.works/play/> version of knowledge networks, as test to engage people in learning (it s still a nerdy prototype I know) - edit maps (selectively add or remove nodes, available if signed up) - sharing maps - pairing wikipedia articles with entities in the knowledge graph (hyperlinks in article will trigger corresponding topics) For full-text search, I am now using a very very basic indexing in elasticsearch; you can test a difference with wikipedia indexing by adding test=ab_fts parameter (I don't know which is better). The nice thing is that I could localise the site for wikipedias not in English, and uncover cultural knowledge strongly related to the language they are written. I think it may be interesting for there are languages spoken by millions of people but poorly addressed by recommender systems (at my knowledge at least). Thinking about swedish, thai, chinese, dutch, hispanic... to name a few! I would like to hear from your comments to improve it and help in making a useful service - comments in UX/UI, tech improvements, communication, functionalities as well as opportunities to sustain it (crowd-funding?) are most welcome! Thank you for time and for your help in answering a lot of questions in the forum! I hope to give back a bit here. Cheers, Luigi

1 0

High priority task added to search sprint: cross-namespace redirects
by Dan Garry 18 Mar '16

18 Mar '16

Hello! I've added T130353 <https://phabricator.wikimedia.org/T130353> to the search sprint. It's not quite an "Unbreak now!" priority task, but it's close. It seems that our fix for cross-namespace redirects has created a bit of a regression which David noticed; the redirects are scored too highly. This is a big problem for those queries, but the set of queries it actually affects is very, very, small. We should get this fixed ASAP, so I marked it as high priority and added it to the top of the sprint. Thanks! Dan -- Dan Garry Lead Product Manager, Discovery Wikimedia Foundation

1 1

Still confused about BM25 vs TF*IDF for search relevance?
by Tomasz Finc 18 Mar '16

18 Mar '16

Here is a good write up that breaks it down http://opensourceconnections.com/blog/2015/10/16/bm25-the-next-generation-o… Given the recent threads about exploring BM25, i thought this was a good introduction to the difference between the two. Cheers. --tomasz

4 3

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Discovery March 2016