Discovery September 2015

discovery@lists.wikimedia.org

14 participants
26 discussions

[Wikimedia-search] Update on what's next for tackling the zero results rate goal
by Dan Garry 15 Sep '15

15 Sep '15

We've had a lot of ideas floating around over the past week or two about what to do in the final weeks of the quarter towards tackling the zero results rate problem. This morning the engineering team had a 25 minute meeting to coalesce these ideas into a plan and sync up. We took notes in this etherpad: https://etherpad.wikimedia.org/p/nextupforsearch The short summary of the meeting was a test which tries relaxing the AND operator for common terms in queries would be tried. This should improve natural language queries by reducing how important words like "the", "a", etc. are to the query, thus focussing in on the essence of the query. This also means that pages that don't contain these common terms, but only contain the core terms, could now be returned in results. This work is tracked in the following series of tasks, the structure of which should now be very familiar to you all: - T112178 <https://phabricator.wikimedia.org/T112178>: Relax 'AND' operator with the common term query - T112581 <https://phabricator.wikimedia.org/T112581>: Run A/B test on relaxing AND operator for search (test starting on 2015-09-22) - T112582 <https://phabricator.wikimedia.org/T112582>: Validate data for AND operator A/B test (on or after 2015-09-23) - T112583 <https://phabricator.wikimedia.org/T112583>: Analyse results of AND operator A/B test (on or after 2015-09-29) What this does mean is that we've probably got a bunch of tests lined up to start at the same time. In principle this isn't a problem, but if the tests overlap it can cause difficulties. This will be discussed in tomorrow's analysis meeting. As always, if there are any questions, let me know! Thanks, Dan -- Dan Garry Lead Product Manager, Discovery Wikimedia Foundation

1 0

[Wikimedia-search] Maps and KPI's
by Kevin Smith 15 Sep '15

15 Sep '15

Notes from this afternoon's Maps and KPI's meeting have been posted: https://www.mediawiki.org/wiki/Discovery/Maps_and_KPIs_2015-09-14 Those who attended can feel free to correct anything I got wrong. Kevin Smith Agile Coach, Wikimedia Foundation

1 0

[Wikimedia-search] Some Results of Cross-Languae Wiki Searching
by Trey Jones 11 Sep '15

11 Sep '15

Hi Everyone, I've done further analysis on the ~1400 zero-results non-DOI query corpus, looking at the effects of perfect (or at least human-level) language detection, and the effects of running all queries against many wikis. In summary: > More that 85% of failed queries to enwiki are in English, or are not in a > particular language. Only about 35% of non-English queries in some language > (<4.5% of zero-results queries), if funneled to the right language wiki, > get any results. > The types of queries most likely to get results from the non-enwikis are > names and queries in English. There are lots of English words in > non-English wikis (enough that they can do decent spelling correction!), > and the idiosyncrasies of language processing on other wikis allow certain > classes of typos in names and English words to match, or the typos happen > to exist uncorrected in the non-enwiki. > Perhaps a better approach to handling non-English queries is user-specified > alternate languages. More details: https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Cross_Language_Wiki_… —Trey Trey Jones Software Engineer, Discovery Wikimedia Foundation

1 0

[Wikimedia-search] Congratulations WDQS team
by Mikhail Popov 11 Sep '15

11 Sep '15

Yinz are popular now! Cheers~ -- *Mikhail Popov* // Data Analyst, Discovery <https://www.mediawiki.org/wiki/Wikimedia_Discovery> https://wikimediafoundation.org/ *Imagine a world in which every single human being can freely share in the **sum of all knowledge. That's our commitment.* Donate <https://donate.wikimedia.org/>.

4 3

[Wikimedia-search] Smoothing in dashboard(s)
by Mikhail Popov 10 Sep '15

10 Sep '15

Hi all, You can now apply various smoothers to the data on the Search Metrics dashboard (e.g. http://searchdata.wmflabs.org/metrics/#failure_rate). Smoothing can be applied globally or on a per-plot basis. Hopefully yinz will find this new feature useful! Cheers~ -- *Mikhail Popov* // Data Analyst, Discovery <https://www.mediawiki.org/wiki/Wikimedia_Discovery> https://wikimediafoundation.org/ *Imagine a world in which every single human being can freely share in the **sum of all knowledge. That's our commitment.* Donate <https://donate.wikimedia.org/>.

3 5

[Wikimedia-search] Temporary dashboard data outage
by Oliver Keyes 10 Sep '15

10 Sep '15

Hey all, We currently have a data outage on our dashboards - they display, but we're missing the last few days. The good news is that we know exactly what happened here; as part of our work to (amusingly enough) make the data pipeline here more robust and standardised, we switched all of our data retrieval scripts over to a new project and repository (previously they'd lived in the repo for the dashboard they referred to, which doesn't scale). A bug in the shell script that tied them all together meant none of them ran - and of course we switched everything over immediately before a long weekend. Doh ;p. The original bug has a patchset in awaiting review, and as soon as it's +2d we're going to begin backfilling the datasets. You can follow our progress on that at https://phabricator.wikimedia.org/T111749 Thanks, -- Oliver Keyes Count Logula Wikimedia Foundation

1 1

[Wikimedia-search] Analysis of ElasticSearch language detection plugin against enwiki zero-results queries
by Trey Jones 08 Sep '15

08 Sep '15

I've written up my analysis of the ElasticSearch language detection plugin that Erik recently enabled: https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Language_Detection_E… The short version is that it really likes Romanian (and Italian, and has a bit of a thing for French), and precision on English is great, but recall is poor (probably because of all the typos and other crap that go to enwiki that is still technically "English"). Chinese and Arabic are good. I think we could do better, and we should evaluate (a) other language detectors and (b) the effect of a good language detector on zero results rate (i.e., simulate sending queries to the right place and see how much of a difference it makes). Moderately pretty pictures included. —Trey Trey Jones Software Engineer, Discovery Wikimedia Foundation

4 7

[Wikimedia-search] Fwd: Announcing the release of the Wikidata Query Service
by Dan Garry 08 Sep '15

08 Sep '15

Cross-posting from wikidata-l. ---------- Forwarded message ---------- From: Dan Garry <dgarry(a)wikimedia.org> Date: 7 September 2015 at 15:29 Subject: Announcing the release of the Wikidata Query Service To: wikidata-l(a)lists.wikimedia.org The Discovery Department at the Wikimedia Foundation is pleased to announce the release of the Wikidata Query Service <https://www.mediawiki.org/wiki/Wikidata_query_service>! You can find the interface for the service at https://query.wikidata.org. The Wikidata Query Service is designed to let users run queries on the data contained in Wikidata. The service uses SPARQL <https://en.wikipedia.org/wiki/SPARQL> as the query language. You can see some example queries in the user manual <https://www.mediawiki.org/wiki/Wikidata_query_service/User_Manual>. Right now, the service is still in beta. This means that our goal <https://www.mediawiki.org/wiki/Wikimedia_Engineering/2015-16_Q2_Goals#Wikid…> is to monitor of the service usage and collect feedback about what people think should be next. To do that, we've created the Wikidata Query Service dashboard <https://searchdata.wmflabs.org/wdqs/> to track usage of the service, and we're in the process <https://phabricator.wikimedia.org/T111403> of setting up a feedback mechanism for users of the service. Once we've got monitored the usage of the service for a while and got user feedback, we'll decide on what's next for development of the service. If you have any feedback, suggestions, or comments, please do send an email to the Discovery Department's public mailing list, wikimedia-search(a)lists.wikimedia.org. Thanks, Dan -- Dan Garry Lead Product Manager, Discovery Wikimedia Foundation

1 0

[Wikimedia-search] WDQS basic usage dashboard is live
by Mikhail Popov 05 Sep '15

05 Sep '15

Hi all, If you've been to http://searchdata.wmflabs.org/ recently, you would have noticed that we have a new dashboard (and a work-in-progress facelift). Introducing… The Wikidata Query Service dashboard: http://searchdata.wmflabs.org/wdqs/ ! Yay! Hopefully this will help the WDQS team as they continue their work on that awesome project. As with the Search Metrics dashboard <http://searchdata.wmflabs.org/metrics/>, we welcome constructive criticism and feature suggestions with an open mind. One suggestion that I'm going to look into is finding out how many people who visited the homepage ended up submitting a query. We also have failure stats, so those will be showing up in the near future. Thank you, Mikhail // Junior Swifty -- *Mikhail Popov* // Data Analyst, The Swifties, Discovery <https://www.mediawiki.org/wiki/Wikimedia_Discovery> https://wikimediafoundation.org/ *Imagine a world in which every single human being can freely share in the **sum of all knowledge. That's our commitment.* Donate <https://donate.wikimedia.org/>.

2 1

[Wikimedia-search] Discovery plans for Gerrit cleanup day
by Kevin Smith 05 Sep '15

05 Sep '15

A few of us met this morning, to ensure that we have a plan for everyone in the department to be productive on Gerrit Cleanup Day (Wednesday 2015-09-23). We think most folks are accounted for, and came up with ideas for others. I added Gerrit Cleanup Day as an upcoming event on our wiki page[1], and created a page with the proposed plan[2] that came out of this morning's meeting. Action items prior to the day (mostly listing them here for my own convenience): - Erik will coordinate with the developers to help them be productive - Kevin will ask Quim to try to get David paired up with someone in his timezone (maybe Trey also) - Kevin will talk to Oliver, who can guide Mikhail - Kevin will get a gerrit account, to be able to +1/-1 - Kevin will organize some kind of kickoff meeting the morning of the big day - Kevin will check with Moiz - Kevin will check with Wes to see what he is planning [1] https://www.mediawiki.org/wiki/Wikimedia_Discovery#Upcoming_events [2] https://www.mediawiki.org/wiki/Discovery_plans_for_gerrit_cleanup_day_2015 Kevin Smith Agile Coach, Wikimedia Foundation

1 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Discovery September 2015