Wikimedia-search June 2015

wikimedia-search@lists.wikimedia.org

20 participants
17 discussions

Predicting the load of EventLogging on Wikipedia Portal
by James Douglas 03 Jun '15

03 Jun '15

Does anyone know where I can find info about the current load on www.wikipedia.org? I'd like to set some reasonable sampling rates for the EventLogging instrumentation that's going to be in place soon. If the Portal gets a zillion hits per second, then 1/1000 of that is probably too high. On the other hand, if it gets five hits per day, then 1/1000 of that is rather too low.

2 2

New proposal for recurring meetings
by Kevin Smith 03 Jun '15

03 Jun '15

Based on input from the developers, Wes, and Dan, and others, here is a new proposal for a new slate of Search & Discovery vertical recurring meetings[1]. These changes really can't take full effect until the first week of June, due to the Hackathon and related travel. But hopefully we can agree on them now, and start to get them scheduled. After trying the new scheme out for a couple weeks, we'll have a retrospective so we can adjust as needed. The days-of-week proposed here are not set in stone, but they seemed logical. They came out of an attempt to stack meetings (as requested by Nik and others), combined with a sense that Monday and Friday are not ideal days for certain types of meetings, plus a recognition that finding huge blocks of time on our schedules may be challenging. Basically, if you are a developer, you'll have twice-weekly 15-minute sub-team standups, plus the weekly 25-minute full-vertical checkin. Every 2-4 weeks, you might be involved in a retrospective and/or showcase. That's a total of about 1 hour of meetings per week. Dan and I crave meetings[2], so we'll enjoy about 6.5 hours per week. The "leads" among you will have meeting loads somewhere in between those extremes. Any concerns? Violent outbursts? Purrs of contentment? [1] https://www.mediawiki.org/wiki/Search_and_Discovery/Process#Recurring_Meeti… [2] Not really, but we'll pretend we do Kevin Smith Agile Coach Wikimedia Foundation *Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment. Help us make it a reality.*

5 4

wdqs beta updater performance data
by Stas Malyshev 02 Jun '15

02 Jun '15

Hi! Just in case anybody finds it interesting/useful, I've made a quick performance analysis on the update service we're running on wdqs-beta, in order to catch up with the wikidata updates: https://www.mediawiki.org/wiki/Wikibase/Indexing/Updater_performance_analys… May be useful for evaluation which hardware we may need if we want for updates to coexist with production loads. -- Stas Malyshev smalyshev(a)wikimedia.org

2 3

finding related pages [was Re: Reading tech sessions at hackathon]
by S Page 02 Jun '15

02 Jun '15

Summary: * CirrusSearch has "morelike:*PageName*", who knew? * I sense a developer article brewing, "Finding related content" AIUI, the Wikipedia mobile apps' "Read more" section just performs a full-text search (API [1] ) for the current page title (Android source [2]). Joaquin's nfity demo http://chimeces.com/webkipedia/ 's "Related pages" section calls the GettingStarted extension's gettingstartedgetpages API module [3] with gsgptaskname=morelike . This is implemented by GettingStarted/MoreLikePageSuggester.php... and it seems this just makes a search query for srsearch=morelike:Australia . Who knew Cirrus search had a "morelike:" keyword? It's not in the enwiki search help, but it is in the Cirrus search help [4]. I'm not sure if there's any reason to interpose gettingstartedgetpages instead of querying search directly for morelike:*pagetitle*, it might cache stuff in Redis. The mobile apps might get better "Read more" suggestions using one of these. There's also a srwhat=suggestion, I don't know if that helps getting related pages. I'll be updating https://www.mediawiki.org/wiki/API:Search_and_discovery with this, and it seems article-worthy. Cheers, hope this helps someone. [1] https://en.wikipedia.org/w/api.php?action=help&modules=query%2Bsearch [2] https://github.com/wikimedia/apps-android-wikipedia/blob/de0b8b579f5030f684… [3] https://en.wikipedia.org/w/api.php?action=help&modules=query%2Bgettingstart… [4] https://www.mediawiki.org/wiki/Help:CirrusSearch#Special_prefixes On Fri, May 29, 2015 at 2:18 AM, Joaquin Oltra Hernandez < jhernandez(a)wikimedia.org> wrote: > Sorry forgot to link it: https://github.com/joakin/webkipedia > > Matt Flaschen told me about the gettingstarted 'morelike' mode for other > purposes, but it fitted perfectly my purposes for this reading app. They > developed it on the Growth team about a year ago, but the experiment wasn't > successful so the API has been there dormant and unused for a lot of time > (works pretty well!). > > About the content I'm fetching for the articles, I'm using the extracts > with the exintro option, and embedding the html ( > https://github.com/joakin/webkipedia/blob/master/lib/api/article.js). The > idea would be to have a 'Read more' that would show the full article I > guess. > > The president's chest is pretty good too :D > http://chimeces.com/webkipedia/#/wiki/Barack_Obama > > On Fri, May 29, 2015 at 4:35 AM, S Page <spage(a)wikimedia.org> wrote: > >> (Cc'ing James Douglas, who's also developing API playground code.) >> >> On Thu, May 28, 2015 at 2:52 AM, Joaquin Oltra Hernandez < >> jhernandez(a)wikimedia.org> wrote: >> >>> S, for getting started quickly, I set up a JS web app completely >>> standalone with some basic infrastructure (libraries for calling the api, >>> rendering pipeline of JS views, url routing) so that interested people >>> could just get quickly to render a view within the app and do interesting >>> stuff querying the API. We were also open to just doing a plain html file >>> with some JS and CSS, or a codepen/jsbin style would have worked too. >>> >>> Here's the demo of the lite wikipedia webapp I worked on: >>> http://chimeces.com/webkipedia/ >>> >> >> That's lovely! It's what API developers develop when they develop. >> * Where's the source? >> * I had no idea the gettingstartedgetpages would give you related pages, >> so obscure! >> * I guess RESTBase has no mode that strips the citations and such, or >> gives you just the opening section (prop=extracts & exintro=) >> * Nice closeup of the great man's chest :-), >> http://chimeces.com/webkipedia/#/wiki/Albert_Einstein >> >> -- >> =S Page WMF Tech writer >> > > -- =S Page WMF Tech writer

8 10

Can we drop support for running Cirrus without plugins
by Nikolas Everett 01 Jun '15

01 Jun '15

In production we run CirrusSearch against Elasticsearch with a bunch of plugins installed - wikimedia-extra and the experimental highlighter. But Cirrus is capable of running without them. It'd be simpler for us if we simply mandated that they be installed. I created https://phabricator.wikimedia.org/T101029 to talk about it by my current feeling is that it probably wouldn't be a huge hardship to make those plugins required. Opinions? Nik

2 5

Event logging
by Kevin Smith 01 Jun '15

01 Jun '15

At two recent meetings, several topics around event logging came up. Specifically, how/where our different systems log (or should log) their events. The topics I'm aware of are: 1. Should WDQS use our standard event logging system? I think the answer was yes. Do we have or need a phab ticket to represent that? 2. Should Our new maps work use our standard event logging system? 3. Should Cirrus/Elastic use our standard event logging system? 4. Should Cirrus/Elastic log to Kibana? What are the privacy implications there? It's probably best to reply to only one topic at a time, so this thread gets forked. Ideally, please change the subject line to reflect the specific topic you are replying to. Discuss! Kevin Smith Agile Coach Wikimedia Foundation *Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment. Help us make it a reality.*

6 5

Event logging caution
by Kevin Smith 01 Jun '15

01 Jun '15

As was mentioned in scrum-of-scrums on May 20, the analytics team is running some Visual Editor A/B tests from now through June 11. As a result, analytics has requested that everyone "reach out to us if you're planning any changes in your Event Logging use" during that time. Hopefully this won't mess up what we're trying to do for the next couple weeks. Kevin Smith Agile Coach Wikimedia Foundation *Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment. Help us make it a reality.*

3 2

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Wikimedia-search June 2015