Discovery February 2016

discovery@lists.wikimedia.org

29 participants
34 discussions

First two weeks as part of the team
by Guillaume Lederrey 19 Feb '16

19 Feb '16

Hello team! I've been with you for 2 weeks. As Tomasz suggested, I might be the right moment to list what I have discovered so far. I'm trying to point out the difference I see from what I'm used to, not to judge (yet), I don't have enough context for that... Still I am biased by my previous experience and that will show up below... I need to unlearn a few things. It seems that last week, each and every decision I took has been challenged by someone (usually for good reasons). Things that I take for granted as "the right way (tm)" are actually done in a fairly different way here. Examples: == Distributed team == ** my belief ** It is hard to be part of a distributed team. ** what happens at WMF ** You've all been extremely welcoming! Having the chance to see you in person in SF in January definitely helps (it is a bit harder for me to find my way around the Ops team, probably in part because I have not had the chance to meet them in person). The fact that we also have kind of informal conversations (IRC, unmeeting, ...) helps to belong. It seems to me that you've made all the necessary effort to include me. ** comments ** I still have to make an effort to get in touch when I need to. I'm so used to coffee breaks where you take time to discuss whatever needs discussing. IRC feels more intrusive as it is a continuous flow, not a set time where you go have coffee and do the needful... I also need to learn to ignore IRC interruptions. It still feels that I might miss something important and that there is just too much information sometimes... == Versionning == **my belief ** anything deployed must have a version number ** what happens at WMF ** * deployments on labs are pretty much free-form, cherry pick whatever you want on puppetmaster * deployments on prod seems to have version numbers at least for mediawiki code, puppet code is deployed directly from production branch ** comments ** Having clear version numbers implies having a conscious decision of creating a version, potentially with the appropriate checks of the content of that version, additional testing. It allows to have a clear separation between creating a version and promoting it to production. Not having versions everywhere allows for more flexibility and puts responsibility of making the right choices more on the people than on the process. Probably a good thing if you have smart enough people (and WMF seems to have a pretty smart crowd). Having a shared git repository on deployment-puppetmaster scares the hell out of me! I'm so used to preparing anything I want to push locally and then just applying a specific tag / version... == Cherry-picking == ** my belief ** cherry-picking is bad and should be used only as a last resort solution ** what happens at WMF ** * cherry picking is the norm * this seems to be influenced by Gerrit, which promotes cherry picking and single commits patches ** comment ** I'm all for rewriting git history to make it more readable, to help tell the story of what is happening to the code. I think that branches and merges are a good tool for that. Cherry picking fixes from one branch to the next leaves a lot of opportunities to forget one. Merging helps tell the story of "those are all the fixes done on branch X, I've applied them on branch X+1". Also, I'm not a huge fan of gerrit idea of changes being a single commit. Having a coherent change split in multiple phases make sense to me (for example: 1) preliminary refactoring, 2) my actual work 3) some clean up I did along the way). I need to dig deeper into topic branchs and how they integrate with gerrit (yes, I am brand new to gerrit). It also seems that all this cherry picking creates much more flexibility (I can take any commit and apply it anywhere). Again, giving control back to the human and not to the tool. == Stupid code is good code == I need to write a blog post about this one, but like Kernighan said "Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?" [1]. Looking at our code (mainly puppet at the moment), I think there are quite a few places where we do the smart thing, when stupid would be sufficient. That's probably the cost of hiring smart people ... == A few random points == * We have an incredible amount of documentation. It is easy to read (I've been drawn into it and lost much time). It is also outdated in some place (documentation always is). * so many different ways to deploy (puppet, trebuchet, salt, manual stuff, ...) * I still have not found a global architecture schema (something like a high level component or deplyoment diagram). But I have never seen any company having those... [1]: https://en.wikiquote.org/wiki/Brian_Kernighan

4 5

Re: [discovery] [Ops] First two weeks as part of the team
by Guillaume Lederrey 19 Feb '16

19 Feb '16

Luca: I love what you have done with this place! Yes, I need to do my part of that documentation work and I have not done it yet... I'll see what I can do... On Fri, Feb 19, 2016 at 9:50 AM, Luca Toscano <ltoscano(a)wikimedia.org> wrote: > Hello! > > On Thu, Feb 18, 2016 at 10:37 AM, Giuseppe Lavagetto > <glavagetto(a)wikimedia.org> wrote: >> >> [X-posting to ops as this discussion is relevant there too] >> >> On Wed, Feb 17, 2016 at 5:53 PM, Erik Bernhardson >> <ebernhardson(a)wikimedia.org> wrote: >> > On Feb 17, 2016 1:50 AM, "Guillaume Lederrey" <glederrey(a)wikimedia.org> >> > wrote: >> >> >> * I still have not found a global architecture schema (something like >> >> a high level component or deplyoment diagram). But I have never seen >> >> any company having those... >> > >> > Pretty sure one doesn't exist :( >> >> Luca (the new analytics opsen) has started to work on >> https://wikitech.wikimedia.org/wiki/File:Infrastructure_overview.png >> >> I asked him to share the sources for it so that everyone can improve it. >> >> Also, if you need some oral history, just ask opsens and we'll be >> happy to give you an overview of how things work :) > > > > I will try to update the schema with more up to date information and I'll > also share the source for draw.io with it (probably next week). There is a > lot of useful docs related to architecture, each one focusing on different > aspects (and points in time!): > > - https://wikitech.wikimedia.org/wiki/Clusters, General overview > - https://wikitech.wikimedia.org/wiki/LVS_and_Varnish (welcome to > Wonderland) > - https://wikitech.wikimedia.org/wiki/Network_design (welcome to Wonderland > part two) > - > https://wikitech.wikimedia.org/wiki/LVS_and_Varnish#/media/File:Wikipedia_w… > - puppet site.pp! > - ... > > In my opinion new opsens should fill the gaps and/or update the outdated > pages, it is a good way to meet new people :) > > Luca > > > > _______________________________________________ > Ops mailing list > Ops(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/ops >

1 0

Organisation of our work in the discovery team (mainly search)
by Guillaume Lederrey 19 Feb '16

19 Feb '16

Short questions about how we organize our work... I'm using mainly https://phabricator.wikimedia.org/tag/discovery-search-sprint/ and https://phabricator.wikimedia.org/project/board/1227/ to track what I'm doing. I have a few things I do not know where to put: T109101 is actually done, but waiting for the elasticsearch upgrade to push it at the same time to prod (to do only one restart). This is still WiP, so I dont want to close it, but it is not really in need of review either. Any idea? MrG

3 4

Unsubscribe
by Jmichscott＠aol.com 18 Feb '16

18 Feb '16

I signed on to receive updates regarding the process of launching new Search on Wikimedia, but find that the list serve content is not what I assumed it would be. So please Unsubscribe me. I assumed I had made that change via prior alteration of my subscription at the site, but I am today and yesterday still receiving further mailings. Thank you.

3 2

USA elections in real time - as viewed by Wikipedia users
by Yuri Astrakhan 18 Feb '16

18 Feb '16

https://meta.wikimedia.org/wiki/User:Yurik/US_Politics_Real_Time Thanks Dario Taraborelli for the idea.

4 5

Re: [discovery] [Wikimedia-l] USA elections in real time - as viewed by Wikipedia users
by Yuri Astrakhan 18 Feb '16

18 Feb '16

Gergo, good to know, thanks. Graph extension itself does not know how long the data is valid - it simply gets a URL from which to get the pageviews (or any other) data. At this point, only the person who writes the graph template knows how long its valid for. We could add an extra attribute to the graph, e.g. <graph refresh="60"> (number of minutes), to let graph extension update cache expiry. On Thu, Feb 18, 2016 at 11:04 PM, Gergo Tisza <gtisza(a)wikimedia.org> wrote: > On Thu, Feb 18, 2016 at 9:02 AM, Yuri Astrakhan <yastrakhan(a)wikimedia.org> > wrote: > > > It will be updated whenever the page containing the template is > > re-generated (e.g. the page is changed, or someone does a null-save). I > > heard that every page is forcefully regenerated if its older than 30 > days, > > > > Yes, and extension tags embedded in the page can reduce that, so if the > graph has a way of knowing how long the data will be valid, it can tell > that to the parser via ParserOutput::updateCacheExpiry. > As a hacky manual workaround, you can put <div > style="display:none">{{CURRENTHOUR}}</div> into the page to force hourly > refresh. > _______________________________________________ > Wikimedia-l mailing list, guidelines at: > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines > New messages to: Wikimedia-l(a)lists.wikimedia.org > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> >

3 3

Wikipedia.org Portal and JavaScript usage
by Deborah Tankersley 18 Feb '16

18 Feb '16

Hello all, The Discovery team recently updated the Wikipedia.org <http://wikipedia.org/> portal page by moving all inline JavaScript into a separate file in order to analyze the amount of incoming traffic that use JavaScript-friendly browsers. This information is very important to our team, as we endeavor to make the portal page more interesting and user friendly for all our visitors. The results were very encouraging - here's the executive summary from the analysis document <https://commons.wikimedia.org/wiki/File:Analysis_of_Wikipedia_Portal_Traffi…> that can also be accessed from the Wikimedia Discovery page <https://www.mediawiki.org/wiki/Wikimedia_Discovery#Wikipedia.org_Portal_Page> : *On 5 February 2016 we deployed a patch to the Wikipedia Portal moving the inline JavaScript into a separate file, which enabled us to finally measure the proportion of traffic with JS support separate from the overall traffic to the Portal. This report covers logs of HTTP request from 5 Feb to 10 Feb, 2016. * *Overall, 93% of the requests made to the Wikipedia Portal have JS support. However, a large component (45%) of this overall percentage is accounted by traffic from United States, which has an overall proportion of 96%. The remaining 55% of the traffic from 234 other countries show a lot of variation in JS support, with 86.5% on average. * *We also performed an analysis of browser usage and learned that approx. 75% of the traffic comes from users with relatively modern browsers, with a few exceptions such as Internet Explorer 8 (3.2% of total traffic). Of those 17 browsers, 14 had populations with more than 93% JS support. That is, less than 7% of those browsers’ users had turned off JavaScript for privacy/bandwidth/other reasons. Interestingly, only 80% of Opera Mini 7 traffic and 60% of Android 4 / Chrome Mobile 30 traffic had JS support.* Please let us know if there are any concerns or questions! Cheers, Deb -- Deb Tankersley Product Manager, Discovery Wikimedia Foundation

1 0

Current tasks
by Oliver Keyes 18 Feb '16

18 Feb '16

Heyo! As an effort to be more transparent about what I, at least, am working on[0] I've resolved to send the sort of notes I'd usually send to the standup, on what I am working on, have worked on, and will be working on, to the public list as well. That way they can be referred back to and the community can ask questions. So! I've been out for a week and a bit, so the "worked on" is a bit sparse. But this morning I: * Completed code review for the data collection code to get the zero results rate by project (https://phabricator.wikimedia.org/T126244), dashboarding for the same (https://phabricator.wikimedia.org/T110590), and displaying browser usage on the portal dashboards (https://phabricator.wikimedia.org/T124827). These will all hopefully be deployed today (Mikhail will send out a distinct email when that happens) * Poked Legal again as part of our ongoing efforts to make public our guidelines around data collection and storage (https://phabricator.wikimedia.org/T123673) I will be: * Working on collecting data on the number of pageviews we get on the portal, and visualising the same (https://phabricator.wikimedia.org/T125737) * Getting back into the loop and finding out what's happening with our new A/B tests. Thanks! -O [0] Phabricator exists but is fragmented -- Oliver Keyes Count Logula Wikimedia Foundation

1 0

Portal Improvements wiki page update
by Deborah Tankersley 15 Feb '16

15 Feb '16

Hi all, I've made multiple updates of text, images, ideas and more on the Wikipedia.org portal improvements <https://meta.wikimedia.org/wiki/Wikipedia.org_Portal_Improvements#Minor_upd…> page on Meta, go check it out! The page now displays what the Discovery Portal team has been working on this quarter <https://meta.wikimedia.org/wiki/Wikipedia.org_Portal_Improvements#Minor_upd…>: what minor improvements to the site that we've released into production and the items we have coming up next. I've also updated the A/B test mocks <https://meta.wikimedia.org/wiki/Wikipedia.org_Portal_Improvements#A.2FB_tes…> and descriptions for clarity into our testing process. Please let us know if there are any questions or concerns! PS: I'm resending this email out to add the Wikitech-ambassadors email list. :) Cheers, Deb -- Deb Tankersley Product Manager, Discovery Wikimedia Foundation

1 0

Portal Improvements wiki page update
by Deborah Tankersley 15 Feb '16

15 Feb '16

Hi all, I've made multiple updates of text, images, ideas and more on the Wikipedia.org portal improvements <https://meta.wikimedia.org/wiki/Wikipedia.org_Portal_Improvements#Minor_upd…> page on Meta, go check it out! The page now displays what the Discovery Portal team has been working on this quarter <https://meta.wikimedia.org/wiki/Wikipedia.org_Portal_Improvements#Minor_upd…>: what minor improvements to the site that we've released into production and the items we have coming up next. I've also updated the A/B test mocks <https://meta.wikimedia.org/wiki/Wikipedia.org_Portal_Improvements#A.2FB_tes…> and descriptions for clarity into our testing process. Please let us know if there are any questions or concerns! Cheers, Deb -- Deb Tankersley Product Manager, Discovery Wikimedia Foundation

2 2

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Discovery February 2016