[OW] fixing performance regressions before they happen - Wiki-research-l - lists.wikimedia.org

List overview All Threads
Download

[OW] fixing performance regressions before they happen

Call for Participants -...

CICM 2022, 15th Conference on...

Samuel Klein

29 Jan 2022 29 Jan '22

6:15 p.m.

An interesting post from the Netflix team, workflow and analysis may be of general interest. I'm not sure what research of this sort we have on W* performance benchmarks over time ~~ https://netflixtechblog.com/fixing-performance-regressions-before-they-happ… 🌍🌏🌎🌑

Reply

Show replies by date

peter＠wikimedia.org

31 Jan 31 Jan

1:29 p.m.

Hi Samuel, my name is Peter and I work in the performance team. I also read the post and I also found it interesting. Our performance metrics are viewable in Grafana, a good start point is the performance summary dashboard: https://grafana.wikimedia.org/d/cZgMg49Wz/performance-summary. We have many dashboards but we lack some documentations, so please ask so I can guide you. We collect and keep track of performance metrics directly from our users, we run synthetic browser tests every X hour where we record a video of the browser screen, collect visual metrics and we also run some tests on commits. The largest research we've done in this is the study Gilles did about correlation between what the user perceive vs browser metrics https://techblog.wikimedia.org/2019/06/17/performance-perception-correlatio… and the paper https://nonsns.github.io/paper/rossi19www.pdf. For regressions, I've gone through the same path as the people at Netflix by trying different amount of runs, taking median/fastest/slowest runs etc to find more "stable" metrics. We don't proxy performance by memory usage, we focus more on visual metrics for the users and for us we need to do more than three runs. We do 5-11 runs depending on what we test. I haven't blogged about that work but it should be in some Phabricator tasks, I can look it up if you are interested. What is also interesting is what kind of practical regression you could find. In our most trimmed systems I think we can find performance regressions that are slighlty over 2%. But there's parts where the regression needs to be 10-20% for us to get alerts. I wrote a blog post a couple of years ago about one regression https://techblog.wikimedia.org/2018/10/03/best-friends-forever/ I like the use of anomaly detection, we discussed that in the teams some time ago but we haven't tried it out. Today we mostly use static thresholds in some way. I think a tool for anomaly detection would be something many teams could use. I really like that they have statistics about false alerts etc. We don't have that today but we should. I started to keep track of them manually, but hmm I failed :) Best Peter Hedenskog

Reply

Isaac Johnson

1 Feb 1 Feb

8:14 p.m.

Thanks for sharing the article SJ and additional details Peter! Just wanted to mention that, tangentially related, there is a place in Wikimedia where anomaly detection is used for monitoring "performance" and that's around detecting instances of Wikipedia outages (often censorship). More details in this blogpost: https://techblog.wikimedia.org/2021/01/15/censorship-outages-and-internet-s… Best, Isaac On Mon, Jan 31, 2022 at 8:30 AM <peter(a)wikimedia.org> wrote:

Hi Samuel, my name is Peter and I work in the performance team. I also read the post and I also found it interesting. Our performance metrics are viewable in Grafana, a good start point is the performance summary dashboard: https://grafana.wikimedia.org/d/cZgMg49Wz/performance-summary. We have many dashboards but we lack some documentations, so please ask so I can guide you. We collect and keep track of performance metrics directly from our users, we run synthetic browser tests every X hour where we record a video of the browser screen, collect visual metrics and we also run some tests on commits. The largest research we've done in this is the study Gilles did about correlation between what the user perceive vs browser metrics https://techblog.wikimedia.org/2019/06/17/performance-perception-correlatio… and the paper https://nonsns.github.io/paper/rossi19www.pdf. For regressions, I've gone through the same path as the people at Netflix by trying different amount of runs, taking median/fastest/slowest runs etc to find more "stable" metrics. We don't proxy performance by memory usage, we focus more on visual metrics for the users and for us we need to do more than three runs. We do 5-11 runs depending on what we test. I haven't blogged about that work but it should be in some Phabricator tasks, I can look it up if you are interested. What is also interesting is what kind of practical regression you could find. In our most trimmed systems I think we can find performance regressions that are slighlty over 2%. But there's parts where the regression needs to be 10-20% for us to get alerts. I wrote a blog post a couple of years ago about one regression https://techblog.wikimedia.org/2018/10/03/best-friends-forever/ I like the use of anomaly detection, we discussed that in the teams some time ago but we haven't tried it out. Today we mostly use static thresholds in some way. I think a tool for anomaly detection would be something many teams could use. I really like that they have statistics about false alerts etc. We don't have that today but we should. I started to keep track of them manually, but hmm I failed :) Best Peter Hedenskog _______________________________________________ Wiki-research-l mailing list -- wiki-research-l(a)lists.wikimedia.org To unsubscribe send an email to wiki-research-l-leave(a)lists.wikimedia.org

-- Isaac Johnson (he/him/his) -- Research Scientist -- Wikimedia Foundation

Reply

828

days inactive

831

days old

wiki-research-l@lists.wikimedia.org

Manage subscription

2 comments

3 participants

tags (0)

participants (3)

Isaac Johnson
peter＠wikimedia.org
Samuel Klein