Hi Samuel,
my name is Peter and I work in the performance team. I also read the post and I also found
it interesting. Our performance metrics are viewable in Grafana, a good start point is the
performance summary dashboard:
https://grafana.wikimedia.org/d/cZgMg49Wz/performance-summary. We have many dashboards but
we lack some documentations, so please ask so I can guide you.
We collect and keep track of performance metrics directly from our users, we run synthetic
browser tests every X hour where we record a video of the browser screen, collect visual
metrics and we also run some tests on commits.
The largest research we've done in this is the study Gilles did about correlation
between what the user perceive vs browser metrics
https://techblog.wikimedia.org/2019/06/17/performance-perception-correlatio…
and the paper
https://nonsns.github.io/paper/rossi19www.pdf.
For regressions, I've gone through the same path as the people at Netflix by trying
different amount of runs, taking median/fastest/slowest runs etc to find more
"stable" metrics. We don't proxy performance by memory usage, we focus more
on visual metrics for the users and for us we need to do more than three runs. We do 5-11
runs depending on what we test. I haven't blogged about that work but it should be in
some Phabricator tasks, I can look it up if you are interested. What is also interesting
is what kind of practical regression you could find. In our most trimmed systems I think
we can find performance regressions that are slighlty over 2%. But there's parts where
the regression needs to be 10-20% for us to get alerts.
I wrote a blog post a couple of years ago about one regression
https://techblog.wikimedia.org/2018/10/03/best-friends-forever/
I like the use of anomaly detection, we discussed that in the teams some time ago but we
haven't tried it out. Today we mostly use static thresholds in some way. I think a
tool for anomaly detection would be something many teams could use.
I really like that they have statistics about false alerts etc. We don't have that
today but we should. I started to keep track of them manually, but hmm I failed :)
Best
Peter Hedenskog