[Wikitech-l] Re: 🤿🚂 Diving Into Wikimedia Deployment Data

18 Feb 2022


      Hi Risker!
On Wed, Feb 16, 2022 at 5:52 PM Risker risker.wp@gmail.com wrote:
...
Thank you very much for sharing this data, Tyler (and to the team that researched and analysed it, as well).  I think it shows that the train has been pretty successful in mitigating the issues it was intended to improve.
I think so, too :)
...
I note the data points that show there has been a significant and clear trend toward fewer comments per patch.  This would be worth investigating further. Iis the total number of reviews pretty consistent, or is it increasing or decreasing?  Is it possible that developers have become more proficient at writing patches to standard, and thus fewer comments are required?  Or could it be that, because more time is invested in writing patches (assuming that more patches = more time writing them), there is less time for review?
I'll preface my comments with the caveat: I am (definitely) not a data
scientist.
I think we need to investigate more to say anything definitive. And I
love that this data enables us to have a conversation about what to
investigate next.
The comments per patch trend comes from the number of comments per
patch averaged over a whole train. Outliers could be affecting the
average (for instance, there is one patch[0] from 2015 with 354
comments).
Another possible explanation is: as we've added more bots over time,
my simple tools to filter out bot noise are proving insufficient.
I've only begun to explore this trend[1]. I'll keep folks posted and I
invite others to explore along with me!
Thanks!
– Tyler
[0]: https://data.releng.team/train?sql=select+*+from+patch+order+by+comments+desc
[1]: https://gitlab.wikimedia.org/thcipriani/train-stats#a-look-at-comments-per-patch

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[Wikitech-l] Re: 🤿🚂 Diving Into Wikimedia Deployment Data