*tl;dr:* We have open data on Wikimedia production deployments. Read Diving into Deployment Data https://phabricator.wikimedia.org/phame/post/view/272/diving_into_our_deployment_data/ to learn more (or read on, I guess). _____
If you’ve ever experienced the pride of seeing your name on MediaWiki's contributor list, you've been involved in our deployment process.
This realization inspires questions – *we have *📈 *data to answer those questions!*
- We wrote a blog (Research folks did the hard parts): ⚙️Phabricator: Diving Into Our Deployment Data https://phabricator.wikimedia.org/phame/post/view/272/diving_into_our_deployment_data/ - The data is open: 🦊 GitLab train-stats https://gitlab.wikimedia.org/thcipriani/train-stats - Play with the live data (if you'd rather dive into SQL): 💾 data.releng.team https://data.releng.team/train
Thanks!
Tyler Cipriani (he/him/his) Engineering Manager, Release Engineering Wikimedia Foundation
Thank you very much for sharing this data, Tyler (and to the team that researched and analysed it, as well). I think it shows that the train has been pretty successful in mitigating the issues it was intended to improve.
I note the data points that show there has been a significant and clear trend toward fewer comments per patch. This would be worth investigating further. Iis the total number of reviews pretty consistent, or is it increasing or decreasing? Is it possible that developers have become more proficient at writing patches to standard, and thus fewer comments are required? Or could it be that, because more time is invested in writing patches (assuming that more patches = more time writing them), there is less time for review?
I've always found the train to be very interesting, and in fact mentioned it when being interviewed for a recently published article (in a positive way). I'm pleased and perhaps a bit relieved to see that the research has borne out my impression of how it has made such a big difference in the deployment process.
Risker/Anne
On Wed, 16 Feb 2022 at 06:01, Tyler Cipriani tcipriani@wikimedia.org wrote:
*tl;dr:* We have open data on Wikimedia production deployments. Read Diving into Deployment Data https://phabricator.wikimedia.org/phame/post/view/272/diving_into_our_deployment_data/ to learn more (or read on, I guess). _____
If you’ve ever experienced the pride of seeing your name on MediaWiki's contributor list, you've been involved in our deployment process.
This realization inspires questions – *we have *📈 *data to answer those questions!*
- We wrote a blog (Research folks did the hard parts): ⚙️Phabricator:
Diving Into Our Deployment Data https://phabricator.wikimedia.org/phame/post/view/272/diving_into_our_deployment_data/
- The data is open: 🦊 GitLab train-stats
https://gitlab.wikimedia.org/thcipriani/train-stats
- Play with the live data (if you'd rather dive into SQL): 💾
data.releng.team https://data.releng.team/train
Thanks!
Tyler Cipriani (he/him/his) Engineering Manager, Release Engineering Wikimedia Foundation _______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Hi Risker!
On Wed, Feb 16, 2022 at 5:52 PM Risker risker.wp@gmail.com wrote:
Thank you very much for sharing this data, Tyler (and to the team that researched and analysed it, as well). I think it shows that the train has been pretty successful in mitigating the issues it was intended to improve.
I think so, too :)
I note the data points that show there has been a significant and clear trend toward fewer comments per patch. This would be worth investigating further. Iis the total number of reviews pretty consistent, or is it increasing or decreasing? Is it possible that developers have become more proficient at writing patches to standard, and thus fewer comments are required? Or could it be that, because more time is invested in writing patches (assuming that more patches = more time writing them), there is less time for review?
I'll preface my comments with the caveat: I am (definitely) not a data scientist.
I think we need to investigate more to say anything definitive. And I love that this data enables us to have a conversation about what to investigate next.
The comments per patch trend comes from the number of comments per patch averaged over a whole train. Outliers could be affecting the average (for instance, there is one patch[0] from 2015 with 354 comments).
Another possible explanation is: as we've added more bots over time, my simple tools to filter out bot noise are proving insufficient.
I've only begun to explore this trend[1]. I'll keep folks posted and I invite others to explore along with me!
Thanks! – Tyler
[0]: https://data.releng.team/train?sql=select+*+from+patch+order+by+comments+desc [1]: https://gitlab.wikimedia.org/thcipriani/train-stats#a-look-at-comments-per-patch
On Thu, Feb 17, 2022 at 1:51 AM Risker risker.wp@gmail.com wrote:
Thank you very much for sharing this data, Tyler (and to the team that researched and analysed it, as well). I think it shows that the train has been pretty successful in mitigating the issues it was intended to improve.
I note the data points that show there has been a significant and clear trend toward fewer comments per patch. This would be worth investigating further. Iis the total number of reviews pretty consistent, or is it increasing or decreasing? Is it possible that developers have become more proficient at writing patches to standard, and thus fewer comments are required? Or could it be that, because more time is invested in writing patches (assuming that more patches = more time writing them), there is less time for review?
There can be other reasons as well, three reasons that I have seen anecdotally and not in any way scientific are: - Reducing average size of patches. This is a good practice in software engineering and slowly being adapted by developers. Smaller patches leading to more straightforward review. - Rise of simple refactor patches. With lots of refactoring work being done (specially breaking down monster classes: https://monmon.toolforge.org), we have a lot of patches fixing deprecations (in core and extensions) and other cleanup work that are useful but really don't require much discussion. - Better test coverage. I remember back then when test coverage was much lower, most of the review comments were something along the lines of "I tested it locally and it broke twenty million things". With better test coverage we have now (specially browser tests, api tests, the rest of e2e tests), now these are caught by jenkins before a human reviewer finds them (or worse, show up in production) which is good :)
The reality is probably a combination of all three plus other reasons.
Best
wikitech-l@lists.wikimedia.org