Re: [Wikitech-l] [Engineering] Simplifying the WMF deployment cadence

28 May 2015


      <quote name="Risker" date="2015-05-28" time="09:53:31 -0400">
...
This is strictly a question from an uninvolved observer.  Does this
schedule provide for sufficient time and real-time/hands-on testing before
changes hit the big projects?
Yes. We still have Beta Cluster (production-like environment) which runs
all code merged into master within 10 minutes of it being merged.
...
An IRC discussion I was following last evening suggested to me that the
first deploy (to test wikis and mw.org) probably did not get sufficient
hands-on testing/utilization to surface many issues that would be
significant on production wikis, which means only 24 hours on smaller
non-wikipedia wikis, hoping that any problems will pop up before it's
applied to dewiki, frwiki and enwiki.
Honestly, that's the wrong perspective to take on that incident
yesterday[0]. The issue is one that is hard to identify at low traffic
levels (one that only really manifests itself at Wikipedia-scale with
Wikipedia-scale caching). There will always be issues like this,
unfortunately. The way to mitigate them better is by changing how we
bucket requests to new or old versions of the software on production.
Currently we bucket by domain name/project site. This doesn't give us a
lot of flexibility in testing new versions at scales that can show
issues by not be "everyone". We would need to be able to deploy new
versions based on percentage of overall requests (ie: 5% of all users to
new version, then 10% of all users to new version, then everyone).
Best,
Greg
[0] https://wikitech.wikimedia.org/wiki/Incident_documentation/20150527-Cookie
-- 
| Greg Grossmeier            GPG: B2FA 27B1 F7EB D327 6B8E |
| identi.ca: @greg                A18D 1138 8E47 FAC8 1C7D |

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] [Engineering] Simplifying the WMF deployment cadence