Hi everyone,
I’d like to announce an organizational change at Wikimedia Foundation in the Platform Engineering group. For those that aren't terribly interested in how WMF's org chart looks, you can skip the rest of this email. :-)
Yesterday, we formalized “Release Engineering” as a team, and promoted Greg Grossmeier to “Release Team Manager” with everyone on the team reporting to him.
In addition to Greg, the new team comprises: * Antoine Musso * Chris McMahon * Dan Duvall * Mukunda Modell * Rummana Yasmeen * Sam Reed * Zeljko Filipin
They are broadly responsible for the lifecycle of code from the point that a developer is ready to check it in through its deployment on our site, maintaining the processes and tools that reduce negative user impact of site software changes while simultaneously making software change deployment efficient and joyful.
On a more detailed level, here’s just a few things the group is responsible for: * Code and bug report hosting - currently Gerrit and Bugzilla, but in the glorious future, Phabricator * Test infrastructure - the team maintains the Beta Cluster, with help from TechOps * Test automation - building the Cucumber/RSpec-based infrastructure for automating browser tests * Manual testing - actually looking at the product and making sure it does what all the robots tell us it should be doing * Test tools - tools that developers can use to test their own code such as Vagrant * Deployment tooling - the infrastructure we use to push code out to production, like scap
More information about the team can be found here: https://www.mediawiki.org/wiki/Wikimedia_Release_and_QA_Team
You may notice that that page has been around a while (August 2013). Greg and Chris McMahon have been leading this as a “virtual team” for the past year, with a shared goal-setting and day-to-day organization. This has demonstrated that there is a strong case for creating a formalized team.
Please join me in congratulating Greg and wishing the newly formalized team continued success!
Rob
<quote name="Rob Lanphier" date="2014-07-29" time="09:52:47 -0700">
They are broadly responsible for the lifecycle of code from the point that a developer is ready to check it in through its deployment on our site, maintaining the processes and tools that reduce negative user impact of site software changes while simultaneously making software change deployment efficient and joyful.
Chris McMahon shared the below quote on the internal thread for this announcement, and I thought it was useful to share here as well:
<quote name="Chris McMahon" date="2014-07-29" time="08:58:11 -0700">
I think it's worth pointing out that RelEng is not only concerned with releasing software early and often, but also concerned with releasing software *safely*. You don't hear much about it, but stuff we also do:
- Put in place and run all the linters, unit tests, qunit tests in Jenkins
- Deploy the master branch of all core and all extensions to beta labs
every three minutes
- Run automated browser tests in beta labs at least twice per day, and
analyze the results
- Do exploratory testing in beta labs
- Maintain the deploy tools like scap
- And manage the process within which all of these things are productive
In Jenkins we find and fix code problems, for example with syntax and structure.
In beta labs we find and fix a number of sorts of problems:
- configuration mistakes, like for caching or database.
- integration problems, for example when a change to VisualEditor makes it
stop working for MobileFrontend, or a change to Core breaks VE.
- regression problems, where a change in one part of the code unexpectedly
makes some other features stop working correctly.
People sometimes ask me why the browser test builds are red so much. The answer is that they are showing where changes and problems are. Red tests give us information.
So today we spend very little time in production "putting out fires", as Andrew put it. Of course, we can't find and fix every problem, but I have no doubt that our current practices and processes are saving Ops and Core and Features engineers many frustrating hours every week.
And speaking of practices and processes, having a Team Practices group in place will be great. We have many interests in common.
And if you're interested, I'm giving a short talk on the subject at Wikimania: https://wikimania2014.wikimedia.org/wiki/Submissions/Finding_and_fixing_soft...
To clarify, is the QA team now under Release Engineering as Chris' comment seems to imply, and how does this org change effect security engineering?
Thanks, Pine On Jul 29, 2014 10:53 AM, "Greg Grossmeier" greg@wikimedia.org wrote:
<quote name="Rob Lanphier" date="2014-07-29" time="09:52:47 -0700"> > They are broadly responsible for the lifecycle of code from the point > that a developer is ready to check it in through its deployment on our > site, maintaining the processes and tools that reduce negative user > impact of site software changes while simultaneously making software > change deployment efficient and joyful.
Chris McMahon shared the below quote on the internal thread for this announcement, and I thought it was useful to share here as well:
<quote name="Chris McMahon" date="2014-07-29" time="08:58:11 -0700"> > I think it's worth pointing out that RelEng is not only concerned with > releasing software early and often, but also concerned with releasing > software *safely*. You don't hear much about it, but stuff we also do: > > * Put in place and run all the linters, unit tests, qunit tests in Jenkins > * Deploy the master branch of all core and all extensions to beta labs > every three minutes > * Run automated browser tests in beta labs at least twice per day, and > analyze the results > * Do exploratory testing in beta labs > * Maintain the deploy tools like scap > * And manage the process within which all of these things are productive > > In Jenkins we find and fix code problems, for example with syntax and > structure. > > In beta labs we find and fix a number of sorts of problems: > > * configuration mistakes, like for caching or database. > * integration problems, for example when a change to VisualEditor makes it > stop working for MobileFrontend, or a change to Core breaks VE. > * regression problems, where a change in one part of the code unexpectedly > makes some other features stop working correctly. > > People sometimes ask me why the browser test builds are red so much. The > answer is that they are showing where changes and problems are. Red tests > give us information. > > So today we spend very little time in production "putting out fires", as > Andrew put it. Of course, we can't find and fix every problem, but I have > no doubt that our current practices and processes are saving Ops and Core > and Features engineers many frustrating hours every week. > > And speaking of practices and processes, having a Team Practices group in > place will be great. We have many interests in common. > > And if you're interested, I'm giving a short talk on the subject at > Wikimania: > https://wikimania2014.wikimedia.org/wiki/Submissions/Finding_and_fixing_software_bugs_for_the_Wikipedias
-- | Greg Grossmeier GPG: B2FA 27B1 F7EB D327 6B8E | | identi.ca: @greg A18D 1138 8E47 FAC8 1C7D |
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
To my understanding this is simply a formalisation of a change that, in almost every regard, already happened months ago.
Dan
On 29 July 2014 11:58, Pine W wiki.pine@gmail.com wrote:
To clarify, is the QA team now under Release Engineering as Chris' comment seems to imply, and how does this org change effect security engineering?
Thanks, Pine On Jul 29, 2014 10:53 AM, "Greg Grossmeier" greg@wikimedia.org wrote:
<quote name="Rob Lanphier" date="2014-07-29" time="09:52:47 -0700"> > They are broadly responsible for the lifecycle of code from the point > that a developer is ready to check it in through its deployment on our > site, maintaining the processes and tools that reduce negative user > impact of site software changes while simultaneously making software > change deployment efficient and joyful.
Chris McMahon shared the below quote on the internal thread for this announcement, and I thought it was useful to share here as well:
<quote name="Chris McMahon" date="2014-07-29" time="08:58:11 -0700"> > I think it's worth pointing out that RelEng is not only concerned with > releasing software early and often, but also concerned with releasing > software *safely*. You don't hear much about it, but stuff we also do: > > * Put in place and run all the linters, unit tests, qunit tests in Jenkins > * Deploy the master branch of all core and all extensions to beta labs > every three minutes > * Run automated browser tests in beta labs at least twice per day, and > analyze the results > * Do exploratory testing in beta labs > * Maintain the deploy tools like scap > * And manage the process within which all of these things are
productive
In Jenkins we find and fix code problems, for example with syntax and structure.
In beta labs we find and fix a number of sorts of problems:
- configuration mistakes, like for caching or database.
- integration problems, for example when a change to VisualEditor makes
it
stop working for MobileFrontend, or a change to Core breaks VE.
- regression problems, where a change in one part of the code
unexpectedly
makes some other features stop working correctly.
People sometimes ask me why the browser test builds are red so much.
The
answer is that they are showing where changes and problems are. Red
tests
give us information.
So today we spend very little time in production "putting out fires",
as
Andrew put it. Of course, we can't find and fix every problem, but I
have
no doubt that our current practices and processes are saving Ops and
Core
and Features engineers many frustrating hours every week.
And speaking of practices and processes, having a Team Practices group
in
place will be great. We have many interests in common.
And if you're interested, I'm giving a short talk on the subject at Wikimania:
https://wikimania2014.wikimedia.org/wiki/Submissions/Finding_and_fixing_soft...
-- | Greg Grossmeier GPG: B2FA 27B1 F7EB D327 6B8E | | identi.ca: @greg A18D 1138 8E47 FAC8 1C7D |
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Basically:
[[mw:Wikimedia Release and QA Team]] -> [[mw:Wikimedia Release Engineering Team]]
From the org perspective, now all of the team members report to me
instead of Rob. That's basically the substance of the change. "QA" and "Release Engineering" were already the same team (effectively) since August of last year.
"Security engineering" isn't a "thing" at this time. Chris Steipp is a part of "MediaWiki Core Team" (reporting to Rob) and is the main security engineer.
Hope that helps,
Greg
<quote name="Dan Garry" date="2014-07-29" time="12:14:55 -0700">
To my understanding this is simply a formalisation of a change that, in almost every regard, already happened months ago.
Dan
On 29 July 2014 11:58, Pine W wiki.pine@gmail.com wrote:
To clarify, is the QA team now under Release Engineering as Chris' comment seems to imply, and how does this org change effect security engineering?
Thanks, Pine
On Tue, Jul 29, 2014 at 11:58 AM, Pine W wiki.pine@gmail.com wrote:
To clarify, is the QA team now under Release Engineering as Chris' comment seems to imply, and how does this org change effect security engineering?
For now, I (the only security engineer) am staying in core, although much of my role spans both groups. I'll continue working with Chris, Greg, and other engineers across the WMF and developer community to build security features, find and respond to vulnerabilities, release security updates, and improve the secure development process in general.
On Tue, Jul 29, 2014 at 12:25 PM, Chris Steipp csteipp@wikimedia.org wrote:
On Tue, Jul 29, 2014 at 11:58 AM, Pine W wiki.pine@gmail.com wrote:
To clarify, is the QA team now under Release Engineering as Chris'
comment
seems to imply, and how does this org change effect security engineering?
For now, I (the only security engineer) am staying in core, although much of my role spans both groups. I'll continue working with Chris, Greg, and other engineers across the WMF and developer community...
I think it is not accurate to say that "the QA team" is "under Release Engineering", or that Release Engineering is somehow separate from Core, and security, and the feature development groups.
Our QA practice reaches into many aspects of software development at WMF, and RelEng serves everyone who needs to get software to Wikipedia. We have a minimum of formal gates and handoffs and such; instead we try to put in place general processes (build, test, deploy) between your local development environment and production in order to get new features to users as quickly and again, as *safely*, as possible.
-Chris
Hi Chris M.,
I understand the difference between functional and reporting relationships. As I understand it, QA is under RelEng in terms of reporting but functionally works in a matrix envrionment. That seems consistent with the OP and the WMF Wiki's pseudo-org chart. Is this your understanding as well?
The everyday difference that this change makes may be trivial, but it makes sense to me to think of QA (and Security Engineering) as being part of RelEng.
By the way, Wikimedians are a vocal group when there are problems, and I take the general quiet of Wikimedia content editors about security and core stability to mean that security and core QA are in good hands.
Pine
On Tue, Jul 29, 2014 at 12:25 PM, Chris Steipp csteipp@wikimedia.org wrote:
On Tue, Jul 29, 2014 at 11:58 AM, Pine W wiki.pine@gmail.com wrote:
To clarify, is the QA team now under Release Engineering as Chris'
comment
seems to imply, and how does this org change effect security
engineering?
For now, I (the only security engineer) am staying in core, although much of my role spans both groups. I'll continue working with Chris, Greg, and other engineers across the WMF and developer community...
I think it is not accurate to say that "the QA team" is "under Release Engineering", or that Release Engineering is somehow separate from Core, and security, and the feature development groups.
Our QA practice reaches into many aspects of software development at WMF, and RelEng serves everyone who needs to get software to Wikipedia. We have a minimum of formal gates and handoffs and such; instead we try to put in place general processes (build, test, deploy) between your local development environment and production in order to get new features to users as quickly and again, as *safely*, as possible.
-Chris _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Tue, Jul 29, 2014 at 2:06 PM, Pine W wiki.pine@gmail.com wrote:
Hi Chris M.,
By the way, Wikimedians are a vocal group when there are problems, and I take the general quiet of Wikimedia content editors about security and core stability to mean that security and core QA are in good hands.
Thank you, that is good to hear! -Chris
On Tue, Jul 29, 2014 at 2:06 PM, Pine W wiki.pine@gmail.com wrote:
The everyday difference that this change makes may be trivial, but it makes sense to me to think of QA (and Security Engineering) as being part of RelEng.
I doubt we disagree too much, but I'll put on my security evangelist hat and get on my soapbox, since you phrased it that way.
It's not uncommon to see security placed (organizationally) as part of the release process. But while security reviews and security regression testing are important, I really hope that for MediaWiki, security isn't just a hurdle to deployment. I believe that security has to be a part of the entire development process to be effective. If the features aren't designed for security, security is always going to loose versus the need to deploy things that we've spent resources to develop. I think MediaWiki benefited a lot from having Tim be both the security evangelist and technical lead for so many years.
So I try to spend a significant portion of my time working early in the development lifecycle, training developers and working towards more secure architecture, rather than focusing on the release process to fix all the bugs before we push something out. Sometimes that happens, and other times (like this week) I spend most of my time fixing issues after they are already in production. Core has been a good place to do that work from so far.
Chris S.,
I agree that in many cases an ounce of prevention is worth a pound of cure. I will also say that I feel that you're a self-motivated, capable person and you'd do good work anywhere in the org chart.
In my experince generally, Wikimedia is a more security-conscious and privacy-conscious environment than a lot of other orgs, and I think this is a net positive. The only critical security problems that I know about in my time with this project that created a lot of public concern were the OpenSSL vulnerability and some possible access to hashed passwords, neither of which resulted in compromised accounts so far as I know. I get the sense that most devs including volunteer devs are serious about writing code that is secure, reliable, and provides a good balance of privacy and openness.
Speaking of pushing work to early in the development lifecycle, I am proposing to do the same for other elements of development via the proposed Technology Committee. We are thinking in similar ways.
Pine On Tue, Jul 29, 2014 at 2:06 PM, Pine W wiki.pine@gmail.com wrote:
The everyday difference that this change makes may be trivial, but it
makes
sense to me to think of QA (and Security Engineering) as being part of RelEng.
I doubt we disagree too much, but I'll put on my security evangelist hat and get on my soapbox, since you phrased it that way.
It's not uncommon to see security placed (organizationally) as part of the release process. But while security reviews and security regression testing are important, I really hope that for MediaWiki, security isn't just a hurdle to deployment. I believe that security has to be a part of the entire development process to be effective. If the features aren't designed for security, security is always going to loose versus the need to deploy things that we've spent resources to develop. I think MediaWiki benefited a lot from having Tim be both the security evangelist and technical lead for so many years.
So I try to spend a significant portion of my time working early in the development lifecycle, training developers and working towards more secure architecture, rather than focusing on the release process to fix all the bugs before we push something out. Sometimes that happens, and other times (like this week) I spend most of my time fixing issues after they are already in production. Core has been a good place to do that work from so far.
_______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org