Hello team!
I've been with you for 2 weeks. As Tomasz suggested, I might be the right moment to list what I have discovered so far. I'm trying to point out the difference I see from what I'm used to, not to judge (yet), I don't have enough context for that... Still I am biased by my previous experience and that will show up below...
I need to unlearn a few things. It seems that last week, each and every decision I took has been challenged by someone (usually for good reasons). Things that I take for granted as "the right way (tm)" are actually done in a fairly different way here. Examples:
== Distributed team ==
** my belief ** It is hard to be part of a distributed team.
** what happens at WMF ** You've all been extremely welcoming! Having the chance to see you in person in SF in January definitely helps (it is a bit harder for me to find my way around the Ops team, probably in part because I have not had the chance to meet them in person). The fact that we also have kind of informal conversations (IRC, unmeeting, ...) helps to belong. It seems to me that you've made all the necessary effort to include me.
** comments ** I still have to make an effort to get in touch when I need to. I'm so used to coffee breaks where you take time to discuss whatever needs discussing. IRC feels more intrusive as it is a continuous flow, not a set time where you go have coffee and do the needful... I also need to learn to ignore IRC interruptions. It still feels that I might miss something important and that there is just too much information sometimes...
== Versionning ==
**my belief ** anything deployed must have a version number
** what happens at WMF ** * deployments on labs are pretty much free-form, cherry pick whatever you want on puppetmaster * deployments on prod seems to have version numbers at least for mediawiki code, puppet code is deployed directly from production branch
** comments ** Having clear version numbers implies having a conscious decision of creating a version, potentially with the appropriate checks of the content of that version, additional testing. It allows to have a clear separation between creating a version and promoting it to production. Not having versions everywhere allows for more flexibility and puts responsibility of making the right choices more on the people than on the process. Probably a good thing if you have smart enough people (and WMF seems to have a pretty smart crowd).
Having a shared git repository on deployment-puppetmaster scares the hell out of me! I'm so used to preparing anything I want to push locally and then just applying a specific tag / version...
== Cherry-picking ==
** my belief ** cherry-picking is bad and should be used only as a last resort solution
** what happens at WMF ** * cherry picking is the norm * this seems to be influenced by Gerrit, which promotes cherry picking and single commits patches
** comment ** I'm all for rewriting git history to make it more readable, to help tell the story of what is happening to the code. I think that branches and merges are a good tool for that. Cherry picking fixes from one branch to the next leaves a lot of opportunities to forget one. Merging helps tell the story of "those are all the fixes done on branch X, I've applied them on branch X+1". Also, I'm not a huge fan of gerrit idea of changes being a single commit. Having a coherent change split in multiple phases make sense to me (for example: 1) preliminary refactoring, 2) my actual work 3) some clean up I did along the way). I need to dig deeper into topic branchs and how they integrate with gerrit (yes, I am brand new to gerrit).
It also seems that all this cherry picking creates much more flexibility (I can take any commit and apply it anywhere). Again, giving control back to the human and not to the tool.
== Stupid code is good code ==
I need to write a blog post about this one, but like Kernighan said "Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?" [1]. Looking at our code (mainly puppet at the moment), I think there are quite a few places where we do the smart thing, when stupid would be sufficient. That's probably the cost of hiring smart people ...
== A few random points ==
* We have an incredible amount of documentation. It is easy to read (I've been drawn into it and lost much time). It is also outdated in some place (documentation always is). * so many different ways to deploy (puppet, trebuchet, salt, manual stuff, ...) * I still have not found a global architecture schema (something like a high level component or deplyoment diagram). But I have never seen any company having those...
Hi Guillaume,
Since I have a good case of insomnia and am reading emails between attempts to sleep:
As a "team practice" for remotes, I would suggest scheduling coffee breaks just like people do in person, but chat on Hangouts.
Don't even try to assimilate 100% of IRC traffic. And if IRC distracts you from Getting Things Done, minimize it and only look if pinged.
For some cultural context: Gerrit has its critics. Awhile back I believe that Isaara suggested burning it "in a huge fire".
Please write the blog post that you mentioned!
Glad to have you with us,
Pine On Feb 17, 2016 01:50, "Guillaume Lederrey" glederrey@wikimedia.org wrote:
Hello team!
I've been with you for 2 weeks. As Tomasz suggested, I might be the right moment to list what I have discovered so far. I'm trying to point out the difference I see from what I'm used to, not to judge (yet), I don't have enough context for that... Still I am biased by my previous experience and that will show up below...
I need to unlearn a few things. It seems that last week, each and every decision I took has been challenged by someone (usually for good reasons). Things that I take for granted as "the right way (tm)" are actually done in a fairly different way here. Examples:
== Distributed team ==
** my belief ** It is hard to be part of a distributed team.
** what happens at WMF ** You've all been extremely welcoming! Having the chance to see you in person in SF in January definitely helps (it is a bit harder for me to find my way around the Ops team, probably in part because I have not had the chance to meet them in person). The fact that we also have kind of informal conversations (IRC, unmeeting, ...) helps to belong. It seems to me that you've made all the necessary effort to include me.
** comments ** I still have to make an effort to get in touch when I need to. I'm so used to coffee breaks where you take time to discuss whatever needs discussing. IRC feels more intrusive as it is a continuous flow, not a set time where you go have coffee and do the needful... I also need to learn to ignore IRC interruptions. It still feels that I might miss something important and that there is just too much information sometimes...
== Versionning ==
**my belief ** anything deployed must have a version number
** what happens at WMF **
- deployments on labs are pretty much free-form, cherry pick whatever
you want on puppetmaster
- deployments on prod seems to have version numbers at least for
mediawiki code, puppet code is deployed directly from production branch
** comments ** Having clear version numbers implies having a conscious decision of creating a version, potentially with the appropriate checks of the content of that version, additional testing. It allows to have a clear separation between creating a version and promoting it to production. Not having versions everywhere allows for more flexibility and puts responsibility of making the right choices more on the people than on the process. Probably a good thing if you have smart enough people (and WMF seems to have a pretty smart crowd).
Having a shared git repository on deployment-puppetmaster scares the hell out of me! I'm so used to preparing anything I want to push locally and then just applying a specific tag / version...
== Cherry-picking ==
** my belief ** cherry-picking is bad and should be used only as a last resort solution
** what happens at WMF **
- cherry picking is the norm
- this seems to be influenced by Gerrit, which promotes cherry picking
and single commits patches
** comment ** I'm all for rewriting git history to make it more readable, to help tell the story of what is happening to the code. I think that branches and merges are a good tool for that. Cherry picking fixes from one branch to the next leaves a lot of opportunities to forget one. Merging helps tell the story of "those are all the fixes done on branch X, I've applied them on branch X+1". Also, I'm not a huge fan of gerrit idea of changes being a single commit. Having a coherent change split in multiple phases make sense to me (for example: 1) preliminary refactoring, 2) my actual work 3) some clean up I did along the way). I need to dig deeper into topic branchs and how they integrate with gerrit (yes, I am brand new to gerrit).
It also seems that all this cherry picking creates much more flexibility (I can take any commit and apply it anywhere). Again, giving control back to the human and not to the tool.
== Stupid code is good code ==
I need to write a blog post about this one, but like Kernighan said "Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?" [1]. Looking at our code (mainly puppet at the moment), I think there are quite a few places where we do the smart thing, when stupid would be sufficient. That's probably the cost of hiring smart people ...
== A few random points ==
- We have an incredible amount of documentation. It is easy to read
(I've been drawn into it and lost much time). It is also outdated in some place (documentation always is).
- so many different ways to deploy (puppet, trebuchet, salt, manual stuff,
...)
- I still have not found a global architecture schema (something like
a high level component or deplyoment diagram). But I have never seen any company having those...
discovery mailing list discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery
Thanks for the feedback!
On Wed, Feb 17, 2016 at 11:39 AM, Pine W wiki.pine@gmail.com wrote:
Hi Guillaume,
Since I have a good case of insomnia and am reading emails between attempts to sleep:
As a "team practice" for remotes, I would suggest scheduling coffee breaks just like people do in person, but chat on Hangouts.
That sounds like a good idea, I might try that.
Don't even try to assimilate 100% of IRC traffic. And if IRC distracts you from Getting Things Done, minimize it and only look if pinged.
I'm not trying to assimilate 100% (I do not think anybody can assimilate that much!). But just understanding which 1% I should assimilate is already a challenge at the moment. This sounds like a NP-complete problem (though I do not have the formal proof yet).
For some cultural context: Gerrit has its critics. Awhile back I believe that Isaara suggested burning it "in a huge fire".
That's an issue of me not being used to this way of working more than anything else (at least I think). It's always fun to see new ways of doing things. Makes me challenge what I've been doing for so long...
Please write the blog post that you mentioned!
Glad to have you with us,
Pine
On Feb 17, 2016 01:50, "Guillaume Lederrey" glederrey@wikimedia.org wrote:
Hello team!
I've been with you for 2 weeks. As Tomasz suggested, I might be the right moment to list what I have discovered so far. I'm trying to point out the difference I see from what I'm used to, not to judge (yet), I don't have enough context for that... Still I am biased by my previous experience and that will show up below...
I need to unlearn a few things. It seems that last week, each and every decision I took has been challenged by someone (usually for good reasons). Things that I take for granted as "the right way (tm)" are actually done in a fairly different way here. Examples:
== Distributed team ==
** my belief ** It is hard to be part of a distributed team.
** what happens at WMF ** You've all been extremely welcoming! Having the chance to see you in person in SF in January definitely helps (it is a bit harder for me to find my way around the Ops team, probably in part because I have not had the chance to meet them in person). The fact that we also have kind of informal conversations (IRC, unmeeting, ...) helps to belong. It seems to me that you've made all the necessary effort to include me.
** comments ** I still have to make an effort to get in touch when I need to. I'm so used to coffee breaks where you take time to discuss whatever needs discussing. IRC feels more intrusive as it is a continuous flow, not a set time where you go have coffee and do the needful... I also need to learn to ignore IRC interruptions. It still feels that I might miss something important and that there is just too much information sometimes...
== Versionning ==
**my belief ** anything deployed must have a version number
** what happens at WMF **
- deployments on labs are pretty much free-form, cherry pick whatever
you want on puppetmaster
- deployments on prod seems to have version numbers at least for
mediawiki code, puppet code is deployed directly from production branch
** comments ** Having clear version numbers implies having a conscious decision of creating a version, potentially with the appropriate checks of the content of that version, additional testing. It allows to have a clear separation between creating a version and promoting it to production. Not having versions everywhere allows for more flexibility and puts responsibility of making the right choices more on the people than on the process. Probably a good thing if you have smart enough people (and WMF seems to have a pretty smart crowd).
Having a shared git repository on deployment-puppetmaster scares the hell out of me! I'm so used to preparing anything I want to push locally and then just applying a specific tag / version...
== Cherry-picking ==
** my belief ** cherry-picking is bad and should be used only as a last resort solution
** what happens at WMF **
- cherry picking is the norm
- this seems to be influenced by Gerrit, which promotes cherry picking
and single commits patches
** comment ** I'm all for rewriting git history to make it more readable, to help tell the story of what is happening to the code. I think that branches and merges are a good tool for that. Cherry picking fixes from one branch to the next leaves a lot of opportunities to forget one. Merging helps tell the story of "those are all the fixes done on branch X, I've applied them on branch X+1". Also, I'm not a huge fan of gerrit idea of changes being a single commit. Having a coherent change split in multiple phases make sense to me (for example: 1) preliminary refactoring, 2) my actual work 3) some clean up I did along the way). I need to dig deeper into topic branchs and how they integrate with gerrit (yes, I am brand new to gerrit).
It also seems that all this cherry picking creates much more flexibility (I can take any commit and apply it anywhere). Again, giving control back to the human and not to the tool.
== Stupid code is good code ==
I need to write a blog post about this one, but like Kernighan said "Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?" [1]. Looking at our code (mainly puppet at the moment), I think there are quite a few places where we do the smart thing, when stupid would be sufficient. That's probably the cost of hiring smart people ...
== A few random points ==
- We have an incredible amount of documentation. It is easy to read
(I've been drawn into it and lost much time). It is also outdated in some place (documentation always is).
- so many different ways to deploy (puppet, trebuchet, salt, manual stuff,
...)
- I still have not found a global architecture schema (something like
a high level component or deplyoment diagram). But I have never seen any company having those...
discovery mailing list discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery
discovery mailing list discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery
On Feb 17, 2016 1:50 AM, "Guillaume Lederrey" glederrey@wikimedia.org wrote:
Hello team!
I've been with you for 2 weeks. As Tomasz suggested, I might be the right moment to list what I have discovered so far. I'm trying to point out the difference I see from what I'm used to, not to judge (yet), I don't have enough context for that... Still I am biased by my previous experience and that will show up below...
I need to unlearn a few things. It seems that last week, each and every decision I took has been challenged by someone (usually for good reasons). Things that I take for granted as "the right way (tm)" are actually done in a fairly different way here. Examples:
== Distributed team ==
** my belief ** It is hard to be part of a distributed team.
** what happens at WMF ** You've all been extremely welcoming! Having the chance to see you in person in SF in January definitely helps (it is a bit harder for me to find my way around the Ops team, probably in part because I have not had the chance to meet them in person). The fact that we also have kind of informal conversations (IRC, unmeeting, ...) helps to belong. It seems to me that you've made all the necessary effort to include me.
** comments ** I still have to make an effort to get in touch when I need to. I'm so used to coffee breaks where you take time to discuss whatever needs discussing. IRC feels more intrusive as it is a continuous flow, not a set time where you go have coffee and do the needful... I also need to learn to ignore IRC interruptions. It still feels that I might miss something important and that there is just too much information sometimes...
== Versionning ==
**my belief ** anything deployed must have a version number
** what happens at WMF **
- deployments on labs are pretty much free-form, cherry pick whatever
you want on puppetmaster
- deployments on prod seems to have version numbers at least for
mediawiki code, puppet code is deployed directly from production branch
** comments ** Having clear version numbers implies having a conscious decision of creating a version, potentially with the appropriate checks of the content of that version, additional testing. It allows to have a clear separation between creating a version and promoting it to production. Not having versions everywhere allows for more flexibility and puts responsibility of making the right choices more on the people than on the process. Probably a good thing if you have smart enough people (and WMF seems to have a pretty smart crowd).
Having a shared git repository on deployment-puppetmaster scares the hell out of me! I'm so used to preparing anything I want to push locally and then just applying a specific tag / version...
Puppet being unversioned certainly makes it different from the rest of deployments. I think ops gets away with this by having relatively few people commiting code. It also has to do with the careful nature of puppet deployments, puppet is typically deployed one patch at a time. I think this helps with understanding what just broke everything, rather than having a big release with many disparate changes.
When it comes to deployment-puppetmaster in labs it is certainly an interesting, and scary situation. I know RelEng has some plans to build separate staging and beta clusters but not sure how that plays into ideas about more isolated puppet test deployments.
== Cherry-picking ==
** my belief ** cherry-picking is bad and should be used only as a last resort solution
** what happens at WMF **
- cherry picking is the norm
- this seems to be influenced by Gerrit, which promotes cherry picking
and single commits patches
** comment ** I'm all for rewriting git history to make it more readable, to help tell the story of what is happening to the code. I think that branches and merges are a good tool for that. Cherry picking fixes from one branch to the next leaves a lot of opportunities to forget one. Merging helps tell the story of "those are all the fixes done on branch X, I've applied them on branch X+1". Also, I'm not a huge fan of gerrit idea of changes being a single commit. Having a coherent change split in multiple phases make sense to me (for example: 1) preliminary refactoring, 2) my actual work 3) some clean up I did along the way). I need to dig deeper into topic branchs and how they integrate with gerrit (yes, I am brand new to gerrit).
It also seems that all this cherry picking creates much more flexibility (I can take any commit and apply it anywhere). Again, giving control back to the human and not to the tool.
You can still build multi patch commits, my recent updates to search metrics collection are split into 5 patches. The first two remove unused features. The third refactors the collection to prepare for additions, the forth patch finally adds the new metrics and the final fifth patch adds a new feature that occurred to me while working on the other patches. Gerrit can make this annoying to update though as later patches need rebasing if earlier patches are changed due to code review. This doesn't need topic branches (although we can use those. Git review <topic> will send to another branch) just make multiple commits and send them up for review, gerrit will track the dependency.
This is certainly different from how many places use git and takes some time to get used to, but I've found I like it more and more as time goes on.
== Stupid code is good code ==
I need to write a blog post about this one, but like Kernighan said "Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?" [1]. Looking at our code (mainly puppet at the moment), I think there are quite a few places where we do the smart thing, when stupid would be sufficient. That's probably the cost of hiring smart people ...
We certainly suffer from this. The funny thing is one reason we don't use many external puppet modules is many of them try and be too smart about things. Feel free to open phabricator tasks about things that are too complex and look in a few of the previous committers.
== A few random points ==
- We have an incredible amount of documentation. It is easy to read
(I've been drawn into it and lost much time). It is also outdated in some place (documentation always is).
- so many different ways to deploy (puppet, trebuchet, salt, manual
stuff, ...) Certainly, we've suffered a bit from trying new things then learning they only fit one niche or another. Also you missed the other major deployment tool, scap :)
- I still have not found a global architecture schema (something like
a high level component or deplyoment diagram). But I have never seen any company having those...
Pretty sure one doesn't exist :(
discovery mailing list discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery
[X-posting to ops as this discussion is relevant there too]
On Wed, Feb 17, 2016 at 5:53 PM, Erik Bernhardson ebernhardson@wikimedia.org wrote:
On Feb 17, 2016 1:50 AM, "Guillaume Lederrey" glederrey@wikimedia.org wrote:
Hello team! == Versionning ==
**my belief ** anything deployed must have a version number
** what happens at WMF **
- deployments on labs are pretty much free-form, cherry pick whatever
you want on puppetmaster
- deployments on prod seems to have version numbers at least for
mediawiki code, puppet code is deployed directly from production branch
** comments ** Having clear version numbers implies having a conscious decision of creating a version, potentially with the appropriate checks of the content of that version, additional testing. It allows to have a clear separation between creating a version and promoting it to production. Not having versions everywhere allows for more flexibility and puts responsibility of making the right choices more on the people than on the process. Probably a good thing if you have smart enough people (and WMF seems to have a pretty smart crowd).
Having a shared git repository on deployment-puppetmaster scares the hell out of me! I'm so used to preparing anything I want to push locally and then just applying a specific tag / version...
Puppet being unversioned certainly makes it different from the rest of deployments. I think ops gets away with this by having relatively few people commiting code. It also has to do with the careful nature of puppet deployments, puppet is typically deployed one patch at a time. I think this helps with understanding what just broke everything, rather than having a big release with many disparate changes.
Puppet is _always_ deployed one patch at a time unless for very very special cases; I do think it's a very good thing for operations: there are a few reasons why it's a good thing:
1) Minimize change risk/surface: given we're a very high traffic website with a mildly complex architecture, you can't realistically think you can validate a large set of changes without throwing live traffic at them. I've see ops teams working with stricter change management strategies and the risk for *big troubles* has always been higher. 2) Speed of deployment: we're a very small team for the amount of things we're doing in parallel. We can't seriously think to keep up the pace with a stricter change management (as in, deploy a new version of our puppet code N times a week after rigorous testing and picking the changes that make the cut). 3) Keeping changes independent: since the puppet repo is large and includes all of production, having changes to independent systems being tied together is a recipe for disaster: rolling back one change would mean rolling back all of them, frustrating a lot of people and probably requiring coordination with other teams. You could just revert the affected change and make a new point release, but then I miss completely how having releases does us any good.
About cherry-picks in beta: the problem is not cherry-picking (I think it's a reasonable way to test things) but persistent cherry-picking to monkey patch problems is. I think if we follow the flow of:
- writing a patch - testing it on beta with a cherry-pick - get it merged on ops/puppet and production
and all of this happens within a week, it would be a decent compromise.
- I still have not found a global architecture schema (something like
a high level component or deplyoment diagram). But I have never seen any company having those...
Pretty sure one doesn't exist :(
Luca (the new analytics opsen) has started to work on https://wikitech.wikimedia.org/wiki/File:Infrastructure_overview.png
I asked him to share the sources for it so that everyone can improve it.
Also, if you need some oral history, just ask opsens and we'll be happy to give you an overview of how things work :)
Cheers,
Giuseppe
And here is the post I wanted to write about not being smart: https://slashdevslashrandom.wordpress.com/2016/02/19/on-the-importance-of-no.... Or at least a post that is vaguely related to what I had in mind (funny how words always sound better and much more clear when they are still inside my own head...)
On Thu, Feb 18, 2016 at 10:37 AM, Giuseppe Lavagetto glavagetto@wikimedia.org wrote:
[X-posting to ops as this discussion is relevant there too]
On Wed, Feb 17, 2016 at 5:53 PM, Erik Bernhardson ebernhardson@wikimedia.org wrote:
On Feb 17, 2016 1:50 AM, "Guillaume Lederrey" glederrey@wikimedia.org wrote:
Hello team! == Versionning ==
**my belief ** anything deployed must have a version number
** what happens at WMF **
- deployments on labs are pretty much free-form, cherry pick whatever
you want on puppetmaster
- deployments on prod seems to have version numbers at least for
mediawiki code, puppet code is deployed directly from production branch
** comments ** Having clear version numbers implies having a conscious decision of creating a version, potentially with the appropriate checks of the content of that version, additional testing. It allows to have a clear separation between creating a version and promoting it to production. Not having versions everywhere allows for more flexibility and puts responsibility of making the right choices more on the people than on the process. Probably a good thing if you have smart enough people (and WMF seems to have a pretty smart crowd).
Having a shared git repository on deployment-puppetmaster scares the hell out of me! I'm so used to preparing anything I want to push locally and then just applying a specific tag / version...
Puppet being unversioned certainly makes it different from the rest of deployments. I think ops gets away with this by having relatively few people commiting code. It also has to do with the careful nature of puppet deployments, puppet is typically deployed one patch at a time. I think this helps with understanding what just broke everything, rather than having a big release with many disparate changes.
Puppet is _always_ deployed one patch at a time unless for very very special cases; I do think it's a very good thing for operations: there are a few reasons why it's a good thing:
- Minimize change risk/surface: given we're a very high traffic
website with a mildly complex architecture, you can't realistically think you can validate a large set of changes without throwing live traffic at them. I've see ops teams working with stricter change management strategies and the risk for *big troubles* has always been higher. 2) Speed of deployment: we're a very small team for the amount of things we're doing in parallel. We can't seriously think to keep up the pace with a stricter change management (as in, deploy a new version of our puppet code N times a week after rigorous testing and picking the changes that make the cut). 3) Keeping changes independent: since the puppet repo is large and includes all of production, having changes to independent systems being tied together is a recipe for disaster: rolling back one change would mean rolling back all of them, frustrating a lot of people and probably requiring coordination with other teams. You could just revert the affected change and make a new point release, but then I miss completely how having releases does us any good.
About cherry-picks in beta: the problem is not cherry-picking (I think it's a reasonable way to test things) but persistent cherry-picking to monkey patch problems is. I think if we follow the flow of:
- writing a patch
- testing it on beta with a cherry-pick
- get it merged on ops/puppet and production
and all of this happens within a week, it would be a decent compromise.
- I still have not found a global architecture schema (something like
a high level component or deplyoment diagram). But I have never seen any company having those...
Pretty sure one doesn't exist :(
Luca (the new analytics opsen) has started to work on https://wikitech.wikimedia.org/wiki/File:Infrastructure_overview.png
I asked him to share the sources for it so that everyone can improve it.
Also, if you need some oral history, just ask opsens and we'll be happy to give you an overview of how things work :)
Cheers,
Giuseppe
discovery mailing list discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery