Let's also take this into a new thread. There are a lot of different conversations now going on....
On Fri, Mar 7, 2014 at 9:21 AM, Brad Jorsch (Anomie) bjorsch@wikimedia.org wrote:
On Fri, Mar 7, 2014 at 12:08 PM, C. Scott Ananian cananian@wikimedia.orgwrote:
I agree. I think a better technical solution would be to halt jenkins' auto-merge for the 24 hour period, so that +2'ed changes are not automatically merged until after the branch is cut.
I don't see how that's any better. Things still aren't getting merged.
If anything, the "cut using master@{24 hours ago}" is a much better idea.[1] Although it might be useful to see if Wednesday tends to be a relatively active bug-fixing day as the community on non-Wikipedia sites finds issues in the version that was deployed to them on Tuesday, in which case keeping those from making it into the new cut on Thursday (and so requiring more backports or waiting an extra week for fixes) might not be so great.
[1]: And yes, 'master@{24 hours ago}' is valid git syntax.
-- Brad Jorsch (Anomie) Software Engineer Wikimedia Foundation _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Fri, Mar 7, 2014 at 10:30 AM, Jon Robson jdlrobson@gmail.com wrote:
Let's also take this into a new thread. There are a lot of different conversations now going on....
On Fri, Mar 7, 2014 at 9:21 AM, Brad Jorsch (Anomie) bjorsch@wikimedia.org wrote:
On Fri, Mar 7, 2014 at 12:08 PM, C. Scott Ananian cananian@wikimedia.orgwrote:
I agree. I think a better technical solution would be to halt jenkins' auto-merge for the 24 hour period, so that +2'ed changes are not automatically merged until after the branch is cut.
I don't see how that's any better. Things still aren't getting merged.
If anything, the "cut using master@{24 hours ago}" is a much better idea.[1] Although it might be useful to see if Wednesday tends to be a relatively active bug-fixing day as the community on non-Wikipedia sites finds issues in the version that was deployed to them on Tuesday, in which case keeping those from making it into the new cut on Thursday (and so requiring more backports or waiting an extra week for fixes) might not be so great.
[1]: And yes, 'master@{24 hours ago}' is valid git syntax.
In another project in another place with another team... I solved a very similar problem by creating release branches from tags that the testing system placed on revisions that had passed the integration test suite rather than whatever HEAD happened to be at the time that the branch was cut. We had a very heavy (~2hr wall clock time; ~24hr cpu time) integration test suite that ran once a day. When the master job that kicked off these tests found that no child tests had failed it would tag the revisions that had been tested across all of the involved repositories with something like 'integration-YYYYMMDD'. Our release branch was forked from these tags. In the MW situation it might be nicer to run such test processes more often that once a day but the fundamental idea could/should work.
This type of gated promotion process won't stop all problems from getting to the next stage, but it might make the inclusion of code into the weekly branch slightly safer. It also could be seen as stepping stone to further automation that could decrease the time between production minor version deploys. Someday we might be able to deploy "continuously" to some set of wikis where continuously doesn't mean every individual commit but every time the integration test suite says that things are stable.
Bryan
<quote name="Jon Robson" date="2014-03-07" time="09:30:09 -0800">
Let's also take this into a new thread. There are a lot of different conversations now going on....
My opinion is that fixing this with policy is going to be hard.
Either everyone who commits needs to be mindful of what day/time it is and whether or not another human has cut the new branch yet (which isn't set in stone on when, it varies by a couple hours, depending on a lot of factors), OR we modify the branch cut based on some arbitrary offset (24 hours ago) or some human looks at the merges and picks a point.
None of those are ideal/scalable.
What we should do, however, is have a true "deployment pipeline". Briefly defined: A deployment pipeline is a sequence of events that increase your confidence in the quality of any particular build/commit point.
A typical example is: commit -> unit tests -> integration tests -> manual tests -> release
Each step has the ability to fail a build, which means "You shall not pass!" to that commit point. The earlier you get a "You shall not pass!" the better because it means less time waiting by the developers to know if what they committed is ok or not.
What this means for us: The Mobile team is actually a good example. They are doing The Right Thing and have a lot of tests written, including browser tests. They run into problems when, eg: they write a new feature and associated test and commit it.
Beta Cluster gets that code (feature and test) within 5 minutes.
But, test.wikipedia and en.wikipedia get that feature much later, days later.
However, the test code is run by Jenkins across all environments (beta cluster, test.wikipedia, en.wikipedia etc) all the time. So, the mobile team gets a ton of false positives when their new test runs against eg production where the feature isn't enabled yet (on purpose).
The QA team is working on this problem now (loosely termed the "versioned test problem").
How a pipeline would help:
Really, a pipeline isn't a thing like your indoor plumping but more of a mindset/way of designing your test infrastructure. But, it means that you keep things self-contained (contrary to the mobile example above) and things progress through the pipeline in a predicable way/pace.
It also means that each code commit spends the exact same amount of time in the various stages as other code commits. Right now some code sits on Beta Cluster for 7 days before hitting production, whereas other code spends 0-5 minutes. That's not good.
Wanna help us on this problem? We're hiring: https://hire.jobvite.com/Jobvite/jobvite.aspx?b=nHZ0zmw6 (2 job openings)
Greg
I feel like I should probably post here about the current Wikibase / Wikidata deployment pipeline too which probably differs slightly to other products.
On a per commit basis: A commit is made, the unit tests run on jenkins, the commit is reviewed, amended, merged, Jenkins again runs the unit tests as gate submit, Travis also runs the unit tests post merge testing against php 5.3, 5.4, 5.5 and both sqlite and mysql.
Daily at 10AM UTC: Our build process is triggered and creates a build for Wikidata, Both the WMF jenkins and our WMDE jenkins run the unit tests, if both pass we +2CR and it is merged, this is then deployed straight to beta where our Jenkins runs all of our Selenium tests, the tests report to the build gerrit commit depending on their outcome.
Branch Day (even 2 weeks): Tuesday is generally branch day, we branch the Wikidata repo all the unit tests and selenium tests are re run, we deploy to test.wikidata, manually test and on thursday and wikidata.org the following tuesday.
We have very good test coverage which generally makes everything much easier! I probably missed something of interest above but generally everything is covered.
Addshore
On 7 March 2014 19:08, Greg Grossmeier greg@wikimedia.org wrote:
<quote name="Jon Robson" date="2014-03-07" time="09:30:09 -0800"> > Let's also take this into a new thread. There are a lot of different > conversations now going on....
My opinion is that fixing this with policy is going to be hard.
Either everyone who commits needs to be mindful of what day/time it is and whether or not another human has cut the new branch yet (which isn't set in stone on when, it varies by a couple hours, depending on a lot of factors), OR we modify the branch cut based on some arbitrary offset (24 hours ago) or some human looks at the merges and picks a point.
None of those are ideal/scalable.
What we should do, however, is have a true "deployment pipeline". Briefly defined: A deployment pipeline is a sequence of events that increase your confidence in the quality of any particular build/commit point.
A typical example is: commit -> unit tests -> integration tests -> manual tests -> release
Each step has the ability to fail a build, which means "You shall not pass!" to that commit point. The earlier you get a "You shall not pass!" the better because it means less time waiting by the developers to know if what they committed is ok or not.
What this means for us: The Mobile team is actually a good example. They are doing The Right Thing and have a lot of tests written, including browser tests. They run into problems when, eg: they write a new feature and associated test and commit it.
Beta Cluster gets that code (feature and test) within 5 minutes.
But, test.wikipedia and en.wikipedia get that feature much later, days later.
However, the test code is run by Jenkins across all environments (beta cluster, test.wikipedia, en.wikipedia etc) all the time. So, the mobile team gets a ton of false positives when their new test runs against eg production where the feature isn't enabled yet (on purpose).
The QA team is working on this problem now (loosely termed the "versioned test problem").
How a pipeline would help:
Really, a pipeline isn't a thing like your indoor plumping but more of a mindset/way of designing your test infrastructure. But, it means that you keep things self-contained (contrary to the mobile example above) and things progress through the pipeline in a predicable way/pace.
It also means that each code commit spends the exact same amount of time in the various stages as other code commits. Right now some code sits on Beta Cluster for 7 days before hitting production, whereas other code spends 0-5 minutes. That's not good.
Wanna help us on this problem? We're hiring: https://hire.jobvite.com/Jobvite/jobvite.aspx?b=nHZ0zmw6 (2 job openings)
Greg
-- | Greg Grossmeier GPG: B2FA 27B1 F7EB D327 6B8E | | identi.ca: @greg A18D 1138 8E47 FAC8 1C7D |
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
One thing I didn't mention is that we recently marked a few of our selenium tests as smoke tests which I just spotted was suggested in the other email thread for core!
Great idea!
Addshore
On 7 March 2014 19:34, addshorewiki addshorewiki@gmail.com wrote:
I feel like I should probably post here about the current Wikibase / Wikidata deployment pipeline too which probably differs slightly to other products.
On a per commit basis: A commit is made, the unit tests run on jenkins, the commit is reviewed, amended, merged, Jenkins again runs the unit tests as gate submit, Travis also runs the unit tests post merge testing against php 5.3, 5.4, 5.5 and both sqlite and mysql.
Daily at 10AM UTC: Our build process is triggered and creates a build for Wikidata, Both the WMF jenkins and our WMDE jenkins run the unit tests, if both pass we +2CR and it is merged, this is then deployed straight to beta where our Jenkins runs all of our Selenium tests, the tests report to the build gerrit commit depending on their outcome.
Branch Day (even 2 weeks): Tuesday is generally branch day, we branch the Wikidata repo all the unit tests and selenium tests are re run, we deploy to test.wikidata, manually test and on thursday and wikidata.org the following tuesday.
We have very good test coverage which generally makes everything much easier! I probably missed something of interest above but generally everything is covered.
Addshore
On 7 March 2014 19:08, Greg Grossmeier greg@wikimedia.org wrote:
<quote name="Jon Robson" date="2014-03-07" time="09:30:09 -0800"> > Let's also take this into a new thread. There are a lot of different > conversations now going on....
My opinion is that fixing this with policy is going to be hard.
Either everyone who commits needs to be mindful of what day/time it is and whether or not another human has cut the new branch yet (which isn't set in stone on when, it varies by a couple hours, depending on a lot of factors), OR we modify the branch cut based on some arbitrary offset (24 hours ago) or some human looks at the merges and picks a point.
None of those are ideal/scalable.
What we should do, however, is have a true "deployment pipeline". Briefly defined: A deployment pipeline is a sequence of events that increase your confidence in the quality of any particular build/commit point.
A typical example is: commit -> unit tests -> integration tests -> manual tests -> release
Each step has the ability to fail a build, which means "You shall not pass!" to that commit point. The earlier you get a "You shall not pass!" the better because it means less time waiting by the developers to know if what they committed is ok or not.
What this means for us: The Mobile team is actually a good example. They are doing The Right Thing and have a lot of tests written, including browser tests. They run into problems when, eg: they write a new feature and associated test and commit it.
Beta Cluster gets that code (feature and test) within 5 minutes.
But, test.wikipedia and en.wikipedia get that feature much later, days later.
However, the test code is run by Jenkins across all environments (beta cluster, test.wikipedia, en.wikipedia etc) all the time. So, the mobile team gets a ton of false positives when their new test runs against eg production where the feature isn't enabled yet (on purpose).
The QA team is working on this problem now (loosely termed the "versioned test problem").
How a pipeline would help:
Really, a pipeline isn't a thing like your indoor plumping but more of a mindset/way of designing your test infrastructure. But, it means that you keep things self-contained (contrary to the mobile example above) and things progress through the pipeline in a predicable way/pace.
It also means that each code commit spends the exact same amount of time in the various stages as other code commits. Right now some code sits on Beta Cluster for 7 days before hitting production, whereas other code spends 0-5 minutes. That's not good.
Wanna help us on this problem? We're hiring: https://hire.jobvite.com/Jobvite/jobvite.aspx?b=nHZ0zmw6 (2 job openings)
Greg
-- | Greg Grossmeier GPG: B2FA 27B1 F7EB D327 6B8E | | identi.ca: @greg A18D 1138 8E47 FAC8 1C7D |
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 03/07/2014 10:08 AM, Greg Grossmeier wrote:
What we should do, however, is have a true "deployment pipeline". Briefly defined: A deployment pipeline is a sequence of events that increase your confidence in the quality of any particular build/commit point.
A typical example is: commit -> unit tests -> integration tests -> manual tests -> release
This is pretty much the way this currently works in Parsoid. We deploy twice per week, with integration tests currently being our mass round-trip testing setup on 160k pages. Those tests take a few hours to run, so we only deploy revisions for which round-trip testing has finished. Anything uncovered there is fed back to improvements in parser tests, so over time it has become less common to catch regressions only in round-trip tests. With improved integration tests manual testing should also mostly be eliminated over time.
Really, a pipeline isn't a thing like your indoor plumping but more of a mindset/way of designing your test infrastructure. But, it means that you keep things self-contained (contrary to the mobile example above) and things progress through the pipeline in a predicable way/pace.
This is one of the big arguments for narrow interfaces and services.
In Parsoid we have small mock implementations of the MediaWiki API end points we use which allows us to run parser tests without a wiki in the background. Network services tend to be at a medium granularity (coarser than modules, finer than the entire system) with necessarily narrow interfaces. Doing much of the testing at this level often seems to strike a good balance between effort, run time (still suitable for CI) and capturing the interface behavior essential to users of the service.
Gabriel
wikitech-l@lists.wikimedia.org