On Tue, Oct 27, 2015 at 8:02 AM, Risker <risker.wp(a)gmail.com> wrote:
The incident report does not go far enough back into
the history of the
incident. It does not explain how this code managed to get into the
deployment chain with a fatal error in it.
Actually, it does. Erik writes "This occured because the patch for the
CirrusSearch repository that removed the schema should have been deployed
before the change that adds it to the WikimediaEvents repository."
In other words, there was nothing wrong with the code itself. The problem
was that the multiple pieces of the change needed to be done in a
particular order during the manual backporting process, but they were not
done in that order.
If this had waited for the train deployment, both pieces would have been
done simultaneously and it wouldn't have been an issue, just as it wasn't
an issue when these changes were done in master and automatically deployed
to Beta Labs.
--
Brad Jorsch (Anomie)
Senior Software Engineer
Wikimedia Foundation