This just happened as scheduled, with the only relevant problem is that we went overtime due to start later than expected. The reason for this is that more preparation than expected was needed for both restarting the servers and stopping the event writing.

* Eventlog events insertion process was stopped between 13:57 and 14:32. Events started to flow-in with no relevant issues:
https://phab.wmfusercontent.org/file/data/dhmuljzblggo2pp6r6o5/PHID-FILE-w6ko74r3cngqfd6cfhfy/Screenshot_from_2016-12-09_17%3A05%3A52.png

master [log]> SELECT now(), max(timestamp) FROM NavigationTiming_15485142;
+---------------------+----------------+
| now()               | max(timestamp) |
+---------------------+----------------+
| 2016-12-09 16:10:23 | 20161209160757 |
+---------------------+----------------+

replica [log]> SELECT now(), max(timestamp) FROM NavigationTiming_15485142;
+---------------------+----------------+
| now()               | max(timestamp) |
+---------------------+----------------+
| 2016-12-09 16:10:33 | 20161209160614 |
+---------------------+----------------+

All dates given are UTC.

* db1047 "analytics replica" was unavailable between 14:41:45 and 14:55:53
* dbstore1002 "analytics store" was unavailable between 14:49:54 and 15:05:28

Some read only periods happened beyond the times mentioned to avoid sending writes to the wrong server. All servers are back in read-write mode. It may take some minutes for all information to be synced back to the slaves, mysql tends to be a bit slower after a recent restart until its buffers are warmed up; I have created some extra sync processes to speed the catch up.

The reason for these small periods were restarts needed to apply security updates and certificate renewals.

Please communicate with me if you have any further questions or concerns.


On Fri, Dec 9, 2016 at 1:39 PM, Jaime Crespo <jcrespo@wikimedia.org> wrote:
This will happen today (Friday December, 9 2016) in about one hour 13:30 UTC, and will take another hour to restart all servers, during which some queries may be temporarily unavailable. No data will be lost for eventlogging, because insertion will be temporarily stopped with the kind help of Joseph.

I will report back again when the work has finished, and everything is back to normal.

--
Jaime Crespo
<http://wikimedia.org>



--
Jaime Crespo
<http://wikimedia.org>