Not sure if this factors in, but Analytics will be having an offsite in NYC
the week of Sept 16 - 22. Luca and I will be less available during that
week.
On Mon, Jul 30, 2018 at 3:23 PM Alexandros Kosiaris <akosiaris(a)wikimedia.org>
wrote:
> Hello everyone,
>
> I hope I have included all the relevant teams, please add anyone I
> might have forgotten
>
> As you probably know, Ops has decided to perform a datacenter
> switchover this quarter. It's already close to 1.5 years[1] since the
> last one and we 've been wanting to do more like 1 per year. Our
> tracking task is https://phabricator.wikimedia.org/T199073 and work
> has already started in various sub areas. One thing that we need to
> decide on is the actual dates. Having looked at the various
> possibilities and the work that needs to be done up to that point, we
> have come to the conclusion that the weeks of
>
> * 17-21 Sept 2018
> * 24-28 Sept 2018
>
> for the switchover and
>
> * 08-12 Oct 2018
> * 15-19 Oct 2018
>
> for the switch back.
>
> Keep in mind that this time around we want to do at least 3 weeks, 1
> more week than the previous switchover, so if we do choose the week of
> 24-29 Sept for the switchover, we have to do the week of 15-19 for the
> switchback.
>
> We also need to decide on the actual time of the date of course. So
> this is your invitation to a scheduling meeting about discussing that.
>
> Remembering the previous switchover's scheduling meeting I think that
> coming up with a proposal that can be discussed individually/per team
> before the large meeting can be beneficial.
>
> So here's a proposal that just copies the previous
> switchover/switchback schedule [1]
>
> Switchover:
>
> Services: Tuesday, September 18th 2018 14:30 UTC
> Media storage/Swift: Tuesday, September 18th 2018 15:00 UTC
> Traffic: Tuesday, September 18th 2018 19:00 UTC
> Mediawiki: Wednesday, September 19th 2018: 14:00 UTC
>
> Switchback:
>
> Traffic: Tuesday, October 9th 2018 19:00 UTC (and maybe some prep work
> on Monday)
> Mediawiki: Wednesday, October 10th 2018: 14:00 UTC
> Services: Thursday, October 11th 2018 14:30 UTC
> Media storage/Swift: Thursday, October 11th 2018 15:00 UTC
>
> Now as for the meeting date, google for this week suggested August 1st
> 14:00 UTC as the time with the least possible conflicts, so here it
> is.
>
> [1] https://wikitech.wikimedia.org/wiki/Switch_Datacenter
>
> Regards,
>
> --
> Alexandros Kosiaris <akosiaris(a)wikimedia.org>
>
> _______________________________________________
> Ops-private mailing list
> Ops-private(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/ops-private
>
[Adding some other mailing lists in Cc]
Hi everybody,
as a lot of you have probably already noticed yesterday reading the
operations@ mailing list, we had an outage of the Kafka Main eqiad cluster
that forced us to switch the Eventbus and Eventstreams services to codfw.
All the precise timings will be listed in
https://wikitech.wikimedia.org/wiki/Incident_documentation/20180711-kafka-e…,
but for a quick glimpse:
2018-07-11 17:00 UTC - Eventbus service switched to codfw
2018-07-11 18:44 UTC - Eventstreams service switched to codfw
We are going to switch back those services to eqiad during the next couple
of hours. The consumers of the Eventstreams service may get some failures
or data drops, apologies in advance for the trouble.
Cheers,
Luca
Il giorno gio 12 lug 2018 alle ore 00:00 Luca Toscano <
ltoscano(a)wikimedia.org> ha scritto:
> Hi everybody,
>
> as you might have seen from the operations' channel on IRC the Kafka Main
> Eqiad cluster (kafka100[1-3].eqiad.wmnet) suffered a long outage due to new
> topics pushed out with too long names (causing fs operation issues, etc..).
> I'll update this email thread tomorrow EU time with more details, tasks,
> precise root cause, etc.., but the important bit to know is that Eventbus
> and Eventstreams have been failed over to the Kafka Main Codfw cluster.
> This should be transparent to everybody but please let us know otherwise.
>
> Thanks for the patience!
>
> (a very sleepy :) Luca
>
>