Hi all,
tl;dr
On Monday August 6 we are making EventStreams multi-DC, and this should be
transparent to users.
Due to a recent outage
<https://wikitech.wikimedia.org/wiki/Incident_documentation/20180711-kafka-e…>
of the our main eqiad Kafka cluster, we want to make the EventStreams
service support multiple datacenters for better high availability. To do
so, we need to hide the Kafka cluster message offsets from the
SSE/EventSource clients. On Monday August 6th, we will deploy a change to
EventStreams that will make it use message timestamps instead of message
offsets in the SSE/EventSource id field that is returned for every received
message. This will allow EventStreams to be backed by any Kafka cluster,
with auto-resuming during reconnect based on timestamp instead of Kafka
cluster based logical offsets.
This deployment should be transparent to clients. SSE/EventSource clients
will reconnect automatically and begin to use timestamps instead of offsets
in the Last-Event-ID.
You can read more about this work here:
https://phabricator.wikimedia.org/T199433
- Andrew Otto, Systems Engineer, WMF
Not sure if this factors in, but Analytics will be having an offsite in NYC
the week of Sept 16 - 22. Luca and I will be less available during that
week.
On Mon, Jul 30, 2018 at 3:23 PM Alexandros Kosiaris <akosiaris(a)wikimedia.org>
wrote:
> Hello everyone,
>
> I hope I have included all the relevant teams, please add anyone I
> might have forgotten
>
> As you probably know, Ops has decided to perform a datacenter
> switchover this quarter. It's already close to 1.5 years[1] since the
> last one and we 've been wanting to do more like 1 per year. Our
> tracking task is https://phabricator.wikimedia.org/T199073 and work
> has already started in various sub areas. One thing that we need to
> decide on is the actual dates. Having looked at the various
> possibilities and the work that needs to be done up to that point, we
> have come to the conclusion that the weeks of
>
> * 17-21 Sept 2018
> * 24-28 Sept 2018
>
> for the switchover and
>
> * 08-12 Oct 2018
> * 15-19 Oct 2018
>
> for the switch back.
>
> Keep in mind that this time around we want to do at least 3 weeks, 1
> more week than the previous switchover, so if we do choose the week of
> 24-29 Sept for the switchover, we have to do the week of 15-19 for the
> switchback.
>
> We also need to decide on the actual time of the date of course. So
> this is your invitation to a scheduling meeting about discussing that.
>
> Remembering the previous switchover's scheduling meeting I think that
> coming up with a proposal that can be discussed individually/per team
> before the large meeting can be beneficial.
>
> So here's a proposal that just copies the previous
> switchover/switchback schedule [1]
>
> Switchover:
>
> Services: Tuesday, September 18th 2018 14:30 UTC
> Media storage/Swift: Tuesday, September 18th 2018 15:00 UTC
> Traffic: Tuesday, September 18th 2018 19:00 UTC
> Mediawiki: Wednesday, September 19th 2018: 14:00 UTC
>
> Switchback:
>
> Traffic: Tuesday, October 9th 2018 19:00 UTC (and maybe some prep work
> on Monday)
> Mediawiki: Wednesday, October 10th 2018: 14:00 UTC
> Services: Thursday, October 11th 2018 14:30 UTC
> Media storage/Swift: Thursday, October 11th 2018 15:00 UTC
>
> Now as for the meeting date, google for this week suggested August 1st
> 14:00 UTC as the time with the least possible conflicts, so here it
> is.
>
> [1] https://wikitech.wikimedia.org/wiki/Switch_Datacenter
>
> Regards,
>
> --
> Alexandros Kosiaris <akosiaris(a)wikimedia.org>
>
> _______________________________________________
> Ops-private mailing list
> Ops-private(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/ops-private
>