Hello,
There will be a couple of brief interruptions to some the Data Platform
services this Wednesday and Thursday, as we are supporting SRE
Infrastructure Foundations with some of their work to upgrade the
network switches in T348977 <https://phabricator.wikimedia.org/T348977>.
Specifically, we need to perform a role swap of our two Analytics_Meta
<https://wikitech.wikimedia.org/wiki/Data_Platform/Systems/Analytics_Meta>
database servers, which serve Hive, Druid, DataHub and Superset. The
roles will be swapped on Wednesday at around 10:00 UTC and swapped back
on Thursday at around 10:00 UTC. On each occasion, there will be a brief
period where the databases are made read-only, while the replication
roles are swapped and the application configuration is updated. This may
result in errors if you are actively using any of the applications at
the time.
In order to minimize the chance of data processing errors, I will also
be pausing ingestion to the data lake around 1 hour before each role
swap, so that data pipelines do not try to write to Hive or ingest to
Druid. Therefore you may also notice a delay for data to arrive in HDFS,
Hive, and Druid, but this shouldn't be more than an hour or so.
If you have any queries or concerns, please don't hesitate to let us
know by reply to this email, or on #data-engineering-collab on Slack, or
#wikimedia-analytics on IRC.
Kind regards,
Ben
--
*Ben Tullis*(he/him)
Senior Site Reliability Engineer
Wikimedia Foundation <https://wikimediafoundation.org/>