Hi Everyone,
The Data Engineering team is upgrading to Spark 3 and will no longer be
supporting Spark 2 jobs on the Hadoop cluster after March 31st, 2023. If
your team owns Spark 2 jobs in production, please plan for the time needed
to upgrade your jobs. For all future work use Spark 3.
You can find more information about the upgrade on:
https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Spark/Migrati….
Please add any missing jobs to the migration list on that page. If you need
help from the data engineering team you can reach out to Jackeline Argüello
<jarguello-ctr(a)wikimedia.org> or join us for the data engineering office
hours.
--
*Olja Dimitrijevic* (she/her)
Director of Data Engineering
Wikimedia Foundation <https://wikimediafoundation.org/>
Hello,
I need to schedule a maintenance window to reboot our three dbstore
servers in order to pick up a new kernel version. These servers are
dbstore100[3,5,7]
Together these servers host the analytics-mysql service:
https://wikitech.wikimedia.org/wiki/Analytics/Systems/MariaDB
Ideally, I would like to reboot both of these tomorrow, *July the 7th
between 09:00 UTC and 10:00 UTC*.
During this hour's maintenance window, access to the various sections
and shards will be intermittent, as the three servers hosting them are
rebooted.
https://wikitech.wikimedia.org/wiki/MariaDB#Sections_and_shards
Note that this maintenance does not affect the Wikireplica databases,
available to Toolforge and Cloud Services:
https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database
Please let me know if this maintenance window is too soon and would
cause you inconvenience.
If this is the case, then I will then look to push back back the date of
the reboots to accommodate your needs.
Likewise, if you have any questions, please don't hesitate to let me know.
Kind regards,
Ben
--
*Ben Tullis*(he/him)
Senior Site Reliability Engineer
Wikimedia Foundation <https://wikimediafoundation.org/>
Hello!
We will be upgrading Presto to version 0.273.3 on Wednesday July 6th. This
will require a restart of the Presto cluster. Any running queries may be
interrupted, but we don't expect any issues aside from that.
If you notice any new problems with Superset dashboards that use Presto,
after we do this upgrade, please let us know. Rolling back is relatively
easy.
We will announce again here once the upgrade is complete.
You can follow along at https://phabricator.wikimedia.org/T311525
-Andrew Otto & WMF Data Engineering
Good morning,
I have to find a convenient time to reboot a key server (an-coord1001)
which supports the analytics services.
Unfortunately, although we have a standby server, the process for
switching the two servers' roles is somewhat arduous so the pragmatic
option is to schedule a brief period of downtime for the affected
services, while the primary server is rebooted. These services are:
Hive, Superset, Oozie, Presto, and DataHub.
The outage should last for less than 10 minutes and I propose to carry
out this maintenance at 09:00 UTC tomorrow - May 5th.
Please do let me know if this is going to impact you negatively and I
will try to find another maintenance window. If you have any other
queries or concerns, please don't hesitate to get in touch.
Thanks and best wishes,
Ben
--
*Ben Tullis*(he/him)
Senior Site Reliability Engineer
Wikimedia Foundation <https://wikimediafoundation.org/>
Hi all,
In order to upgrade the kernels of various analytics hosts, we have to
reboot the machines, which will make several analytics clients temporarily
unavailable.
The maintenance will be Friday at 17:00-19:00 UTC (10am-12pm Pacific).
While this is happening, the following hosts and services will be
temporarily unavailable for a few minutes at a time:
- stat machines (stat1004.eqiad.wmnet etc)
- superset
- turnilo
- hadoop UI, druid UI
If you are planning a long-running query that will overlap with that time,
let me know as soon as possible and we'll find a resolution. Respond to
this email or come visit us on IRC at #wikimedia-analytics on libera.chat.
Regards,
Razzi
Hi analysts and friends,
The following database hosts will be offline starting at 15:00 UTC on
Tuesday April 5, for approximately 3 hours, for an operating system upgrade.
- dbstore1003
- dbstore1005
- dbstore1007
These are the private full replica analytics mariadb hosts (
https://wikitech.wikimedia.org/wiki/Analytics/Systems/MariaDB) that can be
accessed from the stat boxes via analytics-mysql and Jupyter.
Read about the upgrade here: https://phabricator.wikimedia.org/T299481
Respond to this email, comment on the ticket, or come say hi at
#wikimedia-analytics if you have any concerns.
Regards,
Razzi & Data Engineering
Hello,
I need to find a time to reboot two of our analytics explorer (aka stat)
servers in order to pick up a new kernel version.
These servers are the two that have the AMD GPUs in them, namely
stat1005 and stat1008.
Ideally I would like to reboot both of these this *Friday March the
24th, between 10:00 UTC and 10:30 UTC*.
Please let me know if this maintenance window is too soon and would
cause you inconvenience.
If this is the case, then I will then look to push back back the date of
the reboots to accommodate your needs.
Kind regards,
Ben
--
*Ben Tullis*(he/him)
Senior Site Reliability Engineer
Wikimedia Foundation <https://wikimediafoundation.org/>