- Analytics-announce - lists.wikimedia.org

Major upgrade to Wmfdata-Python
by Neil Shah-Quinn 22 Nov '22

22 Nov '22

Hello y'all! Tomorrow or next week, we will release *version 2.0 of the Wmfdata-Python library* for accessing data in the internal Wikimedia analytics cluster*.* If you import Wmfdata, you will see a message asking you to update. Wmfdata 2.0 has lots of improvements, but as a new major version, it also has some *breaking changes*, which means your existing code may need some changes to run properly. These changes are pretty simple, but still, don't update if you are working to an urgent deadline! For more information on what is changing, see the change log <https://github.com/wikimedia/wmfdata-python/blob/main/CHANGELOG.md>. The quickstart notebook <https://github.com/wikimedia/wmfdata-python/blob/main/docs/quickstart.ipynb> has also been massively improved so it gives a comprehensive introduction to Wmfdata's features. As always, let me know if you have any questions! ----- Neil Shah-Quinn senior data scientist, Product Analytics <https://www.mediawiki.org/wiki/Product_Analytics> Wikimedia Foundation <https://wikimediafoundation.org/>

1 0

Scheduled maintenance for stat1005 and stat1008 - Proposed Tuesday 15th Nov 2022 at 06:00 UTC
by Steve Munene 14 Nov '22

14 Nov '22

Hello, Looking for a time to reboot two of our analytics explorer (stat) servers for a kernel upgrade. These are stat1005.eqiad.wmnet and stat1008.eqiad.wmnet. I would like to handle the reboots on *Tuesday 15th November 2022 between 06:00 UTC and 06:30 UTC* Kindly let me know if a maintenance window within these times would cause an inconvenience then I can push back the reboots to accomodate your needs. -- Best, Steve Munene

1 0

Migration to Spark 3
by Olja Dimitrijevic 11 Nov '22

11 Nov '22

Hi Everyone, The Data Engineering team is upgrading to Spark 3 and will no longer be supporting Spark 2 jobs on the Hadoop cluster after March 31st, 2023. If your team owns Spark 2 jobs in production, please plan for the time needed to upgrade your jobs. For all future work use Spark 3. You can find more information about the upgrade on: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Spark/Migrati…. Please add any missing jobs to the migration list on that page. If you need help from the data engineering team you can reach out to Jackeline Argüello <jarguello-ctr(a)wikimedia.org> or join us for the data engineering office hours. -- *Olja Dimitrijevic* (she/her) Director of Data Engineering Wikimedia Foundation <https://wikimediafoundation.org/>

1 0

chanvannak1211@outlook.com
by Thon Chanvannak 03 Nov '22

03 Nov '22

-- chanvannakthon.tcv16(a)gmail.com

1 0

Would like access
by Russell Herndon 22 Jul '22

22 Jul '22

1 0

Scheduled maintenance for dbstore servers - tomorrow at 09:00 UTC
by Ben Tullis 06 Jul '22

06 Jul '22

Hello, I need to schedule a maintenance window to reboot our three dbstore servers in order to pick up a new kernel version. These servers are dbstore100[3,5,7] Together these servers host the analytics-mysql service: https://wikitech.wikimedia.org/wiki/Analytics/Systems/MariaDB Ideally, I would like to reboot both of these tomorrow, *July the 7th between 09:00 UTC and 10:00 UTC*. During this hour's maintenance window, access to the various sections and shards will be intermittent, as the three servers hosting them are rebooted. https://wikitech.wikimedia.org/wiki/MariaDB#Sections_and_shards Note that this maintenance does not affect the Wikireplica databases, available to Toolforge and Cloud Services: https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database Please let me know if this maintenance window is too soon and would cause you inconvenience. If this is the case, then I will then look to push back back the date of the reboots to accommodate your needs. Likewise, if you have any questions, please don't hesitate to let me know. Kind regards, Ben -- *Ben Tullis*(he/him) Senior Site Reliability Engineer Wikimedia Foundation <https://wikimediafoundation.org/>

1 0

Presto Upgrade to 0.273.3
by Andrew Otto 05 Jul '22

05 Jul '22

Hello! We will be upgrading Presto to version 0.273.3 on Wednesday July 6th. This will require a restart of the Presto cluster. Any running queries may be interrupted, but we don't expect any issues aside from that. If you notice any new problems with Superset dashboards that use Presto, after we do this upgrade, please let us know. Rolling back is relatively easy. We will announce again here once the upgrade is complete. You can follow along at https://phabricator.wikimedia.org/T311525 -Andrew Otto & WMF Data Engineering

1 0

Iists.wikimedia
by �� ѡ�� Ѩ��ҹԪ�� 20 Jun '22

20 Jun '22

�Ѻ Outlook ��Ѻ Android<https://aka.ms/AAb9ysg> vivo1723

1 0

Scheduled maintenance for Hive, Superset, Airflow, Oozie - proposed tomorrow (2022-05-05) at 09:00 UTC
by Ben Tullis 04 May '22

04 May '22

Good morning, I have to find a convenient time to reboot a key server (an-coord1001) which supports the analytics services. Unfortunately, although we have a standby server, the process for switching the two servers' roles is somewhat arduous so the pragmatic option is to schedule a brief period of downtime for the affected services, while the primary server is rebooted. These services are: Hive, Superset, Oozie, Presto, and DataHub. The outage should last for less than 10 minutes and I propose to carry out this maintenance at 09:00 UTC tomorrow - May 5th. Please do let me know if this is going to impact you negatively and I will try to find another maintenance window. If you have any other queries or concerns, please don't hesitate to get in touch. Thanks and best wishes, Ben -- *Ben Tullis*(he/him) Senior Site Reliability Engineer Wikimedia Foundation <https://wikimediafoundation.org/>

2 1

Analytics clients scheduled downtime this Friday
by Razzi Abuissa 02 May '22

02 May '22

Hi all, In order to upgrade the kernels of various analytics hosts, we have to reboot the machines, which will make several analytics clients temporarily unavailable. The maintenance will be Friday at 17:00-19:00 UTC (10am-12pm Pacific). While this is happening, the following hosts and services will be temporarily unavailable for a few minutes at a time: - stat machines (stat1004.eqiad.wmnet etc) - superset - turnilo - hadoop UI, druid UI If you are planning a long-running query that will overlap with that time, let me know as soon as possible and we'll find a resolution. Respond to this email or come visit us on IRC at #wikimedia-analytics on libera.chat. Regards, Razzi

1 0

2024

2023

2022

2021

2020

Analytics-announce