February 2021 - Analytics-announce

by Andrew Otto

FYI, SRE has re-imaged bast1002.wikimedia.org. This means that if you use this as your ssh bastion, you will get a warning about SSH key change. If the key being offered you is the one in https://wikitech.wikimedia.org/wiki/Help:SSH_Fingerprints/bast1002.wikimedi…, you can accept the new key. ---------- Forwarded message --------- From: Moritz Mühlenhoff <mmuhlenhoff(a)wikimedia.org> Date: Wed, Feb 24, 2021 at 4:44 AM Subject: Re: [Ops] Reimage of bast1002 tomorrow To: Operations Engineers <ops(a)lists.wikimedia.org> On Tue, Feb 23, 2021 at 11:25 AM Moritz Mühlenhoff <mmuhlenhoff(a)wikimedia.org> wrote: > I'm going to reimage bast1002 for an OS update to Buster tomorrow > during the early European morning. Please use a different bastion > during that time. This is complete, you can use bast1002.wikimedia.org again. You can fetch the updated fingerprint by running the wmf-update-known-hosts-production script, or as a fallback updated fingerprints are also at https://wikitech.wikimedia.org/wiki/Help:SSH_Fingerprints/bast1002.wikimedi… . Cheers, Moritz _______________________________________________ Ops mailing list Ops(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/ops

3 years, 2 months

1
0
0 0

Release of wmfdata-python 1.1

by Neil Shah-Quinn

Hey all! wmfdata-python <https://github.com/wikimedia/wmfdata-python> (a package that streamlines access to private analytics data) has been updated to version 1.1. Here's what's new: - The new presto module supports querying the Data Lake using Presto <https://wikitech.wikimedia.org/wiki/Analytics/Systems/Presto>. - The spark module has been refactored to support local and custom sessions. - A new utils.get_dblist function provides easy access to wiki database lists, which is particularly useful with mariadb.run. - The hive.run_cli function now creates its temp files in standard location, to avoid creating distracting new entries in the current working directory. Many thanks to: - Andrew Otto and Adam Roses Wight for writing significant new code - Mikhail Popov, Andrew Otto, and Luca Toscano for careful code review As always, if you have questions or feedback about wmfdata-python, please email Product Analytics at product-analytics(a)wikimedia.org. -- Neil Shah-Quinn senior data scientist, Product Analytics <https://www.mediawiki.org/wiki/Product_Analytics> Wikimedia Foundation <https://wikimediafoundation.org/>

3 years, 2 months

1
0
0 0

Reboot of stat1005/stat1008 on Monday Feb 22nd 9AM CET

by Luca Toscano

Hi everybody, I'd need to reboot stat1005 and stat1008 for kernel upgrades. The scheduled maintenance window is: Monday Feb 22nd 9AM CET (so early EU morning). Also added to https://wikitech.wikimedia.org/wiki/Analytics/Systems/Maintenance_Schedule As always, let me know if this is a problem for your work, in case we'll schedule a different time window :) Luca

3 years, 2 months

1
0
0 0

Reboot of stat1004 / stat1006 /stat1007 for Linux Kernel upgrades - Feb 17th 9AM CET

by Luca Toscano

Hi everybody, I am back with reboots, please be patient with me :) I am going to reboot stat1004 / stat1006 / stat1007 (only these three for the moment) on Wednesday Feb 17 at 9AM CET for Linux Kernel upgrades. Please let me know if this impacts your work, in case we'll find another maintenance window :) Scheduled maintenance also outlined in https://wikitech.wikimedia.org/wiki/Analytics/Systems/Maintenance_Schedule Luca (on behalf of the Data Engineering / Analytics team)

3 years, 2 months

1
0
0 0

Hadoop maintenance scheduled for February 9th - Downtime for some hours during the EU morning

by Luca Toscano

Hi everybody, The upgrade day has been scheduled, we are going to migrate Hadoop to the Apache Bigtop distribution on February 9th, during the EU morning. This will require from 2 to 4 hours of Hadoop downtime, since the upgrade will be very delicate and complex. I created https://phabricator.wikimedia.org/T273711 to track more precisely timings and updates, please use it to ask questions and to tell us if this impacts your work or important deadlines for your team (in case we'll try to find a different time window). Since we are upgrading software that was released years ago, it may probably happen that right after the upgrade some tools/workflows/etc.. don't work as expected anymore. We have tested a wide variety of use cases in our testing environment, but some corner cases might have been missed. In case you notice something weird right after the upgrade, please let us know how to repro in the task, we'll follow up and hopefully fix promptly. Thanks a lot for the support! Luca

3 years, 2 months

2
4
0 0

EventStreams internal in production

by Andrew Otto

Hi all! We just finished <https://phabricator.wikimedia.org/T269160> setting up an internal instance of EventStreams called eventstreams-internal. This instance is not public, but does expose all streams declared in stream config*. I've added documentation about how to access this here: https://wikitech.wikimedia.org/wiki/Event_Platform/Instrumentation_How_To#I… This instance isn't particularly useful for building any services (in production you should just consume from Kafka), but it may be very useful for debugging and troubleshooting events in production. EventStreams has a GUI that will allow you to see events in Kafka as they flow in. In production, this will allow you to see events right after they are emitted, without having to wait a few hours for them to be ingested into Hive. You can use this to make sure events you trigger in production make it through EventGate into Kafka as you expect. Big thanks to Marcel and Luca for their work on this! :) - Andrew Otto * i.e. those that use Event Platform, not legacy EventLogging events.

3 years, 2 months

2
3
0 0

2024

2023

2022

2021

2020

Analytics-announce February 2021