Hello (especially to Superset users),
As you may know, the Data Platform SRE team is currently working on
migrating the Analytics Superset instances to Kubernetes (under ticket
T347710 <https://phabricator.wikimedia.org/T347710>) and, happily, I can
report that we are making good progress.
This is just a courtesy email to let you know that we plan to switch our
staging instance (superset-next.wikimedia.org
<https://superset-next.wikimedia.org>) to over to Kubernetes over the
next day or two. This is unlikely to affect anyone's work at the moment,
given that both the staging and production instances of Superset have
been on version 3.1.0 for a while.
However, given that this staging instance is available for you to use at
any time, we thought it best to let you know that we are currently
working on it and that it may be in a state of flux for a while.
Once it is stable on Kubernetes, we may well contact you again and ask
you kindly to test superset-next for us and report your findings. At the
moment though, we're just working on the transition itself so there
won't be much for you to test.
As ever, if you have any queries or concerns, please don't hesitate to
let us know.
Kind regards,
Ben
--
*Ben Tullis*(he/him)
Senior Site Reliability Engineer
Wikimedia Foundation <https://wikimediafoundation.org/>
Hello,
If you don't use the GPUs on the stat servers you can skip the rest of
this message.
If you do use stat1005 and its GPU, please be aware that we are planning
to move this GPU to a new stat server stat1010 as soon as it's feasible
to do so. Hopefully within a week or two.
Please could you let us know if this would be inconvenient for you and
we will try to accommodate your needs. We'll let you know a precise date
for the GPU move once we have assessed the current usage and have
planned the work with the DC Ops team in eqiad.
If you still need to use a GPU on buster, you can continue to use
stat1008 for now.
As ever, if you have any queries or concerns about these operations,
please do let us know.
Kind regards,
Ben
--
*Ben Tullis*(he/him)
Senior Site Reliability Engineer
Wikimedia Foundation <https://wikimediafoundation.org/>
Hello,
We are going to be carrying out a short maintenance operation on our
Presto cluster on Monday morning at approximately 11:00 UTC. There may
be a few minutes where Presto is unavailable and this may have an impact
on Superset dashboards that use Presto. We hope to keep this period of
instability to the region of 5-10 minutes.
Specifically, the work involves moving the presto co-ordinator role as
part of a server refresh. (T336045
<https://phabricator.wikimedia.org/T336045>)
We have attempted to make sure that this is a non-breaking change,
especially for any users of wmfdata-python
<https://github.com/wikimedia/wmfdata-python>.
If this maintenance window is inconvenient for you, please do let us
know and we can look to defer the work. Similarly, if you notice
anything unusual afterwards, please let us know.
Kind regards,
Ben
--
*Ben Tullis*(he/him)
Senior Site Reliability Engineer
Wikimedia Foundation <https://wikimediafoundation.org/>
Hello,
Just a quick message to let you know that we have two new analytics
clients <https://wikitech.wikimedia.org/wiki/Analytics/Systems/Clients>
(aka stats servers) ready for use.
These are:
* stat1010 <https://wikitech.wikimedia.org/wiki/Stat1010> (replacement
for stat1005)
* stat1011 <https://wikitech.wikimedia.org/wiki/Stat1011> (replacement
for stat1007)
Both of these servers run Debian Bullseye and you can see the specs on
the linked Wikitech pages.
These are the second and third Bullseye stats servers, so hopefully
there shouldn't be any surprises with moving your work to these hosts.
If you could start to migrate away from the old servers, that would be
very helpful, as we will shortly start to prepare to decommission the
older stats servers.
You may have read in another email that we plan to migrate the GPU from
stat1005 to stat1008 as part of this refresh. Please let us know if this
is likely to impact your work and we will try to take it into account.
As ever, if you have any queries or concerns, please let us know at any
time.
Kind regards,
Ben
--
*Ben Tullis*(he/him)
Senior Site Reliability Engineer
Wikimedia Foundation <https://wikimediafoundation.org/>
Hi all,
The next Research Showcase will be live-streamed on Wednesday, February 21,
at 8:30 AM PST / 16:30 UTC. Find your local time here
<https://zonestamp.toolforge.org/1708533000>. The theme for this showcase is
*Platform Governance and Policies*.
You are welcome to watch via the YouTube stream:
https://www.youtube.com/watch?v=Q1xYwRw1rHU. As usual, you can join the
conversation in the YouTube chat as soon as the showcase goes live.
This month's presentation:
Sociotechnical Designs for Democratic and Pluralistic Governance of Social
Media and AIBy *Amy X. Zhang, University of Washington*Decisions about
policies when using widely-deployed technologies, including social media
and more recently, generative AI, are often made in a centralized and
top-down fashion. Yet these systems are used by millions of people, with a
diverse set of preferences and norms. Who gets to decide what are the
rules, and what should the procedures be for deciding them---and must we
all abide by the same ones? In this talk, I draw on theories and lessons
from offline governance to reimagine how sociotechnical systems could be
designed to provide greater agency and voice to everyday users and
communities. This includes the design and development of: 1) personal
moderation and curation controls that are usable and understandable to
laypeople, 2) tools for authoring and carrying out governance to suit a
community's needs and values, and 3) decision-making workflows for
large-scale democratic alignment that are legitimate and consistent.
Best,Kinneret
--
Kinneret Gordon
Lead Research Community Officer
Wikimedia Foundation <https://wikimediafoundation.org/>
Hello,
This is just a quick message to let you know that we made some changes
today to the monitoring configuration of many of the Data Platform
Engineering servers. This may affect you if you participate in Ops Week
<https://wikitech.wikimedia.org/wiki/Data_Engineering/Ops_week> for Data
Engineering and friends.
By default, all notification alerts from Icinga and Prometheus will now
go to data-platform-alerts(a)wikimedia.org
<https://groups.google.com/a/wikimedia.org/g/data-platform-alerts>
instead of data-engineering-alerts(a)lists.wikimedia.org
<https://lists.wikimedia.org/hyperkitty/list/data-engineering-alerts@lists.w…>
We are working to try to make sure that we can route any alert emails
(and IRC pings) to the most appropriate team, principally so that we
don't overload the person who is on Ops Week with a lot of messages that
would be more appropriately routed to Data Platform SREs.
Any scheduled tasks related to data pipelines and services critical for
data processing are still going to be sent to the
data-engineering-alerts(a)lists.wikimedia.org
<https://lists.wikimedia.org/hyperkitty/list/data-engineering-alerts@lists.w…>
list, so that's Airflow jobs, Refine tasks, Gobblin, Sqoop,
Varnishkafka, Eventlogging etc.
We haven't made any changes to the monitoring/notification settings of
the Search and Query Services servers (Elasticsearch/WDQS/WCQS etc) nor
have we made any changes to the Dumps servers. This mainly affects the
analytics systems
<https://wikitech.wikimedia.org/wiki/Analytics/Systems> and the rest of
the Data Engineering team's infrastructure.
Please do let us know if you have any queries or concerns about this
change, or if anything doesn't look right to you.
You can reach out on Slack at #data-engineering-collab or
#data-platform-sre or on IRC at #wikimedia-analytics or
#wikimedia-data-platform or to data-platform-engineering(a)wikimedia.org
by email.
Kind regards,
Ben
--
*Ben Tullis*(he/him)
Senior Site Reliability Engineer
Wikimedia Foundation <https://wikimediafoundation.org/>
Hello,
Please update to *wmfdata version 2.3.0* and/or *update your
conda-analytics* environments.
Hello, I have just pushed a new version of our conda-analytics
<https://wikitech.wikimedia.org/wiki/Data_Engineering/Systems/Conda>
environment to production and I encourage you to start using it as soon
as possible please. The only change from the previous version is an
important bump of the wmfdata-python
<https://github.com/wikimedia/wmfdata-python> library to version 2.3.0,
which allows wmfdata to talk to presto using a DNS alias, instead of a
hard-coded hostname.
If youcreate a new clone of conda-analytics
<https://wikitech.wikimedia.org/wiki/Data_Engineering/Systems/Conda#Creating…>
you will automatically get this new version of wmfdata, but if that's
inconvenient you can always update the version within your existing
environments. The instructions for doing that are here
<https://github.com/wikimedia/wmfdata-python?tab=readme-ov-file#installation…>.
Once you have all had enough time to update your environments, we will
be able to make a change to the presto configuration that will break
presto support for older versions of wmfdata.
If you have any questions or concerns about this change, or if you
notice anything peculiar with conda-analytics, please don't hesitate to
let us know and we will look into it.
Kind regards,
Ben
--
*Ben Tullis*(he/him)
Senior Site Reliability Engineer
Wikimedia Foundation <https://wikimediafoundation.org/>
Hello,
We need to carry out some scheduled maintenance on the web server behind
the following services:
* analytics.wikimedia.org/published
<https://analytics.wikimedia.org/published/>
* stats.wikimedia.org <https://stats.wikimedia.org>
This means that we need to schedule a period of downtime for these
services, of up to around 30 minutes.
I'd like to schedule this for next Tuesday morning, the 6th of February,
starting at 10:30 UTC.
Please do let me know if this will inconvenient for you at all and I
will postpone the upgrade.
Kind regards,
Ben
--
*Ben Tullis*(he/him)
Senior Site Reliability Engineer
Wikimedia Foundation <https://wikimediafoundation.org/>
*TL;DR* - Please test https://superset-next.wikimedia.org and let us
know of any problems. Thanks.
Hello,
This message is specifically addressed to any users of
superset.wikimedia.org <https://superset.wikimedia.org>.
We in the Data Platform SRE team would be grateful for your assistance
in the acceptance testing phase of an upgrade to Superset, please. Our
production Superset instance is currently running version *1.5.3*, but
superset-next.wikimedia.org <https://superset-next.wikimedia.org> has
now been upgraded to version *3.1.0* and is ready for testing. Its
database was copied from the production instance yesterday, so it is
relatively fresh.
If you could spend a little time reviewing whether your dashboards,
charts, dataset, and SQL queries etc. work properly, that would be
really helpful. There are lots of changes between the 1.5 and 3.1
releases, so please feel free to read through the following release
notes, where the highlights are listed.
* release-notes-2-0
<https://github.com/apache/superset/blob/master/RELEASING/release-notes-2-0/…>
* release-notes-3-1
<https://github.com/apache/superset/blob/master/RELEASING/release-notes-3-1/…>
One particular point of note in the latest upgrade is that a viz
migrations
<https://github.com/apache/superset/blob/master/RELEASING/release-notes-3-1/…>
CLI tool has been added, which can help migrate legacy (Area, Bubble,
Line, and Sunburst) chart types to the newer ECharts based versions.
Please let us know if this tool would be of interest to you and we can
look at running it on your behalf.
Once we assess feedback from users of superset-next, we will be able to
schedule a date for the upgrade of the production instance. All things
being well, we would hope to do this upgrade within *a week or two*.
Feel free to share any feedback or queries about this upgrade in the
#data-engineering-collab Slack channel, or the #wikimedia-analytics IRC
channel, or any of the mailing lists where you read this, or simply by
reply if you prefer.
Kind regards,
Ben
--
*Ben Tullis*(he/him)
Senior Site Reliability Engineer
Wikimedia Foundation <https://wikimediafoundation.org/>