We will be upgrading the Cloud-VPS OpenStack install later today
beginning at 14:00 UTC (7:00 AM Pacific time).
The total upgrade should take 60-90 minutes. During the upgrade period
Horizon will be disabled. There may also be brief network interruptions
as we restart router services.
-Andrew + the WMCS Team
_______________________________________________
Wikimedia Cloud Services announce mailing list
Cloud-announce(a)lists.wikimedia.org (formerly labs-announce(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud-announce
Hi!
Like I mentioned in the past, I've been cataloging input from developers
and doing some analysis from the random query logger that Brooke coded in
the old cluster to get a better idea of the kinds of cross DB queries being
performed.
Wiki Replicas Cross-DB Query Data
<https://wikitech.wikimedia.org/wiki/News/Wikireplicas_2020_Redesign/Wiki_Re…>
The input and random query logger analysis showcase the tools and which
wiki DBs are being used together, but are not perfect, so please if you
know of high impact tools that are going to suffer the breaking changes let
me know so that I can include them.
The most prominent cases that this data shows are
- Querying Commons and another DB (as highlighted in many phab tasks and
conversations, no surprise)
- Querying Wikidata and other DBs (commons, and wikipedias, for example)
- Querying En Wikipedia and other DBs (wikipedias)
Particularly noticeable is the appearance of arwiki in the analysis. It is
hard to know if there is some bias in the random sampling or tools for this
project do use the replicas a lot for their features. Something to look
into.
Detailed analysis of the tables and fields joined / accessed on will need
to be performed on an individual basis, since the SQL queries can be very
complex and do things in different ways, so automated detection can be
faulty. For example on manual inspection I've seen queries using subqueries
on a different DB with `in`, instead of joins.
This is the reason why the report looks at the cross DB queries and not
cross JOIN queries, to accurately look at all cross DB queries, even if
there can be some false positives which can be worked around in the new
architecture (I think the tools.guc queries for example).
I have created T280152 Mitigate breaking changes from the new Wiki Replicas
architecture <https://phabricator.wikimedia.org/T280152> which lists the
links and can be used to hang tasks for mitigation work if needed, and sits
between the OLAP and the new Wiki Replicas task.
My hope is this information can be used both by developers and WMF teams to
figure out ways to mitigate the breaking changes for the tools that editors
and other users rely on. If you think the published CSV is not approachable
enough and you think certain information on the wiki page would be useful
to have, please let me know and I'll process the data and post the results
in the page for ease of reading.
For example, I was wondering if I should publish the "unique
normalized/stripped" queries per tool to be able to look at the SQL on that
wiki page itself. Thoughts?
--
Joaquin Oltra Hernandez
Developer Advocate - Wikimedia Foundation
Hi all,
I have a MediaWiki instance running on WMCloud:
https://annotation.wmcloud.org/
Is there some recipe or instruction available somewhere on how to manage it?
More specifically, about a week ago, spammers discovered it. I would like
to use WSOAuth and PluggableAuth or something similar in order to allow
only logins by users with a Wikimedia account, and only allow edits by
logged in users.
On a shorter notice, as a stop gap, I would like to disallow account
creation by non-logged-in users and edits by non-logged in user, so I can
at least stop new spam creation and clean up the existing one.
I am very confused by Puppet, have a rough idea what Vagrant is, and think
I have a stable understanding of MediaWiki maintenance. Any help or
pointers would be much appreciated.
Thank you!
Denny
I'm exploring various ways of working with the XML data dumps on /publib/dumps/public/enwiki. I've got a process which runs through all of the enwiki-20210301-pages-articles[123456789]*.xml* files in about 6 hours. If I've done the math right, that's just about 18 GB of data, or 3 GB/h, or 8 MB/s that I'm slurping off NFS.
If I were to spin up 8 VPS nodes and run 8 jobs in parallel, in theory I could process 64 MB/s (512 Mb/s). Is that realistic? Or am I just going to beat the hell out of the poor NFS server, or peg some backbone network link, or hit some other rate limiting bottleneck long before I run out of CPU? Hitting a bottleneck doesn't bother me so much as not wanting to trash a shared resource by doing something stupid to it.
Putting it another way, would trying this be a bad idea?
TL;DR:
* We messed up when replacing the mail server in Toolforge
* We didn't notice that we had messed up for nearly 3 weeks
* Toolforge servers should be able to send outbound email again now
We have been working to replace some of the Cloud VPS instances in the
Toolforge project with new instances running Debian Buster
(<https://phabricator.wikimedia.org/T275864>). One step in this
process was to replace the mail server instance that handles all
outbound mail.
We setup a new mail server on 2021-03-31, but missed an important
configuration step of telling the rest of the instances in the
Toolforge project to use the new server when sending outgoing mail. A
Toolforge user reported on irc at 2021-04-20T21:11Z that they had not
received expected emails from their tool recently. Investigation
revealed the broken configuration and work started to correct the
problem. Around 2021-04-20T21:52Z we deployed the correct mail relay
host configuration. Over the next 30 minutes or so this configuration
update rolled out across the Toolforge instances, re-enabling outbound
mail sending. Around 2021-04-20T22:20Z we ran commands to instruct all
Toolforge instances to "unfreeze" emails which were queued for sending
but marked as "frozen" due to the prior invalid configuration.
Emails are now being sent out as expected. We apologize for the
interruption in service. We will also be looking into some active
monitoring system for outbound email delivery to catch problems
similar to this more quickly in the future.
Bryan, on behalf of the Toolforge admin team
--
Bryan Davis Technical Engagement Wikimedia Foundation
Principal Software Engineer Boise, ID USA
[[m:User:BDavis_(WMF)]] irc: bd808
_______________________________________________
Wikimedia Cloud Services announce mailing list
Cloud-announce(a)lists.wikimedia.org (formerly labs-announce(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud-announce
Hi all,
I opened https://phabricator.wikimedia.org/T280726 moments ago. It was
un-assigned from the Cloud-Services tag but I think it is critical for our
Clouds gurus to see it and opine on it.
This is just an FYI email. I recommend that we keep the discussion on the
Phab ticket as much as possible.
Thanks,
Huji
I've computed some data on a VPS node that I want to show to people. Is there some quick and dirty way to publish a file so it's visible to the outside? I'm thinking something along the lines of a public_html directory.
Hi Everyone,
We’re happy to announce the April 2021 edition of the Technical Community
Newsletter
<https://www.mediawiki.org/wiki/Technical_Community_Newsletter/2021/April>
is now available. The newsletter is compiled by the Wikimedia Developer
Advocacy Team. It aims to share highlights, news, and information of
interest from and about the Wikimedia technical community.
Check it out, and learn about what technical contributors have been up to
this past quarter, upcoming conferences & calls for papers, and how to get
involved.
The Wikimedia Technical Community is large and diverse, and we know we
can't capture everything perfectly. We welcome your ideas for future
newsletters. Let us know what you would like to see or highlights you would
like us to include.
Subscribe to the Technical Community Newsletter
<https://www.mediawiki.org/wiki/Newsletter:Technical_Community_Newsletter>,
if you'd like to keep up with essential updates and information
Kindly,
Sarah R. Rodlund
Senior Technical Writer, Developer Advocacy
<https://www.mediawiki.org/wiki/Developer_Advocacy>
srodlund(a)wikimedia.org
This is a reminder of the timeline dates shared on March 12,
In two weeks -April 15- the old cluster will migrate to utilize new
replication hosts, at which point replication may stop.
In ~four weeks -April 28- the old hostnames will be redirected to the new
cluster.
If you can, please test your code works with the new cluster before the
redirects are in place.
- New host names
<https://wikitech.wikimedia.org/wiki/News/Wiki_Replicas_2020_Redesign#New_ho…>
- Help:Toolforge/Database
<https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database>
For questions and support reach out to #wikimedia-cloud on freenode IRC or
email cloud(a)lists.wikimedia.org
--
Joaquin Oltra Hernandez
Developer Advocate - Wikimedia Foundation
_______________________________________________
Wikimedia Cloud Services announce mailing list
Cloud-announce(a)lists.wikimedia.org (formerly labs-announce(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud-announce