April 2021 - Cloud - lists.wikimedia.org

[Cloud-announce] Wiki Replicas: Redirecting the old host names to the new cluster
by Joaquin Oltra Hernandez 29 Apr '21

29 Apr '21

Hi, Like we mentioned in the timeline [0], we are starting the work to redirect the old host names to the new ones [1]. We will share more details in the next few days when we actually deploy the DNS change. This is a heads up before the change actually happens in case you haven't tested your code with the new cluster, last warning. [0]: https://wikitech.wikimedia.org/wiki/News/Wiki_Replicas_2020_Redesign#Timeli… [1]: https://wikitech.wikimedia.org/wiki/News/Wiki_Replicas_2020_Redesign#New_ho… -- Joaquin Oltra Hernandez Developer Advocate - Wikimedia Foundation _______________________________________________ Wikimedia Cloud Services announce mailing list Cloud-announce(a)lists.wikimedia.org (formerly labs-announce(a)lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce

1 0

[Cloud-announce] cloud-vps maintenance at 14:00 UTC
by Andrew Bogott 28 Apr '21

28 Apr '21

We will be upgrading the Cloud-VPS OpenStack install later today beginning at 14:00 UTC (7:00 AM Pacific time). The total upgrade should take 60-90 minutes. During the upgrade period Horizon will be disabled. There may also be brief network interruptions as we restart router services. -Andrew + the WMCS Team _______________________________________________ Wikimedia Cloud Services announce mailing list Cloud-announce(a)lists.wikimedia.org (formerly labs-announce(a)lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce

5 6

Wiki replicas: Compilation of input and logged cross-DB queries
by Joaquin Oltra Hernandez 27 Apr '21

27 Apr '21

Hi! Like I mentioned in the past, I've been cataloging input from developers and doing some analysis from the random query logger that Brooke coded in the old cluster to get a better idea of the kinds of cross DB queries being performed. Wiki Replicas Cross-DB Query Data <https://wikitech.wikimedia.org/wiki/News/Wikireplicas_2020_Redesign/Wiki_Re…> The input and random query logger analysis showcase the tools and which wiki DBs are being used together, but are not perfect, so please if you know of high impact tools that are going to suffer the breaking changes let me know so that I can include them. The most prominent cases that this data shows are - Querying Commons and another DB (as highlighted in many phab tasks and conversations, no surprise) - Querying Wikidata and other DBs (commons, and wikipedias, for example) - Querying En Wikipedia and other DBs (wikipedias) Particularly noticeable is the appearance of arwiki in the analysis. It is hard to know if there is some bias in the random sampling or tools for this project do use the replicas a lot for their features. Something to look into. Detailed analysis of the tables and fields joined / accessed on will need to be performed on an individual basis, since the SQL queries can be very complex and do things in different ways, so automated detection can be faulty. For example on manual inspection I've seen queries using subqueries on a different DB with `in`, instead of joins. This is the reason why the report looks at the cross DB queries and not cross JOIN queries, to accurately look at all cross DB queries, even if there can be some false positives which can be worked around in the new architecture (I think the tools.guc queries for example). I have created T280152 Mitigate breaking changes from the new Wiki Replicas architecture <https://phabricator.wikimedia.org/T280152> which lists the links and can be used to hang tasks for mitigation work if needed, and sits between the OLAP and the new Wiki Replicas task. My hope is this information can be used both by developers and WMF teams to figure out ways to mitigate the breaking changes for the tools that editors and other users rely on. If you think the published CSV is not approachable enough and you think certain information on the wiki page would be useful to have, please let me know and I'll process the data and post the results in the page for ease of reading. For example, I was wondering if I should publish the "unique normalized/stripped" queries per tool to be able to look at the SQL on that wiki page itself. Thoughts? -- Joaquin Oltra Hernandez Developer Advocate - Wikimedia Foundation

2 1

Help with a MediaWiki instance on WMCloud - spammers
by Denny Vrandečić 26 Apr '21

26 Apr '21

Hi all, I have a MediaWiki instance running on WMCloud: https://annotation.wmcloud.org/ Is there some recipe or instruction available somewhere on how to manage it? More specifically, about a week ago, spammers discovered it. I would like to use WSOAuth and PluggableAuth or something similar in order to allow only logins by users with a Wikimedia account, and only allow edits by logged in users. On a shorter notice, as a stop gap, I would like to disallow account creation by non-logged-in users and edits by non-logged in user, so I can at least stop new spam creation and clean up the existing one. I am very confused by Puppet, have a rough idea what Vagrant is, and think I have a stable understanding of MediaWiki maintenance. Any help or pointers would be much appreciated. Thank you! Denny

4 15

NFS bandwidth to VPS nodes?
by Roy Smith 26 Apr '21

26 Apr '21

I'm exploring various ways of working with the XML data dumps on /publib/dumps/public/enwiki. I've got a process which runs through all of the enwiki-20210301-pages-articles[123456789]*.xml* files in about 6 hours. If I've done the math right, that's just about 18 GB of data, or 3 GB/h, or 8 MB/s that I'm slurping off NFS. If I were to spin up 8 VPS nodes and run 8 jobs in parallel, in theory I could process 64 MB/s (512 Mb/s). Is that realistic? Or am I just going to beat the hell out of the poor NFS server, or peg some backbone network link, or hit some other rate limiting bottleneck long before I run out of CPU? Hitting a bottleneck doesn't bother me so much as not wanting to trash a shared resource by doing something stupid to it. Putting it another way, would trying this be a bad idea?

2 1

[Cloud-announce] [Toolforge] Outbound emails were stuck from 2021-03-31T14:56Z to 2021-04-20T21:52Z
by Bryan Davis 20 Apr '21

20 Apr '21

TL;DR: * We messed up when replacing the mail server in Toolforge * We didn't notice that we had messed up for nearly 3 weeks * Toolforge servers should be able to send outbound email again now We have been working to replace some of the Cloud VPS instances in the Toolforge project with new instances running Debian Buster (<https://phabricator.wikimedia.org/T275864>). One step in this process was to replace the mail server instance that handles all outbound mail. We setup a new mail server on 2021-03-31, but missed an important configuration step of telling the rest of the instances in the Toolforge project to use the new server when sending outgoing mail. A Toolforge user reported on irc at 2021-04-20T21:11Z that they had not received expected emails from their tool recently. Investigation revealed the broken configuration and work started to correct the problem. Around 2021-04-20T21:52Z we deployed the correct mail relay host configuration. Over the next 30 minutes or so this configuration update rolled out across the Toolforge instances, re-enabling outbound mail sending. Around 2021-04-20T22:20Z we ran commands to instruct all Toolforge instances to "unfreeze" emails which were queued for sending but marked as "frozen" due to the prior invalid configuration. Emails are now being sent out as expected. We apologize for the interruption in service. We will also be looking into some active monitoring system for outbound email delivery to catch problems similar to this more quickly in the future. Bryan, on behalf of the Toolforge admin team -- Bryan Davis Technical Engagement Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808 _______________________________________________ Wikimedia Cloud Services announce mailing list Cloud-announce(a)lists.wikimedia.org (formerly labs-announce(a)lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce

1 0

IP WHOIS tool on Cloud
by Huji Lee 20 Apr '21

20 Apr '21

Hi all, I opened https://phabricator.wikimedia.org/T280726 moments ago. It was un-assigned from the Cloud-Services tag but I think it is critical for our Clouds gurus to see it and opine on it. This is just an FYI email. I recommend that we keep the discussion on the Phab ticket as much as possible. Thanks, Huji

1 0

Publishing out of a VPS node?
by Roy Smith 19 Apr '21

19 Apr '21

I've computed some data on a VPS node that I want to show to people. Is there some quick and dirty way to publish a file so it's visible to the outside? I'm thinking something along the lines of a public_html directory.

3 5

April 2021 Technical Community Newsletter
by Sarah R 14 Apr '21

14 Apr '21

Hi Everyone, We’re happy to announce the April 2021 edition of the Technical Community Newsletter <https://www.mediawiki.org/wiki/Technical_Community_Newsletter/2021/April> is now available. The newsletter is compiled by the Wikimedia Developer Advocacy Team. It aims to share highlights, news, and information of interest from and about the Wikimedia technical community. Check it out, and learn about what technical contributors have been up to this past quarter, upcoming conferences & calls for papers, and how to get involved. The Wikimedia Technical Community is large and diverse, and we know we can't capture everything perfectly. We welcome your ideas for future newsletters. Let us know what you would like to see or highlights you would like us to include. Subscribe to the Technical Community Newsletter <https://www.mediawiki.org/wiki/Newsletter:Technical_Community_Newsletter>, if you'd like to keep up with essential updates and information Kindly, Sarah R. Rodlund Senior Technical Writer, Developer Advocacy <https://www.mediawiki.org/wiki/Developer_Advocacy> srodlund(a)wikimedia.org

1 0

[Cloud-announce] Wikireplicas: old cluster migrations start in 2 weeks. Please test your code with the new cluster
by Joaquin Oltra Hernandez 06 Apr '21

06 Apr '21

This is a reminder of the timeline dates shared on March 12, In two weeks -April 15- the old cluster will migrate to utilize new replication hosts, at which point replication may stop. In ~four weeks -April 28- the old hostnames will be redirected to the new cluster. If you can, please test your code works with the new cluster before the redirects are in place. - New host names <https://wikitech.wikimedia.org/wiki/News/Wiki_Replicas_2020_Redesign#New_ho…> - Help:Toolforge/Database <https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database> For questions and support reach out to #wikimedia-cloud on freenode IRC or email cloud(a)lists.wikimedia.org -- Joaquin Oltra Hernandez Developer Advocate - Wikimedia Foundation _______________________________________________ Wikimedia Cloud Services announce mailing list Cloud-announce(a)lists.wikimedia.org (formerly labs-announce(a)lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce

4 5

2024

2023

2022

2021

2020

2019

2018

2017

Cloud April 2021