Good morning!
As a side-effect of our response to the current gerrit vandalism
epidemic, the 2fa integration between Horizon and Wikitech has been
disabled. That means that existing Horizon sessions are still valid but
fresh logins will fail.
This problem is being actively worked on. In the meantime, don't panic
if you get an error while trying to log in.
-Andrew
Hi,
following some vandalism attempts, both Horizon and Toolsadmin are affected by a
general Oauth issue in Wikitech which prevents from proper user authentication.
Affected URLs are:
* https://horizon.wikimedia.org/
* https://toolsadmin.wikimedia.org/auth/login
Horizon is the web UI used to create and manage Cloud VPS.
Toolsadmin (also known as striker) is the web UI used to create and maintain
Toolforge accounts.
We have no estimation right now on when a fix will be available, but several
people are actively involved in trying to get things back to normal.
regards
--
Arturo Borrero Gonzalez
Operations Engineer / Wikimedia Cloud Services
Wikimedia Foundation
Since the MCR refactor of the Mediawiki database schema has been progressing (https://phabricator.wikimedia.org/T166733 <https://phabricator.wikimedia.org/T166732> and many other tickets), one of the last steps is dropping the columns from the wiki replica schema.
The column drops are being tracked and are explained well here https://phabricator.wikimedia.org/T212972 <https://phabricator.wikimedia.org/T212972>, and currently the change is ready to be applied. It is already applied to two wikis (eswiki and huwiki since there was a column problem that needed fixing). The tables with names such as <tablename>_compat will retain a similar structure if that is needed for refactoring.
From the ticket, this is a summary of what is changing, organized by table name:
archive: Remove ar_comment
archive_userindex: Remove ar_comment
filearchive:
Remove fa_deleted_reason
Remove fa_description
filearchive_userindex:
Remove fa_deleted_reason
Remove fa_description
image: Remove img_description
ipblocks: Remove ipb_reason.
ipblocks_ipindex: Remove ipb_reason.
logging: Remove log_comment.
logging_logindex: Remove log_comment.
logging_userindex: Remove log_comment.
oldimage: Remove oi_description
oldimage_userindex: Remove oi_description
recentchanges: Remove rc_comment.
recentchanges_userindex: Remove rc_comment.
revision: Remove rev_comment.
revision_userindex: Remove rev_comment.
The changes to the _compat tables should not affect anything.
We will deploy the change early next week (Tuesday - 3/12/2019). In most cases, if a table isn’t working for your tool or app anymore because of the change, you can switch to a table named $tablename_compat and it will appear to have the same schema, but it is recommended that comment references use a join to the new comment table on a “comment_id” field instead where possible.
Brooke Storm
Operations Engineer
Wikimedia Cloud Services
bstorm(a)wikimedia.org <mailto:bstorm@wikimedia.org>
IRC: bstorm_
tl;dr: We're about to disable self-service creation of Debian Jessie
VMs. To request an exception, open a Phabricator ticket specifying your
need and reasons.
--
We're close to polishing off the last few Ubuntu Trusty VMs in the
cloud, which means it's time to start thinking about the upcoming
deprecation of Debian Jessie.
WMCS (and the WMF in general) will continue to support use of Jessie
well into 2020, so no immediate action is needed on the part of current
Jessie users. On the other hand, any /new/ work should definitely
happen on Stretch in order to postpone the inevitable OS-motivated
rebuilds as long as possible. In order to encourage that, we're going
to disable creation of new Jessie VMs in the next few days
If you believe that you are a special case and need a Jessie VM anyway,
please open a phabricator ticket explaining your reasons and specifying
name and flavor for the VM to be created, and WMCS staff will make it
for you.
For reference, the phabricator ticket about this change is:
https://phabricator.wikimedia.org/T218119
-Andrew + the WMCS team
Due to repeated recent outages in the past 30 days and a long history of previous outages due to log files filling up NFS for Toolforge, I’ve deployed a change to restrict the maximum file size that can be created from the Toolforge system of 50 GB.
When a process hits that limit, it will fail to continue writing to the file with the message that the “maximum file size” has been reached. There are files over this size in the environment now that are not going to be affected, but moving them within the same filesystem is likely to require help from someone with root access.
If the limit becomes a problem, it can be revisited. Please let us know on the cloud discussion list or on #wikimedia-cloud if problems arise from the change.
Thanks!
Brooke Storm
Operations Engineer
Wikimedia Cloud Services
bstorm(a)wikimedia.org <mailto:bstorm@wikimedia.org>
IRC: bstorm_
This is just a heads up that due to some residual issues from NFS problems, we are rebooting the cron server for the newer Stretch gridengine on toolforge. This may affect a small number of job submissions, but it should only affect those that happen during the reboot itself.
Brooke Storm
Operations Engineer
Wikimedia Cloud Services
bstorm(a)wikimedia.org <mailto:bstorm@wikimedia.org>
IRC: bstorm_
As announced previously on this list [0] we are in the process of
replacing the old Ubuntu Trusty instances in Toolforge with fancy new
Debian Stretch instances.
== Remaining timeline ==
* Week of 2019-03-04: Switch login.tools.wmflabs.org to point to Stretch bastion
* Week of 2019-03-25: Shutdown Trusty grid
The DNS entry for "login.tools.wmflabs.org" will be updated to point
to a Debian Stretch bastion rather than the old Ubuntu Trusty bastion
soon (like right after I send this email). This change will cause many
ssh clients to alert about a change in the ssh host fingerprint.
Updated fingerprints will be posted on wikitech [1][2] once the switch
has been made.
The legacy Ubuntu Trusty bastion will still be reachable as
"login-trusty.tools.wmflabs.org" until that instance is deleted during
the week of 2019-03-25.
In just over 2 weeks we will be shutting down the Trusty grid for
good. Any tools that have not migrated to either the Stretch grid or
the Kubernetes cluster at that point will be forcibly shutdown.
Nothing will be deleted in the tools' $HOME directories, but any
Trusty grid jobs will be stopped. Any crontab file remaining on the
old grid's cron server will be archived as
"$HOME/crontab.trusty.save". Maintainers who somehow missed all of the
announcements will be able to login and restart their tools on the
Stretch grid or Kubernetes.
See <https://wikitech.wikimedia.org/wiki/News/Toolforge_Trusty_deprecation>
for additional information and tips on common problems that have been
found thus far.
[0]: https://lists.wikimedia.org/pipermail/cloud-announce/2019-January/000122.ht…
[1]: https://wikitech.wikimedia.org/wiki/Help:SSH_Fingerprints/login.tools.wmfla…
[2]: https://wikitech.wikimedia.org/wiki/Help:SSH_Fingerprints/tools-dev.wmflabs…
Bryan, on behalf of the Toolforge admin team
--
Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org>
[[m:User:BDavis_(WMF)]] Manager, Technical Engagement Boise, ID USA
irc: bd808 v:415.839.6885 x6855
Openssh 7.0, released 2015-08-11, deprecated the use of DSA (ssh-dss)
keys and RSA keys smaller than 1024 bits [0]. We have been applying
some backwards compatibility configuration changes to ssh bastion
servers in both Cloud VPS and Toolforge for some time to continue to
support old keys using these deprecated algorithms. I was supposed to
announce this to the community about 1.5 years ago, but apparently I
did not [1].
We have noticed with the introduction of Debian Stretch ssh bastion
servers running Openssh 7.4 that users with DSA keys (and possibly
short RSA keys) are being denied access by the newer software. The
easiest fix for this is for users to generate new keys and upload
their new public key using the form at
<https://toolsadmin.wikimedia.org/profile/settings/ssh-keys> or
<https://wikitech.wikimedia.org/wiki/Special:Preferences#mw-prefsection-open…>.
We currently recommend using either ed25519 or 4096-bit RSA keys. See
<https://wikitech.wikimedia.org/wiki/Production_shell_access#Generating_your…>
for more information.
[0]: https://www.openssh.com/txt/release-7.0
[1]: https://phabricator.wikimedia.org/T168433
Bryan, on behalf of the Wikimedia Cloud Services team
--
Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org>
[[m:User:BDavis_(WMF)]] Manager, Technical Engagement Boise, ID USA
irc: bd808 v:415.839.6885 x6855
During the flurry of activity we had recently in diagnosing and fixing
problems with the shared ToolsDB MariaDB service [0], we made a
configuration change to place a hard limit on the maximum number of
simultaneous connections permitted for each user account [1][2].
The current limit is set at 20 concurrent connections. This should not
cause any problems for a typical webservice or single script using
ToolsDB, but tools making heavy use of ToolsDB may need to make some
adjustments.
As always, tool maintainers can seek advice on dealing with this limit
or other issues in Toolforge from the Toolforge administration team
and others in the community via our Freenode IRC channel
(#wikimedia-cloud), Phabricator tasks, and the
cloud(a)lists.wikimedia.org mailing list.
[0]: https://phabricator.wikimedia.org/T216208
[1]: https://phabricator.wikimedia.org/T216170
[2]: https://mariadb.com/kb/en/library/server-system-variables/#max_user_connect…
Bryan, on behalf of the Toolforge administration team
--
Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org>
[[m:User:BDavis_(WMF)]] Manager, Technical Engagement Boise, ID USA
irc: bd808 v:415.839.6885 x6855
This is an update on the ongoing problems with the toolsdb service. We are preparing to move to a new server, which is now a functioning replica of the toolsdb server. The first step here is to restart the service in read-only mode, and then we will move the DNS. Expect writes to stop working and connections to drop. When we are moved to the new DNS, services that use this database will need to be restarted.
This will be happening within the next hour unless it is slowed down by some issues or caution.
Brooke Storm
Operations Engineer
Wikimedia Cloud Services
bstorm(a)wikimedia.org <mailto:bstorm@wikimedia.org>
IRC: bstorm_