Because of some issues mounting NFS on the PAWS master, it is being rebooted. Traffic to the front page has already been routed through another node, but server creation won’t work until after the reboot is complete.
Brooke Storm
Operations Engineer
Wikimedia Cloud Services
bstorm(a)wikimedia.org <mailto:bstorm@wikimedia.org>
IRC: bstorm_
The legacy Ubuntu Trusty grid engine job grid has been shutdown!
Thanks to everyone who was involved in migrating existing tools from
the old grid to the Kubernetes cluster or the new Debian Stretch job
grid.
There were still 385 tools that may have been running jobs or
webservices on the Trusty grid at the time of shutdown. A static list
of these tools is preserved at
<https://tools.wmflabs.org/trusty-tools/>.
Instructions are still available at
<https://wikitech.wikimedia.org/wiki/News/Toolforge_Trusty_deprecation>
for migrating tools that are currently down.
Bryan, on behalf of the Toolforge admin team
--
Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org>
[[m:User:BDavis_(WMF)]] Manager, Technical Engagement Boise, ID USA
irc: bd808 v:415.839.6885 x6855
As previously announced on this list [0][1] we are in the process of
replacing the old Ubuntu Trusty instances in Toolforge with fancy new
Debian Stretch instances.
This process is reaching its next major milestone on Monday
2019-03-25. During the general US workday on that date (14:00-00:00
UTC) the Toolforge admin team will be dismantling the legacy Ubuntu
Trusty job grid. Any tools that have not migrated to either the
Stretch grid or the Kubernetes cluster at that point will be forcibly
shutdown. Nothing will be deleted in the tools' $HOME directories, but
any Trusty grid jobs will be stopped. Any crontab file remaining on
the
old grid's cron server will be archived as
"$HOME/crontab.trusty.save". Maintainers who somehow missed all of the
announcements will be able to login and restart their tools on the
Stretch grid or Kubernetes.
See <https://wikitech.wikimedia.org/wiki/News/Toolforge_Trusty_deprecation>
for additional information and tips on common problems that have been
found thus far.
[0]: https://lists.wikimedia.org/pipermail/cloud-announce/2019-January/000122.ht…
[1]: https://lists.wikimedia.org/pipermail/cloud-announce/2019-March/000142.html
Bryan, on behalf of the Toolforge admin team
--
Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org>
[[m:User:BDavis_(WMF)]] Manager, Technical Engagement Boise, ID USA
irc: bd808 v:415.839.6885 x6855
Good morning!
As a side-effect of our response to the current gerrit vandalism
epidemic, the 2fa integration between Horizon and Wikitech has been
disabled. That means that existing Horizon sessions are still valid but
fresh logins will fail.
This problem is being actively worked on. In the meantime, don't panic
if you get an error while trying to log in.
-Andrew
Hi,
following some vandalism attempts, both Horizon and Toolsadmin are affected by a
general Oauth issue in Wikitech which prevents from proper user authentication.
Affected URLs are:
* https://horizon.wikimedia.org/
* https://toolsadmin.wikimedia.org/auth/login
Horizon is the web UI used to create and manage Cloud VPS.
Toolsadmin (also known as striker) is the web UI used to create and maintain
Toolforge accounts.
We have no estimation right now on when a fix will be available, but several
people are actively involved in trying to get things back to normal.
regards
--
Arturo Borrero Gonzalez
Operations Engineer / Wikimedia Cloud Services
Wikimedia Foundation
Since the MCR refactor of the Mediawiki database schema has been progressing (https://phabricator.wikimedia.org/T166733 <https://phabricator.wikimedia.org/T166732> and many other tickets), one of the last steps is dropping the columns from the wiki replica schema.
The column drops are being tracked and are explained well here https://phabricator.wikimedia.org/T212972 <https://phabricator.wikimedia.org/T212972>, and currently the change is ready to be applied. It is already applied to two wikis (eswiki and huwiki since there was a column problem that needed fixing). The tables with names such as <tablename>_compat will retain a similar structure if that is needed for refactoring.
From the ticket, this is a summary of what is changing, organized by table name:
archive: Remove ar_comment
archive_userindex: Remove ar_comment
filearchive:
Remove fa_deleted_reason
Remove fa_description
filearchive_userindex:
Remove fa_deleted_reason
Remove fa_description
image: Remove img_description
ipblocks: Remove ipb_reason.
ipblocks_ipindex: Remove ipb_reason.
logging: Remove log_comment.
logging_logindex: Remove log_comment.
logging_userindex: Remove log_comment.
oldimage: Remove oi_description
oldimage_userindex: Remove oi_description
recentchanges: Remove rc_comment.
recentchanges_userindex: Remove rc_comment.
revision: Remove rev_comment.
revision_userindex: Remove rev_comment.
The changes to the _compat tables should not affect anything.
We will deploy the change early next week (Tuesday - 3/12/2019). In most cases, if a table isn’t working for your tool or app anymore because of the change, you can switch to a table named $tablename_compat and it will appear to have the same schema, but it is recommended that comment references use a join to the new comment table on a “comment_id” field instead where possible.
Brooke Storm
Operations Engineer
Wikimedia Cloud Services
bstorm(a)wikimedia.org <mailto:bstorm@wikimedia.org>
IRC: bstorm_
tl;dr: We're about to disable self-service creation of Debian Jessie
VMs. To request an exception, open a Phabricator ticket specifying your
need and reasons.
--
We're close to polishing off the last few Ubuntu Trusty VMs in the
cloud, which means it's time to start thinking about the upcoming
deprecation of Debian Jessie.
WMCS (and the WMF in general) will continue to support use of Jessie
well into 2020, so no immediate action is needed on the part of current
Jessie users. On the other hand, any /new/ work should definitely
happen on Stretch in order to postpone the inevitable OS-motivated
rebuilds as long as possible. In order to encourage that, we're going
to disable creation of new Jessie VMs in the next few days
If you believe that you are a special case and need a Jessie VM anyway,
please open a phabricator ticket explaining your reasons and specifying
name and flavor for the VM to be created, and WMCS staff will make it
for you.
For reference, the phabricator ticket about this change is:
https://phabricator.wikimedia.org/T218119
-Andrew + the WMCS team
Due to repeated recent outages in the past 30 days and a long history of previous outages due to log files filling up NFS for Toolforge, I’ve deployed a change to restrict the maximum file size that can be created from the Toolforge system of 50 GB.
When a process hits that limit, it will fail to continue writing to the file with the message that the “maximum file size” has been reached. There are files over this size in the environment now that are not going to be affected, but moving them within the same filesystem is likely to require help from someone with root access.
If the limit becomes a problem, it can be revisited. Please let us know on the cloud discussion list or on #wikimedia-cloud if problems arise from the change.
Thanks!
Brooke Storm
Operations Engineer
Wikimedia Cloud Services
bstorm(a)wikimedia.org <mailto:bstorm@wikimedia.org>
IRC: bstorm_
This is just a heads up that due to some residual issues from NFS problems, we are rebooting the cron server for the newer Stretch gridengine on toolforge. This may affect a small number of job submissions, but it should only affect those that happen during the reboot itself.
Brooke Storm
Operations Engineer
Wikimedia Cloud Services
bstorm(a)wikimedia.org <mailto:bstorm@wikimedia.org>
IRC: bstorm_
As announced previously on this list [0] we are in the process of
replacing the old Ubuntu Trusty instances in Toolforge with fancy new
Debian Stretch instances.
== Remaining timeline ==
* Week of 2019-03-04: Switch login.tools.wmflabs.org to point to Stretch bastion
* Week of 2019-03-25: Shutdown Trusty grid
The DNS entry for "login.tools.wmflabs.org" will be updated to point
to a Debian Stretch bastion rather than the old Ubuntu Trusty bastion
soon (like right after I send this email). This change will cause many
ssh clients to alert about a change in the ssh host fingerprint.
Updated fingerprints will be posted on wikitech [1][2] once the switch
has been made.
The legacy Ubuntu Trusty bastion will still be reachable as
"login-trusty.tools.wmflabs.org" until that instance is deleted during
the week of 2019-03-25.
In just over 2 weeks we will be shutting down the Trusty grid for
good. Any tools that have not migrated to either the Stretch grid or
the Kubernetes cluster at that point will be forcibly shutdown.
Nothing will be deleted in the tools' $HOME directories, but any
Trusty grid jobs will be stopped. Any crontab file remaining on the
old grid's cron server will be archived as
"$HOME/crontab.trusty.save". Maintainers who somehow missed all of the
announcements will be able to login and restart their tools on the
Stretch grid or Kubernetes.
See <https://wikitech.wikimedia.org/wiki/News/Toolforge_Trusty_deprecation>
for additional information and tips on common problems that have been
found thus far.
[0]: https://lists.wikimedia.org/pipermail/cloud-announce/2019-January/000122.ht…
[1]: https://wikitech.wikimedia.org/wiki/Help:SSH_Fingerprints/login.tools.wmfla…
[2]: https://wikitech.wikimedia.org/wiki/Help:SSH_Fingerprints/tools-dev.wmflabs…
Bryan, on behalf of the Toolforge admin team
--
Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org>
[[m:User:BDavis_(WMF)]] Manager, Technical Engagement Boise, ID USA
irc: bd808 v:415.839.6885 x6855