Komla has started to disable the grid for tools that seem abandoned.
The workboard for this is at
https://phabricator.wikimedia.org/project/view/6135/ I believe that
tools are moving from 'Unreached Tool' to 'Disabled' as they are
disabled.
== How to disable (or re-enable) a tool?
There are two scripts, each run in a different place. BOTH scripts
should be run for any tool. It should be safe to run any of these
commands multiple times without additional effect.
To disable the grid for a tool:
On tools-sgegrid-master.tools.eqiad1.wikimedia.cloud
$ sudo /srv/disable-tool/disable_grid_for_tool.py <toolname>
On tools-sgecron-2.tools.eqiad1.wikimedia.cloud
$ sudo /srv/disable-tool/stop_grid_for_tool.py <toolname>
To re-enable the grid for a tool:
On tools-sgegrid-master.tools.eqiad1.wikimedia.cloud
$ sudo /srv/disable-tool/disable_grid_for_tool.py --enable <toolname>
On tools-sgecron-2.tools.eqiad1.wikimedia.cloud
$ sudo /srv/disable-tool/stop_grid_for_tool.py --enable <toolname>
== Who can re-enable a tool, and when? ==
This shut-down phase has two goals:
1) Stop grid jobs that no one cares about
2) Provide a 'warning shot' to get attention from users or admins of a
tool who are relying on the tool but not responding to Komla's
correspondence.
Anyone with the necessary logins is encouraged re-enable tools as
needed. Specifically:
- If you are contacted by a tool admin requesting restoration, feel free
to restore the tool according to the steps above. First, though, please
make sure the concerned admin is aware that the grid is going away, and
make sure you (or better yet the admin) update the workboard task
associated with the tool explaining how they plan to deal with the
coming shut-down and how they can be contacted in the future.
- If you are contacted by users of a tool requesting restoration, please
encourage them to reach out to the admin and have the admin request
restoration directly. If it's clear that a tool is needed but has no
reachable admin, add notes to the phab task accordingly, then move the
task into the 'Help wanted' column and add 'Abandoned:' to the task
title.
== What is disabling/enabling? ==
The disable scripts do the following:
- set a grid quota that prevents future jobs from being scheduled
- move grid-specific service.manifest files to 'service.disabledmanifest'
- add a 'TOOL_DISABLED' to the tool home
- archive crontab
- qdel all existing grid jobs
Enable scripts do this:
- remove restrictive grid quota, permitting jobs to be scheduled
- move 'service.disabledmanifest' back to service.manifest if no
service.manifest is currently present
- remove 'TOOL_DISABLED' file
- restore crontab
Note that the enable script do not actively start anything. So
non-webservice tools will likely require a manual start after enabling.