Komla has started to disable the grid for tools that seem abandoned.
The workboard for this is at https://phabricator.wikimedia.org/project/view/6135/%C2%A0 I believe that tools are moving from 'Unreached Tool' to 'Disabled' as they are disabled.
== How to disable (or re-enable) a tool?
There are two scripts, each run in a different place. BOTH scripts should be run for any tool. It should be safe to run any of these commands multiple times without additional effect.
To disable the grid for a tool:
On tools-sgegrid-master.tools.eqiad1.wikimedia.cloud
$ sudo /srv/disable-tool/disable_grid_for_tool.py <toolname>
On tools-sgecron-2.tools.eqiad1.wikimedia.cloud
$ sudo /srv/disable-tool/stop_grid_for_tool.py <toolname>
To re-enable the grid for a tool:
On tools-sgegrid-master.tools.eqiad1.wikimedia.cloud
$ sudo /srv/disable-tool/disable_grid_for_tool.py --enable <toolname>
On tools-sgecron-2.tools.eqiad1.wikimedia.cloud
$ sudo /srv/disable-tool/stop_grid_for_tool.py --enable <toolname>
== Who can re-enable a tool, and when? ==
This shut-down phase has two goals:
1) Stop grid jobs that no one cares about
2) Provide a 'warning shot' to get attention from users or admins of a tool who are relying on the tool but not responding to Komla's correspondence.
Anyone with the necessary logins is encouraged re-enable tools as needed. Specifically:
- If you are contacted by a tool admin requesting restoration, feel free to restore the tool according to the steps above. First, though, please make sure the concerned admin is aware that the grid is going away, and make sure you (or better yet the admin) update the workboard task associated with the tool explaining how they plan to deal with the coming shut-down and how they can be contacted in the future.
- If you are contacted by users of a tool requesting restoration, please encourage them to reach out to the admin and have the admin request restoration directly. If it's clear that a tool is needed but has no reachable admin, add notes to the phab task accordingly, then move the task into the 'Help wanted' column and add 'Abandoned:' to the task title.
== What is disabling/enabling? ==
The disable scripts do the following:
- set a grid quota that prevents future jobs from being scheduled
- move grid-specific service.manifest files to 'service.disabledmanifest'
- add a 'TOOL_DISABLED' to the tool home
- archive crontab
- qdel all existing grid jobs
Enable scripts do this:
- remove restrictive grid quota, permitting jobs to be scheduled
- move 'service.disabledmanifest' back to service.manifest if no service.manifest is currently present
- remove 'TOOL_DISABLED' file
- restore crontab
Note that the enable script do not actively start anything. So non-webservice tools will likely require a manual start after enabling.
cloud-admin@lists.wikimedia.org