<Tron>Greetings, Programs!</Tron>
So, not many updates for a while, as things have been progressing at a fair clip in the "oh, my god boring gruntwork" front.
The biggest news is the addition of Petr Bena to the tools project sysadmin team as its first volunteer. Petr has been very involved in the setup and administration of the Tool Labs' predecessor projects, and will continue to steer the bots project where the rules are a little more relaxed to facilitate more experimental development.
He's also joining me on the tools project proper, to help provide support to maintainers over a wider range of times, and to increase availability of sysadmins. You can find him hanging around #wikimedia-labs, often at times where I am not available.
There is some documentation-in-progress that give a lot of information on how to set up your tools on the Labs architecture at:
https://www.mediawiki.org/wiki/Wikimedia_Labs/Tool_Labs/Help
Please don't hesitate to comment if you see missing information, or if parts of it are less clear than idea.
On the other fronts, the wikitech management interface is now in place for self-serve of tool account creation by Labs users; this requires moving already-existing tool accounts to the new scheme, and a brief outage for that purpose later this week (see note below)
Experiments with a bulletproof replacement for gluster are well on their way; with NFS from a highly redundant server as the currently favored option. With a bit of luck, I'll use the opportunity given by the outage for the tool account switchover to move the shared tools filesystem to NFS as a trial run.
The database replication is also well on its way; you can find the current roadmap at:
https://wikitech.wikimedia.org/wiki/Tool_Labs/Database_plan
=== Planned outage ===
In order to move the extant tool accounts to the new, final scheme, and (progress permitting) move the shared filesystems to a new storage server, there will be a brief outage of the Tool Labs infrastructure this Thursday April 11 starting at 16:00 UTC. The outage is expected to last 20 minutes during which service will be intermittently unavailable.
Announcements will be sent by email, on IRC and on the servers 30 minutes before the start of maintenance, at its start, and upon completion.
Impact:
* Jobs running on the grid engine will be stopped then restarted automatically at the end of the maintenance window. If you are running a job that cannot or should not be restarted automatically without intervention from its maintainers, please make certain that it has been stopped before the start of the maintenance window; * The login server will be restarted during the window, ending active sessions; * The web service will be intermittently unavailable; and * Running processes not scheduled through the grid engine will be killed.
Recovery plan:
In case of unplanned failure during the maintenance window, configuration will be rolled back to the current version and a new window will be planned after postmortem. Disruption of services will take place as noted and an announcement will be sent.
-- Marc