- Cloud-admin - lists.wikimedia.org

Cron <root@labstore2004> /usr/local/sbin/block_sync 10.64.37.20 misc misc-project misc-snap backup misc-project misc-project-backup 2T
by root＠labstore2004.codfw.wmnet 19 Apr '18

19 Apr '18

2018-04-18 20:00:02,259 INFO force is enabled 2018-04-18 20:00:02,319 INFO removing misc-project-backup 2018-04-18 20:00:02,435 INFO removing misc-project-backup 2018-04-18 20:00:02,899 INFO creating misc-project-backup at 2T 2018-04-18 20:00:03,718 INFO force is enabled 2018-04-18 20:00:03,744 INFO removing misc-snap 2018-04-18 20:00:03,804 INFO removing misc-snap 2018-04-18 20:00:04,244 INFO creating misc-snap at 1T

1 0

Cron <root@labstore2003> /usr/local/sbin/block_sync 10.64.37.20 tools tools-project tools-snap backup tools-project tools-project-backup 2T
by root＠labstore2003.codfw.wmnet 18 Apr '18

18 Apr '18

2018-04-17 20:00:02,505 INFO force is enabled 2018-04-17 20:00:02,538 INFO removing tools-project-backup 2018-04-17 20:00:02,585 INFO removing tools-project-backup 2018-04-17 20:00:03,107 INFO creating tools-project-backup at 2T 2018-04-17 20:00:03,986 INFO force is enabled 2018-04-17 20:00:04,028 INFO removing tools-snap 2018-04-17 20:00:04,089 INFO removing tools-snap 2018-04-17 20:00:05,698 INFO creating tools-snap at 1T

1 0

Fwd: ** PROBLEM alert - labvirt1015/ensure kvm processes are running is CRITICAL **
by Andrew Bogott 18 Apr '18

18 Apr '18

I opened https://phabricator.wikimedia.org/T192422 and depooled labvirt1015 for now. I don't know that this is actually cause for alarm, but 97 VMs seems like a lot of eggs to have in one basket. -A -------- Forwarded Message -------- Subject: ** PROBLEM alert - labvirt1015/ensure kvm processes are running is CRITICAL ** Date: Wed, 18 Apr 2018 01:17:17 +0000 From: icinga(a)einsteinium.wikimedia.org To: abogott(a)wikimedia.org Notification Type: PROBLEM Service: ensure kvm processes are running Host: labvirt1015 Address: 10.64.20.31 State: CRITICAL Date/Time: Wed Apr 18 01:17:17 UTC 2018 Notes URLs: Additional Info: PROCS CRITICAL: 97 processes with regex args /usr/bin/kvm

1 0

sick kids
by Chase Pettet 14 Apr '18

14 Apr '18

I'm around now but I'm trying to handle our 2y old so my wife can get some sleep. Our 6y old was up all night with a stomach that couldn't hold anything down. I will spare everyone the details but it's pretty brutal. -- Chase Pettet chasemp on phabricator <https://phabricator.wikimedia.org/p/chasemp/> and IRC

1 0

Cron <root@labstore2004> /usr/local/sbin/block_sync 10.64.37.20 misc misc-project misc-snap backup misc-project misc-project-backup 2T
by root＠labstore2004.codfw.wmnet 12 Apr '18

12 Apr '18

2018-04-11 20:00:02,533 INFO force is enabled 2018-04-11 20:00:02,572 INFO removing misc-project-backup 2018-04-11 20:00:02,654 INFO removing misc-project-backup 2018-04-11 20:00:03,144 INFO creating misc-project-backup at 2T 2018-04-11 20:00:04,043 INFO force is enabled 2018-04-11 20:00:04,107 INFO removing misc-snap 2018-04-11 20:00:04,155 INFO removing misc-snap 2018-04-11 20:00:04,428 INFO creating misc-snap at 1T

1 0

WMCS team practices 2018-04-11 action items
by Bryan Davis 12 Apr '18

12 Apr '18

* Bryan to ping Eliza about usage of PagerDuty by OIT to see if there is a way we could trial it * Rotating lead for weekly meeting: come up with a plan and do it * Chase & Brooke to work on Puppet state of the union doc and next steps ideas to bring back to group * Sarah to talk with Arturo and Brooke about onboarding issues for doc improvements * James to give this page a section on the mainpage <https://www.mediawiki.org/wiki/Wikimedia_Cloud_Services_team/Our_audiences> Things we didn't have time to talk about: * Do we need to make more distinction between site-maintaining people and others? ++ * Planning :) How the heck are we going to do all the things? Gotta get ruthless in casting things off we can't do. -- yes but not now bc tired :) + * Question I have from time to time: am I working enough, performance-wise? + * Excellent effort at making the team feel like a team of equals despite realities of contractor status+ Any/all of these could be topics for future team meetings. We could do a meeting or two where we talk about topics such as these instead of project updates and just read the update notes offline instead. Bryan -- Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org> [[m:User:BDavis_(WMF)]] Manager, Cloud Services Boise, ID USA irc: bd808 v:415.839.6885 x6855

1 0

Cron <root@labstore2003> /usr/local/sbin/block_sync 10.64.37.20 tools tools-project tools-snap backup tools-project tools-project-backup 2T
by root＠labstore2003.codfw.wmnet 11 Apr '18

11 Apr '18

2018-04-10 20:00:03,216 INFO force is enabled 2018-04-10 20:00:03,244 INFO removing tools-project-backup 2018-04-10 20:00:03,341 INFO removing tools-project-backup 2018-04-10 20:00:03,850 INFO creating tools-project-backup at 2T 2018-04-10 20:00:04,611 INFO force is enabled 2018-04-10 20:00:04,641 INFO removing tools-snap 2018-04-10 20:00:04,689 INFO removing tools-snap 2018-04-10 20:00:05,849 INFO creating tools-snap at 1T

1 0

2018-04-03 - 2018-04-09 On-call report
by Bryan Davis 11 Apr '18

11 Apr '18

Tue: * Quiet day on irc * Meeting notes to wiki * Updates to tech mgrs meeting * Updates to SoS etherpad Wed: * Security bug reported by eddiegp ** Handed off to Andrew after some discussion ** https://phabricator.wikimedia.org/T191433 * labs-graphite io spike made nagf unresponsive * declined wikimisc project for lack of community support <https://phabricator.wikimedia.org/T191155> * declined sau226test project as a laptop in the cloud <https://phabricator.wikimedia.org/T190852> * worked on some maintain-views requests/bugs ** https://phabricator.wikimedia.org/T191455 ** https://phabricator.wikimedia.org/T191387 ** https://phabricator.wikimedia.org/T191380 * Pinged on https://phabricator.wikimedia.org/T181679 to see if cleanup can start Thu: * helped Jon Robson rescue a VM with a full disk ** This lead to a Puppet patch for mediawiki-vagrant sudoers rules Fri: * Cleaned up reading-web-staging-3.reading-web-staging.eqiad.wmflabs Puppet state. Follow on from Thursday's work. * Ran maintain-views to purge old mediawikiwiki tables <https://phabricator.wikimedia.org/T191387> * Found and fixed another Vagrant sudoers rule bug Sat & Sun: * (stayed offline) * Mon: * Tried to run `sudo maintain-views --clean --all-databases --replace-all` on labsdb1009. Failed due to lock wait timeout in ... some database. * SRE meeting (see below for callouts) * long session trying to help Moriel with a MediaWiki-Vagrant issue SRE: * ICU 57 rollout in progress (PHP7 blocker) * All prod maintenance scripts to use "php" (HHVM on Trusty) starting today * MW servers moving to Stretch during Q4 * DBAs getting quotes on additional sanitarium servers * Jaime wants a meeting about <https://phabricator.wikimedia.org/T189542> (m5) ** Bryan and Andrew will meet with him this week to figure things out Bryan -- Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org> [[m:User:BDavis_(WMF)]] Manager, Cloud Services Boise, ID USA irc: bd808 v:415.839.6885 x6855

1 0

Cron <root@labstore2004> /usr/local/sbin/block_sync 10.64.37.20 misc misc-project misc-snap backup misc-project misc-project-backup 2T
by root＠labstore2004.codfw.wmnet 05 Apr '18

05 Apr '18

2018-04-04 20:00:02,806 INFO force is enabled 2018-04-04 20:00:02,864 INFO removing misc-project-backup 2018-04-04 20:00:02,982 INFO removing misc-project-backup 2018-04-04 20:00:03,856 INFO creating misc-project-backup at 2T 2018-04-04 20:00:04,784 INFO force is enabled 2018-04-04 20:00:04,828 INFO removing misc-snap 2018-04-04 20:00:04,888 INFO removing misc-snap 2018-04-04 20:00:05,244 INFO creating misc-snap at 1T

1 0

Proposed OpenStack upgrade on next Friday, April 13th
by Andrew Bogott 05 Apr '18

05 Apr '18

Sometime soon we need to upgrade our OpenStack deployment to the next release, 'Mitaka'. I've done a test upgrade and the process was fairly smooth, but there is at least one step that will cause unavoidable downtime for new instance creation. Ideally this will only take around 20 minutes, but given the number of surprises I ran into just now it wouldn't shock me if it winds up taking several hours instead. I propose to do this upgrade starting at the beginning of my day on next Friday, April 13th. Unlucky number, but being a Friday there are no active MediaWiki deployments so the lack of CI should be less disruptive than usual. The next day is largely unscheduled for me, and the following Monday is a WMF holiday so that gives us an entire four-day block to back out any possible disasters before we're really stepping on release engineering's toes. The upgrade should not interfere with existing VMs. If there are no objections, I'll send a public announcement about this tomorrow. -Andrew

2 1

2024

2023

2022

2021

2020

2019

2018

2017

Cloud-admin