- Cloud-admin - lists.wikimedia.org

Fwd: ** RECOVERY alert - tools-webgrid-lighttpd-1416/High iowait is OK **
by Chase Pettet 02 Feb '18

02 Feb '18

https://phabricator.wikimedia.org/T186222 ---------- Forwarded message ---------- From: shinken <shinken(a)shinken-01.shinken.eqiad.wmflabs> Date: Thu, Feb 1, 2018 at 8:40 AM Subject: ** RECOVERY alert - tools-webgrid-lighttpd-1416/High iowait is OK ** To: cpettet(a)wikimedia.org Notification Type: RECOVERY Service: High iowait Host: tools-webgrid-lighttpd-1416 Address: 10.68.19.50 State: OK Date/Time: Thu 01 Feb 14:40:38 UTC 2018 Notes URLs: Additional Info: OK: All targets OK -- Chase Pettet chasemp on phabricator <https://phabricator.wikimedia.org/p/chasemp/> and IRC

1 0

Cron <root@labstore2004> /usr/local/sbin/block_sync 10.64.37.20 misc misc-project misc-snap backup misc-project misc-project-backup 2T
by root＠labstore2004.codfw.wmnet 01 Feb '18

01 Feb '18

Traceback (most recent call last): File "/usr/local/sbin/block_sync", line 125, in <module> lock_file = open('/var/lock/{}_{}_backup.lock'.format(args.r_vg, args.r_lv, 'w+')) FileNotFoundError: [Errno 2] No such file or directory: '/var/lock/misc_misc-project_backup.lock'

1 0

Cron <root@labstore2003> /usr/local/sbin/block_sync 10.64.37.20 tools tools-project tools-snap backup tools-project tools-project-backup 2T
by root＠labstore2003.codfw.wmnet 31 Jan '18

31 Jan '18

2018-01-30 20:00:02,901 INFO force is enabled 2018-01-30 20:00:02,959 INFO removing tools-project-backup 2018-01-30 20:00:03,076 INFO removing tools-project-backup 2018-01-30 20:00:03,481 INFO creating tools-project-backup at 2T 2018-01-30 20:00:04,241 ERROR b' /dev/misc/misc-snap: read failed after 0 of 4096 at 5497558073344: Input/output error\n /dev/misc/misc-snap: read failed after 0 of 4096 at 5497558130688: Input/output error\n /dev/misc/misc-snap: read failed after 0 of 4096 at 0: Input/output error\n /dev/misc/misc-snap: read failed after 0 of 4096 at 4096: Input/output error\n' 2018-01-30 20:00:04,241 INFO force is enabled 2018-01-30 20:00:04,258 ERROR b' /dev/misc/misc-snap: read failed after 0 of 4096 at 5497558073344: Input/output error\n /dev/misc/misc-snap: read failed after 0 of 4096 at 5497558130688: Input/output error\n /dev/misc/misc-snap: read failed after 0 of 4096 at 0: Input/output error\n /dev/misc/misc-snap: read failed after 0 of 4096 at 4096: Input/output error\n' 2018-01-30 20:00:04,259 INFO removing tools-snap 2018-01-30 20:00:04,275 ERROR b' /dev/misc/misc-snap: read failed after 0 of 4096 at 5497558073344: Input/output error\n /dev/misc/misc-snap: read failed after 0 of 4096 at 5497558130688: Input/output error\n /dev/misc/misc-snap: read failed after 0 of 4096 at 0: Input/output error\n /dev/misc/misc-snap: read failed after 0 of 4096 at 4096: Input/output error\n' 2018-01-30 20:00:04,291 ERROR b' /dev/misc/misc-snap: read failed after 0 of 4096 at 5497558073344: Input/output error\n /dev/misc/misc-snap: read failed after 0 of 4096 at 5497558130688: Input/output error\n /dev/misc/misc-snap: read failed after 0 of 4096 at 0: Input/output error\n /dev/misc/misc-snap: read failed after 0 of 4096 at 4096: Input/output error\n' 2018-01-30 20:00:04,292 INFO removing tools-snap 2018-01-30 20:00:04,582 ERROR b' /dev/misc/misc-snap: read failed after 0 of 4096 at 5497558073344: Input/output error\n /dev/misc/misc-snap: read failed after 0 of 4096 at 5497558130688: Input/output error\n /dev/misc/misc-snap: read failed after 0 of 4096 at 0: Input/output error\n /dev/misc/misc-snap: read failed after 0 of 4096 at 4096: Input/output error\n' 2018-01-30 20:00:04,607 ERROR b' /dev/misc/misc-snap: read failed after 0 of 4096 at 5497558073344: Input/output error\n /dev/misc/misc-snap: read failed after 0 of 4096 at 5497558130688: Input/output error\n /dev/misc/misc-snap: read failed after 0 of 4096 at 0: Input/output error\n /dev/misc/misc-snap: read failed after 0 of 4096 at 4096: Input/output error\n' 2018-01-30 20:00:04,622 ERROR b' /dev/misc/misc-snap: read failed after 0 of 4096 at 5497558073344: Input/output error\n /dev/misc/misc-snap: read failed after 0 of 4096 at 5497558130688: Input/output error\n /dev/misc/misc-snap: read failed after 0 of 4096 at 0: Input/output error\n /dev/misc/misc-snap: read failed after 0 of 4096 at 4096: Input/output error\n' 2018-01-30 20:00:04,623 INFO creating tools-snap at 1T 2018-01-30 20:00:05,074 ERROR b' /dev/misc/misc-snap: read failed after 0 of 4096 at 5497558073344: Input/output error\n /dev/misc/misc-snap: read failed after 0 of 4096 at 5497558130688: Input/output error\n /dev/misc/misc-snap: read failed after 0 of 4096 at 0: Input/output error\n /dev/misc/misc-snap: read failed after 0 of 4096 at 4096: Input/output error\n' 2018-01-30 20:00:05,088 ERROR b' /dev/misc/misc-snap: read failed after 0 of 4096 at 5497558073344: Input/output error\n /dev/misc/misc-snap: read failed after 0 of 4096 at 5497558130688: Input/output error\n /dev/misc/misc-snap: read failed after 0 of 4096 at 0: Input/output error\n /dev/misc/misc-snap: read failed after 0 of 4096 at 4096: Input/output error\n'

1 0

HostFile backend for Cumin in Cloud VPS land
by Madhumitha Viswanathan 31 Jan '18

31 Jan '18

Hey all, Clush allows for host targeting using a file that has a list of hosts, using the --hostfile option. Thanks to Riccardo's awesome work, we now have this feature available as a Cumin backend in Cloud VPS cumin masters (including labs-puppetmaster). This is something we need when we have to target a list of hosts that are not dynamically queryable, say where puppet runs are failing, or hosts that have NFS enabled. Sometimes these lists are too long and not fun to pass as a comma separated list of hostnames through cumin's Direct backend. Do use with care, and make sure to check that your hostfile is current, and has the right hosts you are looking to target. If in doubt, --dry-run :) Docs on how to use are here - https://wikitech.wikimedia.org/wiki/Cumin#HostFile_backend_(Enabled_only_in… Best, -- Madhumitha Viswanathan Operations Engineer, Cloud Services

1 0

Cron <root@labstore2004> /usr/local/sbin/block_sync 10.64.37.20 misc misc-project misc-snap backup misc-project misc-project-backup 2T
by root＠labstore2004.codfw.wmnet 25 Jan '18

25 Jan '18

Traceback (most recent call last): File "/usr/local/sbin/block_sync", line 125, in <module> lock_file = open('/var/lock/{}_{}_backup.lock'.format(args.r_vg, args.r_lv, 'w+')) FileNotFoundError: [Errno 2] No such file or directory: '/var/lock/misc_misc-project_backup.lock'

1 0

Cron <root@labstore2003> /usr/local/sbin/block_sync 10.64.37.20 tools tools-project tools-snap backup tools-project tools-project-backup 2T
by root＠labstore2003.codfw.wmnet 24 Jan '18

24 Jan '18

Traceback (most recent call last): File "/usr/local/sbin/block_sync", line 125, in <module> lock_file = open('/var/lock/{}_{}_backup.lock'.format(args.r_vg, args.r_lv, 'w+')) FileNotFoundError: [Errno 2] No such file or directory: '/var/lock/tools_tools-project_backup.lock'

2 1

on-call notes 2018-01-16 2018-01-23
by Andrew Bogott 23 Jan '18

23 Jan '18

Note that I will be traveling tomorrow and will miss our weekly meeting (unless today's snowstorm continues and my flight is delayed). Note also that Franciso has been quite on top of things in #wikimedia-cloud and I consequently didn't up doing much support there. \o/ 2018-01-16: * Lots of alerts, all of them caused by me rebooting labvirts * A fair but of chatter on #wikimedia-cloud, also a result of the reboots 2018-01-17: * Rush job: Created some Trusty VMs for the Discovery team to run a demo next week: https://phabricator.wikimedia.org/T185131 * Followed up on 'NCCIC Incident number INC000010157486' and created https://phabricator.wikimedia.org/T185383 * More alerts, also self-inflicted due to meltdown reboots 2018-01-18: * Granted a few tools membership requests 2018-01-19: * Cleared out some logfiles on labtestnet2001 in response to an icinga alert about disk space 2018-01-20: * Responded to a bitninja email and they actually responded and whitelisted us! (Or at least said they did) 2018-01-21: * Granted a few more tools membership requests 2018-01-22: * Responded to complaints about ssh access failing. This turned out to be my fault, the result of a bad ldap patch. * Deleted the 'gitblit' project which has been unused for some time * Ops meeting ** Full etherpad at https://office.wikimedia.org/wiki/Operations/Operations_Meeting_Notes/SRE-2… ** Meeting was sparsely-attended and brief ** Received a tip of the hat for finally upgrading everything to puppet 4 ** Keith mentioned the puppet-apply vs. Trusty issue but everyone sounded satisfied with the additional-backport fix ** From etherpad but not discussed: [CLOUD/releng? heads up] Moving some wikis to s5? T184805 Cloud/releng to provide some feedback

2 1

Fwd: ** PROBLEM alert - tools-static-10/Free space - all mounts is WARNING **
by Chase Pettet 20 Jan '18

20 Jan '18

Is tools-static going to be hosed up pretty soon? Not sure what the solution here is ---------- Forwarded message ---------- From: shinken <shinken(a)shinken-01.shinken.eqiad.wmflabs> Date: Fri, Jan 19, 2018 at 10:26 AM Subject: ** PROBLEM alert - tools-static-10/Free space - all mounts is WARNING ** To: cpettet(a)wikimedia.org Notification Type: PROBLEM Service: Free space - all mounts Host: tools-static-10 Address: 10.68.22.238 State: WARNING Date/Time: Fri 19 Jan 16:26:22 UTC 2018 Notes URLs: Additional Info: WARNING: tools.tools-static-10.diskspace._srv.byte_percentfree (<100.00%) -- Chase Pettet chasemp on phabricator <https://phabricator.wikimedia.org/p/chasemp/> and IRC

2 1

Fwd: Meet the new Vancouver Summit Tracks and submit your talk!
by Bryan Davis 19 Jan '18

19 Jan '18

This conf is immediately after the Barcelona Hackathon, so going to both would probably be tricky/full of jet lag. Bryan ---------- Forwarded message ---------- From: OpenStack Summit <summit(a)openstack.org> Date: Thu, Jan 18, 2018 at 11:43 AM Subject: Meet the new Vancouver Summit Tracks and submit your talk! To: bdavis(a)wikimedia.org Get to know the new tracks and submit for the OpenStack Summit Vancouver [image: Get to know the new tracks and submit for the OpenStack Summit Vancouver] <https://t.e2ma.net/click/qi5vw/ykc58v/a29vli> *Vancouver Summit CFP closes February 8, 11:59pm PT* *SUBMIT <https://t.e2ma.net/click/qi5vw/ykc58v/quawli>* Meet the new Vancouver Summit Tracks The Call for Presentations <https://t.e2ma.net/click/qi5vw/ykc58v/6mbwli> for the Vancouver Summit <https://t.e2ma.net/click/qi5vw/ykc58v/mfcwli>, May 21-24, is now open. This Summit will cover more than just OpenStack, and there’s a new track lineup that reflects those changes. We’ll be featuring the newest project at the Foundation, Kata Containers, as well as recruiting presentations from projects like Ansible, Ceph, Kubernetes, ONAP and more. We encourage you to submit proposals covering many open infrastructure tools and the integration work needed to solve these problems, as well as invite peers from other open source communities to come speak and collaborate. *Vancouver Summit Tracks * CI / CD Container Infrastructure Edge Computing HPC / GPU / AI Open Source Community Private & Hybrid Cloud Public Cloud Telecom & NFV *Get started: The deadline to submit presentations is February 8, 2018.* *SUBMIT YOUR TALK <https://t.e2ma.net/click/qi5vw/ykc58v/27cwli>* Don’t want to talk? Then get your early bird ticket and hotel now! Hotels sold out at the 2015 Vancouver Summit and we expect the same this year. *Early Bird Tickets <https://t.e2ma.net/click/qi5vw/ykc58v/i0dwli>* || *Hotels* <https://t.e2ma.net/click/qi5vw/ykc58v/ysewli> || *Visa Info* <https://t.e2ma.net/click/qi5vw/ykc58v/elfwli> *Share this email:* <https://t.e2ma.net/share/outbound/e/qi5vw/ykc58v> <https://t.e2ma.net/share/outbound/t/qi5vw/ykc58v> <https://t.e2ma.net/share/outbound/f/qi5vw/ykc58v> <https://t.e2ma.net/share/outbound/l/qi5vw/ykc58v> *Manage* <https://app.e2ma.net/app2/audience/signup/1812998/1771360/1328345952/> your preferences | *Opt out* <https://t.e2ma.net/optout/qi5vw/ykc58v?r=aHR0cHM6Ly9hcHAuZTJtYS5uZXQvYXBwMi…> using *TrueRemove*™ Got this as a forward? *Sign up* <https://app.e2ma.net/app2/audience/signup/1812998/1771360.37971808/> to receive our future emails. View this email *online* <https://t.e2ma.net/message/qi5vw/ykc58v>. PO Box 1903 Austin , TX | 78767 US <#m_-5534879067234170510_> This email was sent to bdavis(a)wikimedia.org. *To continue receiving our emails, add us to your address book.* <http://bdavis@wikimedia.org> -- Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org> [[m:User:BDavis_(WMF)]] Manager, Cloud Services Boise, ID USA irc: bd808 v:415.839.6885 x6855

1 0

Cron <root@labstore2004> /usr/local/sbin/block_sync 10.64.37.20 misc misc-project misc-snap backup misc-project misc-project-backup 2T
by root＠labstore2004.codfw.wmnet 18 Jan '18

18 Jan '18

Traceback (most recent call last): File "/usr/local/sbin/block_sync", line 125, in <module> lock_file = open('/var/lock/{}_{}_backup.lock'.format(args.r_vg, args.r_lv, 'w+')) FileNotFoundError: [Errno 2] No such file or directory: '/var/lock/misc_misc-project_backup.lock'

1 0

2024

2023

2022

2021

2020

2019

2018

2017

Cloud-admin