2018-06-05 20:00:02,277 INFO force is enabled
2018-06-05 20:00:02,327 INFO removing tools-project-backup
2018-06-05 20:00:02,427 INFO removing tools-project-backup
2018-06-05 20:00:03,017 INFO creating tools-project-backup at 2T
2018-06-05 20:00:03,789 INFO force is enabled
2018-06-05 20:00:03,859 INFO removing tools-snap
2018-06-05 20:00:03,917 INFO removing tools-snap
2018-06-05 20:00:05,364 INFO creating tools-snap at 1T
* Reminder: June 8th (this Friday) is the deadline to complete
self-reviews, peer reviews, and reviews of your manager.
* Last month of fiscal year and quarter. If you have outstanding
expense reports or wellness reimbursements, send them in
* Icinga not processing downtimes -
https://phabricator.wikimedia.org/T196336 - happened again this AM
* tin has been replaced by deploy1001 as the primary deploy server
(scap & scap3)
* labsdb1009 and labsdb1010 have been switched over to the new
sanitariums last Thursday (pending labsdb1011 on Wednesday)
* Server side of Debmonitor deployed both for the external endpoint (
https://debmonitor.wikimedia.org/ [LDAP login]) and for the internal
one to be used by all the server to send updates. At the moment is
"empty", there is only data from a single host as test. Client side to
be deployed across the fleet soon.
* Traffic working towards merge of cache_misc into cache_text
<https://phabricator.wikimedia.org/T164609> (Wikitech, Horizon,
Striker are on misc)
* Updated network diagrams! <https://wikitech.wikimedia.org/wiki/Network_design>
* eqiad row C server move went smoothly overall (lvs & cloud
remaining) - https://phabricator.wikimedia.org/T187962
* Last DSA key removed from prod
** Gentle poke from Moritz that we need to get rid of DSA keys for cloud users
*** https://phabricator.wikimedia.org/T168433
Bryan
--
Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org>
[[m:User:BDavis_(WMF)]] Manager, Cloud Services Boise, ID USA
irc: bd808 v:415.839.6885 x6855
I just rebooted labservices1001 (our primary DNS server, among other
things) from the console after it stopped responding to ssh. As far as
I know, ordinary cloud services were unaffected by this since
resolutions should have fallen back on 1002 gracefully. Puppet was
upset because it explicitly names the resolver as labservices1001.
I suspect this is a bad fan or heat sink installation -- the failure is
https://phabricator.wikimedia.org/T196252.
As best I can tell, the issue is resolved for now. You can expect a
flood of recovery emails from shinken over the next 30 minutes or so.
My real concern, though, is that we didn't get paged when this box
locked up. I remain baffled by what pages and what doesn't; does anyone
out there know how I can turn on paging for a host and/or subscribe to
alerts?
-A
Hi,
I've been working on some sanity checks for our labtestn
(mitaka/neutron) deployment [0].
I found a weird issue when trying to create an instance in the 'tooling'
network (the vxlan based network):
<<
Network requires port_security_enabled and subnet associated in order to
apply security groups.
>>
This error is seen in the /var/log/nova/nova-conductor.log file when I
create an instance with this cmdline:
% openstack server create --flavor 2 \
--image 66e544e8-fe4f-41f7-9809-6723e53b5a99 aborrero-test1 \
--nic port-id=9389a984-d58e-4776-8b7a-30ff93073917 \
--property subnet=3ec06de7-3b9e-4de3-86c6-67ba1895b253
However:
% neutron port-show 9389a984-d58e-4776-8b7a-30ff93073917 \
| grep port_security_enabled
| port_security_enabled | True
(this port was created manually by me)
Not sure if the 'server create' command is lacking some additional
option. I generated it following what I saw in our bootstrap docs +
labtestcontrol2003 history
I also tried with this command:
% openstack server create --flavor 2 \
--image 66e544e8-fe4f-41f7-9809-6723e53b5a99 aborrero-test1 \
--nic net-id=60aa9467-253c-4fdf-9fa0-eba42dafc975
with the same result (i.e, a net instead of a port)
I'm probably misunderstanding some openstack concepts: nics, ports,
subnets, nets, etc.
[0]
https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Deployment_sanit…