<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 10/7/14 6:50 PM, John wrote:<br>
</div>
<blockquote
cite="mid:CAP-JHpnMbYRFF8P=Zfm=bWLH_tSXbAfu6e4d4Lt2U6r5hCiZFQ@mail.gmail.com"
type="cite">Any details on what parts of toolslab went down? Ie
services running on that virt?<span></span><br>
</blockquote>
I can tell you which tools instances were on virt1005:<br>
<br>
| 120cc401-ed7a-44c5-b905-2d0eae23b6af | tools-exec-03<br>
| 30b98f1d-1c5a-49c1-b800-f4c535addc12 | tools-exec-07<br>
| 5cd684db-d0a6-4241-a11f-daf4c1b2f717 | tools-exec-09<br>
| 523df61c-07f0-41ba-924d-e2b8e474b4d7 | tools-exec-cyberbot<br>
| 96c37c36-970b-4cc7-a7ba-d1ee90a225b5 | tools-submit<br>
| cdce426b-ef6f-47e7-96e4-bcb3647f4709 | tools-webgrid-04<br>
| 79aeb31c-a1c1-41af-9e00-df2c7e248924 | tools-webgrid-tomcat<br>
| 8d92c507-d253-425d-b7f4-2af3678a39ae | tools-webproxy<br>
| 22d32e6e-608c-48a8-8423-2a1ff69fad4d | toolsbeta-exec-01<br>
| 31e8206d-fa5c-4e62-a805-8cfb7def1f46 | toolsbeta-puppetmaster3<br>
| 4f223286-49e0-4526-8a4e-8b64c132422a | toolsbeta-webnode-01<br>
<br>
As for which jobs died -- that's a question for someone with better
grid skills than me :)<br>
<br>
-A<br>
<br>
<br>
<br>
<blockquote
cite="mid:CAP-JHpnMbYRFF8P=Zfm=bWLH_tSXbAfu6e4d4Lt2U6r5hCiZFQ@mail.gmail.com"
type="cite"><br>
On Tuesday, October 7, 2014, Andrew Bogott <<a
moz-do-not-send="true" href="mailto:abogott@wikimedia.org">abogott@wikimedia.org</a>>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">On 10/7/14
5:54 PM, Andrew Bogott wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
One of the labs servers (virt1005) has just died. Marc and I
are investigating, but for the moment roughly 10% of labs
instances are currently in a SHUTOFF state. Please do not
restart these instances until I send an 'all clear' message to
the list.<br>
</blockquote>
Virt1005 is back up and seems to be OK. I'm now booting all
instances on that box -- they should be up and running in a few
minutes, but will show signs of an unceremonious reboot so
you'll want to make sure your services are all still running
properly.<br>
<br>
This crash may be related to overprovisioning on virt1005...
we're in the process of purchasing new hardware to expand
capacity and avoid such issues in the future.<br>
<br>
Thank you again for your patience!<br>
<br>
-Andrew<br>
<br>
<br>
_______________________________________________<br>
Labs-l mailing list<br>
<a moz-do-not-send="true">Labs-l@lists.wikimedia.org</a><br>
<a moz-do-not-send="true"
href="https://lists.wikimedia.org/mailman/listinfo/labs-l"
target="_blank">https://lists.wikimedia.org/mailman/listinfo/labs-l</a><br>
</blockquote>
</blockquote>
<br>
</body>
</html>