[Labs-l] Resolved: Labs and toollabs outage in progress

Andrew Bogott abogott at wikimedia.org
Tue Oct 7 23:53:43 UTC 2014


On 10/7/14 6:50 PM, John wrote:
> Any details on what parts of toolslab went down? Ie services running 
> on that virt?
I can tell you which tools instances were on virt1005:

| 120cc401-ed7a-44c5-b905-2d0eae23b6af | tools-exec-03
| 30b98f1d-1c5a-49c1-b800-f4c535addc12 | tools-exec-07
| 5cd684db-d0a6-4241-a11f-daf4c1b2f717 | tools-exec-09
| 523df61c-07f0-41ba-924d-e2b8e474b4d7 | tools-exec-cyberbot
| 96c37c36-970b-4cc7-a7ba-d1ee90a225b5 | tools-submit
| cdce426b-ef6f-47e7-96e4-bcb3647f4709 | tools-webgrid-04
| 79aeb31c-a1c1-41af-9e00-df2c7e248924 | tools-webgrid-tomcat
| 8d92c507-d253-425d-b7f4-2af3678a39ae | tools-webproxy
| 22d32e6e-608c-48a8-8423-2a1ff69fad4d | toolsbeta-exec-01
| 31e8206d-fa5c-4e62-a805-8cfb7def1f46 | toolsbeta-puppetmaster3
| 4f223286-49e0-4526-8a4e-8b64c132422a | toolsbeta-webnode-01

As for which jobs died -- that's a question for someone with better grid 
skills than me :)

-A



>
> On Tuesday, October 7, 2014, Andrew Bogott <abogott at wikimedia.org 
> <mailto:abogott at wikimedia.org>> wrote:
>
>     On 10/7/14 5:54 PM, Andrew Bogott wrote:
>
>         One of the labs servers (virt1005) has just died.  Marc and I
>         are investigating, but for the moment roughly 10% of labs
>         instances are currently in a SHUTOFF state.  Please do not
>         restart these instances until I send an 'all clear' message to
>         the list.
>
>     Virt1005 is back up and seems to be OK.  I'm now booting all
>     instances on that box -- they should be up and running in a few
>     minutes, but will show signs of an unceremonious reboot so you'll
>     want to make sure your services are all still running properly.
>
>     This crash may be related to overprovisioning on virt1005... we're
>     in the process of purchasing new hardware to expand capacity and
>     avoid such issues in the future.
>
>     Thank you again for your patience!
>
>     -Andrew
>
>
>     _______________________________________________
>     Labs-l mailing list
>     Labs-l at lists.wikimedia.org
>     https://lists.wikimedia.org/mailman/listinfo/labs-l
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20141007/8f449c4f/attachment-0001.html>


More information about the Labs-l mailing list