[Labs-l] [Labs-announce] Partial labs downtime Wednesday, 2015-08-12, 15:00 UTC: Reboot of labvirt1001

Merlijn van Deen valhallasw at arctus.nl
Mon Aug 10 21:33:08 UTC 2015


For Tool Labs, the plan is as follows:
  - tomorrow, we will disable the queue so no new tasks will be distributed
to the affected hosts
  - we will send an e-mail with tasks that are still running an hour later

Unfortunately, there is currently no host that can run jobs that take
longer than a few days, because other virt* hosts will also be rebooted
this week.

For reference, the current long-running jobs on these hosts are the
following, grouped by user name:. Please take a look and consider whether
the jobs are still doing something useful -- and if not, please kill them
(qdel <job id>).

Merlijn



Columns:

job id       name       start date/time

aka
---------------
1317747 start Sat Aug  1 19:17:12 2015

tools.checkwiki
---------------
145845 eswiki-munch Thu Jun 25 05:00:13 2015
818559 arwiki-munch Sat Jul 18 05:00:16 2015

tools.dexbot
---------------
1236997 del Thu Jul 30 13:36:09 2015
1341699 kian_new2 Sun Aug  2 11:03:18 2015

tools.gpy
---------------
527733 gpy Thu Jul  9 01:14:28 2015

tools.luke081515bot
---------------
1346744 queue Sun Aug  2 14:24:31 2015

tools.mjbmrbot
---------------
209254 lgdcp2_1 Sat Jun 27 15:35:04 2015
273994 lgdcp2_2 Tue Jun 30 02:00:07 2015
345013 lgdcp2_3 Thu Jul  2 15:00:05 2015
807548 lsdcp2_3 Fri Jul 17 21:00:12 2015
1092477 lgdcp1_4 Sun Jul 26 14:00:07 2015
1093960 lsdcp1_4 Sun Jul 26 15:00:10 2015

tools.shuaib-bot
---------------
1622344 translator Mon Aug 10 02:10:09 2015

tools.wikidata-exports
---------------
694469 create_dumps Tue Jul 14 08:40:22 2015
735030 create_dumps Wed Jul 15 14:31:25 2015
768842 create_dumps Thu Jul 16 16:12:52 2015



On 10 August 2015 at 21:20, Andrew Bogott <abogott at wikimedia.org> wrote:

> On Wednesday I'll be rebooting labvirt1001.  This will cause downtime for
> about 10% of labs instances, and this downtime may last as long as 60
> minutes (although the average downtime will be much less.)
>
> We will do our best to juggle and reschedule ToolLabs jobs, but persistent
> jobs that cannot gracefully restart may be interrupted and require your
> personal attention.
>
> Here is the list of instances that will be affected by this reboot:
>
> | citoidtest                    | ACTIVE  | -          | Running     | public=10.68.16.182                 |
> | conf                          | ACTIVE  | -          | Running     | public=10.68.18.87, 208.80.155.233  |
> | deployment-bastion            | ACTIVE  | -          | Running     | public=10.68.16.58, 208.80.155.191  |
> | deployment-cache-text02       | ACTIVE  | -          | Running     | public=10.68.16.16                  |
> | deployment-elastic08          | ACTIVE  | -          | Running     | public=10.68.17.188                 |
> | deployment-memc03             | ACTIVE  | -          | Running     | public=10.68.16.15                  |
> | deployment-parsoid05          | ACTIVE  | -          | Running     | public=10.68.16.120                 |
> | deployment-pdf01              | ACTIVE  | -          | Running     | public=10.68.16.73                  |
> | deployment-restbase01         | ACTIVE  | -          | Running     | public=10.68.17.227                 |
> | deployment-salt               | ACTIVE  | -          | Running     | public=10.68.16.99                  |
> | deployment-urldownloader      | ACTIVE  | -          | Running     | public=10.68.16.135                 |
> | diffengine                    | ACTIVE  | -          | Running     | public=10.68.17.127                 |
> | educationdashboard-i18n       | SHUTOFF | -          | Shutdown    | public=10.68.16.235                 |
> | ee-flow-extra                 | ACTIVE  | -          | Running     | public=10.68.16.102                 |
> | etcd01                        | ACTIVE  | -          | Running     | public=10.68.16.130                 |
> | etcd03                        | ACTIVE  | -          | Running     | public=10.68.16.132                 |
> | firstinstance                 | SHUTOFF | -          | NOSTATE     | public=10.68.16.212                 |
> | graphite-trusty               | ACTIVE  | -          | Running     | public=10.68.17.181                 |
> | huggle-d2                     | ACTIVE  | -          | Running     | public=10.68.17.194                 |
> | icinga                        | ACTIVE  | -          | Running     | public=10.68.16.195                 |
> | integration-raita             | ACTIVE  | -          | Running     | public=10.68.16.53                  |
> | integration-slave-trusty-1013 | ACTIVE  | -          | Running     | public=10.68.18.28                  |
> | integration-slave-trusty-1015 | ACTIVE  | -          | Running     | public=10.68.18.30                  |
> | k8s-worker-02                 | ACTIVE  | -          | Running     | public=10.68.18.91                  |
> | kartotherian1                 | ACTIVE  | -          | Running     | public=10.68.16.117                 |
> | language-replag-slave         | SHUTOFF | -          | Shutdown    | public=10.68.16.248                 |
> | maps-tiles2                   | ACTIVE  | -          | Running     | public=10.68.17.110                 |
> | mobile-browser-tests          | ACTIVE  | -          | Running     | public=10.68.16.149                 |
> | mwreview-proxy-test           | ACTIVE  | -          | Running     | public=10.68.16.83                  |
> | osmit-cruncher1               | ACTIVE  | -          | Running     | public=10.68.17.92                  |
> | puppet-jmm-debdeploy-precise  | ACTIVE  | -          | Running     | public=10.68.18.106                 |
> | puppet-mailman                | ACTIVE  | -          | Running     | public=10.68.17.177                 |
> | sentry-builder                | ACTIVE  | -          | Running     | public=10.68.18.82                  |
> | staging-eventlogging          | ACTIVE  | -          | Running     | public=10.68.16.199                 |
> | staging-ms-be03               | ACTIVE  | -          | Running     | public=10.68.17.249                 |
> | staging-rdb01                 | ACTIVE  | -          | Running     | public=10.68.17.193                 |
> | staging-tin                   | ACTIVE  | -          | Running     | public=10.68.16.110                 |
> | stashbot-logstash             | ACTIVE  | -          | Running     | public=10.68.18.101                 |
> | tools-bastion-02              | ACTIVE  | -          | Running     | public=10.68.16.44, 208.80.155.132  |
> | tools-exec-1201               | ACTIVE  | -          | Running     | public=10.68.17.49, 208.80.155.203  |
> | tools-exec-1202               | ACTIVE  | -          | Running     | public=10.68.16.57, 208.80.155.211  |
> | tools-exec-1204               | ACTIVE  | -          | Running     | public=10.68.17.88, 208.80.155.213  |
> | tools-exec-1206               | ACTIVE  | -          | Running     | public=10.68.17.105, 208.80.155.215 |
> | tools-exec-1209               | ACTIVE  | -          | Running     | public=10.68.17.129, 208.80.155.218 |
> | tools-exec-1213               | ACTIVE  | -          | Running     | public=10.68.17.252, 208.80.155.222 |
> | tools-exec-1217               | ACTIVE  | -          | Running     | public=10.68.18.20, 208.80.155.226  |
> | tools-exec-1218               | ACTIVE  | -          | Running     | public=10.68.18.19, 208.80.155.227  |
> | tools-exec-1408               | ACTIVE  | -          | Running     | public=10.68.18.14, 208.80.155.152  |
> | tools-exec-cyberbot           | ACTIVE  | -          | Running     | public=10.68.16.39                  |
> | tools-webgrid-generic-1404    | ACTIVE  | -          | Running     | public=10.68.18.53                  |
> | tools-webgrid-lighttpd-1409   | ACTIVE  | -          | Running     | public=10.68.18.43                  |
> | tools-webgrid-lighttpd-1410   | ACTIVE  | -          | Running     | public=10.68.18.44                  |
> | toolsbeta-exec-101            | ACTIVE  | -          | Running     | public=10.68.16.7                   |
> | toolsbeta-exec-201            | ACTIVE  | -          | Running     | public=10.68.16.250                 |
> | wikidata-mobile               | ACTIVE  | -          | Running     | public=10.68.18.41                  |
> | wikispy                       | ACTIVE  | -          | Running     | public=10.68.17.119                 |
> | wlmjurytool2014               | ACTIVE  | -          | Running     | public=10.68.17.134                 |
> | wmt-exec                      | ACTIVE  | -          | Running     | public=10.68.17.236                 |
>
>
>
> _______________________________________________
> Labs-announce mailing list
> Labs-announce at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-announce
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20150810/06e4d07d/attachment-0001.html>


More information about the Labs-l mailing list