[Labs-l] [Labs-announce] Partial labs downtime Wednesday, 2015-08-12, 15:00 UTC: Reboot of labvirt1001
Merlijn van Deen
valhallasw at arctus.nl
Mon Aug 10 21:33:08 UTC 2015
For Tool Labs, the plan is as follows:
- tomorrow, we will disable the queue so no new tasks will be distributed
to the affected hosts
- we will send an e-mail with tasks that are still running an hour later
Unfortunately, there is currently no host that can run jobs that take
longer than a few days, because other virt* hosts will also be rebooted
this week.
For reference, the current long-running jobs on these hosts are the
following, grouped by user name:. Please take a look and consider whether
the jobs are still doing something useful -- and if not, please kill them
(qdel <job id>).
Merlijn
Columns:
job id name start date/time
aka
---------------
1317747 start Sat Aug 1 19:17:12 2015
tools.checkwiki
---------------
145845 eswiki-munch Thu Jun 25 05:00:13 2015
818559 arwiki-munch Sat Jul 18 05:00:16 2015
tools.dexbot
---------------
1236997 del Thu Jul 30 13:36:09 2015
1341699 kian_new2 Sun Aug 2 11:03:18 2015
tools.gpy
---------------
527733 gpy Thu Jul 9 01:14:28 2015
tools.luke081515bot
---------------
1346744 queue Sun Aug 2 14:24:31 2015
tools.mjbmrbot
---------------
209254 lgdcp2_1 Sat Jun 27 15:35:04 2015
273994 lgdcp2_2 Tue Jun 30 02:00:07 2015
345013 lgdcp2_3 Thu Jul 2 15:00:05 2015
807548 lsdcp2_3 Fri Jul 17 21:00:12 2015
1092477 lgdcp1_4 Sun Jul 26 14:00:07 2015
1093960 lsdcp1_4 Sun Jul 26 15:00:10 2015
tools.shuaib-bot
---------------
1622344 translator Mon Aug 10 02:10:09 2015
tools.wikidata-exports
---------------
694469 create_dumps Tue Jul 14 08:40:22 2015
735030 create_dumps Wed Jul 15 14:31:25 2015
768842 create_dumps Thu Jul 16 16:12:52 2015
On 10 August 2015 at 21:20, Andrew Bogott <abogott at wikimedia.org> wrote:
> On Wednesday I'll be rebooting labvirt1001. This will cause downtime for
> about 10% of labs instances, and this downtime may last as long as 60
> minutes (although the average downtime will be much less.)
>
> We will do our best to juggle and reschedule ToolLabs jobs, but persistent
> jobs that cannot gracefully restart may be interrupted and require your
> personal attention.
>
> Here is the list of instances that will be affected by this reboot:
>
> | citoidtest | ACTIVE | - | Running | public=10.68.16.182 |
> | conf | ACTIVE | - | Running | public=10.68.18.87, 208.80.155.233 |
> | deployment-bastion | ACTIVE | - | Running | public=10.68.16.58, 208.80.155.191 |
> | deployment-cache-text02 | ACTIVE | - | Running | public=10.68.16.16 |
> | deployment-elastic08 | ACTIVE | - | Running | public=10.68.17.188 |
> | deployment-memc03 | ACTIVE | - | Running | public=10.68.16.15 |
> | deployment-parsoid05 | ACTIVE | - | Running | public=10.68.16.120 |
> | deployment-pdf01 | ACTIVE | - | Running | public=10.68.16.73 |
> | deployment-restbase01 | ACTIVE | - | Running | public=10.68.17.227 |
> | deployment-salt | ACTIVE | - | Running | public=10.68.16.99 |
> | deployment-urldownloader | ACTIVE | - | Running | public=10.68.16.135 |
> | diffengine | ACTIVE | - | Running | public=10.68.17.127 |
> | educationdashboard-i18n | SHUTOFF | - | Shutdown | public=10.68.16.235 |
> | ee-flow-extra | ACTIVE | - | Running | public=10.68.16.102 |
> | etcd01 | ACTIVE | - | Running | public=10.68.16.130 |
> | etcd03 | ACTIVE | - | Running | public=10.68.16.132 |
> | firstinstance | SHUTOFF | - | NOSTATE | public=10.68.16.212 |
> | graphite-trusty | ACTIVE | - | Running | public=10.68.17.181 |
> | huggle-d2 | ACTIVE | - | Running | public=10.68.17.194 |
> | icinga | ACTIVE | - | Running | public=10.68.16.195 |
> | integration-raita | ACTIVE | - | Running | public=10.68.16.53 |
> | integration-slave-trusty-1013 | ACTIVE | - | Running | public=10.68.18.28 |
> | integration-slave-trusty-1015 | ACTIVE | - | Running | public=10.68.18.30 |
> | k8s-worker-02 | ACTIVE | - | Running | public=10.68.18.91 |
> | kartotherian1 | ACTIVE | - | Running | public=10.68.16.117 |
> | language-replag-slave | SHUTOFF | - | Shutdown | public=10.68.16.248 |
> | maps-tiles2 | ACTIVE | - | Running | public=10.68.17.110 |
> | mobile-browser-tests | ACTIVE | - | Running | public=10.68.16.149 |
> | mwreview-proxy-test | ACTIVE | - | Running | public=10.68.16.83 |
> | osmit-cruncher1 | ACTIVE | - | Running | public=10.68.17.92 |
> | puppet-jmm-debdeploy-precise | ACTIVE | - | Running | public=10.68.18.106 |
> | puppet-mailman | ACTIVE | - | Running | public=10.68.17.177 |
> | sentry-builder | ACTIVE | - | Running | public=10.68.18.82 |
> | staging-eventlogging | ACTIVE | - | Running | public=10.68.16.199 |
> | staging-ms-be03 | ACTIVE | - | Running | public=10.68.17.249 |
> | staging-rdb01 | ACTIVE | - | Running | public=10.68.17.193 |
> | staging-tin | ACTIVE | - | Running | public=10.68.16.110 |
> | stashbot-logstash | ACTIVE | - | Running | public=10.68.18.101 |
> | tools-bastion-02 | ACTIVE | - | Running | public=10.68.16.44, 208.80.155.132 |
> | tools-exec-1201 | ACTIVE | - | Running | public=10.68.17.49, 208.80.155.203 |
> | tools-exec-1202 | ACTIVE | - | Running | public=10.68.16.57, 208.80.155.211 |
> | tools-exec-1204 | ACTIVE | - | Running | public=10.68.17.88, 208.80.155.213 |
> | tools-exec-1206 | ACTIVE | - | Running | public=10.68.17.105, 208.80.155.215 |
> | tools-exec-1209 | ACTIVE | - | Running | public=10.68.17.129, 208.80.155.218 |
> | tools-exec-1213 | ACTIVE | - | Running | public=10.68.17.252, 208.80.155.222 |
> | tools-exec-1217 | ACTIVE | - | Running | public=10.68.18.20, 208.80.155.226 |
> | tools-exec-1218 | ACTIVE | - | Running | public=10.68.18.19, 208.80.155.227 |
> | tools-exec-1408 | ACTIVE | - | Running | public=10.68.18.14, 208.80.155.152 |
> | tools-exec-cyberbot | ACTIVE | - | Running | public=10.68.16.39 |
> | tools-webgrid-generic-1404 | ACTIVE | - | Running | public=10.68.18.53 |
> | tools-webgrid-lighttpd-1409 | ACTIVE | - | Running | public=10.68.18.43 |
> | tools-webgrid-lighttpd-1410 | ACTIVE | - | Running | public=10.68.18.44 |
> | toolsbeta-exec-101 | ACTIVE | - | Running | public=10.68.16.7 |
> | toolsbeta-exec-201 | ACTIVE | - | Running | public=10.68.16.250 |
> | wikidata-mobile | ACTIVE | - | Running | public=10.68.18.41 |
> | wikispy | ACTIVE | - | Running | public=10.68.17.119 |
> | wlmjurytool2014 | ACTIVE | - | Running | public=10.68.17.134 |
> | wmt-exec | ACTIVE | - | Running | public=10.68.17.236 |
>
>
>
> _______________________________________________
> Labs-announce mailing list
> Labs-announce at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-announce
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20150810/06e4d07d/attachment-0001.html>
More information about the Labs-l
mailing list