October 2019 - Cloud - lists.wikimedia.org

[Cloud-announce] Cloud VPS users, please claim your projects
by Andrew Bogott 25 Nov '19

25 Nov '19

Every year or so the Cloud Services team tries to identify and clean up unused projects and VMs. We do this via an opt-in process: anyone can mark a project as 'in use,' and that project will be preserved for another year. I've created a wiki page the lists all existing projects, here: https://wikitech.wikimedia.org/wiki/News/Cloud_VPS_2019_Purge If you are a VPS user, please visit that page and mark any projects that you use as {{Used}}. Note that it's not necessary for you to be a project admin to mark something -- if you know that you're currently using a resource and want to keep using it, go ahead and mark it accordingly. If you /are/ a project admin, please take a moment to mark which VMs are or aren't used in your projects. When December arrives, I will shut down and begin the process of reclaiming resources from unused projects. If you think you use a VPS project but aren't sure which, I encourage you to poke around on https://tools.wmflabs.org/openstack-browser/ to see what looks familiar. Worst case, just email cloud(a)lists.wikimedia.org with a description of your use case and we'll sort it out there. Exclusive toolforge users are free to ignore this task. Thank you! -Andrew and WMCS team _______________________________________________ Wikimedia Cloud Services announce mailing list Cloud-announce(a)lists.wikimedia.org (formerly labs-announce(a)lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce

2 4

[Toolforge] Proxy maintenance operation next Monday 2019-10-28 @ 14:30 UTC
by Arturo Borrero Gonzalez 28 Oct '19

28 Oct '19

Hi there! Next Monday 2019-10-28 @ 14:30 UTC we will do a maintenance operation on Toolforge which consists in rebuilding the main front proxy [0] used to serve webservices. We expect this to be done within a 30 minutes window. The operation consists on replacing the old virtual machines supporting the proxy (currently running Debian Jessie) with more modern instances running Debian Buster. Both Grid/Kubernetes backends are affected by this change. We don't expect a lot of service downtime, but there is a key point in the operation which is migrating data stored in Redis which can be tricky. The o Examples of things affected by this change: * Browsing Toolforge webservices * Browsing to https://tools.wmflabs.org/<toolname> * Browsing to https://tools.wmflabs.org/admin/ (Toolforge landing page) * Browsing PAWS (to some extent, since it shares part of the toolforge proxy) Example of things not affected by this change: * webservices backend operations * SSH bastions * grid queues, grid jobs * wiki-replicas, toolsdb * other CloudVPS projects regards. [0] https://phabricator.wikimedia.org/T235627 -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation

2 3

[Cloud-announce] Brief ToolsDB Outage - Thursday 10/24 @11am UTC
by Brooke Storm 24 Oct '19

24 Oct '19

With a redundant power supply upgrade going on this week in the datacenter that could affect the VM that Toolsdb runs on, we anticipate a brief outage Thursday 10/24 @11am UTC of the mysql service to protect data in case anything goes wrong. This may require a restart of a tool to reconnect to the database. We do not anticipate any worse disruptions, but if there is any disruption beyond what is planned, a failover may be necessary, which will not include the non-replicated tables mentioned here https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#ToolsDB_Backups… <https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#ToolsDB_Backups…> The maintenance requiring this notice and action is detailed here https://phabricator.wikimedia.org/T227540 <https://phabricator.wikimedia.org/T227540>. The VM resides on the cloudvirt1019 hypervisor, which is why it is in scope. We sincerely apologize for the short notice. Brooke Storm Senior SRE Wikimedia Cloud Services bstorm(a)wikimedia.org <mailto:bstorm@wikimedia.org> IRC: bstorm_ _______________________________________________ Wikimedia Cloud Services announce mailing list Cloud-announce(a)lists.wikimedia.org (formerly labs-announce(a)lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce

5 5

[Cloud-announce] [Toolforge] Lighttpd access.log is no longer enabled by default
by Hieu Pham 23 Oct '19

23 Oct '19

Effective immediately, Toolforge's webservices (re-)started by the webservice command will no longer produce a $HOME/access.log file by default. This feature can easily be re-enabled if required for your tool. To do so, please follow the instructions posted at https://w.wiki/9go Since not everyone requires the access.log feature, we have decided that it makes more sense to have it disabled by default. We believe that this change will improve the overall Toolforge experience. Not only we can free up disk spaces but also the CPU cycle taken up by the web servers to produce the access.log files. If you see odd behaviour when starting or restarting a webservice that looks like it could be related to this change please let myself or one of the Toolforge admins know by either filing a Phabricator bug report or for faster response joining the #wikimedia-cloud IRC channel on Freenode and sending a "!help ...." message to the channel explaining your issue. Hieu Pham - on behalf of the Toolforge admin team _______________________________________________ Wikimedia Cloud Services announce mailing list Cloud-announce(a)lists.wikimedia.org (formerly labs-announce(a)lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce

1 0

CloudVPS maintenance on Wednesday 2019-10-23 (round 3 of cloudvirt reboots)
by Arturo Borrero Gonzalez 23 Oct '19

23 Oct '19

Hello! Next Wednesday 2019-10-23 at 09:00 UTC we will be doing another maintenance operation on some of our cloudvirts servers (the hypervisor servers) that involves rebooting both the physical servers and the virtual machines running on them. The reasons is that we ned to update the running linux kernel version they have. In this window we will reboot 3 hypervisors: * cloudvirt1014 * cloudvirt1025 * cloudvirt1026 The procedure will be to reboot a server, wait for it to come back online (could take up to 5 minutes) and wait for all the VMs to come back online. Then move to the next server. Toolforge users may see their tools and webservices briefly disrupted due to several components of the Toolforge infrastructure being rebooted in this operation. If nothing changes (reallocated or new virtual machine, etc) this is the list of affected VM instances in each hypervisor: * cloudvirt1014: VM: globalcu-app-01 PROJECT: globalcu VM: xtools-dev05 PROJECT: xtools VM: ores-web-05 PROJECT: ores VM: roebling PROJECT: wikispore VM: ores-web-04 PROJECT: ores VM: canary1014-01 PROJECT: testlabs VM: cloudinfra-db02 PROJECT: cloudinfra VM: Krypton PROJECT: codereview VM: dashiki-staging-01 PROJECT: dashiki VM: discovery-production-02 PROJECT: shiny-r VM: logparse01 PROJECT: security-tools VM: tools-sgegrid-shadow PROJECT: tools VM: maps-tiles1 PROJECT: maps VM: wikimetrics-01 PROJECT: wikimetrics VM: tools-sgecron-01 PROJECT: tools VM: wikitextexp-base-1002 PROJECT: wikitextexp VM: kask-client PROJECT: services VM: clm-web-01 PROJECT: community-labs-monitoring VM: accounts-appserver4 PROJECT: account-creation-assistance VM: taxonbota PROJECT: dwl VM: tofawiki02 PROJECT: fa-wp VM: federated-commons PROJECT: wikidata-federation VM: puppetmaster PROJECT: thumbor VM: lizenzhinweisgenerator PROJECT: lizenzhinweisgenerator VM: hound-puppet-02 PROJECT: hound VM: packagist-mirror1 PROJECT: packagist-mirror VM: deployment-elastic06 PROJECT: deployment-prep VM: deployment-changeprop PROJECT: deployment-prep VM: deployment-restbase02 PROJECT: deployment-prep VM: deployment-imagescaler01 PROJECT: deployment-prep VM: deployment-zookeeper02 PROJECT: deployment-prep VM: deployment-kafka-jumbo-1 PROJECT: deployment-prep VM: deployment-memc07 PROJECT: deployment-prep VM: deployment-eventlog05 PROJECT: deployment-prep VM: deployment-cpjobqueue PROJECT: deployment-prep VM: deployment-mediawiki-07 PROJECT: deployment-prep VM: deployment-chromium01 PROJECT: deployment-prep VM: deployment-puppetdb02 PROJECT: deployment-prep VM: deployment-puppetmaster03 PROJECT: deployment-prep VM: deployment-cache-text05 PROJECT: deployment-prep VM: gfg01 PROJECT: video VM: encoding01 PROJECT: video VM: whgi PROJECT: wikidumpparse VM: wikilabels PROJECT: wmf-research-tools VM: gitservices PROJECT: getstarted VM: design-research-methods PROJECT: design VM: wikilabels-backups PROJECT: wikilabels VM: wikilabels-02 PROJECT: wikilabels VM: dumpgrepper PROJECT: visualeditor VM: compiler1001 PROJECT: puppet-diffs VM: af-puppetdb02 PROJECT: automation-framework VM: clickmodel PROJECT: search VM: tool PROJECT: recommendation-api VM: missing-sections PROJECT: recommendation-api VM: ores-lb-03 PROJECT: ores VM: puppet-ema-2 PROJECT: puppet VM: matrix-synapse-01 PROJECT: matrix VM: wikibrain-embeddings-01 PROJECT: wikibrain VM: qube-node2 PROJECT: k8splay VM: captcha-tf-43 PROJECT: privpol-captcha VM: k4-2 PROJECT: analytics * cloudvirt1025: VM: integration-agent-docker-1006 PROJECT: integration VM: striker-deploy04 PROJECT: striker VM: rec-wiki-2 PROJECT: recommendation-api VM: deployment-ms-fe03 PROJECT: deployment-prep VM: deployment-poolcounter05 PROJECT: deployment-prep VM: deployment-ms-be05 PROJECT: deployment-prep VM: readers-web-stephen PROJECT: reading-web-staging VM: traffic-upload-stretch PROJECT: traffic VM: traffic-recdns-anycast PROJECT: traffic VM: deployment-maps05 PROJECT: deployment-prep VM: gerrit-sizzle PROJECT: security-tools VM: tools-sgewebgrid-generic-0901 PROJECT: tools VM: shinken-puppetmaster-01 PROJECT: shinken VM: osmit-due PROJECT: osmit VM: deployment-acme-chief03 PROJECT: deployment-prep VM: meza-cindy PROJECT: pluggableauth VM: accounts-db4 PROJECT: account-creation-assistance VM: krenair-clientpackages-py3-jessie PROJECT: testlabs VM: deployment-sessionstore01 PROJECT: deployment-prep VM: paws-worker-04 PROJECT: paws VM: paws-ext-lb-02 PROJECT: paws VM: paws-int-lb-01 PROJECT: paws VM: paws-master-03 PROJECT: paws VM: paws-master-01 PROJECT: paws VM: language-readership PROJECT: language VM: wmde-wikidiff2-patched-stretch PROJECT: wikidiff2-wmde-dev VM: tools-sgebastion-08 PROJECT: tools VM: compiler1002 PROJECT: puppet-diffs VM: phragile-db PROJECT: phragile VM: cloud-puppetmaster-01 PROJECT: cloudinfra VM: chicotest-cappy01 PROJECT: chicotestproject VM: visualeditor-prototype2 PROJECT: visualeditor VM: programs-and-events-dashboard PROJECT: globaleducation VM: osmit-uno PROJECT: osmit VM: tools-sgewebgrid-lighttpd-0904 PROJECT: tools VM: canary1025-01 PROJECT: testlabs VM: mathosphere PROJECT: math VM: social-tools3 PROJECT: social-tools VM: togetherjs PROJECT: visualeditor VM: language-mleb-legacy PROJECT: language VM: women-in-red PROJECT: globaleducation VM: ntp-01 PROJECT: cloudinfra VM: mc-clusterA-1 PROJECT: test-twemproxy VM: wikifarm PROJECT: pluggableauth VM: login-test PROJECT: catgraph VM: puppenmeister PROJECT: planet * cloudvirt1026: VM: integration-agent-docker-1016 PROJECT: integration VM: wikidata-new-wbterm PROJECT: wikidata-dev VM: incubator-test PROJECT: incubator VM: cloudinfra-internal-puppetmaster01 PROJECT: cloudinfra VM: cloudinfra-db01 PROJECT: cloudinfra VM: tools-checker-03 PROJECT: tools VM: tools-static-13 PROJECT: tools VM: wp1 PROJECT: mwoffliner VM: pk8s PROJECT: planet VM: arturo-k8s-test-4-1 PROJECT: openstack VM: banner PROJECT: wikidumpparse VM: packager01 PROJECT: packaging VM: tools-package-builder-02 PROJECT: tools VM: canary1026-02 PROJECT: testlabs VM: security-checker1 PROJECT: packagist-mirror VM: logstack02 PROJECT: security-tools VM: logstack01 PROJECT: security-tools VM: mediawiki2latex-large PROJECT: collection-alt-renderer VM: tools-sge-services-03 PROJECT: tools VM: tools-sgewebgrid-lighttpd-0928 PROJECT: tools VM: tools-sgewebgrid-lighttpd-0921 PROJECT: tools VM: tools-sgewebgrid-generic-0903 PROJECT: tools VM: tools-sgeexec-0938 PROJECT: tools VM: tools-sgeexec-0936 PROJECT: tools VM: tools-sgeexec-0935 PROJECT: tools VM: tools-sgeexec-0919 PROJECT: tools VM: tools-sgeexec-0917 PROJECT: tools VM: tools-sgeexec-0916 PROJECT: tools VM: tools-sgeexec-0915 PROJECT: tools VM: tools-sgeexec-0914 PROJECT: tools VM: tools-paws-worker-1010 PROJECT: tools VM: tools-paws-worker-1019 PROJECT: tools VM: openstack-puppetmaster-01 PROJECT: openstack VM: web1 PROJECT: graphql VM: etytree-b PROJECT: etytree VM: canary1026-01 PROJECT: testlabs VM: db-instance PROJECT: videowiki VM: tools-sgeexec-0906 PROJECT: tools VM: mwoffliner5 PROJECT: mwoffliner regards -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation

1 2

[Toolforge] annoying bug affecting webservices running on grid engine fixed
by Bryan Davis 18 Oct '19

18 Oct '19

TL;DR: All webservices running on the grid engine backend in Toolforge were restarted around 2019-10-18 21:29 UTC. Following the restart, these jobs should retain the ability to write to their original TMPDIR. Earlier this week Musikanimal commented on a stale ticket [0] about a mysteriously intermittent "(chunk.c.553) opening temp-file failed: No such file or directory" error in a particular webservice. A related bug [1] (now merged into the first as a duplicate) had been looked at in depth previously by Zhuyifei1999 with no clear conclusion. I started looking into the problem with little expectation of finding an answer, but a hope that I could at least rule some things out as the "root cause". I got lucky this time and did figure out a root cause for the problem. It turns out that Grid Engine creates a unique directory under /tmp for each job that is started. This directory is named /tmp/{job number}.{task number}.{queue name}. The job's main process is started with the TMPDIR environment variable pointing to this unique directory. Separately, we have a daily cron task which runs on each Grid Engine exec node marked as a part of the webgrid-generic or webgrid-lighttpd job queues to remove files and empty directories under /tmp which have not been accessed in more than 24 hours. This cleanup task was deleting the empty TMPDIR of jobs which had not written to or read from their TMPDIR in more than 24 hours. Once I made this connection, the fix was as simple as configuring the cleanup task to ignore empty directories that look like the TMPDIR pattern used by Grid Engine. After the configuration change was deployed, I setup a temporary webservice to monitor its own TMPDIR to verify that it was indeed fixed. Earlier today that tool crossed the 48 hour runtime worst case I had calculated with no recurrence of the error. With that confirmation of the fix, I decided to restart all of the webservice jobs running on the grid engine in Toolforge to ensure that they have a TMPDIR created. This seemed like a better solution than just emailing the cloud-announce list to tell folks to restart their webservices if they were likely to be affected. The process I went through in debugging is well documented on the task [2]. The notes there do not include all the web searches I did for various error messages and documentation of FOSS software involved in the webservice, but they do pretty clearly show that I started out looking in one place and ended up figuring out the root cause was something completely different. The final analysis also shows how fixing one problem [3] can unintentionally lead to new problems. [0]: https://phabricator.wikimedia.org/T217815 [1]: https://phabricator.wikimedia.org/T225966 [2]: https://phabricator.wikimedia.org/T217815#5577987 [3]: https://phabricator.wikimedia.org/T190185 Bryan -- Bryan Davis Technical Engagement Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808

1 0

CloudVPS maintenance on Wednesday 2019-10-16 (round 2 of cloudvirt reboots)
by Arturo Borrero Gonzalez 16 Oct '19

16 Oct '19

Hello! Next Wednesday 2019-10-16 at 09:00 UTC we will be doing another maintenance operation on some of our cloudvirts servers (the hypervisor servers) that involves rebooting both the physical servers and the virtual machines running on them. The reasons is that we ned to update the running linux kernel version they have. In this window we will reboot 4 hypervisors: * cloudvirt1028 * cloudvirt1029 * cloudvirt1030 The procedure will be to reboot a server, wait for it to come back online (could take up to 5 minutes) and wait for all the VMs to come back online. Then move to the next server. Toolforge users may see their tools and webservices briefly disrupted due to several components of the Toolforge infrastructure being rebooted in this operation. If nothing changes (reallocated or new virtual machine, etc) this is the list of affected VM instances in each hypervisor: * cloudvirt1028: VM: ores-worker-05 PROJECT: ores VM: phamhi-puppetclient PROJECT: testlabs VM: integration-agent-jessie-docker-1001 PROJECT: integration VM: language-cx2 PROJECT: language VM: apertium PROJECT: language VM: janitor01 PROJECT: puppet VM: integration-agent-docker-1008 PROJECT: integration VM: integration-agent-docker-1005 PROJECT: integration VM: xtools-prod06 PROJECT: xtools VM: eventmetrics-prod02 PROJECT: eventmetrics VM: cloud-cumin-02 PROJECT: cloudinfra VM: appslabs PROJECT: mobile VM: meza-2 PROJECT: meza VM: meza-1 PROJECT: meza VM: labs-t224000-alex-osdev PROJECT: openstack VM: clouddb-wikilabels-01 PROJECT: clouddb-services VM: stretch PROJECT: thumbor VM: tools-puppetmaster-01 PROJECT: tools VM: commtech-nsfw PROJECT: commtech VM: builder-envoy PROJECT: packaging VM: cloudstore-dev-01 PROJECT: cloudstore VM: deployment-db05 PROJECT: deployment-prep VM: cloud-puppetmaster-02 PROJECT: cloudinfra VM: osmit-umap PROJECT: osmit VM: util-abogott-stretch PROJECT: testlabs VM: wdhqs-1 PROJECT: wikidata-history-query-service VM: tools-docker-registry-04 PROJECT: tools VM: adhoc-utils01 PROJECT: security-tools VM: tools-proxy-04 PROJECT: tools VM: tools-docker-builder-06 PROJECT: tools VM: tools-sgeexec-0921 PROJECT: tools VM: proxy-02 PROJECT: project-proxy VM: canary1028-01 PROJECT: testlabs VM: vconverter-instance PROJECT: videowiki VM: snuggle-enwiki-01 PROJECT: snuggle VM: saucelabs-02 PROJECT: integration VM: clm-test-01 PROJECT: community-labs-monitoring VM: deployment-memc05 PROJECT: deployment-prep VM: deployment-sca01 PROJECT: deployment-prep VM: af-puppetdb01 PROJECT: automation-framework VM: api PROJECT: openocr VM: a11y PROJECT: reading-web-staging VM: cyberbot-exec-iabot-01 PROJECT: cyberbot VM: fridolin PROJECT: catgraph * cloudvirt1029: VM: clouddb-wikilabels-02 PROJECT: clouddb-services VM: buster PROJECT: thumbor VM: gerrit-test5 PROJECT: git VM: toolsbeta-paws-worker-1001 PROJECT: toolsbeta VM: toolsbeta-flannel-etcd-01 PROJECT: toolsbeta VM: toolsbeta-services-01 PROJECT: toolsbeta VM: tools-worker-1021 PROJECT: tools VM: tools-worker-1015 PROJECT: tools VM: traffic-rpki PROJECT: traffic VM: paws-worker-03 PROJECT: paws VM: paws-worker-02 PROJECT: paws VM: cloud-bootstrapvz-buster-bootstrap PROJECT: openstack VM: structurednavigation PROJECT: structurednavigation VM: tools-worker-1027 PROJECT: tools VM: tools-sgebastion-07 PROJECT: tools VM: wmde-dashboards PROJECT: wmde-dashboards VM: wikidata-shex PROJECT: wikidata-dev VM: tools-proxy-03 PROJECT: tools VM: tools-sge-services-04 PROJECT: tools VM: abogott-proxy-canary PROJECT: testlabs VM: tools-sgewebgrid-lighttpd-0919 PROJECT: tools VM: tools-sgewebgrid-lighttpd-0918 PROJECT: tools VM: tools-sgewebgrid-lighttpd-0916 PROJECT: tools VM: tools-sgewebgrid-lighttpd-0914 PROJECT: tools VM: tools-sgewebgrid-lighttpd-0913 PROJECT: tools VM: tools-sgewebgrid-lighttpd-0912 PROJECT: tools VM: tools-sgewebgrid-lighttpd-0911 PROJECT: tools VM: tools-sgewebgrid-lighttpd-0905 PROJECT: tools VM: tools-sgeexec-0934 PROJECT: tools VM: tools-sgeexec-0932 PROJECT: tools VM: tools-sgeexec-0931 PROJECT: tools VM: tools-sgeexec-0930 PROJECT: tools VM: tools-sgeexec-0928 PROJECT: tools VM: tools-sgeexec-0913 PROJECT: tools VM: tools-sgeexec-0912 PROJECT: tools VM: tools-sgeexec-0907 PROJECT: tools VM: tools-paws-worker-1007 PROJECT: tools VM: canary1029-01 PROJECT: testlabs VM: maps-puppetmaster PROJECT: maps VM: deployment-imagescaler03 PROJECT: deployment-prep VM: integration-slave-jessie-1004 PROJECT: integration * cloudvirt1030: VM: taxonbota-b PROJECT: dwl VM: irc-buster PROJECT: dwl VM: janus1-1 PROJECT: analytics VM: integration-agent-docker-1012 PROJECT: integration VM: integration-agent-docker-1011 PROJECT: integration VM: tools-sgeexec-0929 PROJECT: tools VM: canary1030-01 PROJECT: testlabs VM: t166878 PROJECT: otrs VM: phragile-pro PROJECT: phragile VM: signwriting-swserver PROJECT: signwriting VM: signwriting-swis PROJECT: signwriting VM: commonsarchive-prod PROJECT: commonsarchive VM: math-docker PROJECT: math VM: ldfclient-new PROJECT: wikidata-query VM: wikidata-misc PROJECT: wikidata-dev VM: packaging PROJECT: thumbor VM: neon PROJECT: rcm VM: oxygen PROJECT: rcm VM: hafnium PROJECT: rcm VM: hound-app-01 PROJECT: hound VM: mediawiki2latex PROJECT: collection-alt-renderer VM: deployment-sca02 PROJECT: deployment-prep VM: deployment-memc04 PROJECT: deployment-prep VM: deployment-fluorine02 PROJECT: deployment-prep VM: deployment-mcs01 PROJECT: deployment-prep VM: deployment-parsoid09 PROJECT: deployment-prep VM: deployment-sca04 PROJECT: deployment-prep VM: deployment-kafka-jumbo-2 PROJECT: deployment-prep VM: deployment-kafka-main-1 PROJECT: deployment-prep VM: deployment-mediawiki-09 PROJECT: deployment-prep VM: deployment-webperf12 PROJECT: deployment-prep VM: deployment-deploy02 PROJECT: deployment-prep VM: deployment-deploy01 PROJECT: deployment-prep VM: deployment-maps04 PROJECT: deployment-prep VM: twlight-tracker PROJECT: twl VM: encoding02 PROJECT: video VM: encoding03 PROJECT: video VM: wikispeech-tts-dev PROJECT: wikispeech VM: pub2 PROJECT: wikiapiary VM: integration-slave-jessie-1001 PROJECT: integration VM: ores-staging-01 PROJECT: ores-staging VM: ve-font PROJECT: design VM: visualeditor-test2 PROJECT: visualeditor VM: ores-redis-02 PROJECT: ores VM: quarry-worker-01 PROJECT: quarry VM: fastcci-new-master PROJECT: fastcci VM: cvn-app8 PROJECT: cvn regards. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation

2 4

[Cloud-announce] CloudVPS maintenance on Wednesday 2019-10-09 (round of cloudvirt reboots)
by Arturo Borrero Gonzalez 09 Oct '19

09 Oct '19

Hi there, Next Wednesday 2019-10-09 at 09:00 UTC we will be doing a maintenance operation on some of our cloudvirt servers (the hypervisor servers) that involves rebooting both the physical servers and the virtual machines running on them. The reason is that we need to update the running linux kernel version they have. In this window we will reboot 4 hypervisors: * cloudvirt1008 * cloudvirt1009 * cloudvirt1012 * cloudvirt1013 The procedure will be to reboot a server, wait for it to come back online (could take up to 5 minutes) and wait for all the VMs to come back online. Then move to the next server. Toolforge users may see their tools and webservices briefly disrupted due to several components of the Toolforge infrastructure being rebooted in this operation. If nothing changes (reallocated or new virtual machine, etc) this is the list of affected VM instances in each hypervisor: * cloudvirt1008: VM: tools-sgebastion-09 PROJECT: tools VM: tools-k8s-master-01 PROJECT: tools VM: deployment-cache-upload05 PROJECT: deployment-prep VM: toolsbeta-paws-worker-1002 PROJECT: toolsbeta VM: toolsbeta-puppetmaster-02 PROJECT: toolsbeta VM: tools-mail-02 PROJECT: tools VM: tools-prometheus-02 PROJECT: tools VM: tools-elastic-01 PROJECT: tools VM: tracker1 PROJECT: lta-tracker VM: tools-clushmaster-02 PROJECT: tools VM: tools-worker-1020 PROJECT: tools VM: tools-k8s-etcd-01 PROJECT: tools VM: tools-worker-1010 PROJECT: tools VM: tools-worker-1008 PROJECT: tools VM: tools-worker-1007 PROJECT: tools VM: tools-worker-1003 PROJECT: tools VM: tools-sgeexec-0937 PROJECT: tools * cloudvirt1009: VM: toolsbeta-paws-master-01 PROJECT: toolsbeta VM: tools-elastic-02 PROJECT: tools VM: tools-paws-worker-1005 PROJECT: tools VM: tools-prometheus-01 PROJECT: tools VM: tools-paws-worker-1002 PROJECT: tools VM: puppet-lta PROJECT: lta-tracker VM: tools-flannel-etcd-03 PROJECT: tools VM: tools-worker-1017 PROJECT: tools VM: tools-k8s-etcd-02 PROJECT: tools VM: tools-worker-1013 PROJECT: tools VM: tools-worker-1012 PROJECT: tools VM: tools-worker-1009 PROJECT: tools VM: tools-worker-1006 PROJECT: tools VM: tools-worker-1004 PROJECT: tools * cloudvirt1012: VM: tools-paws-master-01 PROJECT: tools VM: deployment-ms-be06 PROJECT: deployment-prep VM: toolsbeta-worker-1001 PROJECT: toolsbeta VM: deployment-cumin02 PROJECT: deployment-prep VM: toolsbeta-k8s-master-01 PROJECT: toolsbeta VM: toolsbeta-k8s-etcd-01 PROJECT: toolsbeta VM: toolsbeta-puppetdb-01 PROJECT: toolsbeta VM: tools-redis-1002 PROJECT: tools VM: tools-paws-worker-1003 PROJECT: tools VM: tools-paws-worker-1001 PROJECT: tools VM: tools-elastic-03 PROJECT: tools VM: tools-worker-1025 PROJECT: tools VM: tools-worker-1026 PROJECT: tools VM: tools-worker-1022 PROJECT: tools VM: tools-worker-1019 PROJECT: tools VM: tools-worker-1018 PROJECT: tools VM: tools-k8s-etcd-03 PROJECT: tools VM: tools-worker-1016 PROJECT: tools VM: tools-flannel-etcd-01 PROJECT: tools VM: tools-worker-1014 PROJECT: tools VM: phlogiston-5 PROJECT: phlogiston VM: dumps-3 PROJECT: dumps VM: codesearch4 PROJECT: codesearch VM: wikispeech-wiki-stretch PROJECT: wikispeech VM: ores-worker-01 PROJECT: ores VM: puppet-jmm-kernel-stretch2 PROJECT: puppet VM: mcr-base PROJECT: mcr-dev VM: rel2 PROJECT: search VM: mc-clusterA-2 PROJECT: test-twemproxy VM: wikibrain-embeddings-02 PROJECT: wikibrain VM: qube-node1 PROJECT: k8splay VM: cindy PROJECT: pluggableauth VM: cvn-apache9 PROJECT: cvn VM: zk1-2 PROJECT: analytics * cloudvirt1013: VM: tools-flannel-etcd-02 PROJECT: tools VM: paws-ext-lb-01 PROJECT: paws VM: abogott-puppetclient PROJECT: testlabs VM: tools-worker-1028 PROJECT: tools VM: tools-worker-1005 PROJECT: tools VM: cloudstore-dev-02 PROJECT: cloudstore VM: cloudstore-puppetmaster-01 PROJECT: cloudstore VM: deployment-aqs03 PROJECT: deployment-prep VM: osmit-test PROJECT: osmit VM: tools-sgewebgrid-lighttpd-0927 PROJECT: tools VM: tools-sgewebgrid-lighttpd-0926 PROJECT: tools VM: tools-sgewebgrid-lighttpd-0925 PROJECT: tools VM: tools-sgewebgrid-lighttpd-0924 PROJECT: tools VM: tools-sgewebgrid-lighttpd-0923 PROJECT: tools VM: tools-sgewebgrid-lighttpd-0922 PROJECT: tools VM: tools-sgewebgrid-lighttpd-0920 PROJECT: tools VM: tools-sgewebgrid-lighttpd-0917 PROJECT: tools VM: tools-sgewebgrid-lighttpd-0909 PROJECT: tools VM: tools-sgeexec-0925 PROJECT: tools VM: tools-sgeexec-0923 PROJECT: tools VM: tools-sgeexec-0910 PROJECT: tools VM: cyberbot-db-01 PROJECT: cyberbot regards. -- Arturo Borrero Gonzalez SRE / Wikimedia Cloud Services Wikimedia Foundation _______________________________________________ Wikimedia Cloud Services announce mailing list Cloud-announce(a)lists.wikimedia.org (formerly labs-announce(a)lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce

1 1

[Cloud-announce] [Wiki replicas] query killer time limit reduced to 1 hour for <wikidb>.analytics.db.svc.eqiad.wmflabs
by Bryan Davis 08 Oct '19

08 Oct '19

The <wikidb>.analytics.db.svc.eqiad.wmflabs database servers have been experiencing some stability issues in the last two to three weeks that we have reason to believe are related to query volume. The DBA team at the Wikimedia Foundation is looking into various changes that may help with these problems including software upgrades for our MariaDB deployments. Today we took an initial step of reducing the maximum time allowed for a query to complete on the <wikidb>.analytics.db.svc.eqiad.wmflabs hosts to 1 hour. We were using an upper limit of 4 hours previously. Our hope is that this change will relieve some stress on the shared servers and allow us more time to look into other changes to restore stability. Ideally we will be able to increase the limit again after making other changes to these systems. Bryan, on behalf of the Cloud Services team -- Bryan Davis Technical Engagement Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808 _______________________________________________ Wikimedia Cloud Services announce mailing list Cloud-announce(a)lists.wikimedia.org (formerly labs-announce(a)lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce

1 0

[Cloud-announce] cloud-vps maintenance Monday, 2019-10-07
by Andrew Bogott 07 Oct '19

07 Oct '19

We'll be upgrading the cloud services OpenStack install on Monday, beginning at 14:00 UTC. The entire upgrade process may take a couple of hours. Early on in the process, Horizon (and associated OpenStack APIs) will be disabled (probably for 20 to 30 minutes.) There may also be brief network interruptions during the upgrade, although if all goes well these will not be noticeable by users. Toolforge and existing VMs should be largely unaffected apart from possible network hiccups. - Andrew + the WMCS team _______________________________________________ Wikimedia Cloud Services announce mailing list Cloud-announce(a)lists.wikimedia.org (formerly labs-announce(a)lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud-announce

3 4

2024

2023

2022

2021

2020

2019

2018

2017

Cloud October 2019