Hello,
Pretty much everyone who's dealt with creating views for new wikis on the
labs hosts have experienced issues with "Access denied" sometimes.
This was usually due to the MariaDB grant role being missed. We tried to
workaround this by including the grant addition on the maintain-views
script.
Unfortunately, we ran into very weird problems when doing so and this is an
example: https://phabricator.wikimedia.org/T193187#4273281
After lots of back and forth we decided to create a bug to MariaDB (
https://jira.mariadb.org/browse/MDEV-16466) which was confirmed by MariaDB
yesterday and pointed to a similar issue (
https://jira.mariadb.org/browse/MDEV-14732).
The expected fix will come in 10.4 (we are in 10.1), so quite long ahead of
us.
So, for now, the workaround before adding new views is to manually add the
GRANT on the DB and then run the script:
GRANT SELECT, SHOW VIEW ON `newiki\_p`.* to labsdbuser';
Hopefully with this email everyone is on the same page now.
Thanks everyone (specially Brooke for helping me out with the
troubleshooting!)
Manuel.
Hi!
On 2019-06-03 UTC+2 14:00 (next monday) we will be rebuilding the
cloudservices1003 server,
that holds the designate service which serves DNS request for CloudVPS and
Toolforge.
We have a backup server -cloudservices1004-, so we don't expect a lot of
downtime. But DNS queries are really fast, and there may be a lot of them that
will fail while we stabilize the DNS service.
Please reach out to the WMCS team if you need more details or have any doubts.
regards.
--
Arturo Borrero Gonzalez
Operations Engineer / Wikimedia Cloud Services
Wikimedia Foundation
Hi,
there was a outage today 2019-05-29 in CloudVPS/Toolforge involving keystone +
NFS. All CloudVPS projects (including Toolforge) had troubles using the
NFS-based storage due to an upgrade operation we were doing in
cloudcontrol1003.wikimedia.org.
You can read more about the incident here:
https://wikitech.wikimedia.org/wiki/Incident_documentation/20190529-NFS-key…
The incident postmorten is not completed yet, but you can already read the main
sections:
* what happened
* timeline
* things to improve in the future
regards.
--
Arturo Borrero Gonzalez
Operations Engineer / Wikimedia Cloud Services
Wikimedia Foundation
This was my first trip to PyCon, and I can definitely say it is a strange bird as conferences go (surprisingly emotional). On the more standard conference side of it, besides a lot of hacking and people trying to sell things or hire people, these elements stood out:
Python 2:
Everyone is sort of dancing on Python 2’s grave in the Python community. There are stickers of its grave that were so popular I could only get ones that have a company name on them as well. The transition is firmly established as a Good Thing, and it is now seen as a problematic issue to have Python 2 in an environment (*glares at Debian*). This is probably well-known here, but it bears mentioning. Also: https://pythonclock.org/ <https://pythonclock.org/>
Black:
The auto formatter black is catching on a lot. Django is considering a move to it. CircuitPython has an open issue to move everything in basically all libraries that isn’t C to it. It is becoming a fairly well-regarded way of generating low-diff code by way of simply not making formatting decisions (which is why it has no configuration except the CLI arg to change line length). I’ve been using it on things I touch where it makes sense.
PEP554:
The effort to allow sub-interpreters and make threading so much more disastrously fun is moving right along. If you hate the GIL, you’ll like this or hate this even more. It is expected to actually show up in python 3.8-3.9 somewhere. This means, it probably won’t show up in Buster? However, with the power of pyenv and similar things, actual concurrency in Python may be coming to a Toolforge or someone’s VPS project near you one day. Until then, it’s kind of cool to know it might be coming.
https://www.python.org/dev/peps/pep-0554/ <https://www.python.org/dev/peps/pep-0554/>
pipenv:
Pipenv is still not “standard”, but it is picking up steam. If other efforts to package up OpenStack in deb, containers, etc. fail, the specificity of using Pipenv.lock files and such may turn pipenv into a possible very good deploy alternative. They don’t usually seem to want to say it, but I will, it makes deploying python as well-developed as deploying nodejs or rails ;-)
https://github.com/pypa/pipenv <https://github.com/pypa/pipenv>
There was a lot of other cool stuff going on, but a lot of it was not the most pertinent things to WMCS, perhaps. We won’t get the f-strings that everyone’s excited about until Buster (and can’t use them reliably until that’s the old-stable), and the walrus operator won’t actually end up in Debian until…the Future (https://www.python.org/dev/peps/pep-0572/ <https://www.python.org/dev/peps/pep-0572/>). Apparently Python is also the primary language choice of dystopia (see also TensorFlow)—nothing new but really in-your-face at PyCon. It is also very clear that people would like to know how to deploy Python on Toolforge and our setup there. I was asked repeatedly to give a talk/demo, do an open space or have a development sprint on our stuff (dev sprints are hard when you are on Gerrit, though—the tutorial for other folks is 100% github/lab). I am interested in trying to do one or more of those next year if everything aligns.
Brooke Storm
Operations Engineer
Wikimedia Cloud Services
bstorm(a)wikimedia.org <mailto:bstorm@wikimedia.org>
IRC: bstorm_
Hi!
on 2019-05-16 13:00 UTC there will be a maintenance operation in one of the
Wikimedia Foundation datacenter racks that affects 2 of our servers running
virtual machines [0]. There is a risk that this maintenance operation can result
in power loss of the servers, affecting the virtual machines running on it.
However, there is no way to know for sure if there will be any outage at all.
If you are an admin of any of the VMs in the list and you want the VM to be
reallocated into other servers previous to the operation, please get in touch
with us as soon as possible. Remember that, right now, reallocating the VM to
other server means shutting down the VM briefly.
Here is a list of affected virtual machines:
cloudvirt1028.eqiad.wmnet:
af-puppetdb01.automation-framework.eqiad.wmflabs
bastion-eqiad1-02.bastion.eqiad.wmflabs
fridolin.catgraph.eqiad.wmflabs
cloud-puppetmaster-02.cloudinfra.eqiad.wmflabs
cloudstore-dev-01.cloudstore.eqiad.wmflabs
commtech-nsfw.commtech.eqiad.wmflabs
clm-test-01.community-labs-monitoring.eqiad.wmflabs
cyberbot-exec-iabot-01.cyberbot.eqiad.wmflabs
deployment-db05.deployment-prep.eqiad.wmflabs
deployment-memc05.deployment-prep.eqiad.wmflabs
deployment-sca01.deployment-prep.eqiad.wmflabs
deployment-pdfrender02.deployment-prep.eqiad.wmflabs
ign.ign2commons.eqiad.wmflabs
integration-slave-docker-1050.integration.eqiad.wmflabs
integration-castor03.integration.eqiad.wmflabs
api.openocr.eqiad.wmflabs
osmit-umap.osmit.eqiad.wmflabs
builder-envoy.packaging.eqiad.wmflabs
jmm-buster.puppet.eqiad.wmflabs
a11y.reading-web-staging.eqiad.wmflabs
adhoc-utils01.security-tools.eqiad.wmflabs
util-abogott-stretch.testlabs.eqiad.wmflabs
canary1028-01.testlabs.eqiad.wmflabs
stretch.thumbor.eqiad.wmflabs
tools-worker-1023.tools.eqiad.wmflabs
tools-proxy-04.tools.eqiad.wmflabs
tools-docker-builder-06.tools.eqiad.wmflabs
tools-sgewebgrid-generic-0904.tools.eqiad.wmflabs
tools-sgeexec-0942.tools.eqiad.wmflabs
tools-sgeexec-0941.tools.eqiad.wmflabs
tools-sgeexec-0940.tools.eqiad.wmflabs
tools-sgeexec-0939.tools.eqiad.wmflabs
tools-sgeexec-0937.tools.eqiad.wmflabs
tools-sgeexec-0929.tools.eqiad.wmflabs
tools-sgeexec-0921.tools.eqiad.wmflabs
tools-sgeexec-0920.tools.eqiad.wmflabs
tools-sgeexec-0911.tools.eqiad.wmflabs
tools-sgeexec-0909.tools.eqiad.wmflabs
toolsbeta-proxy-01.toolsbeta.eqiad.wmflabs
vconverter-instance.videowiki.eqiad.wmflabs
perfbot.webperf.eqiad.wmflabs
wdhqs-1.wikidata-history-query-service.eqiad.wmflabs
cloudvirt1014.eqiad.wmnet:
commonsarchive-prod.commonsarchive.eqiad.wmflabs
deployment-imagescaler03.deployment-prep.eqiad.wmflabs
dumps-5.dumps.eqiad.wmflabs
dumps-4.dumps.eqiad.wmflabs
incubator-mw.incubator.eqiad.wmflabs
webperformance.integration.eqiad.wmflabs
saucelabs-01.integration.eqiad.wmflabs
integration-puppetmaster01.integration.eqiad.wmflabs
maps-puppetmaster.maps.eqiad.wmflabs
maps-wma.maps.eqiad.wmflabs
mwoffliner3.mwoffliner.eqiad.wmflabs
mwoffliner1.mwoffliner.eqiad.wmflabs
phlogiston-5.phlogiston.eqiad.wmflabs
discovery-testing-01.shiny-r.eqiad.wmflabs
snuggle-enwiki-01.snuggle.eqiad.wmflabs
canary-1014-01.testlabs.eqiad.wmflabs
tools-sgeexec-0901.tools.eqiad.wmflabs
wdqs-test.wikidata-query.eqiad.wmflabs
Toolforge won't be affected by this operation.
You can read more details about the datacenter operation itself in phabricator [1].
Sorry for the short notice,
regards.
[0] Cloud Services: reallocate workload from rack B5-eqiad
https://phabricator.wikimedia.org/T223148
[1] Install new PDUs into b5-eqiad https://phabricator.wikimedia.org/T223126
--
Arturo Borrero Gonzalez
Operations Engineer / Wikimedia Cloud Services
Wikimedia Foundation
As always, 70% of this conference is about building fresh, new clouds
rather than existing use-cases. That made for a very slow start on the
first day, but there were some interesting bits later on. Mark
Shuttleworth gave a brief talk where he re-affirmed Ubuntu's commitment
to supporting OpenStack and K8s in the long-term, and then scolded
attendees for getting distracted by (unspecified) shiny new things
rather than focusing on the fundamentals. I'm not really sure what that
was about but it was nice to hear someone assert that they still think
OpenStack is fundamental to the future of cloud tech.
The following is largely notes for my future self, but Brooke might be
interested in reading up about Rook.
Ceph/Rook:
Everyone is using ceph! Everyone also talks a lot about how hard it is
to deploy. There's a fair amount of buzz around 'Rook' which is a ceph
deployment/management system that we might want to consider. As I
understand it, you set up a k8s cluster with host networking on all of
your OSD nodes, and then Rook dumps a pod on each node which implements
the ceph services. Plenty of people are claiming that it works great,
and I think it supports rolling upgrades so that might be something to
consider instead of a bare puppet-and-debian-package deployment.
Deployment/package management:
There are lots of ways to deploy! Openstack on k8s, openstack on
openstack, openstack in containers pushed out by ansible, etc. etc.
Almost all of these assume that 1) you're starting from scratch and 2)
you want/have ironic control of bare metal. I spent a while thinking
that we should set up a k8s cluster and deploy openstack services
there... 'airship' might support that model (and it would line up with
using Rook to manage the ceph cluster) but I'm not sure that I'm not
just looking for a problem to solve when we don't really have one.
The one thing that might be useful for us is grabbing the kolla project
packages and deploying on simple standalone docker instances... that
would get us out of our current packaging hell. Assuming we don't ever
want to patch the projects, this might be a decent alternative to
deploying from source.
Designate:
The (two) designate developers are still alive and working on the
project. Development is very slow-paced right now, which is mostly good
for us because it means fewer headaches during upgrades :) Mugsie (the
PTL) switched jobs but says he still has someone paying him to work on
the project part-time, so there's no immediate danger of the project
dying off.
The Designate folks think that we should keep using designate-sink until
we're running version O. Then we can switch to the proper REST-based
neutron integration code for creating/deleting records on VM creation
and deletion. We'll want to write our own custom Neutron plugin to
replace the default one in order to replace the custom code that's
currently running in Sink.
The bad news is that the one feature I really want (the ability to share
.wmflabs.org between multiple tenants) is on the back-burner for the
moment. If money and staff dropped into our lap it might be nice for us
to get some contractor dollars devoted to someone working on that
(partly because I feel like we're a free-rider on the project and it
seems starved for resources).
Keystone:
The keystone upstream is finally implementing system-wide scope for
roles, which means that eventually we'll be able to give the 'observer'
users a system-wide scope rather than having to add it to every single
project. They're also in the process of standardizing on a true
project-admin policy which would let us get rid of some of our hacks
that allow project admins to add members to their own projects but not
others.
Of course, none of that is really useful until other projects have also
adopted these concepts, so we won't see any real gains until T or U.