Following our latest team agreements regarding DNS, we have now 2 wikitech pages:
* https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/DNS
This includes information about the DNS domains we handle, and their setup.
* https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Designate
This includes specific information about how to operate the Openstack Designate
component. I refactored the content to make this seperation and to include the
information from the enhancement proposals.
Edits welcome.
regards.
--
Arturo Borrero Gonzalez
SRE / Wikimedia Cloud Services
Wikimedia Foundation
Hi there,
TL;DR: I brain-dumped a wiki page here:
https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/Enhanceme…
I hope I managed to write my ideas clear enough.
Points for debate:
* ownership of designate domains (cloudinfra vs wmflabsdotorg vs admin vs ..)
* service name of designate (ns0.openstack.eqiad1.wikimediacloud.org??)
* delegations, per-project subdomains etc
* all the details about the wmcloud.org subdomain
Hey Arturo, why this now??
I've been doing several operations to be able to set up a bastion and
puppetmaster in codfw1dev like we do in eqiad1. While at it, instead of setting
this up with the legacy domains, I step forward and have been playing with the
new domains. All this kung-fu allowed me to review the setup and identify
several points where we could introduce a bit more consistency and robustness.
The changes I've been doing in codfw1dev will eventually land in eqiad1, so
double win!
Some new stuff to try related to this follows.
* add this to your .ssh/config file:
=== 8< ===
Match user root host *.codfw1dev.wikimedia.cloud
User root
IdentityFile ~/.ssh/root_key
IdentitiesOnly yes
ForwardAgent no
ProxyCommand ssh -i ~/.ssh/root_key -a -W %h:%p
root(a)bastion-codfw1dev-01.codfw1dev.wmcloud.org
=== 8< ===
* try SSH!
user@laptop:~$ ssh
root(a)puppetmaster-01.cloudinfra-codfw1dev.codfw1dev.wikimedia.cloud
* this means we have the following 2 domains working:
- puppetmaster-01.cloudinfra-codfw1dev.codfw1dev.wikimedia.cloud
- bastion-codfw1dev-01.codfw1dev.wmcloud.org
Comments welcome.
regards!
--
Arturo Borrero Gonzalez
SRE / Wikimedia Cloud Services
Wikimedia Foundation
On 1/7/20 6:12 AM, Andrew Bogott wrote:
> We'll be upgrading the cloud services OpenStack install next Tuesday, beginning
> at 12:00 noon UTC
>
> The entire upgrade process may take an hour or two. Early on in the process,
> Horizon (and associated OpenStack APIs) will be disabled (probably for 20 to 30
> minutes.) There may also be brief network interruptions during the upgrade.
>
> Toolforge and existing VMs should be largely unaffected apart from possible
> network hiccups.
>
Reminder,
this will be happening in about 30 minutes!
regards.
--
Arturo Borrero Gonzalez
SRE / Wikimedia Cloud Services
Wikimedia Foundation
I was trying to figure out how to keep better track of hardware things
and realized that one thing we had not tried yet is a dedicated
workboard. I asked on irc and Brooke and Jason didn't immediately
think it was a horrible idea, so I moved ahead with an experiment:
https://phabricator.wikimedia.org/project/view/4482/
You can add things with the #wmcs-hardware tag. Y'all can also mess
with columns and ordering, but maybe check in with Jason before taking
any really radical actions as I expect he will be one of the main
consumers of this board as he interfaces with the DCOps folks.
One possibly surprising thing when using the new tag: I made
#wmcs-hardware a "milestone" of #cloud-services-team. This means that
when you add #wmcs-hardware Phabricator will remove
#cloud-services-team and of it's other milestones. #wmcs-kanban is
also a milestone, so a hardware task can be in one board or the other,
but not both!
Let's revisit this in our Q3 process retro and decide if we will keep
this new board of archive it and try something else.
Bryan
--
Bryan Davis Technical Engagement Wikimedia Foundation
Principal Software Engineer Boise, ID USA
[[m:User:BDavis_(WMF)]] irc: bd808
I just deployed the Openstack Train version of Horizon to
horizon.wikimedia.org. I've been testing it a fair bit in codfw1-dev
but it may yet exhibit bugs and/or new weird features that we don't
actually want to support. Please let me know if you see any weirdness
there.
So far there aren't any big advantages to the new code (I was hoping it
would be faster but if it is it's not by much) but at least one piece of
our stack is pretty much caught up with the upstream :)
-A
Hello Chase! I hope all is well with you and yours. I have a couple of
questions about networking which you may or may not have opinions or
thoughts about :)
We've been butting our heads against the inability of VMs in the -dev
cloud to talk to outside networks. When Jason and Arturo looked at ways
to open that up, they ran into several code comments from you expressing
unspecific worries about security concerns with allowing that traffic.
Do you remember what those concerns were? If it was just a matter of
'we don't need this anyway' then we might go ahead and allow that
traffic, but I want to make sure we aren't overlooking some grave danger.
Related to that -- it's clear that in the past there was an apt proxy
running someplace to allow labtest VMs to connect to apt repos. Do you
remember how that proxy was set up? I think it must not have been
puppetized because I can't find any traces of it in the git history for
the box it was surely running on. (Obviously this is moot if we open up
the routing.)
Thanks!
-A
Hi,
just FYI I saw a news email flying by some Debian Developers communicating that
the intention is to don't support Python2 *at all* in Debian 11 Bullseye (which
is Buster+1).
I know we still have Jessie, Stretch and Buster, but worth noting!
More info: https://wiki.debian.org/Python/2Removal
regards.
--
Arturo Borrero Gonzalez
SRE / Wikimedia Cloud Services
Wikimedia Foundation
Hi all,
I propose we consolidate Toolforge dynamicproxy documentation into this new
wikitech page:
https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Dynamicproxy
The source for the information in this new page comes mostly from 3 sources:
* https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin#WebProxy
* https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Webservice
* new content added
I've added 'consolidation proposal warning' to the affected pages.
This is part of my effort to better document the Toolforge networking, specially
the k8s part. In the new docs [0], when I was trying to add a link to
dynamicproxy I couldn't find a single page with all the content, and that's why
I'm proposing the factorization.
Of course you can modify the index and content organization in the new proposed
page. There is a lot of stuff going on with dynamicproxy and it deserves its
own wikipage I think!
If nobody raises strong concerns about this, I plan to do this doc consolidation
soon. CC @bstorm @bd808.
regards.
[0]
https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Networking_and_i…
--
Arturo Borrero Gonzalez
SRE / Wikimedia Cloud Services
Wikimedia Foundation
Hi there!
Next Monday 2019-10-28 @ 14:30 UTC we will do a maintenance operation on
Toolforge which consists in rebuilding the main front proxy [0] used to serve
webservices. We expect this to be done within a 30 minutes window.
The operation consists on replacing the old virtual machines supporting the
proxy (currently running Debian Jessie) with more modern instances running
Debian Buster. Both Grid/Kubernetes backends are affected by this change. We
don't expect a lot of service downtime, but there is a key point in the
operation which is migrating data stored in Redis which can be tricky. The o
Examples of things affected by this change:
* Browsing Toolforge webservices
* Browsing to https://tools.wmflabs.org/<toolname>
* Browsing to https://tools.wmflabs.org/admin/ (Toolforge landing page)
* Browsing PAWS (to some extent, since it shares part of the toolforge proxy)
Example of things not affected by this change:
* webservices backend operations
* SSH bastions
* grid queues, grid jobs
* wiki-replicas, toolsdb
* other CloudVPS projects
regards.
[0] https://phabricator.wikimedia.org/T235627
--
Arturo Borrero Gonzalez
SRE / Wikimedia Cloud Services
Wikimedia Foundation
On 10/21/19 9:49 PM, Brooke Storm wrote:
> With a redundant power supply upgrade going on this week in the datacenter that
> could affect the VM that Toolsdb runs on, we anticipate a brief outage Thursday
> 10/24 @11am UTC of the mysql service to protect data in case anything goes
> wrong. This may require a restart of a tool to reconnect to the database. We do
> not anticipate any worse disruptions, but if there is any disruption beyond what
> is planned, a failover may be necessary, which will not include the
> non-replicated tables mentioned
> here https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#ToolsDB_Backups…
>
> The maintenance requiring this notice and action is detailed
> here https://phabricator.wikimedia.org/T227540. The VM resides on the
> cloudvirt1019 hypervisor, which is why it is in scope.
>
> We sincerely apologize for the short notice.
>
Reminder, this is happening in a few minutes!
--
Arturo Borrero Gonzalez
SRE / Wikimedia Cloud Services
Wikimedia Foundation