Hello Cloud Admins!
As part of https://phabricator.wikimedia.org/T174569 we have to alter some
big tables.
One of them is logging, which, for instance, in wikidata takes around 8h.
Which is the shard I am currently working on.
Because of the nature of the change (some columns being added) and ROW
based replication (what we use in sanitariums) this change needs to be done
with replication (from sanitarium, or their masters, to the labs servers).
This will obviously generate lag and if not done that way, it will break
replication till the column is added on the labs hosts, and this is less
desirable than replication lag.
I am planning to run the alter probably tomorrow or Monday (I will notify
when I start it) for the sanitarium host in s5, that means that there will
be lag on the labs servers, for a few hours, on the s5 instance (which will
also affect s1 and s3 because we are using the same replication thread for
those shards too - which is a FIXME we have pending).
s2, s4, s6 and s7 will remain unaffected as they have their own replication
thread.
Should you have any questions, let me know!
Thanks
Manuel.
I was shuffling a meeting for next week around today and noticed that
only Arturo and I have working hours set in our Google Calendar
profiles. I think it would be useful for everyone to configure their
calendar to show their comfortable working hours to make it easier for
me and others to set meetings without doing a lot of mental math about
if it is too early or too late for someone to join. You do this by
clicking the "cog" icon in the gcal UI, selecting 'Settings' and then
'General > Scheduling' in the left hand navigation menu.
Bryan
--
Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org>
[[m:User:BDavis_(WMF)]] Manager, Cloud Services Boise, ID USA
irc: bd808 v:415.839.6885 x6855
2017-12-27 20:00:02,830 INFO force is enabled
2017-12-27 20:00:02,867 INFO removing misc-project-backup
2017-12-27 20:00:02,934 INFO removing misc-project-backup
2017-12-27 20:00:03,497 INFO creating misc-project-backup at 2T
2017-12-27 20:00:04,344 INFO force is enabled
2017-12-27 20:00:04,385 INFO removing misc-snap
2017-12-27 20:00:04,420 INFO removing misc-snap
2017-12-27 20:00:04,697 INFO creating misc-snap at 1T
2017-12-26 20:00:02,890 INFO force is enabled
2017-12-26 20:00:02,941 INFO removing tools-project-backup
2017-12-26 20:00:03,041 INFO removing tools-project-backup
2017-12-26 20:00:03,705 INFO creating tools-project-backup at 2T
2017-12-26 20:00:04,456 INFO force is enabled
2017-12-26 20:00:04,521 INFO removing tools-snap
2017-12-26 20:00:04,554 INFO removing tools-snap
2017-12-26 20:00:06,902 INFO creating tools-snap at 1T
2017-12-20 20:00:02,452 INFO force is enabled
2017-12-20 20:00:02,485 INFO removing misc-project-backup
2017-12-20 20:00:02,551 INFO removing misc-project-backup
2017-12-20 20:00:03,156 INFO creating misc-project-backup at 2T
2017-12-20 20:00:03,908 INFO force is enabled
2017-12-20 20:00:03,940 INFO removing misc-snap
2017-12-20 20:00:04,006 INFO removing misc-snap
2017-12-20 20:00:04,513 INFO creating misc-snap at 1T
2017-12-19 20:00:02,577 INFO force is enabled
2017-12-19 20:00:02,598 INFO removing tools-project-backup
2017-12-19 20:00:02,642 INFO removing tools-project-backup
2017-12-19 20:00:03,323 INFO creating tools-project-backup at 2T
2017-12-19 20:00:04,134 INFO force is enabled
2017-12-19 20:00:04,166 INFO removing tools-snap
2017-12-19 20:00:04,216 INFO removing tools-snap
2017-12-19 20:00:06,378 INFO creating tools-snap at 1T
I don't think I ever sent these out to the group. :/
== 2017-11-28 - 2017-12-04 ==
Tues:
* Superyetkin irc about error.log output
* Superyetkin irc about user dbs
* Created massmessage project
Wed:
* Paladox spotted a Puppet problem caused by refactoring; Andrew fixed it
* Talked to matanya about his collaboration project request
* Created collaboration project
* Talked to matanya about ssh issues connecting to tools-login (looks
like routing)
Thur:
* Fixed a bug in stashbot that anomie noticed
* Superyetkin and https://phabricator.wikimedia.org/P6409
* Krinkle and https://phabricator.wikimedia.org/T181742
* Quota reduction -- https://phabricator.wikimedia.org/T177299
* Pinged for a 2FA removal -- https://phabricator.wikimedia.org/T181475
* Verified and closed https://phabricator.wikimedia.org/T176043
* Made a maintenance script to attach LDAP accounts to wikitech for
https://phabricator.wikimedia.org/T180813
* Attached Flominator and set email to match SUL account
* Triaged some random bugs in our backlog
* Made some traffic report dumps:
** https://phabricator.wikimedia.org/P6413
** https://phabricator.wikimedia.org/P6414
Fri:
* Phab triage
* Worked on https://phabricator.wikimedia.org/T171417
Sun:
* Crontab encoding -- https://phabricator.wikimedia.org/T181948
== 2017-12-05 - 2017-12-11 ==
Tues-Sun:
* offsite + kubecon; didn't keep notes or do much
Mon:
* looking at tools-static cndjs disk space --
https://phabricator.wikimedia.org/T182604
TechOps:
* Code stewardship reviews (previously called "Sunsetting") is now
live: https://www.mediawiki.org/wiki/Code_stewardship_reviews
** Faidon has a few things he is thinking about proposing for review
(e.g. irc feed which would be a thing that Toolforge users care about)
* Singapore DC racking done (except for serial console which was DOA hardware)
** Planning for full turn up in Q3
* k8s 1.7.10 upgrade in progress by Alex
* s8 is "done" but rolling out in Q3 (2018-01-09)
* All prod puppetmasters on Puppet4; Keith working on clients (small
bug working on now); PuppetDB will not be upgraded this quarter
** Puppet4 syntax should be ok to use in ops/puppet now
* Ganeti servers are having some issues
<https://phabricator.wikimedia.org/T181121>
* netbox should have all the racktables data; test it if you'd like
* Q3 draft goals -- https://etherpad.wikimedia.org/p/TechOps-goals-FQ3-FY1718
** Asia DC
** Varnish 5 for all clusters
** Continue DB backups program
** Multi-DC: deploy Etcd/conftool for MediaWiki
** esams cleanup (would span into Q4)
** Q3 procurement/refresh/expansion
*** Try to get all racked hardware into service
** Puppet4 completion (PuppetDB) (and look at Puppet5? needs work to flesh out)
** Streamlined Services: 1 prod service on k8s
** infrastructure monitoring (needs details)
Bryan
--
Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org>
[[m:User:BDavis_(WMF)]] Manager, Cloud Services Boise, ID USA
irc: bd808 v:415.839.6885 x6855
*Original:* https://etherpad.wikimedia.org/p/kubecon-2017-offsite-agenda
*Archived on office wiki:*
https://office.wikimedia.org/wiki/Wikimedia_Cloud_Services/Offsite_Notes/ku…
*Decided:*
- Stick with Trusty through Neutron migration (for now as we think we are
making enough progress on this to ensure Trusty sunset by April 2019.
Xenial seems to have Mitaka so if we have to potentially we can match
mitaka there with Trusty for a migration of OpenStack releases across
releases but that's work we don't want to do and we need to settle on a
distro (see: figure out deployment methodology))
- https://phabricator.wikimedia.org/T166845 to be done via cumin for now
(long term prometheus?)
- draw.io is a canonical tool
- Dumps work is a carry over goal
- Neutron will be a carry over goal but hopefully not a literal one
*Open Near Term:*
- Neutron Plan: talk about the naming of deployents
- Need to do hard capacity thinking on storage scaling and budgeting
- icon templates for draw.io
*Open Long(er) Term:*
- Need to figure out openstack components deploy methodology (containers,
source, distro packaging...)
- Is SLURM viable?
- kubeadm for Kubernetes deploy?
- Tools Bastion and resource issues
- Is there an upstream alternative that is viable for Quarry?
- How much do we fold into cloud-init?
- Do we use puppet standalone and virtualize the main cloud masters?
- Hiera for instances is a mess and needs to be rethought.
- Trial of paging duty cycles (while still taking advantage of our time
spread)
- How much of labtest is ready for parity testing?
- Document undoing standalone puppetmaster setup
*Missed because of time:*
-
- - puppet horizon interface future (
https://phabricator.wikimedia.org/T181551 and co)
-
- FUTURE IDEAS: NOT EXISTING
- - new TLDs for public and internal addresses: when and how to deploy
- - new ingress for HTTPS in VPS and Toolforge?
- - monitoring sanely
- - thinking about ceph
- - metrics for end-users - who uses my tools and how? (
https://phabricator.wikimedia.org/T178834 and co)
I would like to talk about the missed items if we can find a few minutes at
all hands (over dinner?)
--
Chase Pettet
chasemp on phabricator <https://phabricator.wikimedia.org/p/chasemp/> and
IRC