2018-01-16 20:00:02,777 INFO force is enabled
2018-01-16 20:00:02,806 INFO removing tools-project-backup
2018-01-16 20:00:02,847 INFO removing tools-project-backup
2018-01-16 20:00:03,503 INFO creating tools-project-backup at 2T
2018-01-16 20:00:04,253 INFO force is enabled
2018-01-16 20:00:04,270 INFO removing tools-snap
2018-01-16 20:00:04,321 INFO removing tools-snap
2018-01-16 20:00:05,859 INFO creating tools-snap at 1T
Hello Cloud Admins!
As part of https://phabricator.wikimedia.org/T174569 we have to alter some
big tables.
One of them is logging, which, for instance, in wikidata takes around 8h.
Which is the shard I am currently working on.
Because of the nature of the change (some columns being added) and ROW
based replication (what we use in sanitariums) this change needs to be done
with replication (from sanitarium, or their masters, to the labs servers).
This will obviously generate lag and if not done that way, it will break
replication till the column is added on the labs hosts, and this is less
desirable than replication lag.
I am planning to run the alter probably tomorrow or Monday (I will notify
when I start it) for the sanitarium host in s5, that means that there will
be lag on the labs servers, for a few hours, on the s5 instance (which will
also affect s1 and s3 because we are using the same replication thread for
those shards too - which is a FIXME we have pending).
s2, s4, s6 and s7 will remain unaffected as they have their own replication
thread.
Should you have any questions, let me know!
Thanks
Manuel.
#
# Please NOTE: I didn't properly check for icinga/shinken alerts.
# See 16/Jan/2018
#
09/Jan/2018
* No icinga/shinken alerts
* Checked grafana boards. Nothing special.
10/Jan/2018
* No icinga/shinken alerts
* Checked grafana boards.
** Low space warning for tools-worker-1020 -> T184604
* Approved toolforge access request 210
* Added docs to wikitech about this:
https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin#Users_and_commun…
and updated on-call docs as well.
* Triage tasks: recently opened tasks, like T184500 and T184566, they
are already in development
11/Jan/2018
* no icinga/shinken alerts
* no toolforge access requests
* checked grafana boards. Nothing special. Closed T184604
* #wikimedia-cloud: Niharika missing mediawiki_vagrant puppet class.
Force puppet run.
* lots of rootspam from failed GE jobs, related to reboots
12/Jan/2018
* no icinga/shinken alerts
* no toolforge access requests
* checked grafana boards. Nothing special
* #wikimedia-cloud: Lokal_Profil about OAuth
* created next week etherpad
* Bryan accepted toolforge access request: 211
* #wikimedia-cloud: anna____ about port redir for a toolsforge server
13/Jan/2018
* Mostly quiet IRC channels
14/Jan/2018
* Mostly quiet IRC channels
15/Jan/2018
* no icinga/shinken alerts
* no toolforge access requests
* checked grafana boards. Nothing special
* Accepted 2 toolforge access requests: 212 & 213
* No TechOps meeting due to bank holiday in USA
* Processed my own quota increase request for 'aborrero-test'
16/Jan/2018
* We discovered that I don't see any icinga/shinken alerts because:
- something is wrong with the shinken IRC bot
- I'm not configured to recv icinga emails
2018-01-10 20:00:02,908 INFO force is enabled
2018-01-10 20:00:02,958 INFO removing misc-project-backup
2018-01-10 20:00:03,058 INFO removing misc-project-backup
2018-01-10 20:00:03,639 INFO creating misc-project-backup at 2T
2018-01-10 20:00:04,534 INFO force is enabled
2018-01-10 20:00:04,581 INFO removing misc-snap
2018-01-10 20:00:04,616 INFO removing misc-snap
2018-01-10 20:00:05,057 INFO creating misc-snap at 1T
2018-01-09 20:00:02,750 INFO force is enabled
2018-01-09 20:00:02,789 INFO removing tools-project-backup
2018-01-09 20:00:02,855 INFO removing tools-project-backup
2018-01-09 20:00:03,509 INFO creating tools-project-backup at 2T
2018-01-09 20:00:04,297 INFO force is enabled
2018-01-09 20:00:04,313 INFO removing tools-snap
2018-01-09 20:00:04,369 INFO removing tools-snap
2018-01-09 20:00:06,417 INFO creating tools-snap at 1T
Mon:
* Private email convo with Marc Miquel about using Toolforge
Tue:
* Private email convo with Marc Miquel about using Toolforge
Wed:
* made some projects that we approved in the team meeting
* deployed striker update
Thur:
* worked on things for the s8 migration
* worked on things for dropping old wikis from the replicas
Fri:
* Helped Zhuyifei get a stack trace of a stuck python job
* toolforge membership requests
* T181925 -- drop some wiki replica database views
Mon:
* TechOps meeting
** team change public announce to mailing lists today(?)
** Meltdown Trusty kernel may get delayed. Having seg faults on bare
metal. Mortiz looking at a 4.4 LTS patch
*** no QEMU permanent fix with upstreams yet
*** Perf impact discussion... does OpenStack expose PCID to guest VMs?
** Icinga bot was busted for several days (concurrency bug)
** s8 failover coming 2018-01-09 (DONE now)
** Puppet4 agent for Trusty rolled out in prod
** Hardware refresh Q3 goal has a couple of WMCS interlocks
** Wikitech to S3 on DBA list of things post-s8
** Prod "solved" the FERM + k8s problem by moving to Calico for the SDN
* meta_p needed to be rebuilt one more time --
https://phabricator.wikimedia.org/T184433
Tue:
* ran wikireplica_dns to add s8 stuff
* ran sudo /usr/local/sbin/maintain-meta_p --all-databases for s8
Bryan
--
Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org>
[[m:User:BDavis_(WMF)]] Manager, Cloud Services Boise, ID USA
irc: bd808 v:415.839.6885 x6855
Hello everyone
Happy new year!
This is a reminder.
Next Tuesday 9th January at 6:00AM UTC we will have a read only time on s5
for 30 minutes (https://phabricator.wikimedia.org/T181645 ), to proceed and
split wikidata onto its own hardware and make the new s8 shard live (
https://phabricator.wikimedia.org/T177208).
Amir and Katie have kindly agreed to help us with testing and any possible
troubleshooting from the code side.
Communication and coordination will happen on #wikimedia-operations if
anyone else is around and willing to provide another pair of eyes, that
would be, of course, much appreciated!
Thanks
Jaime, Manuel
2018-01-03 20:00:02,999 INFO force is enabled
2018-01-03 20:00:03,040 INFO removing misc-project-backup
2018-01-03 20:00:03,140 INFO removing misc-project-backup
2018-01-03 20:00:03,520 INFO creating misc-project-backup at 2T
2018-01-03 20:00:04,329 INFO force is enabled
2018-01-03 20:00:04,346 INFO removing misc-snap
2018-01-03 20:00:04,410 INFO removing misc-snap
2018-01-03 20:00:04,684 INFO creating misc-snap at 1T
2018-01-02 20:00:02,518 INFO force is enabled
2018-01-02 20:00:02,565 INFO removing tools-project-backup
2018-01-02 20:00:02,666 INFO removing tools-project-backup
2018-01-02 20:00:03,380 INFO creating tools-project-backup at 2T
2018-01-02 20:00:04,207 INFO force is enabled
2018-01-02 20:00:04,236 INFO removing tools-snap
2018-01-02 20:00:04,280 INFO removing tools-snap
2018-01-02 20:00:06,008 INFO creating tools-snap at 1T