Hello,
Pretty much everyone who's dealt with creating views for new wikis on the
labs hosts have experienced issues with "Access denied" sometimes.
This was usually due to the MariaDB grant role being missed. We tried to
workaround this by including the grant addition on the maintain-views
script.
Unfortunately, we ran into very weird problems when doing so and this is an
example: https://phabricator.wikimedia.org/T193187#4273281
After lots of back and forth we decided to create a bug to MariaDB (
https://jira.mariadb.org/browse/MDEV-16466) which was confirmed by MariaDB
yesterday and pointed to a similar issue (
https://jira.mariadb.org/browse/MDEV-14732).
The expected fix will come in 10.4 (we are in 10.1), so quite long ahead of
us.
So, for now, the workaround before adding new views is to manually add the
GRANT on the DB and then run the script:
GRANT SELECT, SHOW VIEW ON `newiki\_p`.* to labsdbuser';
Hopefully with this email everyone is on the same page now.
Thanks everyone (specially Brooke for helping me out with the
troubleshooting!)
Manuel.
Hi,
I'm opening this email thread to try to get an overview of IPv6 in CloudVPS.
Several times I've heard that Openstack Mitaka wasn't the most
appropriate version to start doing IPv6.
However, I've read the config docs [0] and I didn't detect any major
issues at first.
@Chase, do you remember the issues you found?
Sooner or later, we will have to handle IPv6 in CloudVPS (and in
Toolforge), so a plan could be:
* design the ideal IPv6 model [1], with agreement from the main SRE team
* start doing tests in the labtestn deployment in codfw
* evaluate how this could be added to eqiad1 incrementally
Even if we must forget about IPv6 in Mitaka, we could start thinking on
the ideal model now.
[0] https://docs.openstack.org/mitaka/networking-guide/config-ipv6.html
[1]
https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Neutron_ideal_mo…
I can see this is still riding high and at least some of it is likely
because of file operations I have going there. I had hoped woudl be done
today and it's not as far along as I would like. Difficult to estimate as
there is a tar+scp happening and I'm not sure how much benefit compression
is but it's sure slowing things down. Let me know if this is an issue, I'm
keeping an eye on it and hoping it's done today.
On Wed, Dec 12, 2018 at 6:30 AM <nagios(a)icinga1001.wikimedia.org> wrote:
> Notification Type: PROBLEM
>
> Service: High load average
> Host: labstore1007
> Address: 208.80.155.106
> State: CRITICAL
>
> Date/Time: Wed Dec 12 12:30:38 UTC 2018
>
> Notes URLs: https://grafana.wikimedia.org/dashboard/db/labs-monitoring
>
> Additional Info:
>
> CRITICAL: 100.00% of data above the critical threshold [24.0]
>
--
Chase Pettet
chasemp on phabricator <https://phabricator.wikimedia.org/p/chasemp/> and
IRC
Hello,
Like any large project, we have a lot of old tasks/tickets/issues
opened and it's hard to know what is still relevant, considering that a
lot has changed, or there hasn't been anyone to work on them, etc.
To keep things under control, some open source projects have adopted a
time limit on how long these can stay open. I would like to propose we
do the same.
A soft threshold would trigger a warning being added to the task
saying it's going stale. After a hard threshold is reached, the task
would be closed.
This doesn't mean someone can't re-open the task and we could also
have a special tag preventing the stale mechanism to activate on it.
Any thoughts?
Thank you
--
Giovanni Tirloni
Operations Engineer
Wikimedia Cloud Services
Hi,
I just finished converting the remaining tools to use cron and removed
BigBrother from Toolforge. The docs have been updated to reflect this.
If you notice any issues, please let me know.
Regards,
--
Giovanni Tirloni
Operations Engineer
Wikimedia Cloud Services
On 12/6/18 9:16 PM, Andrew Bogott wrote:
> I recently noticed that some of our standard kvm/nova monitoring never
> got copied over from the labvirt puppet code to the cloudvirt puppet
> code. Tomorrow I will merge
> https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/478113/ to fix that.
>
> Once that patch is merged, icinga will be a bit touchier on the
> cloudvirts. In particular, it will alert for any cloudvirt that has 0
> VMs running on it. (This turns out to be a useful thing to watch for
> because we've had cases where every single kvm process died at once.)
>
> So, all 'idle' cloudvirts should nonetheless have a canary instance. For
> example, on the new analytics cloudvirts I created canaries like this:
>
> $ OS_PROJECT_ID=testlabs openstack server create --image
> 7c6371d1-8411-48c7-bf73-2ef6d6ff2a15 --flavor m1.small --nic
> net-id=7425e328-560c-4f00-8e99-706f3fb90bb4 --availability-zone
> host:cloudvirtan1004 canary-an1004-01
>
> Once a virt host is in full service we can leave the canaries there or
> delete them -- there hasn't been any real consistent policy there.
Thanks for the heads up and the example command.
I think it makes sense to have a canary per cloudvirt. It does mean they
are OSes that need to be updated and maybe ignored in metrics
collection, but the annoyance should be minimal. It would be good to
have a barebones OS image for them but I'd consider that a very low
priority.
--
Giovanni Tirloni
Operations Engineer
Wikimedia Cloud Services