Hello,
labsdb1011 has kept lagging behind during the weekend. I have depooled it,
and I will reshuffle weights again to get labsdb1010 to help more on
analytics rather than web service once labsdb1011 is back in syn.
Manuel.
On Fri, Oct 2, 2020 at 3:16 PM Manuel Arostegui <marostegui(a)wikimedia.org>
wrote:
Hello,
I have pushed this
https://gerrit.wikimedia.org/r/c/operations/puppet/+/631768 as labsdb1011
is starting to lag again on s4. There were some heavy queries there...let's
see how it goes during the weekend.
Manuel.
On Fri, Oct 2, 2020 at 8:00 AM Manuel Arostegui <marostegui(a)wikimedia.org>
wrote:
> Hello,
>
> Both hosts are back in sync
>
> Manuel.
>
> On Thu, Oct 1, 2020 at 7:19 AM Manuel Arostegui <marostegui(a)wikimedia.org>
> wrote:
>
>> Hello,
>>
>> Labsdb1011 has recovered, I have repooled it.
>> Labsdb1010 is lagging a bit behind, but I am going to repool it with its
>> normal weight, and keeping the query killer to 1800 seconds until it fully
>> recovers from helping labsdb1011.
>>
>> Manuel.
>>
>> On Wed, Sep 30, 2020 at 7:27 AM Manuel Arostegui <
>> marostegui(a)wikimedia.org> wrote:
>>
>>> Hello,
>>>
>>> This is a heads up about the current situation with s4 (commons) and
>>> labsdb.
>>>
>>> There's been more activity lately on s4, and that had made labsdb1011
>>> (analytics role) start lagging behind.
>>>
>>>
https://grafana.wikimedia.org/d/000000273/mysql?viewPanel=6&orgId=1&…
>>>
>>> I have tried to ease its weight a couple of days ago, to help it
>>> recovering:
>>>
https://gerrit.wikimedia.org/r/c/operations/puppet/+/630392
>>>
https://gerrit.wikimedia.org/r/c/operations/puppet/+/630531
>>>
https://gerrit.wikimedia.org/r/c/operations/puppet/+/630770
>>>
>>> The last change has (as sort of expected) made labsdb1010 lag:
>>>
>>>
https://grafana.wikimedia.org/d/000000273/mysql?viewPanel=6&orgId=1&…
>>>
>>> I am going to decrease the pt-kill query time from 3600 to 1800 to see
>>> if that helps labsdb1010 to guard the fort a bit.
>>>
>>> There's not much else we can do at the moment, but just keep all these
>>> issues in mind if people complain about lag on s4 (commons) on the
>>> analytics role.
>>> The web role is doing fine (labsdb1009 isn't lagging).
>>>
>>> Manuel.
>>>
>>