Hello,
I have repooled labsdb1011, but I am sure it will get lagged again, so I am
leaving the weights this way:
web:
labsdb1009: 2
labsdb1010: 1
analytics
labsdb1011: 1
labsdb1010: 1
So labsdb will serve less on web service and will help equally on analytics.
WMCS, I am not sure if you are receiving any of these emails (as they are
sent to your admin list, maybe I am being moderated?), but any thoughts on
all this?
Thanks,
Manuel.
On Mon, Oct 5, 2020 at 7:51 AM Manuel Arostegui <marostegui(a)wikimedia.org>
wrote:
Hello,
labsdb1011 has kept lagging behind during the weekend. I have depooled it,
and I will reshuffle weights again to get labsdb1010 to help more on
analytics rather than web service once labsdb1011 is back in syn.
Manuel.
On Fri, Oct 2, 2020 at 3:16 PM Manuel Arostegui <marostegui(a)wikimedia.org>
wrote:
> Hello,
>
> I have pushed this
>
https://gerrit.wikimedia.org/r/c/operations/puppet/+/631768 as
> labsdb1011 is starting to lag again on s4. There were some heavy queries
> there...let's see how it goes during the weekend.
> Manuel.
>
> On Fri, Oct 2, 2020 at 8:00 AM Manuel Arostegui <marostegui(a)wikimedia.org>
> wrote:
>
>> Hello,
>>
>> Both hosts are back in sync
>>
>> Manuel.
>>
>> On Thu, Oct 1, 2020 at 7:19 AM Manuel Arostegui <
>> marostegui(a)wikimedia.org> wrote:
>>
>>> Hello,
>>>
>>> Labsdb1011 has recovered, I have repooled it.
>>> Labsdb1010 is lagging a bit behind, but I am going to repool it with
>>> its normal weight, and keeping the query killer to 1800 seconds until it
>>> fully recovers from helping labsdb1011.
>>>
>>> Manuel.
>>>
>>> On Wed, Sep 30, 2020 at 7:27 AM Manuel Arostegui <
>>> marostegui(a)wikimedia.org> wrote:
>>>
>>>> Hello,
>>>>
>>>> This is a heads up about the current situation with s4 (commons) and
>>>> labsdb.
>>>>
>>>> There's been more activity lately on s4, and that had made
labsdb1011
>>>> (analytics role) start lagging behind.
>>>>
>>>>
https://grafana.wikimedia.org/d/000000273/mysql?viewPanel=6&orgId=1&…
>>>>
>>>> I have tried to ease its weight a couple of days ago, to help it
>>>> recovering:
>>>>
https://gerrit.wikimedia.org/r/c/operations/puppet/+/630392
>>>>
https://gerrit.wikimedia.org/r/c/operations/puppet/+/630531
>>>>
https://gerrit.wikimedia.org/r/c/operations/puppet/+/630770
>>>>
>>>> The last change has (as sort of expected) made labsdb1010 lag:
>>>>
>>>>
https://grafana.wikimedia.org/d/000000273/mysql?viewPanel=6&orgId=1&…
>>>>
>>>> I am going to decrease the pt-kill query time from 3600 to 1800 to see
>>>> if that helps labsdb1010 to guard the fort a bit.
>>>>
>>>> There's not much else we can do at the moment, but just keep all
these
>>>> issues in mind if people complain about lag on s4 (commons) on the
>>>> analytics role.
>>>> The web role is doing fine (labsdb1009 isn't lagging).
>>>>
>>>> Manuel.
>>>>
>>>