[QA] Two big improvements for flaky tests

Jon Robson jrobson at wikimedia.org
Fri Jul 25 18:00:14 UTC 2014


Great news Chris! This is becoming a really useful and important
platform and I'm glad we're doing all that is possible to make it as
stable as possible :).

On Fri, Jul 25, 2014 at 10:49 AM, Arthur Richards
<arichards at wikimedia.org> wrote:
> Wonderful news, thank you Chris :)
>
> Is https://bugzilla.wikimedia.org/show_bug.cgi?id=68465 related to the db
> issues that are now resolved, or is that a different issue?
>
>
> On Fri, Jul 25, 2014 at 9:29 AM, Chris McMahon <cmcmahon at wikimedia.org>
> wrote:
>>
>>
>> This month we've made two big improvements that should greatly reduce the
>> number of flaky test failures in all of the builds.
>>
>> For one thing, the MobileFrontend test that used to walk through a dozen
>> steps to protect a page in the UI and then logout, is now using the API to
>> protect the test page, and not logging out prevents flaky failures when
>> another test that expects to be logged-in unexpectedly gets logged out.  We
>> have seen this before in VisualEditor tests in particular.
>>
>> Another thing is that we should no longer see the beta labs db become
>> read-only.  Some time ago we had requests from the Language team to be able
>> to test in situations where there is replication lag between master and
>> slave dbs.  At the same time, the Flow team wanted to be able to test
>> deploying in a situation with master and slave dbs. So we created a slave db
>> for beta labs.
>>
>> What we only discovered recently is that there is a process that monitors
>> the replication lag time between master and slave dbs, and when that time
>> goes over 5 seconds, the db is set to read-only. This is fine in production,
>> but it was causing a lot of problems in beta labs.  We have now set the
>> value for replication lag to 5 minutes instead of 5 seconds, and this has
>> stopped a number of kinds of faillures:
>>
>> * the explicit "database is read-only" failure from the VisualEditor tests
>> * the generic "Save failed" failure message in MobileFrontend tests
>> * the unexplained 90-second timeouts in Flow tests
>>
>> Sorry for the inconvenience, but things should be much improved from now
>> on.
>> -Chris
>>
>> _______________________________________________
>> QA mailing list
>> QA at lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/qa
>>
>
>
>
> --
> Arthur Richards
> Team Practices Manager
> [[User:Awjrichards]]
> IRC: awjr
> +1-415-839-6885 x6687
>
> _______________________________________________
> QA mailing list
> QA at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/qa
>



More information about the QA mailing list