[QA] Two big improvements for flaky tests

Chris McMahon cmcmahon at wikimedia.org
Fri Jul 25 16:29:34 UTC 2014


This month we've made two big improvements that should greatly reduce the
number of flaky test failures in all of the builds.

For one thing, the MobileFrontend test that used to walk through a dozen
steps to protect a page in the UI and then logout, is now using the API to
protect the test page, and not logging out prevents flaky failures when
another test that expects to be logged-in unexpectedly gets logged out.  We
have seen this before in VisualEditor tests in particular.

Another thing is that we should no longer see the beta labs db become
read-only.  Some time ago we had requests from the Language team to be able
to test in situations where there is replication lag between master and
slave dbs.  At the same time, the Flow team wanted to be able to test
deploying in a situation with master and slave dbs. So we created a slave
db for beta labs.

What we only discovered recently is that there is a process that monitors
the replication lag time between master and slave dbs, and when that time
goes over 5 seconds, the db is set to read-only. This is fine in
production, but it was causing a lot of problems in beta labs.  We have now
set the value for replication lag to 5 minutes instead of 5 seconds, and
this has stopped a number of kinds of faillures:

* the explicit "database is read-only" failure from the VisualEditor tests
* the generic "Save failed" failure message in MobileFrontend tests
* the unexplained 90-second timeouts in Flow tests

Sorry for the inconvenience, but things should be much improved from now
on.
-Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wikimedia.org/pipermail/qa/attachments/20140725/3b7e3391/attachment.html>


More information about the QA mailing list