[QA] [reading-wmf] Recently failing browser tests

Thu Jun 4 17:42:39 UTC 2015

Cc'ing QA so they can see my latest reply.

On Thu, Jun 4, 2015 at 10:42 AM, Jon Robson <jrobson at wikimedia.org> wrote:

> Gather tests take around 7 min 21s to run [1]
> Smoke tests 4 mins [2]
> These would be the initial target.
>
> Yesterday 3 patches got merged that had been sitting in Gerrit for approx
> 3 days that caused the test to break in both Gather and MobileFrontend.
> Had we run the smoke tests on these tests at some point during those 3
> days we would have been able to fix them pre-merging.
>
> Personally I am fine with the delay and running the test jobs one at a
> time. I see this as a low risk useful grass routes experiment by a
> smaller team that can be fed up to the rest of the Foundation and benefit
> all teams.  I don't see this as duplicating efforts rather than
> investigating something we know we all want but on a smaller scale.
>
> I could easily imagine having an unofficial convention that something can
> only be merged once the browser tests have been verified to pass by the bot.
>
> [1]
> https://integration.wikimedia.org/ci/view/Mobile/job/browsertests-Gather-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/144/testReport/
> [2]
> https://integration.wikimedia.org/ci/view/Mobile/job/browsertests-MobileFrontend-SmokeTests-linux-chrome-sauce/140/testReport/(root)/
>
> On Thu, Jun 4, 2015 at 10:34 AM, Dan Duvall <dduvall at wikimedia.org> wrote:
>
>> On Wed, Jun 3, 2015 at 7:58 PM, Bryan Davis <bd808 at wikimedia.org> wrote:
>>
>>> On Wed, Jun 3, 2015 at 8:09 PM, Jon Robson <jrobson at wikimedia.org>
>>> wrote:
>>> > The bits I need help on:
>>> > * I need a way to watch Gerrit for new patches, and then checkout those
>>> > patches and trigger my script. Anyone want to write it?
>>>
>>> If this is going to run in labs, the trivially easy thing to do would
>>> be to setup a vm as a Jenkins slave and let Zuul notify Jenkins to
>>> fire a custom job pinned to our slave. Alternately there is a redis
>>> feed of the commits that can be used to watch for things. That is what
>>> powers grrrit-wm and some other bots. I've never used it but Kunal
>>> could probably give us some quick pointers.
>>>
>>
>> Setup isn't the problem. The real challenges center around performance
>> and isolation: How do you plan to run a test suite that takes ~ 45 minutes
>> to complete for every commit to your repo on a single labs instance, and
>> ensure the level of isolation between concurrent runs that the suite will
>> require?
>>
>> In other words, there were 12 non-merge commits made yesterday to
>> MobileFrontend alone; that's around 9 hours of run time for the tests to
>> complete which, on a single instance, will have to be run in serial to
>> ensure the tests don't interfere with one another; that essentially
>> nullifies the more expedient feedback that you're hoping to gain.
>>
>> You could scale up the number of instances in the pool, but at that point
>> the ad hoc setup will be duplicating a large portion of our shared CI
>> infrastructure, which brings me to the next unanswered question: Who is
>> going to maintain this setup? I don't mean to be a naysayer here, but I
>> worry about the effects of implementing such a complex—and seemingly
>> volatile setup—without a clear understanding within Infrastructure or
>> Reading about its real value or long-term maintenance burden. Perhaps this
>> is misplaced anxiety, but this screams tech debt incarnate.
>>
>> It seems clear to everyone that the current setup is not ideal, which is
>> why we've been working hard to improve it.[1][2] In the meantime, wouldn't
>> it be wise not to duplicate our efforts but to dedicate time squashing the
>> elephant in the room: Fix, refactor, or delete the tests. Regardless of how
>> you hear that they are broken, 45 minutes after your commit is merged or 24
>> hours; fix them. Grab someone from RelEng if the backtrace is obscure or
>> utterly unintelligible, but fix them.
>>
>> [1]
>> http://www.mediawiki.org/wiki/Continuous_integration/Architecture/Isolation
>> [2] https://phabricator.wikimedia.org/T47499
>>
>> --
>> Dan Duvall
>> Automation Engineer
>> Wikimedia Foundation <http://wikimediafoundation.org>
>>
>> _______________________________________________
>> reading-wmf mailing list
>> reading-wmf at lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/reading-wmf
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/qa/attachments/20150604/d327d28e/attachment.html>