Today I worked a bit on fixing failing browser tests. The good news is that some tests detected a regression in core that caused full text search on mobile to not work. The bad news is that many of the failures seem to be caused by problems with Saucelabs and/or beta labs, examples:
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi... Editor doesn't seem to load, possible causes: beta labs API error, or problem with connection between saucelabs and beta labs
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi... getaddrinfo: Name or service not known (SocketError) - seems like a problem with network on saucelabs
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi... Saucelabs recording shows "no data received" error in Chrome, either beta labs problem or saucelabs network problem
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi... same as above
Those are just a few examples from recent failures, but they make tracking regressions really tedious and time consuming. I know we are planning to move away from Saucelabs and use our own servers to run the tests. When will this happen? Is there any deadline?
Thanks,
Indeed. The tests have been failing for a month now, and had been passing green before the move to integration.wikimedia.org It would be really good to get these back to being useful.
I'm not sure how our interaction with saucelabs changed during that move, but is there anything that can be done on the short term to get it back to how they were before when we were on cloudbees?
Thanks Juliusz for the good summary of the problems!
On Wed, Jul 9, 2014 at 3:57 PM, Juliusz Gonera jgonera@wikimedia.org wrote:
Today I worked a bit on fixing failing browser tests. The good news is that some tests detected a regression in core that caused full text search on mobile to not work. The bad news is that many of the failures seem to be caused by problems with Saucelabs and/or beta labs, examples:
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi... Editor doesn't seem to load, possible causes: beta labs API error, or problem with connection between saucelabs and beta labs
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi... getaddrinfo: Name or service not known (SocketError) - seems like a problem with network on saucelabs
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi... Saucelabs recording shows "no data received" error in Chrome, either beta labs problem or saucelabs network problem
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi... same as above
Those are just a few examples from recent failures, but they make tracking regressions really tedious and time consuming. I know we are planning to move away from Saucelabs and use our own servers to run the tests. When will this happen? Is there any deadline?
Thanks,
Juliusz
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
ChrisMC,
Are these failures unique to mobile? They seem look to be at the infrastructure level so i'm guessing it would affect others.
What other information do you need from us to be able to remedy these?
--tomasz
On Wed, Jul 9, 2014 at 5:54 PM, Jon Robson jdlrobson@gmail.com wrote:
Indeed. The tests have been failing for a month now, and had been passing green before the move to integration.wikimedia.org It would be really good to get these back to being useful.
I'm not sure how our interaction with saucelabs changed during that move, but is there anything that can be done on the short term to get it back to how they were before when we were on cloudbees?
Thanks Juliusz for the good summary of the problems!
On Wed, Jul 9, 2014 at 3:57 PM, Juliusz Gonera jgonera@wikimedia.org wrote:
Today I worked a bit on fixing failing browser tests. The good news is that some tests detected a regression in core that caused full text search on mobile to not work. The bad news is that many of the failures seem to be caused by problems with Saucelabs and/or beta labs, examples:
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi... Editor doesn't seem to load, possible causes: beta labs API error, or problem with connection between saucelabs and beta labs
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi... getaddrinfo: Name or service not known (SocketError) - seems like a problem with network on saucelabs
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi... Saucelabs recording shows "no data received" error in Chrome, either beta labs problem or saucelabs network problem
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi... same as above
Those are just a few examples from recent failures, but they make tracking regressions really tedious and time consuming. I know we are planning to move away from Saucelabs and use our own servers to run the tests. When will this happen? Is there any deadline?
Thanks,
Juliusz
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
-- Jon Robson
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Head's up that Chris Mcmahon is on vacation all this week. That said, it would be great to hear from anyone in QA about this - it has been a long standing issue.
On Thu, Jul 10, 2014 at 10:51 AM, Tomasz Finc tfinc@wikimedia.org wrote:
ChrisMC,
Are these failures unique to mobile? They seem look to be at the infrastructure level so i'm guessing it would affect others.
What other information do you need from us to be able to remedy these?
--tomasz
On Wed, Jul 9, 2014 at 5:54 PM, Jon Robson jdlrobson@gmail.com wrote:
Indeed. The tests have been failing for a month now, and had been passing green before the move to integration.wikimedia.org It would be really good to get these back to being useful.
I'm not sure how our interaction with saucelabs changed during that move, but is there anything that can be done on the short term to get it back to how they were before when we were on cloudbees?
Thanks Juliusz for the good summary of the problems!
On Wed, Jul 9, 2014 at 3:57 PM, Juliusz Gonera jgonera@wikimedia.org
wrote:
Today I worked a bit on fixing failing browser tests. The good news is
that
some tests detected a regression in core that caused full text search on mobile to not work. The bad news is that many of the failures seem to be caused by problems with Saucelabs and/or beta labs, examples:
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi...
Editor doesn't seem to load, possible causes: beta labs API error, or problem with connection between saucelabs and beta labs
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi...
getaddrinfo: Name or service not known (SocketError) - seems like a
problem
with network on saucelabs
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi...
Saucelabs recording shows "no data received" error in Chrome, either
beta
labs problem or saucelabs network problem
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi...
same as above
Those are just a few examples from recent failures, but they make
tracking
regressions really tedious and time consuming. I know we are planning to move away from Saucelabs and use our own servers to run the tests. When
will
this happen? Is there any deadline?
Thanks,
Juliusz
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
-- Jon Robson
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
RobLa, is this something that we should be pulling in Greg for?
--tomasz
On Thu, Jul 10, 2014 at 11:08 AM, Arthur Richards arichards@wikimedia.org wrote:
Head's up that Chris Mcmahon is on vacation all this week. That said, it would be great to hear from anyone in QA about this - it has been a long standing issue.
On Thu, Jul 10, 2014 at 10:51 AM, Tomasz Finc tfinc@wikimedia.org wrote:
ChrisMC,
Are these failures unique to mobile? They seem look to be at the infrastructure level so i'm guessing it would affect others.
What other information do you need from us to be able to remedy these?
--tomasz
On Wed, Jul 9, 2014 at 5:54 PM, Jon Robson jdlrobson@gmail.com wrote:
Indeed. The tests have been failing for a month now, and had been passing green before the move to integration.wikimedia.org It would be really good to get these back to being useful.
I'm not sure how our interaction with saucelabs changed during that move, but is there anything that can be done on the short term to get it back to how they were before when we were on cloudbees?
Thanks Juliusz for the good summary of the problems!
On Wed, Jul 9, 2014 at 3:57 PM, Juliusz Gonera jgonera@wikimedia.org wrote:
Today I worked a bit on fixing failing browser tests. The good news is that some tests detected a regression in core that caused full text search on mobile to not work. The bad news is that many of the failures seem to be caused by problems with Saucelabs and/or beta labs, examples:
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi... Editor doesn't seem to load, possible causes: beta labs API error, or problem with connection between saucelabs and beta labs
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi... getaddrinfo: Name or service not known (SocketError) - seems like a problem with network on saucelabs
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi... Saucelabs recording shows "no data received" error in Chrome, either beta labs problem or saucelabs network problem
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi... same as above
Those are just a few examples from recent failures, but they make tracking regressions really tedious and time consuming. I know we are planning to move away from Saucelabs and use our own servers to run the tests. When will this happen? Is there any deadline?
Thanks,
Juliusz
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
-- Jon Robson
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
-- Arthur Richards Team Practices Lead [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687
Hi all,
Antoine and Zeljko are the right people to talk about this while Chris is out, and it's late in the day for them. I'm sure they'll get back to you tomorrow. Greg may be able to say more about this, but honestly, the nature of this thread is a little bit like little kids in the backseat saying "are we there yet? are we there yet?" repeatedly :-)
Antoine's response seems to answer the substance of what y'all are asking about. We moved from Cloudbees to directly using Saucelabs so that we could debug these issues directly. Now that we're on Saucelabs, we have the info (see Antoine's mail). As he said, we have no plans to set up our own version of Saucelabs.
It may be that the very first thing we need to do is put some sort of environment health check prior to executing the actual test portion to avoid these false failures. Given that the team just completed the migration off of Cloudbees, give them a little time to figure things out.
Thanks Rob
On Thu, Jul 10, 2014 at 11:09 AM, Tomasz Finc tfinc@wikimedia.org wrote:
RobLa, is this something that we should be pulling in Greg for?
--tomasz
On Thu, Jul 10, 2014 at 11:08 AM, Arthur Richards arichards@wikimedia.org wrote:
Head's up that Chris Mcmahon is on vacation all this week. That said, it would be great to hear from anyone in QA about this - it has been a long standing issue.
On Thu, Jul 10, 2014 at 10:51 AM, Tomasz Finc tfinc@wikimedia.org
wrote:
ChrisMC,
Are these failures unique to mobile? They seem look to be at the infrastructure level so i'm guessing it would affect others.
What other information do you need from us to be able to remedy these?
--tomasz
On Wed, Jul 9, 2014 at 5:54 PM, Jon Robson jdlrobson@gmail.com wrote:
Indeed. The tests have been failing for a month now, and had been passing green before the move to integration.wikimedia.org It would be really good to get these back to being useful.
I'm not sure how our interaction with saucelabs changed during that move, but is there anything that can be done on the short term to get it back to how they were before when we were on cloudbees?
Thanks Juliusz for the good summary of the problems!
On Wed, Jul 9, 2014 at 3:57 PM, Juliusz Gonera <jgonera@wikimedia.org
wrote:
Today I worked a bit on fixing failing browser tests. The good news
is
that some tests detected a regression in core that caused full text search on mobile to not work. The bad news is that many of the failures seem to be caused by problems with Saucelabs and/or beta labs, examples:
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi...
Editor doesn't seem to load, possible causes: beta labs API error, or problem with connection between saucelabs and beta labs
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi...
getaddrinfo: Name or service not known (SocketError) - seems like a problem with network on saucelabs
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi...
Saucelabs recording shows "no data received" error in Chrome, either beta labs problem or saucelabs network problem
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi...
same as above
Those are just a few examples from recent failures, but they make tracking regressions really tedious and time consuming. I know we are planning to move away from Saucelabs and use our own servers to run the tests.
When
will this happen? Is there any deadline?
Thanks,
Juliusz
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
-- Jon Robson
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
-- Arthur Richards Team Practices Lead [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687
1) Yeah, feel free to pull me in on these conversations. I'm here to help :)
2) What Robla said plus: Please feel free, actually, please feel encouraged to report bugs on these specific infrastructure-failure tests and assign to Chris or Zeljko and bring up in the SoS as appropriate. Any infrastructure migration will be bumpy when there are some odd black holes we need to debug. They'll debug/process them as fast as they can.
Greg
(PS: I trimmed cc's to just the lists, assuming all were on one of the two)
<quote name="Rob Lanphier" date="2014-07-10" time="12:06:39 -0700">
Hi all,
Antoine and Zeljko are the right people to talk about this while Chris is out, and it's late in the day for them. I'm sure they'll get back to you tomorrow. Greg may be able to say more about this, but honestly, the nature of this thread is a little bit like little kids in the backseat saying "are we there yet? are we there yet?" repeatedly :-)
Antoine's response seems to answer the substance of what y'all are asking about. We moved from Cloudbees to directly using Saucelabs so that we could debug these issues directly. Now that we're on Saucelabs, we have the info (see Antoine's mail). As he said, we have no plans to set up our own version of Saucelabs.
It may be that the very first thing we need to do is put some sort of environment health check prior to executing the actual test portion to avoid these false failures. Given that the team just completed the migration off of Cloudbees, give them a little time to figure things out.
Thanks Rob
On Thu, Jul 10, 2014 at 11:09 AM, Tomasz Finc tfinc@wikimedia.org wrote:
RobLa, is this something that we should be pulling in Greg for?
--tomasz
On Thu, Jul 10, 2014 at 11:08 AM, Arthur Richards arichards@wikimedia.org wrote:
Head's up that Chris Mcmahon is on vacation all this week. That said, it would be great to hear from anyone in QA about this - it has been a long standing issue.
On Thu, Jul 10, 2014 at 10:51 AM, Tomasz Finc tfinc@wikimedia.org
wrote:
ChrisMC,
Are these failures unique to mobile? They seem look to be at the infrastructure level so i'm guessing it would affect others.
What other information do you need from us to be able to remedy these?
--tomasz
On Wed, Jul 9, 2014 at 5:54 PM, Jon Robson jdlrobson@gmail.com wrote:
Indeed. The tests have been failing for a month now, and had been passing green before the move to integration.wikimedia.org It would be really good to get these back to being useful.
I'm not sure how our interaction with saucelabs changed during that move, but is there anything that can be done on the short term to get it back to how they were before when we were on cloudbees?
Thanks Juliusz for the good summary of the problems!
On Wed, Jul 9, 2014 at 3:57 PM, Juliusz Gonera <jgonera@wikimedia.org
wrote:
Today I worked a bit on fixing failing browser tests. The good news
is
that some tests detected a regression in core that caused full text search on mobile to not work. The bad news is that many of the failures seem to be caused by problems with Saucelabs and/or beta labs, examples:
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi...
Editor doesn't seem to load, possible causes: beta labs API error, or problem with connection between saucelabs and beta labs
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi...
getaddrinfo: Name or service not known (SocketError) - seems like a problem with network on saucelabs
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi...
Saucelabs recording shows "no data received" error in Chrome, either beta labs problem or saucelabs network problem
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi...
same as above
Those are just a few examples from recent failures, but they make tracking regressions really tedious and time consuming. I know we are planning to move away from Saucelabs and use our own servers to run the tests.
When
will this happen? Is there any deadline?
Thanks,
Juliusz
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
-- Jon Robson
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
-- Arthur Richards Team Practices Lead [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Thanks for the replies, everyone - however I haven't seen Antoine's response (beyond the bit quoted in Chris Steipp's reply), did it not get reply all'd or otherwise not sent to mobile-l? Further response inline.
On Thu, Jul 10, 2014 at 12:06 PM, Rob Lanphier robla@wikimedia.org wrote:
Antoine and Zeljko are the right people to talk about this while Chris is out, and it's late in the day for them. I'm sure they'll get back to you tomorrow. Greg may be able to say more about this, but honestly, the nature of this thread is a little bit like little kids in the backseat saying "are we there yet? are we there yet?" repeatedly :-)
Rob, the last time we inquired about this (to my knowledge) was one month ago (see email subject 'migrating MobileFrontend browser tests to WMF Jenkins'). We saw no substantive followup, and our browser test builds have been broken since the migration over one month ago as a result of infrastructural issues. This not only rendered the automated browser tests essentially useless for us but it has also become a significant drain on time and focus. This isn't a case of us asking 'are we there yet' - rather it's a case of us trying to understand when we'll be able to rely on our browser tests again and to see if there's any way we can help to improve the situation.
I think Arthur found the right words to describe what the problem is for us. If "we're not there yet" then we should disable all browser test notifications altogether because there's no point in getting several emails about failing tests on mobile-tech if we know they will be for sure failing.
On Thu, Jul 10, 2014 at 1:38 PM, Arthur Richards arichards@wikimedia.org wrote:
Thanks for the replies, everyone - however I haven't seen Antoine's response (beyond the bit quoted in Chris Steipp's reply), did it not get reply all'd or otherwise not sent to mobile-l? Further response inline.
On Thu, Jul 10, 2014 at 12:06 PM, Rob Lanphier robla@wikimedia.org wrote:
Antoine and Zeljko are the right people to talk about this while Chris is out, and it's late in the day for them. I'm sure they'll get back to you tomorrow. Greg may be able to say more about this, but honestly, the nature of this thread is a little bit like little kids in the backseat saying "are we there yet? are we there yet?" repeatedly :-)
Rob, the last time we inquired about this (to my knowledge) was one month ago (see email subject 'migrating MobileFrontend browser tests to WMF Jenkins'). We saw no substantive followup, and our browser test builds have been broken since the migration over one month ago as a result of infrastructural issues. This not only rendered the automated browser tests essentially useless for us but it has also become a significant drain on time and focus. This isn't a case of us asking 'are we there yet' - rather it's a case of us trying to understand when we'll be able to rely on our browser tests again and to see if there's any way we can help to improve the situation.
-- Arthur Richards Team Practices Lead [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687
On Thu, Jul 10, 2014 at 1:38 PM, Arthur Richards arichards@wikimedia.org wrote:
Thanks for the replies, everyone - however I haven't seen Antoine's response (beyond the bit quoted in Chris Steipp's reply), did it not get reply all'd or otherwise not sent to mobile-l? Further response inline.
On Thu, Jul 10, 2014 at 12:06 PM, Rob Lanphier robla@wikimedia.org wrote:
Antoine and Zeljko are the right people to talk about this while Chris is out, and it's late in the day for them. I'm sure they'll get back to you tomorrow. Greg may be able to say more about this, but honestly, the nature of this thread is a little bit like little kids in the backseat saying "are we there yet? are we there yet?" repeatedly :-)
Rob, the last time we inquired about this (to my knowledge) was one month ago (see email subject 'migrating MobileFrontend browser tests to WMF Jenkins'). We saw no substantive followup, and our browser test builds have been broken since the migration over one month ago as a result of infrastructural issues. This not only rendered the automated browser tests essentially useless for us but it has also become a significant drain on time and focus. This isn't a case of us asking 'are we there yet' - rather it's a case of us trying to understand when we'll be able to rely on our browser tests again and to see if there's any way we can help to improve the situation.
The completion of the migration to Saucelabs was announced July 3. If you aren't getting replies, it's probably because they are stuck in the mobile-l moderation queue. Antoine did send his email to the mobile-l list, but is probably not a member. In general, there has been a fair amount of conversation on the topic on the qa list, so I'd encourage you to check out the activity there if you haven't already seen it.
Sorry for being glib earlier. I understand the current situation must be frustrating for you all, and I hope Zeljko and Antoine are able to provide a satisfactory answer on this. Chris will also be back on Monday, though unfortunately Antoine will be gone at that point.
Rob
On Thu, Jul 10, 2014 at 6:37 PM, Rob Lanphier robla@wikimedia.org wrote:
The completion of the migration to Saucelabs was announced July 3. If you aren't getting replies, it's probably because they are stuck in the mobile-l moderation queue. Antoine did send his email to the mobile-l list, but is probably not a member. In general, there has been a fair amount of conversation on the topic on the qa list, so I'd encourage you to check out the activity there if you haven't already seen it.
There are zero messages being held in the moderation queue.
--tomasz
On Fri, Jul 11, 2014 at 7:41 PM, Tomasz Finc tfinc@wikimedia.org wrote:
There are zero messages being held in the moderation queue.
This might be the reason:
http://lists.wikimedia.org/pipermail/qa/2014-July/001706.html
Željko
While the migration from Cloudbees was completed earlier this month, the MobileFrontend jobs were migrated off of Cloudbees over one month ago.
Chris McMahon sent an email announcing this on June 6 [0]. In addition, Chris said 'The tests for MF on beta labs running in headless Firefox under xvfb are reliably green as of today and we'll be working to keep them that way.'
However, those tests have been consistently failing since early June. After digging through MobileFrontend test failures done in particular by Jon and Juliusz, *it's clear that the consistency of the failures are related to architectural/infrastructural issues*. Jon brought this up on June 9 [1], with a response from Zeljko on June 28 [2] mentioning that the issue was known, but that they haven't had time to debug the problem. Fast forward two weeks, and Juliusz resurfaced the problem with this thread since we haven't heard any additional information in regards to resolving the issue, while dealing with a high degree of noise from build failures.
We really appreciate all of the hard work that release/qa/platform/etc has put into this and we understand that resolving issues takes time. When we had more reliable builds, we found the automated browser tests to be incredibly valuable. We want to regain that value so that we can again more reliably catch issues before they find their way to production.
*Is there anyone currently owning or willing to own digging into and resolving these issues? Can we get any kind of timeline for resolving this *- even if it's just in regards to when the issue will be able to be investigated? In the mean time, let's remove the mobile web team from the failure notifications until such time that the builds are reliable and we can depend on a better signal-to-noise ratio.
As an aside - I didn't know the migration work to Cloudbees was complete until it was addressed on this thread. It looks like the only announcement about it was made on the qa list, where I and many folks affected by this change are not subscribed. Please make announcements about things as significant as this on broad-reaching lists like wikitech-l.
[0] http://lists.wikimedia.org/pipermail/qa/2014-June/001515.html [1] http://lists.wikimedia.org/pipermail/qa/2014-June/001535.html [2] http://lists.wikimedia.org/pipermail/qa/2014-June/001615.html
On Thu, Jul 10, 2014 at 6:37 PM, Rob Lanphier robla@wikimedia.org wrote:
On Thu, Jul 10, 2014 at 1:38 PM, Arthur Richards arichards@wikimedia.org wrote:
Thanks for the replies, everyone - however I haven't seen Antoine's response (beyond the bit quoted in Chris Steipp's reply), did it not get reply all'd or otherwise not sent to mobile-l? Further response inline.
On Thu, Jul 10, 2014 at 12:06 PM, Rob Lanphier robla@wikimedia.org wrote:
Antoine and Zeljko are the right people to talk about this while Chris is out, and it's late in the day for them. I'm sure they'll get back to you tomorrow. Greg may be able to say more about this, but honestly, the nature of this thread is a little bit like little kids in the backseat saying "are we there yet? are we there yet?" repeatedly :-)
Rob, the last time we inquired about this (to my knowledge) was one month ago (see email subject 'migrating MobileFrontend browser tests to WMF Jenkins'). We saw no substantive followup, and our browser test builds have been broken since the migration over one month ago as a result of infrastructural issues. This not only rendered the automated browser tests essentially useless for us but it has also become a significant drain on time and focus. This isn't a case of us asking 'are we there yet' - rather it's a case of us trying to understand when we'll be able to rely on our browser tests again and to see if there's any way we can help to improve the situation.
The completion of the migration to Saucelabs was announced July 3. If you aren't getting replies, it's probably because they are stuck in the mobile-l moderation queue. Antoine did send his email to the mobile-l list, but is probably not a member. In general, there has been a fair amount of conversation on the topic on the qa list, so I'd encourage you to check out the activity there if you haven't already seen it.
Sorry for being glib earlier. I understand the current situation must be frustrating for you all, and I hope Zeljko and Antoine are able to provide a satisfactory answer on this. Chris will also be back on Monday, though unfortunately Antoine will be gone at that point.
Rob
On Fri, Jul 11, 2014 at 9:35 PM, Arthur Richards arichards@wikimedia.org wrote:
Chris McMahon sent an email announcing this on June 6 [0]. In addition, Chris said 'The tests for MF on beta labs running in headless Firefox under xvfb are reliably green as of today and we'll be working to keep them that way.'
Running tests using xvfb proved to be more unstable that Sauce Labs. We have moved to Sauce until we have some time to investigate failures.
*Is there anyone currently owning or willing to own digging into and resolving these issues? Can we get any kind of timeline for resolving this *- even if it's just in regards to when the issue will be able to be investigated?
Rob, Chris, as far as I know, I have no big projects at the moment. Should I focus on this?
In the mean time, let's remove the mobile web team from the failure notifications until such time that the builds are reliable and we can depend on a better signal-to-noise ratio.
Done[1]. I have added everybody from this thread to the reviewers. (I could not find Tomasz in Gerrit.)
Željko -- 1: https://gerrit.wikimedia.org/r/#/c/146056/
On Thu, Jul 10, 2014 at 10:38 PM, Arthur Richards arichards@wikimedia.org wrote:
We saw no substantive followup, and our browser test builds have been broken since the migration over one month ago as a result of infrastructural issues. This not only rendered the automated browser tests essentially useless for us but it has also become a significant drain on time and focus. This isn't a case of us asking 'are we there yet' - rather it's a case of us trying to understand when we'll be able to rely on our browser tests again and to see if there's any way we can help to improve the situation.
I have took a quick look at MobileFrontend Jenkins jobs[1-3]. 2 out of 3 jobs jobs have failures that have age 1 (meaning the test failed just once, probably intermittent problem), but also tests that have been failing for the last 3-53 times (so the problem is stable).
I will start debugging the problems that happen every time. I have no answer on when the tests will be green (no failures) and sunny (no failures for the last 5 test runs) again.
Željko -- 1: https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi... 2: https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi... 3: https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi...
On Thu, Jul 10, 2014 at 10:38 PM, Arthur Richards arichards@wikimedia.org wrote:
This isn't a case of us asking 'are we there yet' - rather it's a case of us trying to understand when we'll be able to rely on our browser tests again and to see if there's any way we can help to improve the situation.
I have noticed that pairing is a great way to share knowledge and get things done. As far as I know, the majority of the mobile team is in San Francisco, so it is not the easiest thing to arrange pairing with me.
I am free almost every Monday, Tuesday and Wednesday, 8-9am San Francisco time (5-6pm my time). Just saying. ;)
Željko
On Thu, Jul 10, 2014 at 7:51 PM, Tomasz Finc tfinc@wikimedia.org wrote:
Are these failures unique to mobile? They seem look to be at the infrastructure level so i'm guessing it would affect others.
We have noticed similar problems across all repositories.
What other information do you need from us to be able to remedy these?
All we need is time. We have finished migration from Cloudbees to Wikimedia Jenkins a few days ago, the next step is making the jobs as green as possible.
Željko
On Thu, Jul 10, 2014 at 2:54 AM, Jon Robson jdlrobson@gmail.com wrote:
I'm not sure how our interaction with saucelabs changed during that move, but is there anything that can be done on the short term to get it back to how they were before when we were on cloudbees?
We have a new account now, instead of being able to run 2-3 parallel tests, we are now able to run them 10-15, but that should not cause any problems.
Nothing comes to my mind what could be done short term.
Željko
Hi Juliusz,
comments are inline.
On Thu, Jul 10, 2014 at 12:57 AM, Juliusz Gonera jgonera@wikimedia.org wrote:
Today I worked a bit on fixing failing browser tests. The good news is that some tests detected a regression in core that caused full text search on mobile to not work. The bad news is that many of the failures seem to be caused by problems with Saucelabs and/or beta labs, examples:
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi... Editor doesn't seem to load, possible causes: beta labs API error, or problem with connection between saucelabs and beta labs
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi... getaddrinfo: Name or service not known (SocketError) - seems like a problem with network on saucelabs
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi... Saucelabs recording shows "no data received" error in Chrome, either beta labs problem or saucelabs network problem
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi... same as above
All links now lead to passing tests. When copy/pasting links from Jenkins, please make sure to use URLs with build number instead of URLs with "lastBuild".
Those are just a few examples from recent failures, but they make tracking regressions really tedious and time consuming. I know we are planning to move away from Saucelabs and use our own servers to run the tests.
We have tried moving away from third party services (Cloudbees, Sauce Labs) and we have succeeded to move all Jenkins jobs from Cloudbees to Wikimedia Jenkins.
We have tried running tests in local browsers (instead at Sauce Labs) but the tests were also sometimes failing for unclear reasons, so we are at the moment again using Sauce Labs. Better the devil you know than the devil you don't...[1]
I will continue testing and debugging tests with both local and Sauce Labs browsers and I will let you know the results.
When will this happen? Is there any deadline?
As far as I know, there is no deadline.
Željko -- 1: http://www.usingenglish.com/reference/idioms/better+the+devil+you+know.html
We have tried running tests in local browsers (instead at Sauce Labs) but the tests were also sometimes failing for unclear reasons, so we are at the moment again using Sauce Labs. Better the devil you know than the devil you don't...[1]
I have a lot of learning to do before I can be of much help with the Sauce Labs side of things, but if improving the state of MobileFrontend browser tests in mw-vagrant would be the best first step here, I can certainly help with that.
Juliusz (or anyone else on Mobile that has the time), let me know if you're available for pairing next week. I'm available during SF hours in the office Monday, Wednesday, Friday, and available for hangouts Tuesday and Thursday. Fair warning: I'm still getting up-to-speed with all things MediaWiki. That said, I feel I now understand enough about browser tests and the mw-vagrant environment to be helpful. For things still mysterious, I can always take notes and lean on Zeljko for answers. :)
Dan
On Thu, Jul 10, 2014 at 12:57 AM, Juliusz Gonera jgonera@wikimedia.org wrote:
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi... Editor doesn't seem to load, possible causes: beta labs API error, or problem with connection between saucelabs and beta labs
My guess is that you are talking about this failure[1]. Looking at the Sauce Labs screencast, it looks to me that labs was just slow to respond and the test failed after 5 seconds with pretty descriptive error message[2]. It is hard for me to say why labs is slow. As Antoine has suggested, looking at logs for that date/time could help.
If that happens a lot, a short workaround would be to make the test wait for 10 seconds instead of 5.
Željko -- 1: https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi... 2: timed out after 5 seconds, waiting for {:css=>".wikitext-editor", :tag_name=>"textarea"} to become present (Watir::Wait::TimeoutError)
On Fri, Jul 11, 2014 at 10:12 AM, Željko Filipin zfilipin@wikimedia.org wrote:
On Thu, Jul 10, 2014 at 12:57 AM, Juliusz Gonera jgonera@wikimedia.org wrote:
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi... Editor doesn't seem to load, possible causes: beta labs API error, or problem with connection between saucelabs and beta labs
My guess is that you are talking about this failure[1]. Looking at the Sauce Labs screencast, it looks to me that labs was just slow to respond and the test failed after 5 seconds with pretty descriptive error message[2]. It is hard for me to say why labs is slow. As Antoine has suggested, looking at logs for that date/time could help.
If that happens a lot, a short workaround would be to make the test wait for 10 seconds instead of 5.
Thanks for the response, Zeljko. I'm going to make the dumb manager response and ask the question (not just to Zeljko, but to everyone): as a permanent fix, can we change the wait to 60 seconds and call it good? How was 5 seconds arrived at as the time for an automated test to fail? Labs was never intended for performance testing, and it's not suited for it, so if the rationale is because we're testing end-user performance, we should stop. SauceLabs also isn't designed for it, so any effort to use it will also end in sadness. 60 seconds may be a bit extreme, but really, let's set a number here (and everywhere we have timeouts) that's high enough to stop getting false positives, and leave it there. In cases where we want to automate a responsiveness test, let's make sure we're doing it against test2 or some production cluster machine, and that we're doing it from a client that isn't also likely to introduce random delays.
Rob
1: https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi... 2: timed out after 5 seconds, waiting for {:css=>".wikitext-editor", :tag_name=>"textarea"} to become present (Watir::Wait::TimeoutError)
Greg, Rob, Tomasz and I just had an IRL conversation about this. Given some of the ambiguity of the test failures we've been discussing as related to 'infrastructure/architecture issues', we should be filing specific bug reports in Bugzilla in regards to the issues we see. I'll followup with the mobile web team directly to start digging into this. Further, it was clarified that Greg G has ownership of getting the issues resolved. We also agreed that for the time being, mobile-tech will be removed from the list of recipients of the failure emails until the issues are resolved. However, no one in the room was sure how to actually do this - Zeljko, Chris, Dan, is this something one of you can help out with?
Finally, I'd like to mention that none of the conversation on this thread was intended to question the integrity or validity of the hard work that the QA team has put in to making improvements to the testing infrastructure. We're all on the same (figurative) team, and we understand that it takes time to iron out inevitable issues particularly when it pertains to complex systems, migrations, etc. At the end of the day, we're very eager to be able to fully leverage an automated test system to help us ship better quality stuff, and the heart of this conversation is about resolving the things currently standing in the way of that goal.
On Fri, Jul 11, 2014 at 12:39 PM, Rob Lanphier robla@wikimedia.org wrote:
On Fri, Jul 11, 2014 at 10:12 AM, Željko Filipin zfilipin@wikimedia.org wrote:
On Thu, Jul 10, 2014 at 12:57 AM, Juliusz Gonera jgonera@wikimedia.org wrote:
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi... Editor doesn't seem to load, possible causes: beta labs API error, or problem with connection between saucelabs and beta labs
My guess is that you are talking about this failure[1]. Looking at the Sauce Labs screencast, it looks to me that labs was just slow to respond and the test failed after 5 seconds with pretty descriptive error message[2]. It is hard for me to say why labs is slow. As Antoine has suggested, looking at logs for that date/time could help.
If that happens a lot, a short workaround would be to make the test wait for 10 seconds instead of 5.
Thanks for the response, Zeljko. I'm going to make the dumb manager response and ask the question (not just to Zeljko, but to everyone): as a permanent fix, can we change the wait to 60 seconds and call it good? How was 5 seconds arrived at as the time for an automated test to fail? Labs was never intended for performance testing, and it's not suited for it, so if the rationale is because we're testing end-user performance, we should stop. SauceLabs also isn't designed for it, so any effort to use it will also end in sadness. 60 seconds may be a bit extreme, but really, let's set a number here (and everywhere we have timeouts) that's high enough to stop getting false positives, and leave it there. In cases where we want to automate a responsiveness test, let's make sure we're doing it against test2 or some production cluster machine, and that we're doing it from a client that isn't also likely to introduce random delays.
Rob
1: https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi... 2: timed out after 5 seconds, waiting for {:css=>".wikitext-editor", :tag_name=>"textarea"} to become present (Watir::Wait::TimeoutError)
On 11 July 2014 15:19, Arthur Richards arichards@wikimedia.org wrote:
Greg, Rob, Tomasz and I just had an IRL conversation about this. Given some of the ambiguity of the test failures we've been discussing as related to 'infrastructure/architecture issues', we should be filing specific bug reports in Bugzilla in regards to the issues we see. I'll followup with the mobile web team directly to start digging into this. Further, it was clarified that Greg G has ownership of getting the issues resolved. We also agreed that for the time being, mobile-tech will be removed from the list of recipients of the failure emails until the issues are resolved. However, no one in the room was sure how to actually do this - Zeljko, Chris, Dan, is this something one of you can help out with?
Steps (I had to do this last week; sharing the learning rather than just replicating the issue):
1. Go to https://integration.wikimedia.org/ci/view/BrowserTests/ 2. Be logged in as someone with admin permissions (I *think* that's automatic for ldap/wmf) 3. Go to the browser test you want to modify (e.g. the MobileFrontend Chrome enwiki BetaLabs one https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/ ) 4. Click "configure" in the upper-left of the project page. 5. Scroll down to "Project Recipient List" 6. Add/remove as needed. 7. Press "Save" at the bottom of the page.
Have done this for MobileFrontend's Chrome and Firefox enwiki BetaLabs, and the Firefox test2 Prod projects. They now only send to qa-alerts and Chris McMahon.
HTH!
J.
I believe many of the jobs listed there are defined in integration/jenkins-job-builder-config (see jobs.yaml in the cloudbees branch). Whether changes made through the web interface will be clobbered by the next import, I'm not sure. Antoine will likely know more.
On Fri, Jul 11, 2014 at 3:50 PM, James Forrester jforrester@wikimedia.org wrote:
On 11 July 2014 15:19, Arthur Richards arichards@wikimedia.org wrote:
Greg, Rob, Tomasz and I just had an IRL conversation about this. Given some of the ambiguity of the test failures we've been discussing as related to 'infrastructure/architecture issues', we should be filing specific bug reports in Bugzilla in regards to the issues we see. I'll followup with the mobile web team directly to start digging into this. Further, it was clarified that Greg G has ownership of getting the issues resolved. We also agreed that for the time being, mobile-tech will be removed from the list of recipients of the failure emails until the issues are resolved. However, no one in the room was sure how to actually do this - Zeljko, Chris, Dan, is this something one of you can help out with?
Steps (I had to do this last week; sharing the learning rather than just replicating the issue):
- Go to https://integration.wikimedia.org/ci/view/BrowserTests/
- Be logged in as someone with admin permissions (I *think* that's
automatic for ldap/wmf) 3. Go to the browser test you want to modify (e.g. the MobileFrontend Chrome enwiki BetaLabs one https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/ ) 4. Click "configure" in the upper-left of the project page. 5. Scroll down to "Project Recipient List" 6. Add/remove as needed. 7. Press "Save" at the bottom of the page.
Have done this for MobileFrontend's Chrome and Firefox enwiki BetaLabs, and the Firefox test2 Prod projects. They now only send to qa-alerts and Chris McMahon.
HTH!
J.
James D. Forrester Product Manager, Editing Wikimedia Foundation, Inc.
jforrester@wikimedia.org | @jdforrester
QA mailing list QA@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/qa
On Sat, Jul 12, 2014 at 1:04 AM, Dan Duvall dduvall@wikimedia.org wrote:
I believe many of the jobs listed there are defined in integration/jenkins-job-builder-config (see jobs.yaml in the cloudbees branch).
_All_ jobs are managed via JJB.
Whether changes made through the web interface will be clobbered by the next import, I'm not sure.
Yes, the changes made via the web interface will be overwritten.
Željko
Nice! Thanks James :)
On Fri, Jul 11, 2014 at 3:50 PM, James Forrester jforrester@wikimedia.org wrote:
On 11 July 2014 15:19, Arthur Richards arichards@wikimedia.org wrote:
Greg, Rob, Tomasz and I just had an IRL conversation about this. Given some of the ambiguity of the test failures we've been discussing as related to 'infrastructure/architecture issues', we should be filing specific bug reports in Bugzilla in regards to the issues we see. I'll followup with the mobile web team directly to start digging into this. Further, it was clarified that Greg G has ownership of getting the issues resolved. We also agreed that for the time being, mobile-tech will be removed from the list of recipients of the failure emails until the issues are resolved. However, no one in the room was sure how to actually do this - Zeljko, Chris, Dan, is this something one of you can help out with?
Steps (I had to do this last week; sharing the learning rather than just replicating the issue):
- Go to https://integration.wikimedia.org/ci/view/BrowserTests/
- Be logged in as someone with admin permissions (I *think* that's
automatic for ldap/wmf) 3. Go to the browser test you want to modify (e.g. the MobileFrontend Chrome enwiki BetaLabs one https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/ ) 4. Click "configure" in the upper-left of the project page. 5. Scroll down to "Project Recipient List" 6. Add/remove as needed. 7. Press "Save" at the bottom of the page.
Have done this for MobileFrontend's Chrome and Firefox enwiki BetaLabs, and the Firefox test2 Prod projects. They now only send to qa-alerts and Chris McMahon.
HTH!
J.
James D. Forrester Product Manager, Editing Wikimedia Foundation, Inc.
jforrester@wikimedia.org | @jdforrester
On Sat, Jul 12, 2014 at 12:50 AM, James Forrester jforrester@wikimedia.org wrote:
Steps (I had to do this last week; sharing the learning rather than just replicating the issue):
- Go to https://integration.wikimedia.org/ci/view/BrowserTests/
- Be logged in as someone with admin permissions (I *think* that's
automatic for ldap/wmf) 3. Go to the browser test you want to modify (e.g. the MobileFrontend Chrome enwiki BetaLabs one https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/ ) 4. Click "configure" in the upper-left of the project page. 5. Scroll down to "Project Recipient List" 6. Add/remove as needed. 7. Press "Save" at the bottom of the page.
Oh noes! That is _not_ the way to do it! :)
We use JJB[1] for job configuration. This[2] is how to do it.
You could _temporarily_ change a Jenkins job via the web interface (useful for debugging a job), but the next time somebody pushes a change via JJB, your change will be overwritten.
James, the changes you have made are overwritten, since we have been updating jobs via JJB last week. Let me know if you need help making changes to jobs.
Željko -- 1: http://ci.openstack.org/jenkins-job-builder/ 2: https://gerrit.wikimedia.org/r/#/c/146056/
On 14 July 2014 06:44, Željko Filipin zfilipin@wikimedia.org wrote:
On Sat, Jul 12, 2014 at 12:50 AM, James Forrester < jforrester@wikimedia.org> wrote:
Steps (I had to do this last week; sharing the learning rather than just replicating the issue):
- Go to https://integration.wikimedia.org/ci/view/BrowserTests/
- Be logged in as someone with admin permissions (I *think* that's
automatic for ldap/wmf) 3. Go to the browser test you want to modify (e.g. the MobileFrontend Chrome enwiki BetaLabs one https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/ ) 4. Click "configure" in the upper-left of the project page. 5. Scroll down to "Project Recipient List" 6. Add/remove as needed. 7. Press "Save" at the bottom of the page.
Oh noes! That is _not_ the way to do it! :)
We use JJB[1] for job configuration. This[2] is how to do it.
You could _temporarily_ change a Jenkins job via the web interface (useful for debugging a job), but the next time somebody pushes a change via JJB, your change will be overwritten.
James, the changes you have made are overwritten, since we have been updating jobs via JJB last week. Let me know if you need help making changes to jobs.
Ah, interesting. All the changes I've made to the VisualEditor jobs have stuck for weeks – presumably they've not been updated since? Apologies all for the mis-information.
J.
On Mon, Jul 14, 2014 at 5:28 PM, James Forrester jforrester@wikimedia.org wrote:
All the changes I've made to the VisualEditor jobs have stuck for weeks – presumably they've not been updated since?
Probably. Let me know if you need help changing VisualEditor jobs.
Željko
On Fri, Jul 11, 2014 at 3:19 PM, Arthur Richards arichards@wikimedia.org wrote:
Given some of the ambiguity of the test failures we've been discussing as related to 'infrastructure/architecture issues', we should be filing specific bug reports in Bugzilla in regards to the issues we see.
Juliusz is going to coordinate this - should bugs get filed under Wikimedia -> Quality Assurance, or is there a better place for them?
<quote name="Arthur Richards" date="2014-07-11" time="16:47:40 -0700">
On Fri, Jul 11, 2014 at 3:19 PM, Arthur Richards arichards@wikimedia.org wrote:
Given some of the ambiguity of the test failures we've been discussing as related to 'infrastructure/architecture issues', we should be filing specific bug reports in Bugzilla in regards to the issues we see.
Juliusz is going to coordinate this - should bugs get filed under Wikimedia -> Quality Assurance, or is there a better place for them?
If the issue is a browser test or as-of-yet unclear, yeah. If it turns out to really be a bug somewhere else, we can move it.
Greg
(PS: I trimmed the cc's again, assuming robla was on qa and/or mobile, and arthur was on either and maryana was on mobile-l, let me know if that's wrong.)
On Sat, Jul 12, 2014 at 1:47 AM, Arthur Richards arichards@wikimedia.org wrote:
Juliusz is going to coordinate this - should bugs get filed under Wikimedia -> Quality Assurance, or is there a better place for them?
That is a good place, especially for anything related to Ruby, Selenium, Cucumber, page object pattern and friends.
Wikimedia > Continuous integration is a good place for anything related to Jenkins.
Željko
On Fri, Jul 11, 2014 at 3:19 PM, Arthur Richards arichards@wikimedia.org wrote:
Greg, Rob, Tomasz and I just had an IRL conversation about this. Given some of the ambiguity of the test failures we've been discussing as related to 'infrastructure/architecture issues', we should be filing specific bug reports in Bugzilla in regards to the issues we see. I'll followup with the mobile web team directly to start digging into this. Further, it was clarified that Greg G has ownership of getting the issues resolved. We also agreed that for the time being, mobile-tech will be removed from the list of recipients of the failure emails until the issues are resolved. However, no one in the room was sure how to actually do this - Zeljko, Chris, Dan, is this something one of you can help out with?
Finally, I'd like to mention that none of the conversation on this thread was intended to question the integrity or validity of the hard work that the QA team has put in to making improvements to the testing infrastructure. We're all on the same (figurative) team, and we understand that it takes time to iron out inevitable issues particularly when it pertains to complex systems, migrations, etc. At the end of the day, we're very eager to be able to fully leverage an automated test system to help us ship better quality stuff, and the heart of this conversation is about resolving the things currently standing in the way of that goal.
Arthur, thanks for the excellent recap and for the conversation earlier. I'm really happy we were able to hash things out, and my apologies again for being glib about the test failures.
Rob
On Sat, Jul 12, 2014 at 12:19 AM, Arthur Richards arichards@wikimedia.org wrote:
We also agreed that for the time being, mobile-tech will be removed from the list of recipients of the failure emails until the issues are resolved. However, no one in the room was sure how to actually do this - Zeljko, Chris, Dan, is this something one of you can help out with?
Done[1].
Željko -- 1: https://gerrit.wikimedia.org/r/#/c/146056/
On Fri, Jul 11, 2014 at 9:39 PM, Rob Lanphier robla@wikimedia.org wrote:
as a permanent fix, can we change the wait to 60 seconds and call it good?
That would be a workaround, not really a fix, especially not a permanent one. It is doable[1-2] and it might help, but it also means that every time a test really fails because an element is not present, it will not fail after waiting for 5, but 60 seconds. That might make the test runs longer.
I will create a few test jobs and see if it helps.
A permanent fix would be to run both the Jenkins job, the browser and the mediawiki instance on the same machine (or as close as possible), instead of reaching over the internet to Sauce Labs, WMF labs or production.
How was 5 seconds arrived at as the time for an automated test to fail?
We have an option of using three (yes, 3) APIs to drive the browser, but I think all three of them default to waiting for an element for 5 seconds and then giving up.
Željko -- 1: https://code.google.com/p/selenium/wiki/RubyBindings#Implicit_waits 2: http://rdoc.info/gems/selenium-webdriver/Selenium/WebDriver/Timeouts#implici...
On Wed, Jul 9, 2014 at 3:57 PM, Juliusz Gonera jgonera@wikimedia.org wrote:
https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Mobi... getaddrinfo: Name or service not known (SocketError) - seems like a problem with network on saucelabs
Three Flow Chrome browsertests on beta labs run at Sauce Labs failed today with "getaddrinfo: Name or service not known (SocketError)" on Jul 16, 2014 6:26:46 PM (UTC?, I think 11:26 AM SF time). See https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Flow...
15 minutes earlier a Firefox test also failed with the getaddrinfo error, see https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Flow...
So I filed *Bug 68125* https://bugzilla.wikimedia.org/show_bug.cgi?id=68125 - browser tests failing with "getaddrinfo: Name or service not known (SocketError)"
[Sage manager] suggested
can we change the wait to 60 seconds and call it good? How was 5 seconds arrived at as the time for an automated test to fail?
The other Firefox test failure on that run was adding a topic took 6 seconds, thus triggering
timed out after 5 seconds, Element still visible after 5 seconds (Watir::Wait::TimeoutError)
Flow tests often fail with these timeouts yet the expected result appears is in the screencast or ends up on the test page. So yes, increasing the wait timeout to 10 seconds would cut down our false failures.
QA folk, is there a way to "grep" all browser tests for gettaddrinfo and "timed out after 5 seconds" to see if there's a pattern to when and how often they occur?
Thanks indeed,