Wait times are intermittently spiking to 100 or more seconds on all clouds. These spikes only last about three minutes and occur about once every two hours. We are investigating the cause.
2019-04-04 Service Incident (2)
Incident Report for Sauce Labs US West Data Center
Postmortem

Dates:

April 4th, 2019 21:21 PM - April 5th, 2019 00:03 AM PDT

What happened:

Customers of the Real Device Cloud in the US may have experienced failures with starting Appium tests and Live Testing sessions.

Why it happened:

A bug fix pushed into production resulted in an increase in the number of overall threads which starved a key service. This issue was not apparent in pre-production environments and only appeared under full production load and high cloud capacity. We did not detect the issue earlier as our application monitoring solution was not well tailored towards observing some key metrics and alerting on anomalies.

How we fixed it:

We rolled back the change as soon as we managed to identify the root cause of the issue.

What we are doing to prevent it from happening again:

We are reworking the bug fix to avoid the thread issue. Application monitoring and alerting is being enhanced so that such issues could be detected, localized, and resolved in a more efficient way. We are also looking into ways to enhance our load tests in order to be able to detect such issues before changes are rolled out to production.

Posted 7 months ago. Apr 11, 2019 - 16:44 PDT

Resolved
The problem has been resolved. All services are fully operational. Please contact support@saucelabs.com if you continue to experience issues.
Posted 7 months ago. Apr 04, 2019 - 23:58 PDT
Update
RDC Automated and Manual Tests are still offline. Our engineers have eliminated several potential causes and continue to investigate.
Posted 7 months ago. Apr 04, 2019 - 23:08 PDT
Investigating
Automated and Manual Real Device tests are offline. The Real Device REST API may be unresponsive. All Virtual device tests are functioning normally. Our engineers are investigating.
Posted 7 months ago. Apr 04, 2019 - 22:04 PDT
This incident affected: Manual Testing (Manual RDC Testing), REST API (REST API RDC), and Automated RDC Testing (Automated RDC Testing).