Wait times are intermittently spiking to 100 or more seconds on all clouds. These spikes only last about three minutes and occur about once every two hours. We are investigating the cause.
2019-August-08 Service Incident
Incident Report for Sauce Labs US West Data Center
Postmortem

Dates:

August 8th, 2019 07:35 -13:22 PDT

What happened:

Intermittently, customers couldn’t start tests and wait times exceeded normal durations on the Windows and Linux cloud.

Why it happened:

A routine code deploy caused a short disruption to our service. These short disruptions are normal for any cloud service and ours is designed to recover within seconds, with no impact upon customer testing. However, a combination of recent changes to our service made the normal recovery process fail on the PC/Linux cloud. These factors included slightly slower boot times on our VMs and recent increases to the size of the cloud. The code deploy also coincided with a high volume of tests, which made the recovery process yet more difficult. Consequently, the PC/Linux cloud was unable to boot enough VMs of the most requested images to keep up with demand.

How we fixed it:

We corrected this issue by throttling traffic and manually adjusting our scheduling logic until the system cleared the backlog of customer demand.

What we are doing to prevent it from happening again:

We’ve improved the performance and logic of the key cloud state management service, addressed slow boot times on the affected hypervisors and added better alerting. In addition, we developed a new set of tools and procedures to deal with this much more quickly should a similar issue arise. We are also reviewing our test coverage for the scheduler to find any similar corner cases we may have missed.

Posted 2 days ago. Aug 15, 2019 - 17:46 PDT

Resolved
All services are now fully operational.
Posted 9 days ago. Aug 08, 2019 - 13:22 PDT
Update
The PC cloud is recovering but still experiencing high wait times and failures. We are taking remedial action.
Posted 9 days ago. Aug 08, 2019 - 11:12 PDT
Update
Mac and iOS Simulator tests are now running normally. We continue to investigate high wait times in our PC cloud.
Posted 10 days ago. Aug 08, 2019 - 09:13 PDT
Update
We are continuing to investigate this issue.
Posted 10 days ago. Aug 08, 2019 - 08:49 PDT
Investigating
We are experiencing high wait times on iOS simulators, PC and Mac. We are investigating.
Posted 10 days ago. Aug 08, 2019 - 08:07 PDT
This incident affected: Manual Testing (Manual VM Testing) and Automated VM Testing (Automated PC Testing, Automated Mac Testing, Automated iOS Simulator Testing).