Wait times are intermittently spiking to 100 or more seconds on all clouds. These spikes only last about three minutes and occur about once every two hours. We are investigating the cause.
2019-December-12 Service Incident
Incident Report for Sauce Labs US West Data Center
Postmortem

Date:

December 12th, 10:40 - 11:10 PDT

What happened:

Some customers experienced high wait times running tests in our PC, Mac, &/or Android clouds in our US-West datacenter.

Why it happened:

A failure within our DNS infrastructure resulted in a loss of connectivity from an availability zone to our control plane for 20 to 40 minutes at a time.

How we fixed it:

We’ve modified the configuration of our DNS servers to respond to the error condition and recover immediately.

What we are doing to prevent it from happening again:

  • We’ve developed extensive playbooks that will allow us to diagnose and respond more rapidly to an event where our VMs have lost connectivity to our control plane.
  • We’ve deployed new test infrastructure that validates our configuration and gives us rapid feedback to the conditions and quality of our environment
  • We’ve deployed additional logging and monitoring of our Load Balancing and DNS infrastructure.
  • We’ve configured additional network level logging to provide advanced diagnostics and replay of failure scenarios.
Posted Jan 09, 2020 - 10:25 PST

Resolved
Wait times are back to normal. All services are now fully operational.
Posted Dec 12, 2019 - 11:54 PST
Monitoring
We have taken remedial action and tests should be starting. We are monitoring closely.
Posted Dec 12, 2019 - 11:29 PST
Investigating
Wait times on our Mac and iOS Simulator Cloud are high. We are investigating.
Posted Dec 12, 2019 - 11:06 PST
This incident affected: Automated VM Testing (Automated Mac Testing, Automated iOS Simulator Testing).