Wait times are intermittently spiking to 100 or more seconds on all clouds. These spikes only last about three minutes and occur about once every two hours. We are investigating the cause.
2019-December-13 Service Incident
Incident Report for Sauce Labs US West Data Center
Postmortem

Date:

December 13th, 9:02 - 9:17 PDT

What happened:

Some customers experienced high wait times running tests in our PC, Mac, &/or Android clouds in our US-West datacenter.

Why it happened:

A failure within our DNS infrastructure resulted in a loss of connectivity from an availability zone to our control plane for 20 to 40 minutes at a time.

How we fixed it:

We’ve modified the configuration of our DNS servers to respond to the error condition and recover immediately.

What we are doing to prevent it from happening again:

  • We’ve developed extensive playbooks that will allow us to diagnose and respond more rapidly to an event where our VMs have lost connectivity to our control plane.
  • We’ve deployed new test infrastructure that validates our configuration and gives us rapid feedback to the conditions and quality of our environment
  • We’ve deployed additional logging and monitoring of our Load Balancing and DNS infrastructure.
  • We’ve configured additional network level logging to provide advanced diagnostics and replay of failure scenarios.
Posted Jan 09, 2020 - 10:25 PST

Resolved
Wait times have returned to normal levels on our PC Cloud. All services are now fully operational.
Posted Dec 13, 2019 - 09:17 PST
Investigating
Wait times are high on our PC Cloud. We are investigating.
Posted Dec 13, 2019 - 09:02 PST
This incident affected: Automated VM Testing (Automated PC Testing).