Wait times are intermittently spiking to 100 or more seconds on all clouds. These spikes only last about three minutes and occur about once every two hours. We are investigating the cause.
2020-January-17 Resolved Service Incident
Incident Report for Sauce Labs US West Data Center
Postmortem

Dates:

January 17th, 2020 6:29 am - 6:39 am PDT

What happened:

Some customers intermittently couldn’t start tests and wait times exceeded normal durations on all clouds.

Why it happened:

A configuration change resulted in a communication breakage between two services. The change affected all customers that are part of the extended team management platform. The scheduling system was unable to retrieve entitlements which resulted in queueing of customer sessions instead of cloud resources assignment.

How we fixed it:

We corrected this issue by issuing a revert to the previous stable service version immediately after detecting a problem. This resulted in rapid recovery of the system.

What we are doing to prevent it from happening again:

We are adding an extra automated step to the deployment pipeline that will confirm the validity of the configuration from the app perspective. In addition, the monitoring is being reviewed so that we can detect possible connection issues sooner, before it propagates to the rest of the system.

Posted Feb 06, 2020 - 14:16 PST

Resolved
Wait times exceeded 30 seconds between 6:30 am and 7:00 am (PST), leading to some jobs timing out before starting.
We have taken remedial action, and all services are now fully operational.
Posted Jan 17, 2020 - 08:15 PST