Wait times are intermittently spiking to 100 or more seconds on all clouds. These spikes only last about three minutes and occur about once every two hours. We are investigating the cause.
2019-February-13 Service Incident
Incident Report for Sauce Labs US West Data Center
Postmortem

Dates:

February 13th, 2019 1:35 - 5:00 pm PST

What happened:

Some tests failed to run and new tunnels would not start reliably.

Why it happened:

Our Sauce Connect Tunnel Cloud experienced a burst in peak usage shortly after 1pm that resulted in its inability to boot new tunnels fast enough to meet demand. A bug was also discovered that contributed to elevated boot times when under extreme contention that led to a prolonged recovery.

How we fixed it:

We throttled requests to start Sauce-Connect to allow our Tunnel Cloud to stabilize

What we are doing to prevent it from happening again:

Throughout Feb 13th - Feb 20th we added 50% additional capacity to our tunnel cloud and are working on mitigating changes that allow us to scale more effectively under peak usage. We’ve additionally prepared a number of new nodes with an updated technology stack that we’ll be using to validate for better performance and tunnel density. Additional monitoring and alerting have been put in place that allow us to respond rapidly to degradation in tunnel nodes that have led to poor performance and higher wait times for affected customers over the last two weeks. We have yet to experience tunnels downtime since the outage Wednesday February 13th and continue to proactively respond to tunnel issues before they affect service availability. Longer term we will continue tuning our code, updating our technology stack and implementing better guard rails to prevent irregular behaviour. Furthermore we’ll be making improvements in our Tunnel’s load balancing implementation and capacity modelling to provide a more performant and reliable tunnels experience.

Posted Mar 04, 2019 - 04:25 PST

Resolved
Starting tunnels and wait times are back to normal. All systems are operational.
Posted Feb 13, 2019 - 18:11 PST
Monitoring
We’ve taken remedial measures and wait times as well as tunnels are back to normal. We will continue to monitor.
Posted Feb 13, 2019 - 18:01 PST
Update
We are still seeing high wait times on all our clouds and Sauce Connect tunnels are failing to start. We are still investigating.
Posted Feb 13, 2019 - 15:34 PST
Investigating
We are experiencing high wait times on all our clouds and some Sauce Connect tunnels are failing to start. We are investigating.
Posted Feb 13, 2019 - 14:59 PST
This incident affected: Automated VM Testing (Automated PC Testing, Automated Mac Testing, Automated iOS Simulator Testing, Automated Android Emulator Testing) and Sauce Connect (Sauce Connect VM).