July 8th, 2019 7:00 PM - 8:20 PM PDT
New Sauce Connect Tunnel requests were unable to start for a short period of time. Afterwards, some tunnels could start but others would fail as we throttled new requests to allow our US-West Tunnels cloud to catch up with the backlog of requests and stabilize.
While deploying to our US-West region a subsystem's configuration was changed to an incorrect value that rendered our Tunnels Cloud unable to accept new tunnel requests. This change was quickly reverted, but not before our Tunnels Cloud became overwhelmed working through the backlog of new tunnel requests that had quickly accumulated.
We corrected the original issue by returning the configuration to its proper value and restarting the related subsystem. We also throttled new tunnel requests until our Tunnel Cloud was able to stabilize.
We’ve begun auditing all configuration settings across both regions and documenting known variances. We’ve added additional logic to our Tunnels Cloud services to better handle this edge case and prevent this error from recurring. We’ve introduced a new module and method to ensure service health status as part of our deploy process.