RSA SecurID Access Service Incident (NA Region)
Incident Report for SecurID
Postmortem

Starting at 8/25/2021 14:04 UTC, three significant and simultaneous events occurred, leading to an authentication outage in [NA3.securid.com](http://na3.securid.com).

  1. Our cloud provider reported a database outage lasting two minutes.
  2. We found three previously unidentified defects in our failure handling.
  3. A high volume of API traffic.

The three simultaneous issues consumed all available processing resources, which caused a unique failure in the authentication front-end that prevented it from accepting connections.

We attempted to resolve this issue according to best practice by performing rolling restarts of affected nodes. However, this did not resolve the issue because the restarted nodes were immediately overwhelmed.

We reverted software on the affected nodes to the July 2021 release, which allowed us to move the affected customers to a set of servers that was not impacted by these same issues. The August 2021 release did not introduce issues that caused the outage. It was the three previously mentioned issues happening simultaneously that caused the outage.

The following mitigations are being investigated to help avoid similar situations in the future:

  • Fix three failure handling defects.
  • Throttle unusual or incorrect admin API traffic.
  • Review outage communications procedures and make necessary updates to help ensure more timely updates to our customers.

Thank you,
The SecurID SaaS Operations Team

Posted Aug 31, 2021 - 16:26 UTC

Resolved
The SecurID team continues to investigate today’s outage on NA3. Out of an abundance of caution, our SaaS Operations Team has rolled back our August 2021 release and restored customers on na3.access.securid.com to our July 2021 release.

As a result of this rollback, your administrators will notice that recent user interface enhancements and identity router status enhancements are removed from the system. These enhancements will be restored once we have completed our investigation and have a fix in place.

We apologize for any inconvenience this has caused. RSA will post a root cause analysis as soon as it is available.

Thank you,
The SecurID Team
Posted Aug 25, 2021 - 21:54 UTC
Monitoring
The issue affecting RSA SecurID Access has been corrected. The RSA SaaS Operations team is monitoring the fix.

RSA will post a root cause analysis as soon as it is available.
Posted Aug 25, 2021 - 17:40 UTC
Identified
RSA SaaS Operations has identified the cause of the issue and is working to implement a fix.
Posted Aug 25, 2021 - 17:20 UTC
Investigating
RSA has detected an issue affecting RSA SecurID Access.
RSA SaaS Operations is investigating the issue and will post updates as they become available.
Posted Aug 25, 2021 - 14:14 UTC
This incident affected: NA (na3.access Authentication Service).