RSA ID Plus Service Incident (NA Region)
Incident Report for RSA ID Plus
Postmortem

PRELIMINARY RCA

Summary:

On May 7th, 2024, between approximately 12:45 and 14:20 UTC, a service degradation impacted a subset of our customers within the North American region hosted on our NA2 authentication components. The degradation primarily affected browser-based authentication workflows, while other workflows that did not rely on the hosted Authentication UI experienced minimal disruption.

Root Cause Analysis:

The incident stemmed from a failure in our front-end service tier, resulting in requests being incorrectly directed to a degrading node. This issue was compounded by inconsistent results from internal health service checks, leading to the premature return of the node to service before full recovery. Consequently, the node processed incoming requests too slowly, causing authentication service timeouts.

Mitigations:

In response to this incident, RSA is actively enhancing the ID Plus service and related processes with the following measures:

  1. Upgrade of SSL Library: Addressing an edge case performance flaw in a specific SSL library, which was identified as the ultimate root cause of the node failure.

  2. Enhanced Monitoring and Alerting: Implementation of advanced monitoring and alerting systems to promptly detect and mitigate degraded performance anomalies in front-end clustering.

  3. Incident Response Enhancement: Revision and enhancement of incident response procedures to incorporate specific protocols for managing failures in front-end clustering and traffic misrouting.

These proactive steps aim to fortify our systems and processes, ensuring improved resilience and reliability in service delivery.

Posted May 16, 2024 - 21:21 UTC

Resolved
After monitoring the fix, SaaS Operations has determined that the incident affecting RSA ID Plus has been resolved.

We will post a root cause analysis as soon as it is available.
Posted May 07, 2024 - 15:55 UTC
Monitoring
The issue affecting RSA ID Plus has been corrected. The SaaS Operations team is monitoring the fix.

We will post a root cause analysis as soon as it is available.
Posted May 07, 2024 - 15:19 UTC
Investigating
We have detected an issue affecting RSA ID Plus.
SaaS Operations is investigating the issue and will post updates as they become available.
Posted May 07, 2024 - 14:05 UTC
This incident affected: NA (na2.access Authentication Service).