Service Incident Notification – RSA SecurID Access - EU Region
Incident Report for RSA SecurID Access
Postmortem

Published: July 30, 2020

ROOT CAUSE

The Cloud Authentication Services were affected because of high CPU load and latency on the database cluster deployed in the EU region. The high CPU/resource load was caused by Azure MSSQL selecting a poor query plan for a query used in all authentications due to outdated statistics.

RSA Engineering is working closely with our cloud service provider to determine why the database engine at the time choose a query plan different than normal. RSA SaaS Operations has dramatically increased the base processing power of the database cluster in the EU region to mitigate against this risk. We have been continuously monitoring for this event since that time, and there have been no further signs of excessive DB usage in this environment.

RSA Engineering continues to investigate the problem and will issue an update to this RCA if additional cause information is determined.

RECOVERY

RSA is continuously taking steps to improve the RSA SecurID Access service and our processes to help ensure such incidents do not occur in the future. In this case, steps include (but are not limited to):

  • Validate statistics updates and explore the use of automatic tuning options.
  • Optimize the query used to make it less likely for a sub-optimal query plan to be selected.
  • We have dramatically increased the base processing power of the database cluster in the EU region to mitigate against this risk.
Posted Oct 02, 2020 - 16:24 UTC

Resolved
Between 06:10 and 07:05 GMT (approx.) on July 16, 2020, RSA SecurID Access hosted in the EU region experienced an incident due to high load on the database resources.
Posted Jul 16, 2020 - 06:10 UTC