June 7, 2017
San Francisco based identity management company, OneLogin Inc., suffered a major breach last week on Wednesday, May 31st. According to information provided by OneLogin, a malicious actor obtained access to their Amazon Web Services account through a set of compromised API keys. Using the API keys as the primary attack vector, the threat actor (aka hacker) was able to steal unlimited amounts of sensitive customer data including encryption keys and passwords. OneLogin provides single-sign-on (SSO) as a service with protocols such as SAML (Simple Assertion Markup Language) as well as password, policy and identity management. And the unfortunate thing? It was all easily preventable.
Around 2:00 a.m. San Francisco time, a threat actor began to start new virtual instances inside of the OneLogin’s AWS account. Using these instances and the stolen API credentials, the actor conducted reconnaissance, searching for information to steal. Over seven hours later, OneLogin staff, beginning their morning at 9:00 a.m, finally noticed “unusual activity” within their production databases containing sensitive customer information. At this time, OneLogin staff discovered the beachhead created by the attacker and began to secure the environment, but not before the attacker was able to steal any and all information contained within OneLogin’s databases.
According to OneLogin, the threat actor had sufficient time and authorization to access information about users, including their full names and email addresses, stored passwords and encryption keys. While some data, deemed sensitive by OneLogin was encrypted at rest, the company could not rule out the possibility that the attacker could have gained the ability to decrypt the data, although OneLogin did not say why they believe this or how this would have been possible. In the notice sent to customers two days after the breach, customers were asked to change all of their passwords, generate all new encryption keys and certificates and authorize applications using OAUTH to generate new authentication credentials, essentially replacing all of the data used by customers for SSO, authentication and identity access management.
API keys provide the ability to make remote program calls to create, update, modify or destroy configuration within an account based upon the role(s) assigned to the API key credentials. Roles contain permissions that permit or deny access to features and functions within an account, sometimes limiting access to specific regions or more granular containers. The role(s) assigned to the stolen credentials apparently had sufficient rights to create new instances, modify security groups and access databases.
Since 2012, AWS has provided the ability to attach roles to instances when they are launched and since February 2017, users can attach roles to instances that are already running. When a role is attached to an instance, any API call made by that instance. When the role is assigned and/or an instance is started, temporary IAM credentials are assumed by the instance, allowing calls to the API infrastructure to assume the authority of the role(s) assigned. This eliminates the need to generate and share public/private key pairs to leverage the APIs within AWS.
Since 2011, temporary credentials could be created through the Identity Security Token Service enabling federated access to AWS APIs and services. Given a IAM role, a call to STS generates a temporary access token that will have the same or lower privilege as the requestor. Let’s say that a software-as-a-service (SaaS) application needs to grant permission to a user to write to their own specific portion of an S3 bucket. The application’s IAM credential, granted either explicitly with a public/private key (NO NO!) or, ideally, through an assigned instance (EC2) role, will request a temporary credential from STS with the specific rights to a slice of S3. The application can even limit the time the temporary credential is valid. This temporary token can either be used directly by the SaaS or returned to other instances or applications for direct usage–ensuring that the concept of least-privilege is maintained and also limited to a specific duration.
Consider the last hotel you stayed at. More than likely, the access control to your room used some form of electronic key. Think of this as your temporary access token. The key with the most authority, the front desk, created a temporary access token, your room key, with access only to your room and only for the duration of your stay. This is analogous to temporary access tokens using STS in AWS.
Multi-factor authentication (MFA) is becoming ubiquitous with vendors who want to provide customers with the ability to secure their account with more than just a password. AWS supports MFA for console access, but API and CLI tools do not yet support MFA. Still, since we don’t know how the identity keys were stolen, we would be remiss if we did point out this extremely important feature of AWS identity access management (IAM).
While we do not yet know the details about how the API keys were used to gain elevated privilege and access the customer databases, another surprising element to the OneLogin breach is the length of time the threat actor was inside the account without detection. Vulnerabilities are part of any risk management program and not every risk or vulnerability can be completely eliminated. For this reason, early detection and quick incident response teams will limit the damage caused by attacks such as these. Yet it took OneLogin staff over 7 hours to detect the breach before implementing a response plan and then only after, “unusual database activity.” Why didn’t anyone detect the newly created instances? Why didn’t anyone detect the use of API keys from unknown source IPs? Where were the controls to alert the staff to this unusual activity?
Unless you encase your computer in a block of cement and drop it in a deepest trench in the ocean, every system is vulnerable. Early detection of a breach can mitigate or even prevent data loss or further compromise. In a case study of Sony’s 2014 security breach which led to the disclosure of over 4,000 private employee records, the finding concluded that while Sony failed to properly implement some critical security controls, the length of time before detection played a key role in the data loss.
It took over 7 hours before OneLogin detected the breach, according to OneLogin’s own account to its customers. There are several services within AWS that might have prevented any data loss if they were properly implemented. Simple Notification Service (SNS) is an AWS service that enables notifications via SMS, email and push notification to mobile devices. AWS Config is a service that allows customers to take a snapshot view of the entire AWS account. And finally AWS Lambda allows for serverless execution of functions (aka custom code) that can perform various functions.
When Lambda, Config and SNS are properly configured, the services can detect changes to security groups, network access control lists, IAM credentials and basically anything that generates an AWS config statement. There are plenty of examples illustrating this concept. Using the API credentials alone would not have provided easy access to customer data. But the moment the attacker started an instance, cloned a volume or changed a security group, it could have alerted a security operations team and begun the incident response at that moment.
Another approach to early detection could use SNS, AWS Redshift and AWS CloudTrail. CloudTrail produces logs of API transactions including information about the credentials used, source IP and result of the API call. AWS Redshift is a data warehouse solution that can ingest CloudTrail logs. Using Lambda, previously unknown source IPs (the identity of the attacker, essentially) could have triggered an SNS notification to a security operations team the first time the attacker used the credentials! This might have effectively reduced the response time to minutes instead of hours.
While not particularly relevant to this breach, AWS provides anti-denial-of-service (DOS) tools such as AWS Shield and AWS WAF (web application firewall). When combined with tools like auto-scaling groups (ASGs) and other technologies such as Lambda, a very solid and robust compute environment can mount a sizable defense against attackers wanting to bring down a company’s workload. This is not applicable to this particular breach, but given the ubiquity of these types of attacks (Think about the Dyn attack), we felt it was worth mentioning.
As of the time of this article, OneLogin did not provide information about how the attacker gained access to the databases themselves. If the data was properly encrypted with keys managed by AWS Key Management Service (KMS), then direct access to systems, volumes or databases could and should have been protected with encryption at the volume level. Restricting access to the decryption keys might have prevented the data loss entirely or prevented access to secure database user passwords. Since it is purely speculation at this point on the specific method used to exploit the customer data specifically, we will not go too deep here except to say that proper management of encryption keys is critical. It doesn’t matter how good the cipher is if the key is easily accessible. Using some of the methods of “fail-secure” discussed below, critical encryption keys could be made “safe” upon detection of anomalous activity. It might have rendered production systems inoperative temporarily, but it also would have prevented loss of customer information.
Early detection is critical to initiate a rapid incident response. However unless an organization has a robust on-call service or outsources incident response to an organization that operates 24/7, there will still be some delay. Automating a “fail-secure” program may reduce the time an attacker can exploit a breach to near zero. For example, consider the detection methods described above using AWS Config and AWS Cloudtrail as the source of information about an attack: source IP address and API key (aka IAM credentials). Once the source IP is known, Lambda could be used to block this IP from organization resources that are using AWS WAF (or any other WAF that can be accessed by Lambda). And once the API key is known, Lambda could disable the API keys entirely, shutting down the entire attack.
Another fail-secure technique would be to rollback unauthorized changes such as changes to security groups (aka the firewalls around instances) or stopping instances that are not properly authorized. And while we are discussing the idea of fail-secure for an actual breach, this also can be used to fail-secure accidental changes by authorized users and enforce security policies.
Systems and data need to be accessible to authorized programs, networks and users in order to be useful. And it’s therefore impossible to secure any system 100%. In this “game of threats” played out between information security professionals and threat actors, the goal is always to mitigate risk, reduce impact and minimize incident response times. For any company, losing customer data can not only damage its brand, but in this case, place additional customer data and systems at high risk. OneLogin’s entire business is based on trust and assurance and they have suffered the worst kind of loss–a loss of confidence and trust. Customers who choose to remain with OneLogin should ask tough questions:
There are likely some facts that are very sensitive sensitive and may never be released by OneLogin, and certainly not while there is an ongoing forensic and criminal investigation. But given the facts presented by OneLogin, it seems that their risks were not properly identified, their controls woefully inadequate and their staff completely unprepared. We have presented several architectural and implementation options that might have eliminated the vulnerability or, at a minimum, reduced the breach and incident response time to minutes or seconds instead of 7+ hours.
OneLogin should take a good hard look at its security program and those responsible for it before there any more customer data is put at risk or worse–lost.
Jacobian Engineering is an information security firm based in Oakland, CA. Erik D. Jones is the CEO and has been working in information security and national defense for over 25 years. He is a certified HITRUST practitioner, CISSP and AWS Solutions Architect Professional. Erik founded Jacobian Engineering 11 years ago and the company provides managed IT & security services, compliance certification, risk assessment, audit services and forensics. Jacobian operates a 24/7 network and security operations center out of its Omaha, Nebraska office. Prior to founding Jacobian Engineering, Erik worked for Lawrence Livermore National Laboratory in national security and defense. In addition to his work in security, Erik helped to build and sell a successful genealogy company to Ancestry and spent over 8 years working in big-data analytics and machine learning. Erik studied Electrical Engineering at The University of California, Davis where he graduated with high honors.