Scanning for Secrets in Source Code

To put this into context, let’s look at an instance of hardcoded credentials. This bug report was submitted to reverb.com. The researcher discovered a pair of basic authentication credentials used to access Cloudinary. The secret was embedded in the source code of Reverb’s Android app. Anyone who downloads the Android app can extract this credential and gain the ability to access, edit, and delete all files in the Cloudinary instance.

private static final java.lang.String CONFIG = "cloudinary://434762629765715:█████@reverb";

This type of vulnerability is not rare by any means. As a penetration tester, I’ve found anything from basic auth credentials, AWS keys, and Github API keys in many organizations’ public source code or binaries. Sometimes, the only thing attackers need to do to compromise an organization is to search their Github repositories for accidentally committed credentials.

How do we detect these secrets before it causes an info leak? The most straightforward way to detect hardcoded credentials is to use text search and regex.

Hardcoded credentials such as API keys, encryption keys, and database passwords can often be discovered by grepping for keywords such as “key”, “secret”, “password”, or “aws”. These searches target identifiers, like variable names, that are used to refer to the secrets. Similarly, you can use string searches to look for keywords, known file names, and file formats that indicate a secret. RSA private key files, for instance, start with the string -----BEGIN RSA PRIVATE KEY-----.

Many API keys also adhere to a specific format. You can detect these by looking for patterns in source code using regex searches. For instance, AWS access keys IDs commonly start with the string “AKIA”, followed by 16 alphanumeric characters. So if you do a regex search of AKIA[0–9A-Z]{16}, you can very reliably identify strings of this format. Twilio API keys start with “SK” followed by 32 alphanumeric characters. So you can locate them with the regex patternSK[a-z0–9]{32}. Passwords in URLs can be detected by searching for patterns that indicate basic authentication syntax: [a-zA-Z]{3,15}:\/\/[^\/\\:@]+:[^\/\\:@]+@.{1,100}. This regex pattern will discover credentials included in URLs: protocol://username:password@example.com. Identify the key formats for the services you use, and target your search using those patterns.

These two strategies can discover most hardcoded credentials. But by relying on text searches, you risk missing secrets that don’t adhere to a specific format. This is where entropy scanning comes in.

For our purposes, you can think of entropy as how random and unpredictable something is. For instance, a string composed of only one character aaaaa has very low entropy. A longer string with a larger set of characters wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY, has higher entropy.

You can test these strings out and see how entropy is calculated here (Kozlowski, L. Shannon entropy calculator):

Entropy is a good tool to find highly randomized and complex strings, which often indicates a secret. By measuring the entropy of string literals in your source code, you can discover suspicious strings of any format.

You should monitor your public repositories for accidentally committed secrets. Any credentials that are leaked to public repositories should be considered stolen and should be rotated.

Of course, not all code is open-sourced, and not all hardcoded secrets will be committed to public repositories. But hardcoded secrets can still be an issue if leaked through application binaries, logs, or stolen source code. A good strategy to minimize the risk of hardcoded secrets is to employ a scan that combines pattern searching with entropy analysis before code makes it to production and to store secrets in configuration files or secret management services instead.

Sometimes, it might feel necessary to store secrets in code that users can get their hands on. An example of this is API keys used in mobile applications. In this case, you can take steps to prevent these keys from being found. For instance, avoid naming your sensitive variables with easily guessable identifiers like “api_key” or “password”, and obfuscate your code so that it’s harder to extract secrets from your application. Finally, run parts of your application that requires third-party services on the server to avoid packaging keys into application files.

Always scan your codebase for hardcoded secrets and analyze if they have the chance to make it onto the attacker’s screen. See if the secrets need to be there and if you are protecting them properly.