Look, Ma, No Passwords: How & Why Blackfish uses Bloom Filters

When NIST issued guidelines in 2017 advising organizations to check new users’ credentials against a password “breach corpus,” one of the first questions was how to ensure the breach corpus itself didn’t get compromised.

Shape’s game-changing product, Blackfish, solved that problem by designing a patented approach to credential storage involving Bloom filters.

What is a Bloom filter?

A “Bloom filter” is a probabilistic data structure which can be queried for set membership, but which cannot be used to reproduce the original data that defines the set. This makes the construct ideal for storing highly sensitive data such as login credentials.

Bloom filters work by performing multiple hashes against the input datum, translating each of these resulting hash values to an index value of a bit-field. Since the same input value results in the same bit positions for each hash, if all matching fields are already set, then the item in question has probably been seen before.

For example, let’s say there are three pieces of data that are added to a Bloom filter:

Red

Blue

Green

Using the Bloom filter’s hashing algorithm, they will become

Red: <1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0>

Blue: <0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0>

Green: <0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0>

Now the Bloom filter set will be

Set: <1,0,0,1,0,0,0,1,1,1,0,1,1,0,0,0>

So then one might want to query whether “black” is in the dataset. Using the hashing algorithm, “black” becomes  

Black: <1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0>.

Because there is no 1 in the 5th bit position in the aggregate Bloom filter set, we know “black” is not part of the set.

How does Blackfish use Bloom filters?

Blackfish uses Bloom filters to safely determine whether queried credentials were previously identified by Blackfish as compromised.

When a credential stuffing attack is observed on a Shape customer’s website or mobile app, Shape AI identifies the credentials used by the attacker and considers them compromised. The username and password pairs are then hashed, salted, re-hashed, and added to the Bloom filter. Once added to the set, the original credentials are destroyed.

Every time a login request is made on a Blackfish customer’s website or mobile app, Blackfish hashes the username and password combination and then checks the credential against the Bloom filter to determine if that particular username and password pair is part of the Bloom filter’s set.

If the credentials are found to be a match in the Bloom filter, Blackfish notifies the customer so that they can take appropriate action; e.g., temporarily suspend the account, force a password reset, etc.

How is a knowledge base built on Bloom filters safer from attack than a database of hashed passwords?

The underlying bitfield of a Bloom filter represents the entire set of all information about all supplied data. This means that portions of the datastore are not useful for providing meaningful amounts of information about any fraction of that data. For example, if an attacker got his hands on half of the Bloom filter set in the example above, he would not be able to leverage it in an attack, even if he had access to the hashing algorithm. All he would be able to do is determine if a certain username and password pair was not in the compromised credential set. That isn’t of much value to an attacker attempting to identify valid credentials!

Contrast this scenario with one in which an attacker gains access to a correctly salted, hashed, and/or encrypted row of a password database. With enough time and compute power (made much cheaper thanks to Bitcoin and its need for inexpensive SHA256 hashes), it is relatively straightforward to decrypt the subset of passwords via brute force.

How “sure” can a Bloom filter be?

As a probabilistic structure, there is error inherent in the Bloom filter as a storage medium. It is possible for an item to be identified as a member of the set when it was not added, if all of the hash indices return values that were set by some other member of the set. The likelihood of this “false positive” determination being correct is a function of the size of the bit array, the number of items stored in the array, and the number of hashed performed per item. However, the datastore can be sized such that the desired level of precision is maintained, even when the datastore reaches saturation.

The likelihood that the Blackfish Bloom filter implementation will produce a false positive result is less than one in a million.

Should every organization be using Bloom filters to store passwords?

Bloom filters are a fantastic solution for secure storage of passwords when checking for password reuse; but, because of the potential for false-positives, however small, they are ill-suited for credential validation.

Blooms filters work well for applications where the consequences of a false positive determination are small. For example, Google’s Chrome web browser uses a Bloom filter as a first level screen of suspicious URLs, and positive results are subjected to a second level test to confirm the issue before a warning is issued to the user.

In Blackfish’s use case, in the one in a million chance of a false positive, an enterprise falsely believes that a user’s password is compromised and takes an appropriate action. If an enterprise were to use a Bloom filter for their own password storage that would be used to authenticate users, a false positive could mean allowing a non-authorized user access to someone else’s account.

Care to learn more? Visit shapesecurity.com/blackfish or contact blackfish@shapesecurity.com to set up a demo. 

*** This is a Security Bloggers Network syndicated blog from Shape Security Blog authored by Shape Security. Read the original post at: https://blog.shapesecurity.com/2018/09/26/look-ma-no-passwords-how-why-blackfish-uses-bloom-filters/