IoT devices have serious security deficiencies due to bad random number generation

The confidentiality and integrity assurances of modern communication protocols rely on algorithms that generate secret tokens that attackers cannot guess. These are used for authentication, encryption, access control and many other aspects of modern security and they all require cryptographically secure random numbers — sequences of numbers or symbols that are chosen in a way that’s unpredictable by an attacker.

All cryptographic algorithms that involve some sort of secure key or token generation need to be seeded with random numbers so the process by which these numbers are chosen, known as random number generators (RNGs), is the foundation of many security systems and features. While modern operating systems and computers have long had secure RNG worked out, IoT devices, especially resource-constrained ones without many interfaces that run more simple operating systems, have long had an issue with finding high sources of randomness, also known as entropy.

In recent years, systems-on-a-chip (SoC) vendors have tried to solve this problem by incorporating peripheral controllers in their products that are designed to generate random numbers that the OS can then safely use. However, a team of researchers from security firm Bishop Fox who analyzed the way IoT developers use these hardware RNGs found major implementation issues that compromise the security of their systems. They recently presented their findings at the DEF CON security conference.

“How you use the peripheral is critically important, and the current state of the art in IoT can only be aptly described as ‘doing it wrong’,” the researchers concluded.

Hardware RNG vs software RNG

Software-based RNGs are also known as pseudo-random number generators (PRNGs) because they use deterministic software algorithms and they need to be seeded with random values, usually from an entropy pool that combines random values from different sources: network information, radio interfaces, various timing values, etc. That’s not the end of it. General-purpose PRNGs are used for non-security purposes and cryptographically secure PRNGs (CSPRNGs) are designed to offer security guarantees against attacks that attempt to guess the seed values or predict their output in a reasonable and computationally feasible amount of time.

CSPRNGs employ cryptographic ciphers and hashing functions to produce output that is then used for critical operations like key generation. They also perform tests to prevent known attacks. All these operations consume computing resources and memory, both of which are highly limited on IoT devices, which is why SoC vendors added hardware RNG capabilities.

Hardware RNGs are also known as true RNGs (TRNGs) because they generate random numbers from physical processes in a way that cannot be predicted like in the case of the software algorithms. The output of TRNGs is supposed to be safe to use directly, but as the Bishop Fox researchers found, interfaces with them can be tricky and mistakes have devastating consequences for the security systems built on top of them.

No checking for hardware RNG errors

SoC vendors allow developers to interact with their hardware RNG through the hardware abstraction layer (HAL) API by calling different functions in their C code. The names of these RNG functions can differ among vendors, but their output is a pointer to the random number (32-bit integer) and, importantly, a return value that indicates potential errors.

“The HAL function to the RNG peripheral can fail for a variety of reasons, but by far the most common (and exploitable) is that the device has run out of entropy,” the researchers explained in a report. “Hardware RNG peripherals pull entropy out of the universe through a variety of means (such as analog sensors or EMF readings) but don’t have it in infinite supply. They’re only capable of producing so many random bits per second. If you try calling the RNG HAL function when it doesn’t have any random numbers to give you, it will fail and return an error code. Thus, if the device tries to get too many random numbers too quickly, the calls will begin to fail.”

This can happen quite often during the generation of large keys. For example, during the generation of a 2048-bit private key, the device will call the RNG hall function repeatedly in a loop and the hardware will often fail to keep up due to its limitations and will start outputting errors.

The problem is that based on code found in GitHub for interacting with SoCs and even the RNG handling code in popular embedded OSes like FreeRTOS, developers don’t seem to be checking for hardware RNG errors. “This is just how the IoT industry does it,” the Bishop Fox researchers said. “You’ll find this behavior across basically every SDK and IoT OS.”

This is a major problem, because depending on the hardware, when the HAL RNG function fails, the output can be partial entropy, the number 0 or uninitialized memory. None of these outputs are safe for cryptographic purposes.

In 2019, researchers from Keyfactor analyzed 75 million digital certificates based on RSA keys and found that one in every 172 was vulnerable to attacks that could recover their secret keys. The culprit was poor random number generation, and most of the vulnerable certificates were found in IoT and embedded networking devices like routers, switches and firewalls. If the prime numbers used to generate RSA public keys are not random enough, two separate keys will likely share a factor. Then it’s extremely easy to recover their other factors and compromise them.

“While we can’t say for sure that our research is responsible for those results… widespread instances of weak RSA keys in IoT devices is exactly what you’d expect to find,” the Bishop Fox researchers said about their findings on how IoTs handle hardware RNGs. “It sure seems like this is an exploitable large-scale issue in practice, not just in theory.”

Forced bad implementations

The researchers warn not to rush to blame IoT software developers for poor implementations because handling these errors in a proper way has other downsides that could impact the functioning of the device. “If you’re the networking stack in the middle of generating a crypto key for secure communications, how are you supposed to ‘handle’ the error? There are really only two options: Abort, killing the entire process, or spin loop on the HAL function for an indefinite amount of time until the call completes, blocking all other processes and using 100% CPU in the process. Neither are acceptable solutions. This is why developers ignore the error condition. The alternatives are terrible and the ecosystem around RNG hardware has done them no favors.”

On top of that, few devices come with proper documentation on how the RNG is supposed to work and how to handle errors or avoid them, missing information about expected operating speed, safe operating temperature ranges or statistical evidence of randomness. Even when documentation exists, getting it right is no easy task.

Hardware RNG output alone is not reliable

The Bishop Fox researchers found many other problems with trying to rely only the hardware RNG as the source of entropy in IoT. For example, in the absence of a true CSPRNG, many SDKs and IoT operating systems that support hardware RNGs use it to seed an insecure PRNG such as libc rand() and then its output is used for security purposes when it shouldn’t.

“PRNGs such as libc rand() are wildly insecure since the numbers they produce reveal information about the internal state of the RNG,” the researchers said. “They’re fine for non-security-relevant contexts because they’re fast and easy to implement. But using them for things like encryption keys leads to catastrophic collapse of the device’s security, as all the numbers are predictable.”

In one test on an LPC54628 microcontroller, the researchers noticed that the random numbers produced by the hardware RNG were of poor quality and suspected that something was wrong with their code. Digging deeper in the manufacturer’s documentation they found a recommendation against concatenating 32-bit numbers outputted by the RNG to construct larger numbers. For example, to obtain a 128-bit random number, developers shouldn’t concatenate four consecutive RNG 32-bit outputs, the manufacturer said. Instead, they should use one 32-bit number, discard the next 32 numbers, then use the next one, then discard another 32 numbers, and so on.

This API behavior is so odd and counterintuitive that it’s almost guaranteed to produce vulnerable implementations. Even if developers would come across the correct code that discards 32 numbers, they might remove it because they think it’s unnecessary, the researchers said.

Furthermore, some hardware RNG implementations tested by the researchers failed statistical analysis tests or had clear patterns in the relative distribution of the bytes they produced. So, even if developers manage to overcome all the hurdles of proper error handling, API quirks, poor documentation, they still shouldn’t rely on the output of hardware RNGs alone.

The answer is having proper CSPRNGs for IoTs where the output of hardware RNGs is just one of the sources in the entropy pool. All major operating systems like Linux, Windows, macOS, Android, iOS, BSD have CSPRNG subsystems that are error-proof and application developers can easily use without interfacing directly with hardware and worrying that their code is not correct or that the random numbers are insecure.

“The core vulnerability here doesn’t lie in a single device’s SDK or in any particular SoC implementation,” the researchers said. “The IoT needs a CSPRNG subsystem. This issue can’t be fixed by just changing the documentation and blaming users. The most elegant place for such a CSPRNG subsystem is in one of the increasingly popular IoT operating systems. If you’re designing a new device from scratch, we’d recommend implementing a CSPRNG in an operating system.”