Looking for sophisticated malware in IoT devices

One of the motivations for this post is to encourage other researchers who are interested in this topic to join in, to share ideas and knowledge and to help build more capabilities in order to better protect our smart devices.

Research background

Smart watches, smart home devices and even smart cars – as more and more connected devices join the IoT ecosystem, the importance of ensuring their security becomes patently obvious.

It’s widely known that the smart devices which are now inseparable parts of our lives are not very secure against cyberattacks. Malware targeting IoT devices has been around for more than a decade. Hydra, the first known router malware that operated automatically, appeared in 2008 in the form of an open-source tool. Hydra was an open-source prototype of router malware. Soon after Hydra, in-the-wild malware was also found targeting network devices. Since then, different botnet families have emerged and become widespread, including families such as Mirai, Hajime and Gafgyt.

Apart from the malware mentioned above, there are also vulnerabilities found in communication protocols used in IoT devices, such as Zigbee, which can be exploited by an attacker to target a device and to propagate malware to other devices in a network, similar to computer worms.

In this research, we are focusing on hunting low-level sophisticated attacks targeting IoT devices and, in particular, taking a closer look at the firmware of IoT devices to find backdoor implants, modifications to the boot process and other malicious alterations to different parts of the firmware.

Now, let’s talk about the structure of the firmware of an IoT device in order to get a better understanding of the different components.

IoT firmware structure

Regardless of the CPU architecture of an IoT device, the boot process consists of the following stages: the boot loader, the kernel and the file system (shown in the figure below). When an IoT device is switched on, the code from the onboard SoC (System on Chip) ROM transfers control to the bootloader, the bootloader loads the kernel and kernel then mounts the root file system.

The boot loader, the kernel and the file system also comprise the three main components of typical IoT firmware.

IoT boot process

There are a variety of CPU architectures used in IoT devices. Therefore, being able to analyze and understand the different components of firmware requires a good understanding of these architectures and also their instruction set. The most common CPU architectures among IoT devices are:

  • ARM
  • MIPS
  • PowerPC
  • SPARC

Possible attack scenarios

Understanding the firmware structure enables us to think about how an attacker might take advantage of the various components when deploying a stealth attack that’s difficult to detect.

The bootloader is the first component that takes control of the system. Therefore, targeting the bootloader offers an attacker a perfect opportunity to carry out malicious tasks. It also means that an attack can remain persistent after a reboot.

An attacker can also manipulate the kernel modules. The majority of IoT devices use the Linux kernel. As easy as it is for a developer to customize and choose whatever they need from the Linux kernel, an attacker who manages to access and manipulate the device firmware can also add or edit kernel modules.

Moving on to the file system, there are also a number of common file systems used in IoT devices. These file systems are usually easy to work with. An attacker can extract, decompress and also mount the original file system from the firmware, add malicious modules and compress it again using common utilities. For instance, SquashFS is a compressed file system for Linux that is quite common among IoT manufacturers. It’s very straightforward to mount or uncompress a SquashFS file system using the Linux utilities “squashfs” and “unsquashfs”.

Challenges of this research

Obtaining firmware

There are different ways to obtain firmware. When deciding to investigate, sometimes you want the acquired firmware to belong to the exact same device with the same specifications; and you also want it to be deployed on the device through some specific means. For example, you suspect that the network through which the firmware is updated has been compromised and you consider the possibility of the firmware being manipulated in transition between the vendor’s server and the device, hence you want to investigate the updated firmware to validate its integrity. In another example scenario, you might have bought a device from a third-party vendor and have doubts about the firmware’s authenticity.

There are also a large number of IoT devices where the manufacturers don’t implement any ways to get access to the firmware, not even for an update. The device is released from the manufacturer with firmware for its lifetime.

In such cases the surest way to obtain the exact firmware you are after, is to extract the firmware from the device itself.

The main challenge here is that this process requires a certain domain-specific knowledge and also specialist hardware/software experience of working with embedded systems. This approach also lacks scalability if you want to find sophisticated attacks targeting IoT devices in general.

Among the various ways of obtaining IoT firmware, the easiest way is to download the firmware from the device manufacturer’s website. However, not all manufacturers publish their firmware on their website. In general, a large number of IoT devices can only be updated through the device physical interface or via a specific software application (e.g. mobile app) used to manage the device.

When downloading firmware from a vendor’s website, a common issue is that you might not be able to find older versions of the firmware for your specific device model. Let’s also not forget that in many cases the published firmware binaries are encrypted and can only be decrypted through the older firmware modules installed on the device.

Understanding firmware

According to Wikipedia, “firmware is a specific class of computer software that provides the low-level control for a device’s specific hardware. Firmware can either provide a standardized operating environment for more complex device software (allowing more hardware-independence), or, for less complex devices, act as the device’s complete operating system, performing all control, monitoring and data manipulation functions.”

Even though the main components of firmware are almost always the same, there is no standard architecture for firmware.

The main components of firmware are typically the bootloader, the kernel module and the file system; but there are many other components that can be found in a firmware binary, such as the device tree, the digital certificates, and other device specific resources and components.

Once the firmware binary has been retrieved from the vendor’s website, we can then begin analyzing it and taking it apart. Given the specialized nature of the firmware, its analysis is very challenging and rather involved. To get some more details about these challenges and how to tackle them, refer to the “IoT firmware analysis” section.

Finding suspicious elements in firmware

After the components of the firmware have been extracted, you can start to look for suspicious modules, code snippets or any sort of malicious modifications to the components.

An easy step to start with, is to scan the file system contents against a set of YARA rules which can be based on known IoT malware or heuristic rules. You can also scan the extracted file system contents with an antivirus scanner.

Something else you can do is look for the startup scripts inside the file system. These scripts contain lists of modules that get loaded every time the system boots up. The address to a malicious module might have been inserted in a script like this with malicious intent.

Here the Firmwalker tool can help with scanning an extracted file system for potentially vulnerable files.

.

Firmwalker capabilities (https://craigsmith.net/firmwalker/)

Another place to investigate is the bootloader component, though this is more challenging.

There are a number of common bootloaders used in IoT devices with U Boot being the most common. U Boot is highly customizable, which makes it very difficult to determine whether the compiled code has been manipulated or not. Finding malicious modifications becomes even more complicated with uncommon or custom bootloaders.

IoT firmware analysis

There are a variety of open-source and closed-source tools that can help with firmware analysis. The best approach is to use a combination of the tools and techniques suggested by experienced firmware analysts.

Let’s begin with Binwalk, the most comprehensive firmware analysis tool. Binwalk scans the firmware binary and looks for known patterns and signatures.

It has a large collection of signatures for various bootloaders and file systems used in IoT devices. It also has signatures for common encryption and compression algorithms along with the respective routines for decompression and decoding.

Binwalk is also capable of extracting the components it finds in the firmware binary.

The following screenshot shows the output of a Binwalk scan on a sample firmware binary:

Binwalk scan output

In this screenshot, Binwalk has found and printed out the header, the bootloader and the Linux kernel as well as the file system. There are also metadata details that have been extracted from the headers and the components themselves, such as the type and size of each component, CRC checksums, important addresses, CPU architecture, image name and so on. Now you can go on and use Binwalk itself to extract the above-mentioned parts, or manually calculate the sizes and extract the parts based on the start offset found by Binwalk.

After extracting the components of the firmware, you can go on and extract, decompress or even mount the file system and start investigating the file system content. You can also look at the bootloader code in a disassembler, or debug it through a debugger.

However, doing firmware analysis is not always that straightforward. Firmware is so varied and diverse that understanding its structure and extracting the components is usually quite complicated.

Let’s take a close look at another sample firmware and try to understand its structure.

1. Binwalk firmware.bin

The Binwalk scan shows nothing in the result. This means that Binwalk could not find any known signatures.

Binwalk scan output

We can see in this case that the simple Binwalk scan was not very helpful. However, be aware that there are other tools and techniques we can use to learn more about the structure of this firmware.

2. File firmware.bin

Let’s next try the Linux file utility on the firmware binary.

File utility output

The file utility shows the file type as Targa image data. By looking at the beginning of the binary file, and doing a Google search on the Targa image data signature, the result is obviously a false positive.

First bytes of the firmware binary

This is because the first bytes of the firmware file, 0x01010000, match the Targa image data signature. See the screenshot above.

3. Binwalk -E firmware.bin

Let’s use another capability of Binwalk and check the entropy of the firmware binary.

Running Binwalk using the “-E” command option gives an entropy diagram for the firmware file and some additional details such as the offset for falling and rising entropy.

Entropy details

Entropy diagram

Entropy figures close to 1 indicate compression, while the lower entropy figures indicate uncompressed and unencrypted areas. As can be seen from the screenshots above, the offset 55296 (0xD800) is the beginning of the high entropy part.

There is also another tool that can be helpful in visualizing the binary. With the help of binvis.io you can see the contents of the firmware file and its visualization in two side-by-side panes. Different parts are shown in different colors based on their entropy. (binvis.io)

Visualization of the firmware created by binvis.io

4. Binwalk -A firmware.bin

Binwalk can also scan the binary file for common executable opcode signatures.

First function prologues found in the file

Last function prologues found in the file

As we can see from the screenshot above, the result of the opcode signature check is actually very helpful! First, we can see that the firmware belongs to an ARM device.

Second, if we consider the offsets of the first and last function prologue signatures, we get an indication that these are the sections of the firmware binary that contain code.

From the screenshot, we can also see that the last function is found at the address 0xD600, which is just 0x200 bytes before the part where the entropy goes up. From this, we can make an educated guess that this offset is likely the end of the code of the bootloader and the beginning of the compressed kernel modules.

5. Hexdump -C

hexdump -C firmware.bin | grep -C 4 -e “^\*$”

Now that we know the rough boundaries of some of the components of the firmware file, we can try to confirm these boundary offsets by looking at the actual contents of the firmware file around these areas.

If we run the firmware file through a hexdump, and look for lines that contain only an asterisk “*”, we can locate the compiler-added padding for each of the firmware components.

Contents of the firmware binary

Contents of other parts of the firmware binary

The output of the Hexdump utility, together with the previous findings, confirm the section of the firmware binary containing ARM code. We previously suspected that this code belongs to the bootloader.

6. Strings –radix=x firmware.bin

Next, let’s extract the ASCII strings from the firmware together with their offsets.

Last ASCII strings found in the firmware binary

Looking at the screenshot above, there are some strings related to the module entry point. These strings can give us a good indication of the nature of the code involved.

We can see some other interesting strings from the beginning of the firmware binary in the screenshot below. For example, the “MctlApplet.cpp” library name can be used to find other binaries or packages from the same developers. Having other firmware images from the same vendor helps to better understand the binary structure.

Another interesting string from the same screenshot is “Not Booting from softloader” which can indicate the process state or perhaps the nature of this module.

Strings containing “Assert()” can suggest different information about the code. Using Asserts is a common practice in firmware development, as it helps the developer to debug and troubleshoot the code during the development and production phase.

First ASCII strings found in the firmware binary

7. IDA -parm firmware.bin

We can see that we have already collected lots of valuable information from this firmware binary that seemed quite incomprehensible at the beginning.

Let’s now use IDA to inspect the code. As this binary is not an ELF file with standard headers that show the ISA, we need to explicitly tell IDA to use the ARM instruction set to disassemble the code.

Disassembly view of part of a function in IDA

The above screenshot from IDA shows how the strings found in the previous analysis steps can be used to help find the call to the entry point of the kernel module.

8. dd

We can now go ahead and extract the part of the firmware binary which our analysis found to be the bootloader module.

9. Qemu

After all the modules have been extracted from the firmware binary – the file system content, the kernel modules and other components – we can then use Qemu to run the binaries, and even emulate the files that were meant for a different architecture from our own machine, and start interacting with them.

Conclusion

The number of IoT devices is getting bigger and bigger every day. From industrial control systems, smart cities and cars to consumer-grade devices such as mobile phones, networking devices, personal assistants, smart watches and a large variety of smart home appliances.

IoT devices are derived from embedded systems that have been around for many years. The manufacture and development of software for embedded devices has always had different priorities from those of general-purpose computer systems due to the different nature of these devices. These priorities have been shaped by the limited and specific functions of the devices themselves, the limited capabilities and capacities of the underlying hardware as well as the inaccessibility of the developed code to subsequent alteration and modifications. However, IoT devices have significant differences to traditional embedded systems. Most IoT devices nowadays run on hardware that have similar capabilities to a general-purpose computer system.

As IoT devices become more prevalent, they are now accessing and controlling many aspects of our lives and day-to-day interactions. IoT devices can now potentially give malicious actors unprecedented opportunities to do harm. This highlights the importance of security in IoT devices and also shows the relevance of research around this topic. The good news is that there are many tools and techniques available to assist current and future research in this field. Acquiring a good understanding of the architecture of IoT devices, learning the language these devices speak and a good dose of determination and perseverance are what it takes to enter this research field.

This post has been written primarily to motivate individuals who want to start diving into IoT security research. You can reach out to us regarding this research at iot_firmware_research@kaspersky.com or via my twitter account, @Noushinshbb.

We’ll be publishing more in the future! Stay tuned!