Today, I came across a bloated malware sample (292 Mb) full of encoded strings being distributed in Brazil through compromised WordPress websites as fake Java Updates.
One common first step while analyzing malware samples is trying to decode its strings. Although we cannot rely solely on it to make our assumptions, they may take us to significant findings.
After digging into the decryption algorithm used to decode malware strings in runtime, it was possible to write a simple Python script to reverse them statically. This way, I ended up with a list of decoded strings. They gave me many insights about the malware intentions, but it would be better to have them back to the malware while analyzing it statically or dynamically.
Studying the subject, I learned that Floss , a tool developed by FireEye to automatically extract obfuscated strings from malware, can generate scripts to annotate Radare2 and IDA Pro databases with the decoded strings as comments by the assembly code. It was exactly what I was looking for, but, unfortunately, Floss wasn’t able to decode the binary I was analyzing.
So, I decided to go one step further in my decoding Python script to make it generating annotation scripts. Its implementation is straightforward and requires improvements, but it helped me attaching the decoded strings to the malware while statically and dynamically analyzing using Radare2 and x64dbg respectively.
In today’s malware sample, the strings were encrypted with XOR algorithm. XOR algorithms are frequently used in malware and are not so difficult to reverse using brute force with tools like XORSearch and XORStrings  depending on the encryption key size. Bigger key sizes, as in my case, are hard to brute-force and require code analysis as seen in Figure 1.
Figure 1 – Decoding function disassembly
Now that we understood the decoding algorithm and know the key, it is time to decode the strings. To this end, a simple Python script was implemented, as shown in Figure 2.
Figure 2 – Decoding function in Python
So, giving it one of the encoded strings and the key, it returns the decoded string.
Figure 3 – Decoding sample string
Once we have the decoding script working, it’s time decode all the malware embedded strings. To this end, it was necessary to extract all the binary strings and select all of those who matched to the encryption pattern. In Figure 4 it is possible to see part of the string list and its respective offsets.
Figure 4 – Encoded string list
The next step was to make the decoding script parse this file and generate the outputs. Note that the offset is necessary to further attach the decoded strings to the code.
In Figure 5 is shown the code snipped in charge of the parsing and output file generation.
Figure 5 – Generating Radare2 and x64dbg scripts
In Figure 6 we have part of the output for the x64dbg script.
Figure 6 – X64dbg script
Attaching strings back to the code
Now, it is time to run generated scripts. In Radare2 (Cutter), select and run the script by the menu File -> Run Script. In x64dbg, select and run the script while debugging the malware using the tab “Script.”
In both, the decoded strings will be annotated as comments by the assembly code which references the original encoded strings, as shown in Figures 7 and 8.
Figure 7 – Decoded strings in Radare2
Figure 8 – Decoded strings in x64dbg
As described, this was the way I found to make it easier to make sense of decoded strings while analyzing statically and dynamically this malware. I hope it can be helpful for other analysts. If you know or use different ways to this, share with us!