Developing YARA Rules Based on Byte Patterns: ROMCOM
YARA is an important tool for any aspiring threat intel analyst or reverse engineer, whether for detecting code reuse among different families, identifying samples utilising a certain technique, or even tracking the development of recently discovered malware.
While using simple string patterns for rules can be an efficient method for quickly building detections, it is limited should the malware developers choose to obfuscate any embedded strings through string encryption, or even storing them as stack strings – if you’re developing rules to track development of a certain family over time, it is likely the format of the strings will change over time, rendering the rules null and void. At this point, you’ll want to turn to byte patterns for detection over time.
In this situation, we have a sample of ROMCOM, which is a pretty interesting backdoor that offers comms over ICMP, meaning all traffic is sent and received through ICMP packets, rather than the typical HTTP(S) communication. The origins of ROMCOM are fairly recent, but usage of the tool has been on the rise. In its early stage, ROMCOM stored all strings in cleartext, though over time the authors moved to a more obfuscated codebase, whereby the majority of strings were encrypted with a series of XOR operations.
Setup for call to string decryption function
As you may have guessed, this hindered detection of any new samples, and so it was imperative that the YARA rule be re-written in such a way that it would rely on a mixture of byte patterns and strings within the samples to maximize the detection rate. As mentioned, the string encryption method was quite unique, being composed of a series of XOR operations performed on the encrypted strings, which were stored as stack strings. As this appeared to be custom (e.g., not RC4, AES, or any other recognized algorithm), this was the ideal location to start analyzing for any easily identifiable byte patterns.
String decryption function
In order to develop an effective YARA rule based on byte patterns, it is important that you have multiple samples linked to the same family, to firstly identify commonalities between the functions, but also see which assembly instructions, registers, and values might change across the different compilations.
Examining two different samples of ROMCOM (both with string encryption), we can see they are identical in structure:
Side-by-side of 2 different ROMCOM string decryption functions
Unfortunately there are multiple string decryption functions throughout the samples, with differing structures, so relying on the pattern of XORs seen above might not get us far with finding other samples.
Side-by-side of 2 different ROMCOM string decryption function entry points
Now at this point I would take a few different byte patterns from the samples, and run quick searches on VirusTotal to see if the pattern is shared by other ROMCOM samples, as well as any clearly benign samples to narrow down any false positives. This can be quite difficult if you don’t have access to VirusTotal, though at that point I would recommend downloading as many samples linked to the certain family (presuming it doesn’t have a huge sample size such as Emotet or IcedID), structure the byte patterns as YARA rules, and test it against all downloaded samples. If you get a 100% match rate you can be confident that the rule will detect any further samples, but you should also add in a few string-based conditions too, to limit the amount of possible false positives you might receive.
In the case of ROMCOM, we’ve got a few different patterns to choose from, such as:
Block seen inside main decryption loop
Snippet seen at entry point of function
Running a quick search on VT with both of these byte patterns brings up over 19 files, however the majority are tagged as ROMCOM, with a few false positives easily removed with some basic filtering for detections.
VT Query
Putting these into a quick and easy rule, along with some cleartext strings, we can get the following:
rule romcom_example :
{
meta:
description = "Basic ROMCOM Rule"
author = "0ffset Training Solutions"
strings:
$s1 = "comDll.dll" nocase
$s2 = "oleaut32." nocase
$s3 = "IcmpCreateFile" nocase
$b1 = {48 89 50 e0 83 60 dc 00 48 83 22 00 48 83 62 10}
$b2 = {48 8d 41 01 48 89 43 10 48 8b c3 48 83 7b 18 10 72 03 48 8b 03 44 88 0c 08}
condition:
2 of ($s*) and 1 of ($b*)
}
And that’s about it for detection! While string-based rules can be effective, they can easily be bypassed through simple string obfuscation, and so being able to leverage custom functionality (whether algorithms, a certain API loading routine, or even unique constants) can give you the edge against the threat actors and allow you to detect samples regardless of string obfuscation or compiler modifications.