Contents

Hunting Opaque Predicates with YARA

Introduction

Malware tends to obfuscate itself using many different techniques from opaque predicates, garbage code, control flow manipulation with the stack and more. These techniques definitely make analysis more challening for reverse engineers. However, from a detection and hunting standpoint to find interesting samples to reverse engineer we can leverage our knowlege of these techniques to hunt for obfuscated code. In our case today, we will be developing a yara signature to hunt for one specific technique of opaque predicates, there are many variations and situations where this does not match and should only serve as a hunting signatures as more heuristic and programitic approaches for this are better for detection. With the limitations in mind, we must first understand what opaque predicates are.

In computer programming, an opaque predicate is a predicate, an expression that evaluates to either “true” or “false”, for which the outcome is known by the programmer a priori, but which, for a variety of reasons, still needs to be evaluated at run time. - Wikipedia

As the quote from Wikipedia states, opaque predicates are conditional checks in programming where the outcome is always predetermined. To understand this concept better let’s have a look at an example in C (Figure 1).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
// Compile: gcc -O0 op.c -o op
// Run    : ./op

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv){
	int a = 0;
	int b = 1;
	if (a != b){
		printf("foo\n");
	}
	printf("bar\n");
	return 0;
}

Figure 1. Opaque Predicate in C

In this case, a will never be equal to b, meaning, foo will always be printed to the console. Additionally, this should also be represented in the assembly code because no optimizations are performed by the compiler because of the argument -O0 being passed to gcc. Analysis of this code can be challenging as basic blocks are created and these conditions can be nested making a mess in our disassemblers and decompilers.

We can confirm this by performing the following operations.

1
2
3
4
5
6
7
gcc -O0 op.c -o op
objdump -Mintel --disassemble=main op
# 115c:	c7 45 f8 00 00 00 00 	mov    DWORD PTR [rbp-0x8],0x0
# 1163:	c7 45 fc 01 00 00 00 	mov    DWORD PTR [rbp-0x4],0x1
# 116a:	8b 45 f8             	mov    eax,DWORD PTR [rbp-0x8]
# 116d:	3b 45 fc             	cmp    eax,DWORD PTR [rbp-0x4]
# 1170:	74 0f                	je     1181 <main+0x38>

Now that we have the basic concept understood in C, let’s have a look at simpler example in assembly (Figure 2).

1
2
3
4
5
6
0:  b8 41 41 41 41          mov    eax,0x41414141
5:  bb 42 42 42 42          mov    ebx,0x42424242
a:  39 d8                   cmp    eax,ebx
c:  75 00                   jnz    e <test>
0000000e <test>:
e:  90                      nop

Figure 2. Opaque Predicate in Assembly

In this case, we have a combination of two mov instructions moving 32-bit immutable values into 32-bit registers, followed by a cmp instruction checking both the involved registers, and finally a conditional jnz instruction.

Signature Development

Before we can create a yara hunting signature, we need to create the wild carded hex string to match the bytes correctly. To accomplish this, we need to know about x86 instruction encoding for moving immutable values into registers (Table 1).

OpcodeMnemonicDescription
B0+ rbMOV r8,imm8Move imm8 to r8
B8+ rwMOV r16,imm16Move imm16 to r16
B8+ rdMOV r32,imm32Move imm32 to r32

Table 1. Instruction Encoding for mov reg,imm

This means we can detect each mov instruction with the yara search pattern b? ?? ?? ?? ??.

Next, we need to identify the cmp instruction, for this we can use the yara search pattern 39 ?? ??. Finally, we use the instruction encodings for conditional jump instructions (7?|e3). A reference to instruction encoding for x86 conditional jumps can be found here.

With all these considerations we can make the yara signature in Figure 3.

1
2
3
4
5
6
rule op {
    strings:
        $match = {b? ?? ?? ?? ?? b? ?? ?? ?? ?? 39 ?? (7?|e3)}
    condition:
		all of them
}

Figure 3. Opaque Predicate yara Signature

So we can dust our hands off and say we are done right? 🤔

Wrong! 😲

We need to verify the registers in from the mov instructions with the immutable values are used in the cmp instruction. Otherwise, there would be no context for the cmp instruction, as yara does not care at this point what registers are part of the compare only that it is simply a cmp instruction. To accomplish this we need to add logic to the condition.

First we must iterate all the matches of our $match string. To accomplish this, we can use a for loop in our condition based on the number of matches (#match) (Figure 4).

1
2
3
4
5
6
7
8
rule op {
    strings:
        $match = {b? ?? ?? ?? ?? b? ?? ?? ?? ?? 39 ?? (7?|e3)}
    condition:
        for any i in (0..#match):(
			// Additional Logic Here
		)
}

Figure 4 Loop Checking Matches for match

Next, using the for loop we can get each opcode we ae interested in using uint8(), address of matches @match with bitwise operations to extract and compare the registers in the encoded bytes.

In this case, the first two bits of the mov opcodes are the mod/rm bits, and the last three bits indicate what register is being used.

01334567
MODMODREGREGREGR/MR/MR/M

Table 2. Mod R/M Byte Encoding

REG Value81632
000alaxeax
001clcxecx
010dldxedx
011blbxebx
100ahspesp
101chbpebp
110dhsiesi
111bhdiedi

Table 3. Register Encoding

Now that we have the encodings, we know the first mov instruction involves eax, so and the byte is 0xb8, which in binary is 10 111 000 indicating the last three bits are the values we are interested in. We can single these bits out using a simple mask of 00 000 111 , which is equivelent to 0x7. Thus, 0xb8 & 0x7 will collect the value representing eax. This method will also work for our second mov instruction mathematicly decoding our second register of ebx.

Next, we need to decode the registers from the cmp instruction. The operands of the cmp instruction are encoded as 0xd8, which in binary is represented by 11 011 000. To get the first 3 bits, we first need to shift the bits right by 3, then perform masking to avoid trailing bits. This can be performed with the operation 0xd8 >> 3 & 0x7, which results in 011 or ebx. For the next operand we can simply perform the masking operation 0xd7 & 0x7, which results in 000 or eax.

Now that we have the bitwise operations to decode the respective operands, we can implement them in our condition (Figure 5).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
rule op {
    strings:
        $match = {b? ?? ?? ?? ?? b? ?? ?? ?? ?? 39 ?? (7?|e3)}
    condition:
        for any i in (0..#match):(
            (
                uint8(@match[i]) & 0x7 == uint8(@match[i]+11) & 0x7 and
                uint8(@match[i]+5) & 0x7 == uint8(@match[i]+11) >> 3 & 0x7
            )
        )
}

Figure 5. Opaque Predicate yara Signature

That was exhausting 😫, surely we are done right? 🤔

Wrong! 😲

We need to additionally check for the operands in reverse order, because the register options can swap positions. However, we can just add an or to our condition to make this work (Figure 6).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
rule op {
    strings:
        $match = {b? ?? ?? ?? ?? b? ?? ?? ?? ?? 39 ?? (7?|e3)}
    condition:
        for any i in (0..#match):(
            (
                uint8(@match[i]) & 0x7 == uint8(@match[i]+11) & 0x7 and
                uint8(@match[i]+5) & 0x7 == uint8(@match[i]+11) >> 3 & 0x7
            ) or
            (
                uint8(@match[i]) & 0x7 == uint8(@match[i]+11) >> 3 & 0x7 and
                uint8(@match[i]+5) & 0x7 == uint8(@match[i]+11) & 0x7
            )
        )
}

Figure 6. Opaque Predicate yara Signature

Next, we can reduce the potential for false positives by scanning only executable sections in an executable. This can be applied to any executable format. However, for the sake of simplicity we will focus on the PE file format using the pe module in yara. To accomplish this, we iterate over each section checking of the characteristics has the pe.SECTION_MEM_EXECUTE flag and that the address of the matched bytes is with the section offsets. We also should check to be sure it is a 32-bit executable, as decoding instructions on different architecures would be pointless and yield more false positives. Finally, we should also check our mod r/m bits for the mov and cmp instructions to validate they are the opcodes we are looking for that work with immutables and registers. Once completed, we have a signature we can use for hunting provided in Figure 7.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
import "pe"
import "dotnet"

rule op {
    meta:
        author      = "@c3rb3ru5d3d53c"
        description = "Potential 32-bit Immutable Opaque Predicates"
        reference   = "https://c3rb3ru5d3d53c.github.io/2023/02/opaque-predicate-hunting-with-yara.en.md/"
        tlp         = "white"
    strings:
	    // mov    <regiser>,<32_bit_imm>
		// mov    <register>,<32_bit_imm>
		// cmp    <regiser>,<register>
		// <conditional_jump>
        $match = {b? ?? ?? ?? ?? b? ?? ?? ?? ?? 39 ?? (7?|e3)}
    condition:
        uint16(0) == 0x5a4d and
        uint32(uint32(0x3c)) == 0x00004550 and
        pe.is_32bit() and not dotnet.is_dotnet and
        for any i in (0..#match):(
            for any j in (0..pe.number_of_sections):(
                pe.sections[j].characteristics & pe.SECTION_MEM_EXECUTE and
                @match[i] >= pe.sections[j].raw_data_offset and
                @match[i] < pe.sections[j].raw_data_offset + pe.sections[j].raw_data_size - 15
            ) and
            (
                uint8(@match[i]) >> 3 == 0x17 and   // Check mod r/m for mov
                uint8(@match[i]+5) >> 3 == 0x17 and // Check mod r/m for mov
                uint8(@match[i]+11) >> 6 == 0x3     // Check mod r/m for cmp
            ) and
            (
                (   // Check first mov instruction operand against first cmp operand
                    // Check second mov instruction operand against second cmd operand
                    uint8(@match[i]) & 0x7 == uint8(@match[i]+11) & 0x7 and
                    uint8(@match[i]+5) & 0x7 == uint8(@match[i]+11) >> 3 & 0x7
                ) or
                (
                    // Check first mov instruction operand against second cmp operand
                    // Check second mov instruction operand against first cmp operand
                    uint8(@match[i]) & 0x7 == uint8(@match[i]+11) >> 3 & 0x7 and
                    uint8(@match[i]+5) & 0x7 == uint8(@match[i]+11) & 0x7
                )
            )
        )
}

FIgure 7. Immutable 32-bit Opaque Predicate yara Signature for PE Files

Limitations

Because x86 is turring complete, we are limited by the halting problem.

In computability theory, the halting problem is the problem of determining, from a description of an arbitrary computer program and an input, whether the program will finish running, or continue to run forever. - Wikipedia

This means that given any program with an input we cannot prove if the program will finish running or continue to execute forever. More in the context of our application, given any input, we cannot detect every possible combination of opaque predicates. It is undecidable for which no one algorithm has been proven to solve.

The more reliable way to detect opaque prediques would be to use heuristic and programmatic solutions to disassemble and inspect the executable code sections. However, this is also limited by encrypted code containing opaque predicates and of course edge cases and an infinite number of combinations that can be used.

In addition to these issue, when performing scans with this signature yara prints the message slowing down scanning, which means that while the signature maybe great for hunting, it is not great from a performance vs. detection standpoint. False positives could also be an issue, as yara scanning does not perform proper disassembly of the application and relies on pattern matching of bytes, which can cause false positives due to alignment issues.

We also didn’t account for 8, and 16 bit immutable values being moved into registers. However, with the knowledge you have now, I hope you will be able to make some interesting opaque prediquate hunting yara signatures of your own.

Conclusion

Knowing the limitations of the halting problem and slow scanning when detecting opaque prediques with yara; our opaque predique yara signature can still be used to hunt for interesting obfuscated samples. Additionally, we can use this example as a way to create more opaque predique hunting yara signatures. At the end of the day, hunting and detection is a cat and mouse game for which we do not have a mathematical solution for to be completely safe. Those who win, are those who detect the most, and that is something we can quantify.

References