Junk Code¶
Principle¶
Junk code is a method used to hide code blocks (or other functionalities) from reverse engineering. It inserts garbage code into the real code while ensuring the original program still executes correctly. However, the program cannot be decompiled properly, making it difficult to understand the program's content — achieving the goal of obfuscation.
Junk code is typically used to increase the difficulty of static analysis.
Writing Junk Code¶
The simplest junk code uses inline assembly. Below is an example of adding junk code in VC. GNU compilers can also add junk code in a similar way, but using AT&T assembly syntax:
// Normal function code
int add(int a, int b){
int c = 0;
c = a + b;
return c;
}
// Function code with junk code added
int add_with_junk(int a, int b){
int c = 0;
__asm{
jz label;
jnz label;
_emit 0xe8; call instruction, followed by a 4-byte address offset, causing the disassembler to fail to recognize properly
label:
}
c = a + b;
return c;
}
When decompiling with IDA, the function with junk code cannot be properly recognized. The result is as follows:
Pseudo-code:
// With junk code added
.text:00401070 loc_401070: ; CODE XREF: sub_401005↑j
.text:00401070 push ebp
.text:00401071 mov ebp, esp
.text:00401073 sub esp, 44h
.text:00401076 push ebx
.text:00401077 push esi
.text:00401078 push edi
.text:00401079 lea edi, [ebp-44h]
.text:0040107C mov ecx, 11h
.text:00401081 mov eax, 0CCCCCCCCh
.text:00401086 rep stosd
.text:00401088 mov dword ptr [ebp-4], 0
.text:0040108F jz short near ptr loc_401093+1
.text:00401091 jnz short near ptr loc_401093+1
.text:00401093
.text:00401093 loc_401093: ; CODE XREF: .text:0040108F↑j
.text:00401093 ; .text:00401091↑j
.text:00401093 call near ptr 3485623h
.text:00401098 inc ebp
.text:00401099 or al, 89h
.text:0040109B inc ebp
.text:0040109C cld
.text:0040109D mov eax, [ebp-4]
.text:004010A0 pop edi
.text:004010A1 pop esi
.text:004010A2 pop ebx
.text:004010A3 add esp, 44h
.text:004010A6 cmp ebp, esp
.text:004010A8 call __chkesp
.text:004010AD mov esp, ebp
.text:004010AF pop ebp
.text:004010B0 retn
In the example above, patching the obfuscating junk instructions with NOPs will fix it, and then normal analysis can proceed.
It is worth noting that IDA's stack analysis is quite strict, so junk instructions involving push and ret can interfere with the disassembler's normal operation. Below is a concrete example that readers can compile and reproduce themselves:
#include <stdio.h>
// Compile with gcc/g++
int main(){
__asm__(".byte 0x55;"); // push rbp save the stack
__asm__(".byte 0xe8,0,0,0,0;"); // call $5;
__asm__(".byte 0x5d;"); // pop rbp -> get the value of rip
__asm__(".byte 0x48,0x83,0xc5,0x08;"); // add rbp, 8
__asm__(".byte 0x55;"); // push rbp -> equivalent to modifying call's return value to jump below
__asm__("ret;");
__asm__(".byte 0xe8;"); // This is an obfuscation instruction that is not executed
__asm__(".byte 0x5d;"); // pop rbp restore the stack
printf("whoami \n");
return 0;
}
Example¶
Here we use the second challenge from the Kanxue.TSRC 2017 CTF Autumn Competition as an example. Download link: ctf2017_Fpc.exe
The program contains several functions designed as decoys to mislead analysis, and the critical verification logic is protected with junk code to prevent IDA's static analysis. Let's open the Fpc challenge in IDA. The program first prints some prompt information, then gets the user's input.

Here the unsafe scanf function is used, and the user input buffer is only 0xCh bytes long. Let's double-click v1 to enter the stack frame view:

Therefore, we can overflow the data to overwrite the return address, thereby redirecting execution to an arbitrary address.
I should also explain that the several decoy functions before scanf are simple equations that are actually unsolvable. The program obfuscates the real verification logic with junk code, preventing IDA from decompiling it properly. So our approach for this challenge is to use the overflow to jump to the actual verification code and continue execution.
During analysis, we can find the following data block not far from the code:

Since IDA failed to properly identify the data, we can move the cursor to the beginning of the data block and press the C key (code) to disassemble this data block into code:

It is worth noting that this code is located at address 0x00413131. 0x41 is the ASCII code for 'A', and 0x31 is the ASCII code for '1'. Due to the Kanxue competition restrictions, user input can only contain letters and digits, so we can indeed exploit the overflow vulnerability to execute this code.
Open with OllyDbg, then press Ctrl+G to navigate to 0x413131 and set a breakpoint. After running and entering 12345612345611A followed by Enter, the program successfully reaches 0x00413131. Then right-click Analysis -> Remove analysis from module to properly recognize the code.

After breaking at 0x413131, click the "View" menu, select "Run Trace", then click "Debug", select "Trace Into". The program will record the execution flow of the junk code, as shown below:

The junk code section is originally very long, but after using OllyDbg's trace feature, the execution flow of the junk code becomes very clear. A large number of jumps occur throughout the process — we only need to extract the effective instructions for analysis.
It should be noted that among the effective instructions, we still need to satisfy certain conditional jumps so that the program continues executing along the correct logic path.
For example, at 0x413420 there is jnz ctf2017_.00413B03. We need to start over and set a breakpoint at 0x413420.

Modify the flags register to satisfy the jump condition. Continue tracing into (there is also 0041362E jnz ctf2017_.00413B03 that needs to be satisfied). After ensuring the logic is correct, extract the effective instructions and continue the analysis.
