False Disassembly¶

For some commonly used disassemblers, such as objdump or disassembler projects based on objdump, there are some disassembly flaws. There are ways to make the code disassembled by objdump not quite accurate.

Jumping into the Middle of an Instruction¶

The simplest method is to use jmp to jump into the middle of an instruction for execution. This means the real code starts from "within" a certain instruction, but during disassembly, since it targets the complete instruction, it cannot list the assembly instruction code that is actually executed.

This might sound confusing and hard to understand, so let's look at an example. Given the following assembly code:

start:
    jmp label+1
label:  
    DB 0x90
    mov eax, 0xf001

The first instruction at label is DB 0x90. Let's see the disassembly result from objdump for this code:

08048080 <start>:
  8048080:  e9 01 00 00 00  jmp 8048086 <label+0x1>
08048085 <label>:
  8048085:  90      nop
  8048086:  b8 01 f0 00 00  mov eax,0xf001

This looks fine — DB 0x90 is correctly disassembled as 90 nop.

But if we change the nop instruction to an instruction longer than 1 byte, objdump will not follow our jump and disassemble correctly, but instead linearly continue disassembling from top to bottom (linear sweep algorithm). For example, if I change DB 0x90 to DB 0xE9, let's see the disassembly result from objdump again:

08048080 <start>:
  8048080:  e9 01 00 00 00  jmp 8048086 <label+0x1>
08048085 <label>:
  8048085:  e9 b8 01 f0 00  jmp 8f48242 <__bss_start+0xeff1b6>

Comparing with the previous disassembly result, you can clearly see what's happening. DB 0xE9 is purely data and won't be executed, but the disassembly result treats it as an instruction, and subsequent results are changed accordingly.

objdump ignores the code at the jmp destination address and directly assembles the instruction after jmp. This way, our real code is nicely "hidden".

Solution¶

How do we solve this problem? The most straightforward method appears to be manually replacing the useless 0xE9 with 0x90 using a hex editor. However, if the program performs file integrity checks and calculates checksum values, this method won't work.

So a better solution is to use IDA or similar disassemblers that perform control flow analysis. For the same problematic program, the disassembly result might look like:

  ---- section .text ----:
08048080    E9 01 00 00 00  jmp Label_08048086
                                                ; (08048086)
                                                ; (near + 0x1)
08048085    DB E9

Label_08048086:
08048086    B8 01 F0 00 00  mov eax, 0xF001
                                                ; xref ( 08048080 )

The disassembly result looks fine.

Computing Jump Addresses at Runtime¶

This method can even defeat disassemblers that analyze control flow. Let's look at an example code to better understand:

; ----------------------------------------------------------------------------
    call earth+1
Return:
                    ; x instructions or random bytes here               x byte(s)
earth:              ; earth = Return + x
    xor eax, eax    ; align disassembly, using single byte opcode       1 byte
    pop eax         ; start of function: get return address ( Return )  1 byte
                    ; y instructions or random bytes here               y byte(s)
    add eax, x+2+y+2+1+1+z ; x+y+z+6                                    2 bytes
    push eax        ;                                                   1 byte
    ret             ;                                                   1 byte
                    ; z instructions or random bytes here               z byte(s)
; Code:
                    ; !! Code Continues Here !!
; ----------------------------------------------------------------------------

The program uses call+pop to get the return address saved on the stack when calling the function, which is essentially the EIP before the function call. Then junk data is inserted at the function return point. But in reality, during function execution, the return address has been modified to point to Code. Therefore, when the earth function returns, it jumps to Code to continue execution, rather than continuing at Return.

Let's look at a simple demo:

; ----------------------------------------------------------------------------
    call earth+1
earth:  
    DB 0xE9             ; 1 <--- pushed return address,
                        ; E9 is opcode for jmp to disalign disas-
    ; sembly
    pop eax             ; 1 hidden
    nop                 ; 1
    add eax, 9          ; 2 hidden
    push eax            ; 1 hidden
    ret                 ; 1 hidden
    DB 0xE9             ; 1 opcode for jmp to misalign disassembly
Code:                   ; code continues here <--- pushed return address + 9
    nop
    nop
    nop
    ret
; ----------------------------------------------------------------------------

If using objdump for disassembly, just call earth+1 will cause problems, as follows:

00000000 <earth-0x5>:
  0:    e8 01 00 00 00  call 6 <earth+0x1>
00000005 <earth>:
  5:    e9 58 90 05 09  jmp 9059062 <earth+0x905905d>
  a:    00 00           add %al,(%eax)
  c:    00 50 c3        add %dl,0xffffffc3(%eax)
  f:    e9 90 90 90 c3  jmp c39090a4 <earth+0xc390909f>

Let's see the ida situation:

text:08000000   ; Segment permissions: Read/Execute
.text:08000000   _text  segment para public 'CODE' use32
.text:08000000      assume cs:_text
.text:08000000      ;org 8000000h
.text:08000000      assume  es:nothing, ss:nothing, ds:_text,
.text:08000000          fs:nothing, gs:nothing
.text:08000000      dd 1E8h
.text:08000004 ; -------------------------------------------------------------
.text:08000004      add cl, ch
.text:08000006      pop eax
.text:08000007      nop
.text:08000008      add eax, 9
.text:0800000D      push eax
.text:0800000E      retn
.text:0800000E ; -------------------------------------------------------------
.text:0800000F      dd 909090E9h
.text:08000013 ; -------------------------------------------------------------
.text:08000013      retn
.text:08000013 _text        ends
.text:08000013
.text:08000013
.text:08000013      end

The last 3 nop instructions are all nicely hidden. Not only that, but the process of calculating EIP is also perfectly hidden. In fact, the entire disassembled code is completely different from the actual code.

How to solve this problem? In reality, there is no tool that can guarantee 100% accurate disassembly. Perhaps when disassemblers achieve code emulation, they may be able to produce completely correct assembly.

In real-world situations, this is not a particularly big problem. Because with interactive disassemblers, you can specify where the code starts. And during debugging, you can clearly see the addresses where the program actually jumps to.

So at this point, in addition to static analysis, we also need dynamic debugging.

Reference: Beginners Guide to Basic Linux Anti Anti Debugging Techniques