Post

Exploit Development Windows User Mode Fundamentals

Fundamentos

Before moving on to exploitation, it is necessary to learn some basic concepts. Exploitations will be carried out on the x86 architecture since knowing how it is done on this architecture allows for easy adaptation to x64 with just a few minimal changes.

Memory

When a program runs on Windows, it is allocated memory from the lowest address 0x00000000 to the highest address 0x7fffffff, which falls within the “user-mode” range, and from address 0x80000000 to 0xffffffff in “kernel-mode.”

When a process is created, the PEB and TEB structures are also created with it:

1
2
3
PEB: Contains the parameters of the "windows-user" in the current process, such as the address of the executable or the pointer to the loader, as well as information about the heap.

TEB: Contains information about the thread, such as the address of the PEB structure, the location of the current thread's stack, or the pointer to the SEH structure.

Image Alt Text

Stack

When a thread is created, it executes code from the program or libraries. This thread requires a fast-access area for functions, variables, and program information, which is known as the stack. Each thread creates its own stack.

The stack operates under a LIFO (Last-In-First-Out) structure, meaning the last data pushed onto the stack will be the first to be removed when a pop instruction is executed to remove data.

When a stack is created, the pointer to it points to the top. This means that when information is pushed onto the stack, the pointer decreases, indicating that the stack grows in the reverse direction, from the highest address to the lowest address.

Calling Conventions

Calling conventions refer to the way functions receive parameters and the return address in each architecture. In x86, arguments are pushed onto the stack along with the return address, and the stack is cleaned up after the function call to be used again.

When a function is called, it needs to know where to return after it finishes executing. Therefore, before making the call, the address of the next instruction is saved on the stack. When the function reaches the end and executes the ret instruction, it will take the saved address from the stack and return to it to continue executing from there.

Image Alt Text

Registers

To execute code efficiently, the CPU uses 9 32-bit registers. Registers are small locations where data can be read and/or modified efficiently. It’s important to note that each register can be divided into 16-bit and/or 8-bit subregisters, as shown in the following table:

Image Alt Text

These registers and their subregisters allow for efficient data manipulation at various levels of granularity, enhancing the CPU’s ability to perform tasks quickly.

In the case of the 32-bit register (EAX), it is divided into a 16-bit subregister (AX), which can further be divided into two 8-bit subregisters (AH) and (AL) respectively.

Here is the breakdown:

1
2
3
4
EAX (32-bit)
    AX (16-bit)
        AH (8-bit, high)
        AL (8-bit, low)

Image Alt Text

General-Purpose Registers

Several registers are used as general-purpose registers to store temporary data. Some of their specific purposes are:

1
2
3
4
5
6
EAX (Accumulator): Used for arithmetic and logical instructions.
EBX (Base): Serves as a base pointer for memory addresses.
ECX (Counter): Functions as a counter pointer.
EDX (Data): Used for addressing, multiplication, and division.
ESI (Source Index): Points to the source in string operations.
EDI (Destination Index): Points to the destination in string operations.

Important Pointer Registers

Other important registers that store pointers are:

1
2
3
ESP (Stack Pointer): Holds the pointer to the top of the stack.
EBP (Base Pointer): Points to the top of the stack when a function is called.
EIP (Instruction Pointer): Points to the address of the next instruction to be executed.

Endianness

There are different ways to represent values in memory, with the most common being:

1
2
3
Big Endian: Adopted by Motorola and others, it represents bytes in natural order. For instance, the hexadecimal value 0x01020304 would be stored in memory with the bytes ordered as 01 02 03 04, so it doesn't undergo any changes.

Little Endian: Adopted by Intel, the same value 0x01020304 would be stored in reverse order, with the bytes 04 03 02 01. This makes data access more intuitive, as it can be performed incrementally from less significant to more significant bytes.

Example

A vulnerable piece of code might look like this. It calls the vuln() function, passing it the string from the first argument. This function copies the string to the dest variable with only a 64-byte buffer. So, what happens if more bytes are sent?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#include <stdio.h>
#include <string.h>

void vuln(char *str) {
    char dest[64];
    strcpy(dest, str);
    printf("Copied string: %s\n", dest);
}

int main(int argc, char *argv[]) {
    if (argc > 1) {
        vuln(argv[1]);
    } else {
        printf("Usage: %s <string>\n", argv[0]);
    }
    return 0;
}

If more than 64 bytes are sent as the input string, it will overflow the buffer dest, potentially overwriting adjacent memory, which can lead to unexpected behavior or exploitation.

1
i686-w64-mingw32-gcc vuln.c -no-pie -o vuln.exe 

After compiling the code, we can pass it to a disassembler like IDA, where we see that the pointer where the copied data will be stored points to ebp - 72.

Here is how the disassembled code might look:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
vuln:
    push    ebp
    mov     ebp, esp
    sub     esp, 72h    ; Allocate 64 bytes for dest
    mov     eax, [ebp+8] ; Load the address of str into eax
    lea     ecx, [ebp-72h] ; Load the address of dest into ecx
    mov     [ecx], eax
    call    _strcpy
    lea     eax, [ebp-72h]
    push    eax
    call    _printf
    add     esp, 4
    leave
    ret

Image Alt Text

We will use an argument consisting of a string with 72 A's, which will be placed before ebp, 4 B's that will overwrite ebp, 4 C's that will be written on the stack, and 20 D's.

1
2
3
4
 python3 -q
>>> ("A" * 72) + ("B" * 4) + ("C" * 4) + ("D" * 20)
'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBBCCCCDDDDDDDDDDDDDDDDDDDD'  
>>>

To analyze the program, we need a debugger; in this case, we will use WinDbg.

Image Alt Text

If we break at the ret instruction, we can see that we wrote A’s up to just before overwriting ebp, which now holds the value 0x42424242 (the hexadecimal value of BBBB). The C’s and D’s are stored right after this on the stack.

Image Alt Text

When executing the ret instruction, it will attempt to jump to the memory address stored at the top of the stack. However, since we have overwritten that address with CCCC, it will try to jump to 0x43434343. As this address does not exist, the program will become corrupted.

Image Alt Text

In summary, the most well-known buffer overflow generally occurs when the amount of data written is not properly controlled. By overwriting the return address, it points to a value introduced by the attacker, allowing them to gain control.

This post is licensed under CC BY 4.0 by the author.