Difference between revisions of "The Stack"

From SkullSecurity
Jump to navigation Jump to search
Line 496: Line 496:
0x401029 add esp, 8    ; Remove the local variables from the stack
0x401029 add esp, 8    ; Remove the local variables from the stack
0x40102b pop ebp        ; Restore ebp
0x40102b pop ebp        ; Restore ebp
0x40102c ret            ; Return (eax isn't set, so there's no return value)</pre>
0x40102c ret            ; Return (eax isn't set, so there's no return value)
</pre>
</pre>
(You can download the complete code to test this example in Visual Studio [[Stack_example|here]].)
(You can download the complete code to test this example in Visual Studio [[Stack_example|here]].)
Because this is such a complicated example, it's valuable to go through it step by step, keeping track of the stack (again, if you use IDA, the stack variables will automatically be identified, but you should still understand how this works):
0x400000 push ecx      ; A pointer to an integer in memory
0x400001 push edx      ; Another integer pointer
<table border='1' cellpadding='0' cellspacing='0'>
    <tr>
        <td align='left'  width='75'>esp + 12</td>
        <td align='center' width='50'>?</td>
    </tr>
    <tr>
        <td align='left'>esp + 8</td>
        <td align='center'>3</td>
    </tr>
    <tr>
        <td align='left'>esp + 4</td>
        <td align='center'>2</td>
    </tr>
    <tr>
        <td align='left'>'''''esp'''''</td>
        <td align='center'>1</td>
    </tr>
    <tr>
        <td align='left'>esp - 4</td>
        <td align='center'>?</td>
    </tr>
</table>
0x400002 call 0x401000 ; Call the swap function
0x400007 add esp, 8    ; Clear the stack
.....
0x401000 ; function swap(int *a, int *b)
0x401000 push ebp      ; Preserve ebp.
0x401001 mov ebp, esp  ; Set up the frame pointer.
0x401003 sub esp, 8    ; Make room for two local variables.
0x401007 push esi      ; Preserve esi on the stack.
0x401008 push edi      ; Preserve edi on the stack.
0x401009 mov ecx, [ebp+8]  ; Put the first parameter (a pointer) into ecx.
0x40100e mov edx, [ebp+12]  ; Put the second parameter (a pointer) into edx.
0x401013 mov esi, [ecx] ; Dereference the pointer to get the first parameter.
0x401018 mov edi, [edx] ; Dereference the pointer to get the second parameter.
0x40101d mov [ecx], edi ; Put the second value into the first address.
0x401022 mov [edx], esi ; Put the first value into the second address.
0x401027 pop edi        ; Restore the edi register
0x401028 pop esi        ; Restore the esi register
0x401029 add esp, 8    ; Remove the local variables from the stack
0x40102b pop ebp        ; Restore ebp
0x40102c ret            ; Return (eax isn't set, so there's no return value)


== Balance ==
== Balance ==

Revision as of 18:30, 13 March 2007

Assembly Language Tutorial
Please choose a tutorial page:

The stack is, at best, a difficult concept to understand. However, understanding the stack is essential to reverse engineering code.

The stack register, esp, is basically a register that points to an arbitrary location in memory called "the stack". The stack is just a really big section of memory where temporary data can be stored and retrieved. When a function is called, some stack space is allocated to the function, and when a function returns the stack should be in the same state it started in.

The stack always grows downwards, towards lower values. The esp register always points to the lowest value on the stack. Anything below esp is considered free memory that can be overwritten.

The stack stores function parameters, local variables, and the return address of every function.

Function Parameters

When a function is called, its parameters are typically stored on the stack before making the call. Here is an example of a function call in C:

func(1, 2, 3); 

And here is the equivalent call in assembly:

push 3
push 2
push 1
call func
add esp, 0Ch

The parameters are put on the stack, then the function is called. The function has to know it's getting 3 parameters, which is why function parameters have to be declared in C.

After the function returns, the stack pointer is still 12 bytes ahead of where it started. In order to restore the stack to where it used to be, 12 (0x0c) has to be added to the stack pointer. The three pushes, of 4 bytes each, mean that a total of 12 was subtracted from the stack.

Here is what the initial stack looked like (with ?'s representing unknown stack values):

esp ?
esp - 4 ?
esp - 8 ?
esp - 12 ?
esp - 16 ?

Note that the same 5 32-bit stack values are shown in all these examples, with the stack pointer at the left moved. The stack goes much further up and down, but that isn't shown here.

Here are the three pushes:


push 3
esp + 4 ?
esp 3
esp - 4 ?
esp - 8 ?
esp - 12 ?


push 2
esp + 8 ?
esp + 4 3
esp 2
esp - 4 ?
esp - 8 ?


push 1
esp + 12 ?
esp + 8 3
esp + 4 2
esp 1
esp - 4 ?

Now all three values are on the stack, and esp is pointing at the 4. The function is called, and returns, leaving the stack the way it started. Now the final instruction runs:


add esp, 0Ch
esp ?
esp + 4 3
esp + 8 2
esp - 12 1
esp - 16 ?

Note that the 3, 2, and 1 are still on the stack. However, they're below the stack pointer, which means that they are considered free memory and will be overwritten.

call and ret Revisited

The call instruction pushes the address of the next instruction onto the stack, then jumps to the specified function.

The ret instruction pops the next value off the stack, which should have been put there by a call, and jumps to it.

Here is some example code:

0x10000000 push 3
0x10000001 push 2
0x10000002 push 1
0x10000003 call 0x10000020
0x10000007 add esp, 12
0x10000011 exit ; This isn't a real instruction, but pretend it is
0x10000020 mov eax, 1
0x10000024 ret

Now here is what the stack looks like at each step in this code:


0x10000000 push 3
esp + 4 ?
esp 3
esp - 4 ?
esp - 8 ?
esp - 12 ?
esp - 16 ?
esp - 20 ?


0x10000001 push 2
esp + 8 ?
esp + 4 3
esp 2
esp - 4 ?
esp - 8 ?
esp - 12 ?
esp - 16 ?


0x10000002 push 1
esp + 12 ?
esp + 8 3
esp + 4 2
esp 1
esp - 4 ?
esp - 8 ?
esp - 12 ?


0x10000003 call 0x10000020
esp + 16 ?
esp + 12 3
esp + 8 2
esp + 4 1
esp 0x1000007
esp - 4 ?
esp - 8 ?


0x10000020 mov eax, 1
esp + 16 ?
esp + 12 3
esp + 8 2
esp + 4 1
esp 0x1000007
esp - 4 ?
esp - 8 ?


0x10000024 ret
esp + 12 ?
esp + 8 3
esp + 4 2
esp 1
esp - 4 0x1000007
esp - 8 ?
esp - 12 ?


0x10000007 add esp, 12
esp ?
esp - 4 3
esp - 8 2
esp - 12 1
esp - 16 0x1000007
esp - 20 ?
esp - 24 ?


0x10000011 exit ; This isn't a real instruction, but pretend it is
esp ?
esp - 4 3
esp - 8 2
esp - 12 1
esp - 16 0x1000007
esp - 20 ?
esp - 24 ?

Note the return address being pushed onto the stack by call, and being popped off the stack by ret.

Saved Registers

Some registers (ebx, edi, esi, ebp) are generally considered to be non-volatile. What that means is that when a function is called, those registers have to be saved. Typically, this is done by pushing them onto the stack at the start of a function, and popping them in reverse order at the end. Here is a simple example:

; function test()
push esi
push edi
.....
pop edi
pop esi
ret

Local Variables

At the beginning of most functions, space to store local variables in is allocated. This is done by subtracting the total size of all local variables from the stack pointer at the start of the function, then referencing them based on the stack. An example of this will be demonstrated in the following section.

Frame Pointer

The frame pointer is the final piece to the puzzle. Unless a program has been optimized, ebp is set to point at the beginning of the local variables. The reason for this is that throughout a function, the stack changes (due to saving variables, making function calls, and others reasons), so keeping track of where the local variables are relative to the stack pointer is tricky. The frame pointer, on the other hand, is stored in a non-volatile register, ebp, so it never changed during the function.

Here is an example of a swap function that uses two parameters passed on the stack and a local variable to store the interim result (if you don't fully understand this, don't worry too much -- I don't either. IDA tends to look after this kind of stuff for you automatically, so this is more theory than actual useful information):

0x400000 push ecx      ; A pointer to an integer in memory
0x400001 push edx      ; Another integer pointer
0x400002 call 0x401000 ; Call the swap function
0x400007 add esp, 8    ; Clear the stack
.....
0x401000 ; function swap(int *a, int *b)
0x401000 push ebp      ; Preserve ebp.
0x401001 mov ebp, esp  ; Set up the frame pointer.
0x401003 sub esp, 8    ; Make room for two local variables.
0x401007 push esi      ; Preserve esi on the stack.
0x401008 push edi      ; Preserve edi on the stack.

0x401009 mov ecx, [ebp+8]   ; Put the first parameter (a pointer) into ecx.
0x40100e mov edx, [ebp+12]  ; Put the second parameter (a pointer) into edx.

0x401013 mov esi, [ecx] ; Dereference the pointer to get the first parameter.
0x401018 mov edi, [edx] ; Dereference the pointer to get the second parameter.

0x40101d mov [ecx], edi ; Put the second value into the first address.
0x401022 mov [edx], esi ; Put the first value into the second address.
		
0x401027 pop edi        ; Restore the edi register
0x401028 pop esi        ; Restore the esi register
0x401029 add esp, 8     ; Remove the local variables from the stack
0x40102b pop ebp        ; Restore ebp
0x40102c ret            ; Return (eax isn't set, so there's no return value)

(You can download the complete code to test this example in Visual Studio here.)


Because this is such a complicated example, it's valuable to go through it step by step, keeping track of the stack (again, if you use IDA, the stack variables will automatically be identified, but you should still understand how this works):

0x400000 push ecx  ; A pointer to an integer in memory 0x400001 push edx  ; Another integer pointer

esp + 12 ?
esp + 8 3
esp + 4 2
esp 1
esp - 4 ?

0x400002 call 0x401000 ; Call the swap function 0x400007 add esp, 8  ; Clear the stack ..... 0x401000 ; function swap(int *a, int *b) 0x401000 push ebp  ; Preserve ebp. 0x401001 mov ebp, esp  ; Set up the frame pointer. 0x401003 sub esp, 8  ; Make room for two local variables. 0x401007 push esi  ; Preserve esi on the stack. 0x401008 push edi  ; Preserve edi on the stack.

0x401009 mov ecx, [ebp+8]  ; Put the first parameter (a pointer) into ecx. 0x40100e mov edx, [ebp+12]  ; Put the second parameter (a pointer) into edx.

0x401013 mov esi, [ecx] ; Dereference the pointer to get the first parameter. 0x401018 mov edi, [edx] ; Dereference the pointer to get the second parameter.

0x40101d mov [ecx], edi ; Put the second value into the first address. 0x401022 mov [edx], esi ; Put the first value into the second address.

0x401027 pop edi  ; Restore the edi register 0x401028 pop esi  ; Restore the esi register 0x401029 add esp, 8  ; Remove the local variables from the stack 0x40102b pop ebp  ; Restore ebp 0x40102c ret  ; Return (eax isn't set, so there's no return value)

Balance

Questions

Feel free to edit this section and post questions, I'll do my best to answer them. But you may need to contact me to let me know that a question exists.