https://wiki.skullsecurity.org/api.php?action=feedcontributions&user=Mogigoma&feedformat=atomSkullSecurity - User contributions [en]2024-03-28T16:19:50ZUser contributionsMediaWiki 1.36.1https://wiki.skullsecurity.org/index.php?title=The_Stack&diff=3113The Stack2011-01-16T17:21:19Z<p>Mogigoma: /* Function Parameters */</p>
<hr />
<div>{{Infobox assembly}}<br />
<br />
The stack is, at best, a difficult concept to understand. However, understanding the stack is essential to reverse engineering code. <br />
<br />
The stack register, esp, is basically a register that points to an arbitrary location in memory called "the stack". The stack is just a really big section of memory where temporary data can be stored and retrieved. When a function is called, some stack space is allocated to the function, and when a function returns the stack should be in the same state it started in. <br />
<br />
The stack always grows downwards, towards lower values. The esp register always points to the lowest value on the stack. Anything below esp is considered free memory that can be overwritten. <br />
<br />
The stack stores function parameters, local variables, and the return address of every function. <br />
<br />
== Function Parameters ==<br />
When a function is called, its parameters are typically stored on the stack before making the call. Here is an example of a function call in C:<br />
func(1, 2, 3); <br />
And here is the equivalent call in assembly:<br />
push 3<br />
push 2<br />
push 1<br />
call func<br />
add esp, 0Ch<br />
<br />
The parameters are put on the stack, then the function is called. The function has to know it's getting 3 parameters, which is why function parameters have to be declared in C. <br />
<br />
After the function returns, the stack pointer is still 12 bytes ahead of where it started. In order to restore the stack to where it used to be, 12 (0x0c) has to be added to the stack pointer. The three pushes, of 4 bytes each, mean that a total of 12 was subtracted from the stack. <br />
<br />
Here is what the initial stack looked like (with ?'s representing unknown stack values):<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>'''''esp'''''</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
Note that the same 5 32-bit stack values are shown in all these examples, with the stack pointer at the left moved. The stack goes much further up and down, but that isn't shown here. <br />
<br />
Here are the three pushes:<br />
<br />
<br />
<br />
push 3<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 4</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
push 2<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 8</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
push 1<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 12</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
Now all three values are on the stack, and esp is pointing at the 1. The function is called, and returns, leaving the stack the way it started. Now the final instruction runs:<br />
<br />
<br />
<br />
add esp, 0Ch<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>'''''esp'''''</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
Note that the 3, 2, and 1 are still on the stack. However, they're below the stack pointer, which means that they are considered free memory and will be overwritten.<br />
<br />
== call and ret Revisited ==<br />
<br />
The ''call'' instruction pushes the address of the next instruction onto the stack, then jumps to the specified function. <br />
<br />
The ''ret'' instruction pops the next value off the stack, which should have been put there by a call, and jumps to it. <br />
<br />
Here is some example code:<br />
0x10000000 push 3<br />
0x10000001 push 2<br />
0x10000002 push 1<br />
0x10000003 call 0x10000020<br />
0x10000007 add esp, 12<br />
0x10000011 exit ; This isn't a real instruction, but pretend it is<br />
0x10000020 mov eax, 1<br />
0x10000024 ret<br />
<br />
Now here is what the stack looks like at each step in this code:<br />
<br />
<br />
<br />
0x10000000 push 3<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 4</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x10000001 push 2<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 8</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x10000002 push 1<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 12</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x10000003 call 0x10000020<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 16</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>0x1000007</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x10000020 mov eax, 1<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 16</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>0x1000007</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x10000024 ret<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 12</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>0x1000007</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x10000007 add esp, 12<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>'''''esp'''''</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>0x1000007</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x10000011 exit ; This isn't a real instruction, but pretend it is<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>'''''esp'''''</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>0x1000007</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
Note the return address being pushed onto the stack by call, and being popped off the stack by ret.<br />
<br />
== Saved Registers ==<br />
Some registers (ebx, edi, esi, ebp) are generally considered to be non-volatile. What that means is that when a function is called, those registers have to be saved. Typically, this is done by pushing them onto the stack at the start of a function, and popping them in reverse order at the end. Here is a simple example:<br />
<br />
; function test()<br />
push esi<br />
push edi<br />
.....<br />
pop edi<br />
pop esi<br />
ret<br />
<br />
== Local Variables ==<br />
<br />
At the beginning of most functions, space to store local variables in is allocated. This is done by subtracting the total size of all local variables from the stack pointer at the start of the function, then referencing them based on the stack. An example of this will be demonstrated in the following section. <br />
<br />
== Frame Pointer ==<br />
The frame pointer is the final piece to the puzzle. Unless a program has been optimized, ebp is set to point at the beginning of the local variables. The reason for this is that throughout a function, the stack changes (due to saving variables, making function calls, and others reasons), so keeping track of where the local variables are relative to the stack pointer is tricky. The frame pointer, on the other hand, is stored in a non-volatile register, ebp, so it never changed during the function. <br />
<br />
Here is an example of a swap function that uses two parameters passed on the stack and a local variable to store the interim result (if you don't fully understand this, don't worry too much -- I don't either. IDA tends to look after this kind of stuff for you automatically, so this is more theory than actual useful information):<br />
<br />
<pre><br />
0x400000 push ecx ; A pointer to an integer in memory<br />
0x400001 push edx ; Another integer pointer<br />
0x400002 call 0x401000 ; Call the swap function<br />
0x400003 add esp, 8 ; Clear the stack<br />
.....<br />
0x401000 ; function swap(int *a, int *b)<br />
0x401000 push ebp ; Preserve ebp.<br />
0x401001 mov ebp, esp ; Set up the frame pointer.<br />
0x401002 sub esp, 8 ; Make room for two local variables.<br />
0x401003 push esi ; Preserve esi on the stack.<br />
0x401004 push edi ; Preserve edi on the stack.<br />
<br />
0x401005 mov ecx, [ebp+8] ; Put the first parameter (a pointer) into ecx.<br />
0x401006 mov edx, [ebp+12] ; Put the second parameter (a pointer) into edx.<br />
<br />
0x401007 mov esi, [ecx] ; Dereference the pointer to get the first parameter.<br />
0x401008 mov edi, [edx] ; Dereference the pointer to get the second parameter.<br />
<br />
0x401009 mov [ebp-4], esi ; Store the first as a local variable<br />
0x40100a mov [ebp-8], edi ; Store the second as a local variable<br />
<br />
0x40100b mov esi, [ebp-8] ; Retrieve them in reverse<br />
0x40100c mov edi, [ebp-4]<br />
<br />
0x40100d mov [ecx], edi ; Put the second value into the first address.<br />
0x40100e mov [edx], esi ; Put the first value into the second address.<br />
<br />
0x40100f pop edi ; Restore the edi register<br />
0x401010 pop esi ; Restore the esi register<br />
0x401011 add esp, 8 ; Remove the local variables from the stack<br />
0x401012 pop ebp ; Restore ebp<br />
0x401013 ret ; Return (eax isn't set, so there's no return value)<br />
</pre><br />
<br />
(You can download the complete code to test this example in Visual Studio [[Stack_Example|here]].)<br />
<br />
<br />
<br />
Because this is such a complicated example, it's valuable to go through it step by step, keeping track of the stack (again, if you use IDA, the stack variables will automatically be identified, but you should still understand how this works):<br />
<br />
Initial stack:<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp - 4</td><br />
<td align='center' width='100'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 28</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 32</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 36</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x400000 push ecx ; A pointer to an integer in memory<br />
0x400001 push edx ; Another integer pointer<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 4</td><br />
<td align='center' width='100' style='color: red;'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp'''''</td><br />
<td align='center' style='color: red;'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 28</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x400002 call 0x401000 ; Call the swap function<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 8</td><br />
<td align='center' width='100'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp'''''</td><br />
<td align='center' style='color: red;'>0x400007</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x401000 ; function swap(int *a, int *b)<br />
0x401000 push ebp ; Preserve ebp.<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 12</td><br />
<td align='center' width='100'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8</td><br />
<td align='center'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>0x400007</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp'''''</td><br />
<td align='center' style='color: red;'>(ebp's value)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x401001 mov ebp, esp ; Set up the frame pointer.<br />
0x401002 sub esp, 8 ; Make room for two local variables.<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 20</td><br />
<td align='center' width='100'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 16</td><br />
<td align='center'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12</td><br />
<td align='center'>0x400007</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: yellow'>esp + 8, '''''ebp'''''</td><br />
<td align='center'>(ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center' style='color: red;'>(unused)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp'''''</td><br />
<td align='center' style='color: red;'>(unused)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x401003 push esi ; Preserve esi on the stack.<br />
0x401004 push edi ; Preserve edi on the stack.<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 28, ''ebp + 12''</td><br />
<td align='center' width='100'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 24, ''ebp + 8''</td><br />
<td align='center'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 20, ''ebp + 4''</td><br />
<td align='center'>0x400007</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: yellow'>esp + 16, '''''ebp'''''</td><br />
<td align='center'>(ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12, ''ebp - 4''</td><br />
<td align='center'>(unused)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8, ''ebp - 8''</td><br />
<td align='center'>(unused)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4, ''ebp - 12''</td><br />
<td align='center' style='color: red;'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp''''', ''ebp - 16''</td><br />
<td align='center' style='color: red;'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4, ''ebp - 20''</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
Note how in the following section the variables are address based in the address of ebp. The first parameter is ebp + 8, which is 2 values above ebp on the stack, and the second is ebp + 12, which is 3 above ebp. Count them to confirm!<br />
<br />
0x401005 mov ecx, [ebp+8] ; Put the first parameter (a pointer) into ecx.<br />
0x401006 mov edx, [ebp+12] ; Put the second parameter (a pointer) into edx.<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 28, ''ebp + 12''</td><br />
<td align='center' width='100' style='color: green;'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 24, ''ebp + 8''</td><br />
<td align='center' style='color: green;'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 20, ''ebp + 4''</td><br />
<td align='center'>0x400007</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: yellow'>esp + 16, '''''ebp'''''</td><br />
<td align='center'>(ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12, ''ebp - 4''</td><br />
<td align='center'>(unused)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8, ''ebp - 8''</td><br />
<td align='center'>(unused)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4, ''ebp - 12''</td><br />
<td align='center'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp''''', ''ebp - 16''</td><br />
<td align='center'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4, ''ebp - 20''</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
<br />
These lines don't use the stack, so the table will be omitted:<br />
0x401007 mov esi, [ecx] ; Dereference the pointer to get the first parameter.<br />
0x401008 mov edi, [edx] ; Dereference the pointer to get the second parameter.<br />
<br />
<br />
<br />
<br />
0x401009 mov [ebp-4], esi ; Store the first as a local variable<br />
0x40100a mov [ebp-8], edi ; Store the second as a local variable<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 28, ''ebp + 12''</td><br />
<td align='center' width='100'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 24, ''ebp + 8''</td><br />
<td align='center'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 20, ''ebp + 4''</td><br />
<td align='center'>0x400007</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: yellow'>esp + 16, '''''ebp'''''</td><br />
<td align='center'>(ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12, ''ebp - 4''</td><br />
<td align='center' style='color: red;'>esi (var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8, ''ebp - 8''</td><br />
<td align='center' style='color: red;'>edi (var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4, ''ebp - 12''</td><br />
<td align='center'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp''''', ''ebp - 16''</td><br />
<td align='center'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4, ''ebp - 20''</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x40100b mov esi, [ebp-8] ; Retrieve them in reverse<br />
0x40100c mov edi, [ebp-4]<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 28, ''ebp + 12''</td><br />
<td align='center' width='100'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 24, ''ebp + 8''</td><br />
<td align='center'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 20, ''ebp + 4''</td><br />
<td align='center'>0x400007</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: yellow'>esp + 16, '''''ebp'''''</td><br />
<td align='center'>(ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12, ''ebp - 4''</td><br />
<td align='center' style='color: green;'>esi (var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8, ''ebp - 8''</td><br />
<td align='center' style='color: green;'>edi (var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4, ''ebp - 12''</td><br />
<td align='center'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp''''', ''ebp - 16''</td><br />
<td align='center'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4, ''ebp - 20''</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
<br />
0x40100d mov [ecx], edi ; Put the second value into the first address.<br />
0x40100e mov [edx], esi ; Put the first value into the second address.<br />
0x40100f pop edi ; Restore the edi register<br />
0x401010 pop esi ; Restore the esi register<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 20, ''ebp + 12''</td><br />
<td align='center' width='100'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 16, ''ebp + 8''</td><br />
<td align='center'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12, ''ebp + 4''</td><br />
<td align='center'>0x400007</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: yellow'>esp + 8, '''''ebp'''''</td><br />
<td align='center'>(ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4, ''ebp - 4''</td><br />
<td align='center'>esi (var1)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan;'>'''''esp ''''', ''ebp - 8''</td><br />
<td align='center'>edi (var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4, ''ebp - 12''</td><br />
<td align='center' style='color: green;'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8, ''ebp - 16''</td><br />
<td align='center' style='color: green;'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12, ''ebp - 20''</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x401011 add esp, 8 ; Remove the local variables from the stack<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 12, ''ebp + 12''</td><br />
<td align='center' width='100'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8, ''ebp + 8''</td><br />
<td align='center'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4, ''ebp + 4''</td><br />
<td align='center'>0x400007</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp''''', '''''ebp'''''</td><br />
<td align='center'>(ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4, ''ebp - 4''</td><br />
<td align='center'>esi (var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8, ''ebp - 8''</td><br />
<td align='center'>edi (var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12, ''ebp - 12''</td><br />
<td align='center'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16, ''ebp - 16''</td><br />
<td align='center'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20, ''ebp - 20''</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x401012 pop ebp ; Restore ebp<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 8</td><br />
<td align='center' width='100'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan;'>'''''esp '''''</td><br />
<td align='center'>0x400007</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center' style='color: green;'>(ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>esi (var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>edi (var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
0x401013 ret ; Return (eax isn't set, so there's no return value)<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 4</td><br />
<td align='center' width='100'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp'''''</td><br />
<td align='center'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center' style='color: green;'>0x400007</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>(ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>esi (var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>edi (var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 28</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
0x400007 add esp, 8 ; Clear the stack<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp - 4</td><br />
<td align='center' width='100'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>0x400007</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>(ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>esi (var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>edi (var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 28</td><br />
<td align='center'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 32</td><br />
<td align='center'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 36</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
== Balance ==<br />
<br />
This should be rather obvious from the examples shown above, but it is worth paying special attention to.<br />
<br />
Every function should leave the stack pointer in the exact place it received it. In other words, every amount subtracted from the stack (either by sub or push) ''has to be added to the stack'' (either by add or pop). If it isn't, the return value won't be in the right place and the program will likely crash.<br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, I'll do my best to answer them. But you may need to contact me to let me know that a question exists.</div>Mogigomahttps://wiki.skullsecurity.org/index.php?title=The_Stack&diff=3112The Stack2011-01-16T17:20:38Z<p>Mogigoma: /* Function Parameters */</p>
<hr />
<div>{{Infobox assembly}}<br />
<br />
The stack is, at best, a difficult concept to understand. However, understanding the stack is essential to reverse engineering code. <br />
<br />
The stack register, esp, is basically a register that points to an arbitrary location in memory called "the stack". The stack is just a really big section of memory where temporary data can be stored and retrieved. When a function is called, some stack space is allocated to the function, and when a function returns the stack should be in the same state it started in. <br />
<br />
The stack always grows downwards, towards lower values. The esp register always points to the lowest value on the stack. Anything below esp is considered free memory that can be overwritten. <br />
<br />
The stack stores function parameters, local variables, and the return address of every function. <br />
<br />
== Function Parameters ==<br />
When a function is called, its parameters are typically stored on the stack before making the call. Here is an example of a function call in C:<br />
func(1, 2, 3); <br />
And here is the equivalent call in assembly:<br />
push 3<br />
push 2<br />
push 1<br />
call func<br />
add esp, 0Ch<br />
<br />
The parameters are put on the stack, then the function is called. The function has to know it's getting 3 parameters, which is why function parameters have to be declared in C. <br />
<br />
After the function returns, the stack pointer is still 12 bytes ahead of where it started. In order to restore the stack to where it used to be, 12 (0x0c) has to be added to the stack pointer. The three pushes, of 4 bytes each, mean that a total of 12 was subtracted from the stack. <br />
<br />
Here is what the initial stack looked like (with ?'s representing unknown stack values):<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>'''''esp'''''</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
Note that the same 5 32-bit stack values are shown in all these examples, with the stack pointer at the left moved. The stack goes much further up and down, but that isn't shown here. <br />
<br />
Here are the three pushes:<br />
<br />
<br />
<br />
push 3<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 4</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
push 2<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 8</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
push 1<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 12</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
Now all three values are on the stack, and esp is pointing at the `. The function is called, and returns, leaving the stack the way it started. Now the final instruction runs:<br />
<br />
<br />
<br />
add esp, 0Ch<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>'''''esp'''''</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
Note that the 3, 2, and 1 are still on the stack. However, they're below the stack pointer, which means that they are considered free memory and will be overwritten.<br />
<br />
== call and ret Revisited ==<br />
<br />
The ''call'' instruction pushes the address of the next instruction onto the stack, then jumps to the specified function. <br />
<br />
The ''ret'' instruction pops the next value off the stack, which should have been put there by a call, and jumps to it. <br />
<br />
Here is some example code:<br />
0x10000000 push 3<br />
0x10000001 push 2<br />
0x10000002 push 1<br />
0x10000003 call 0x10000020<br />
0x10000007 add esp, 12<br />
0x10000011 exit ; This isn't a real instruction, but pretend it is<br />
0x10000020 mov eax, 1<br />
0x10000024 ret<br />
<br />
Now here is what the stack looks like at each step in this code:<br />
<br />
<br />
<br />
0x10000000 push 3<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 4</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x10000001 push 2<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 8</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x10000002 push 1<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 12</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x10000003 call 0x10000020<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 16</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>0x1000007</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x10000020 mov eax, 1<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 16</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>0x1000007</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x10000024 ret<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 12</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>0x1000007</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x10000007 add esp, 12<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>'''''esp'''''</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>0x1000007</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x10000011 exit ; This isn't a real instruction, but pretend it is<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>'''''esp'''''</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>0x1000007</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
Note the return address being pushed onto the stack by call, and being popped off the stack by ret.<br />
<br />
== Saved Registers ==<br />
Some registers (ebx, edi, esi, ebp) are generally considered to be non-volatile. What that means is that when a function is called, those registers have to be saved. Typically, this is done by pushing them onto the stack at the start of a function, and popping them in reverse order at the end. Here is a simple example:<br />
<br />
; function test()<br />
push esi<br />
push edi<br />
.....<br />
pop edi<br />
pop esi<br />
ret<br />
<br />
== Local Variables ==<br />
<br />
At the beginning of most functions, space to store local variables in is allocated. This is done by subtracting the total size of all local variables from the stack pointer at the start of the function, then referencing them based on the stack. An example of this will be demonstrated in the following section. <br />
<br />
== Frame Pointer ==<br />
The frame pointer is the final piece to the puzzle. Unless a program has been optimized, ebp is set to point at the beginning of the local variables. The reason for this is that throughout a function, the stack changes (due to saving variables, making function calls, and others reasons), so keeping track of where the local variables are relative to the stack pointer is tricky. The frame pointer, on the other hand, is stored in a non-volatile register, ebp, so it never changed during the function. <br />
<br />
Here is an example of a swap function that uses two parameters passed on the stack and a local variable to store the interim result (if you don't fully understand this, don't worry too much -- I don't either. IDA tends to look after this kind of stuff for you automatically, so this is more theory than actual useful information):<br />
<br />
<pre><br />
0x400000 push ecx ; A pointer to an integer in memory<br />
0x400001 push edx ; Another integer pointer<br />
0x400002 call 0x401000 ; Call the swap function<br />
0x400003 add esp, 8 ; Clear the stack<br />
.....<br />
0x401000 ; function swap(int *a, int *b)<br />
0x401000 push ebp ; Preserve ebp.<br />
0x401001 mov ebp, esp ; Set up the frame pointer.<br />
0x401002 sub esp, 8 ; Make room for two local variables.<br />
0x401003 push esi ; Preserve esi on the stack.<br />
0x401004 push edi ; Preserve edi on the stack.<br />
<br />
0x401005 mov ecx, [ebp+8] ; Put the first parameter (a pointer) into ecx.<br />
0x401006 mov edx, [ebp+12] ; Put the second parameter (a pointer) into edx.<br />
<br />
0x401007 mov esi, [ecx] ; Dereference the pointer to get the first parameter.<br />
0x401008 mov edi, [edx] ; Dereference the pointer to get the second parameter.<br />
<br />
0x401009 mov [ebp-4], esi ; Store the first as a local variable<br />
0x40100a mov [ebp-8], edi ; Store the second as a local variable<br />
<br />
0x40100b mov esi, [ebp-8] ; Retrieve them in reverse<br />
0x40100c mov edi, [ebp-4]<br />
<br />
0x40100d mov [ecx], edi ; Put the second value into the first address.<br />
0x40100e mov [edx], esi ; Put the first value into the second address.<br />
<br />
0x40100f pop edi ; Restore the edi register<br />
0x401010 pop esi ; Restore the esi register<br />
0x401011 add esp, 8 ; Remove the local variables from the stack<br />
0x401012 pop ebp ; Restore ebp<br />
0x401013 ret ; Return (eax isn't set, so there's no return value)<br />
</pre><br />
<br />
(You can download the complete code to test this example in Visual Studio [[Stack_Example|here]].)<br />
<br />
<br />
<br />
Because this is such a complicated example, it's valuable to go through it step by step, keeping track of the stack (again, if you use IDA, the stack variables will automatically be identified, but you should still understand how this works):<br />
<br />
Initial stack:<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp - 4</td><br />
<td align='center' width='100'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 28</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 32</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 36</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x400000 push ecx ; A pointer to an integer in memory<br />
0x400001 push edx ; Another integer pointer<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 4</td><br />
<td align='center' width='100' style='color: red;'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp'''''</td><br />
<td align='center' style='color: red;'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 28</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x400002 call 0x401000 ; Call the swap function<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 8</td><br />
<td align='center' width='100'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp'''''</td><br />
<td align='center' style='color: red;'>0x400007</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x401000 ; function swap(int *a, int *b)<br />
0x401000 push ebp ; Preserve ebp.<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 12</td><br />
<td align='center' width='100'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8</td><br />
<td align='center'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>0x400007</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp'''''</td><br />
<td align='center' style='color: red;'>(ebp's value)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x401001 mov ebp, esp ; Set up the frame pointer.<br />
0x401002 sub esp, 8 ; Make room for two local variables.<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 20</td><br />
<td align='center' width='100'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 16</td><br />
<td align='center'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12</td><br />
<td align='center'>0x400007</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: yellow'>esp + 8, '''''ebp'''''</td><br />
<td align='center'>(ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center' style='color: red;'>(unused)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp'''''</td><br />
<td align='center' style='color: red;'>(unused)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x401003 push esi ; Preserve esi on the stack.<br />
0x401004 push edi ; Preserve edi on the stack.<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 28, ''ebp + 12''</td><br />
<td align='center' width='100'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 24, ''ebp + 8''</td><br />
<td align='center'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 20, ''ebp + 4''</td><br />
<td align='center'>0x400007</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: yellow'>esp + 16, '''''ebp'''''</td><br />
<td align='center'>(ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12, ''ebp - 4''</td><br />
<td align='center'>(unused)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8, ''ebp - 8''</td><br />
<td align='center'>(unused)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4, ''ebp - 12''</td><br />
<td align='center' style='color: red;'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp''''', ''ebp - 16''</td><br />
<td align='center' style='color: red;'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4, ''ebp - 20''</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
Note how in the following section the variables are address based in the address of ebp. The first parameter is ebp + 8, which is 2 values above ebp on the stack, and the second is ebp + 12, which is 3 above ebp. Count them to confirm!<br />
<br />
0x401005 mov ecx, [ebp+8] ; Put the first parameter (a pointer) into ecx.<br />
0x401006 mov edx, [ebp+12] ; Put the second parameter (a pointer) into edx.<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 28, ''ebp + 12''</td><br />
<td align='center' width='100' style='color: green;'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 24, ''ebp + 8''</td><br />
<td align='center' style='color: green;'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 20, ''ebp + 4''</td><br />
<td align='center'>0x400007</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: yellow'>esp + 16, '''''ebp'''''</td><br />
<td align='center'>(ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12, ''ebp - 4''</td><br />
<td align='center'>(unused)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8, ''ebp - 8''</td><br />
<td align='center'>(unused)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4, ''ebp - 12''</td><br />
<td align='center'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp''''', ''ebp - 16''</td><br />
<td align='center'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4, ''ebp - 20''</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
<br />
These lines don't use the stack, so the table will be omitted:<br />
0x401007 mov esi, [ecx] ; Dereference the pointer to get the first parameter.<br />
0x401008 mov edi, [edx] ; Dereference the pointer to get the second parameter.<br />
<br />
<br />
<br />
<br />
0x401009 mov [ebp-4], esi ; Store the first as a local variable<br />
0x40100a mov [ebp-8], edi ; Store the second as a local variable<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 28, ''ebp + 12''</td><br />
<td align='center' width='100'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 24, ''ebp + 8''</td><br />
<td align='center'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 20, ''ebp + 4''</td><br />
<td align='center'>0x400007</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: yellow'>esp + 16, '''''ebp'''''</td><br />
<td align='center'>(ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12, ''ebp - 4''</td><br />
<td align='center' style='color: red;'>esi (var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8, ''ebp - 8''</td><br />
<td align='center' style='color: red;'>edi (var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4, ''ebp - 12''</td><br />
<td align='center'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp''''', ''ebp - 16''</td><br />
<td align='center'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4, ''ebp - 20''</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x40100b mov esi, [ebp-8] ; Retrieve them in reverse<br />
0x40100c mov edi, [ebp-4]<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 28, ''ebp + 12''</td><br />
<td align='center' width='100'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 24, ''ebp + 8''</td><br />
<td align='center'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 20, ''ebp + 4''</td><br />
<td align='center'>0x400007</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: yellow'>esp + 16, '''''ebp'''''</td><br />
<td align='center'>(ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12, ''ebp - 4''</td><br />
<td align='center' style='color: green;'>esi (var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8, ''ebp - 8''</td><br />
<td align='center' style='color: green;'>edi (var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4, ''ebp - 12''</td><br />
<td align='center'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp''''', ''ebp - 16''</td><br />
<td align='center'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4, ''ebp - 20''</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
<br />
0x40100d mov [ecx], edi ; Put the second value into the first address.<br />
0x40100e mov [edx], esi ; Put the first value into the second address.<br />
0x40100f pop edi ; Restore the edi register<br />
0x401010 pop esi ; Restore the esi register<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 20, ''ebp + 12''</td><br />
<td align='center' width='100'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 16, ''ebp + 8''</td><br />
<td align='center'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12, ''ebp + 4''</td><br />
<td align='center'>0x400007</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: yellow'>esp + 8, '''''ebp'''''</td><br />
<td align='center'>(ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4, ''ebp - 4''</td><br />
<td align='center'>esi (var1)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan;'>'''''esp ''''', ''ebp - 8''</td><br />
<td align='center'>edi (var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4, ''ebp - 12''</td><br />
<td align='center' style='color: green;'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8, ''ebp - 16''</td><br />
<td align='center' style='color: green;'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12, ''ebp - 20''</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x401011 add esp, 8 ; Remove the local variables from the stack<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 12, ''ebp + 12''</td><br />
<td align='center' width='100'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8, ''ebp + 8''</td><br />
<td align='center'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4, ''ebp + 4''</td><br />
<td align='center'>0x400007</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp''''', '''''ebp'''''</td><br />
<td align='center'>(ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4, ''ebp - 4''</td><br />
<td align='center'>esi (var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8, ''ebp - 8''</td><br />
<td align='center'>edi (var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12, ''ebp - 12''</td><br />
<td align='center'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16, ''ebp - 16''</td><br />
<td align='center'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20, ''ebp - 20''</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x401012 pop ebp ; Restore ebp<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 8</td><br />
<td align='center' width='100'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan;'>'''''esp '''''</td><br />
<td align='center'>0x400007</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center' style='color: green;'>(ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>esi (var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>edi (var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
0x401013 ret ; Return (eax isn't set, so there's no return value)<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 4</td><br />
<td align='center' width='100'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp'''''</td><br />
<td align='center'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center' style='color: green;'>0x400007</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>(ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>esi (var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>edi (var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 28</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
0x400007 add esp, 8 ; Clear the stack<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp - 4</td><br />
<td align='center' width='100'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>0x400007</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>(ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>esi (var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>edi (var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 28</td><br />
<td align='center'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 32</td><br />
<td align='center'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 36</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
== Balance ==<br />
<br />
This should be rather obvious from the examples shown above, but it is worth paying special attention to.<br />
<br />
Every function should leave the stack pointer in the exact place it received it. In other words, every amount subtracted from the stack (either by sub or push) ''has to be added to the stack'' (either by add or pop). If it isn't, the return value won't be in the right place and the program will likely crash.<br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, I'll do my best to answer them. But you may need to contact me to let me know that a question exists.</div>Mogigomahttps://wiki.skullsecurity.org/index.php?title=Example_2&diff=3111Example 22011-01-16T17:10:37Z<p>Mogigoma: /* Annotated Code */</p>
<hr />
<div>{{Infobox assembly}}<br />
[[Category: Assembly Examples]]<br />
<br />
This example is the first step in Starcraft's CDKey Decode. This shuffles the characters in the key in a predictable way. I made this the second example because it's a little trickier, but it's not terribly difficult. <br />
<br />
As in the previous example, try figuring this out on your own first! <br />
<br />
<pre><br />
lea edi, [esi+0Bh]<br />
mov ecx, 0C2h<br />
top:<br />
mov eax, ecx<br />
mov ebx, 0Ch<br />
cdq<br />
idiv ebx<br />
mov al, [edi]<br />
sub ecx, 11h<br />
dec edi<br />
cmp ecx, 7<br />
mov bl, [edx+esi]<br />
mov [edi+1], bl<br />
mov [edx+esi], al<br />
jge top<br />
</pre><br />
<br />
<br />
== Annotated Code ==<br />
<br />
Here, comments have been added explaining each line a little bit. These comments are added in an attempt to understand what the code's doing. <br />
<pre><br />
; This is actually continued from the last example, so esi contains the verified CDKey. <br />
lea edi, [esi+0Bh] ; edi is a pointer to the 12th character in the cdkey (0x0b = 11, and arrays start at 0).<br />
mov ecx, 0C2h ; Set ecx to 0xC2. Recall that ecx is often a loop counter. <br />
top:<br />
mov eax, ecx ; Move the loop counter to eax.<br />
mov ebx, 0Ch ; Set ebx to 0x0C (0x0C = 12, and arrays are indexed from 0, so the CDKey string goes from 0 to 12).<br />
cdq ; Get ready for a signed division.<br />
idiv ebx ; Divide the loop counter by 0x0C. It isn't clear yet whether this is division or modulus, but<br />
; because an accumulator is being divided by the CDKey length, it's logical to assume that <br />
; this is modular division. edx will likely be used, and eax will likely be overwritten. <br />
<br />
mov al, [edi] ; Move the value that edi points to (which is a character in the CDKey) to al. Recall that al is the<br />
; right-most byte in eax. This confirms two things: that edi points to a character in the CDKey<br />
; (since a character is 1 byte) and that the division above is modulus (because eax is overwritten). <br />
sub ecx, 11h ; Subtract 0x11 from ecx. Recall that ecx is often a loop counter, and likely is in this case. <br />
dec edi ; Decrement the pointer into the CDKey. edi started at the 12th character and is moving backwards. <br />
cmp ecx, 7 ; Compare ecx to 7, which confirms that ecx is the loop counter. The jump corresponding to this<br />
; comparison is a few lines below. <br />
mov bl, [edx+esi] ; Recall that edx is the remainder from the above division, which is (accumulator % 12), and that <br />
; esi points to the CDKey. So this takes the character corresponding to the accumulator and moves<br />
; it into bl, which is the right-most byte of ebx. <br />
mov [edi+1], bl ; Overwrite the character that edi pointed to at the top of the loop. Recall that [edi] is moved <br />
; into al, then decremented above, which is why this is +1 (to offset the decrement). <br />
mov [edx+esi], al ; Move al into the string corresponding to the modular division. <br />
jge top ; Jump as long as the ecx counter is greater than or equal to 7<br />
;<br />
; Note that the loop here is simply a swap. edi decrements, starting from the 12th character and<br />
; moving backwards. ecx is reduced by 0x11 each time, and the character at (ecx % 12) is swapped<br />
; with the character pointed at by edi. <br />
</pre><br />
<br />
== C Code ==<br />
<br />
Please, try this yourself first!<br />
<br />
This is an absolutely direct conversion from the annotated assembly to C. I added a main function that sends a bunch of test keys through the function to print out the results.<br />
<br />
Now that a driver function can test the CDKey shuffler, the code can be reduced and condensed. <br />
<br />
<pre><br />
#include <stdio.h><br />
<br />
/* Prototype */<br />
void shuffleCDKey(char *key);<br />
<br />
int main(int argc, char *argv[])<br />
{<br />
/* A series of test cases (I'm using fake keys here obviously, but real ones work even better) */<br />
char keys[3][14] = { "1212121212121", /* Valid */<br />
"1234567890123", /* Valid */<br />
"4962883551538" /* Valid */<br />
};<br />
char *results[] = { "1112222221111",<br />
"7130422865193",<br />
"3461239588558" };<br />
<br />
<br />
int i;<br />
<br />
for(i = 0; i < 3; i++)<br />
{<br />
printf("Original: %s\n", keys[i]);<br />
shuffleCDKey(keys[i]);<br />
printf("Shuffled: %s\n", keys[i]);<br />
printf("Should be: %s\n\n", results[i]);<br />
}<br />
<br />
return 0;<br />
}<br />
<br />
void shuffleCDKey(char *key)<br />
{<br />
int eax, ebx, ecx, edx;<br />
char *esi;<br />
char *edi;<br />
<br />
esi = key;<br />
<br />
// ; This is actually continued from the last example, so esi contains the verified CDKey.<br />
// lea edi, [esi+0Bh] ; edi is a pointer to the 12th character in the cdkey (0x0b = 11, and arrays start at 0).<br />
edi = (esi + 0x0b);<br />
// mov ecx, 0C2h ; Set ecx to 0xC2. Recall that ecx is often a loop counter.<br />
ecx = 0xc2;<br />
//top:<br />
do<br />
{<br />
// mov eax, ecx ; Move the loop counter to eax.<br />
eax = ecx;<br />
// mov ebx, 0Ch ; Set ebx to 0x0C (0x0C = 12, an arrays are indexed from 0, so the CDKey string goes from 0 to 12).<br />
ebx = 0x0c;<br />
// cdq ; Get ready for a signed division.<br />
// idiv ebx ; Divide the loop counter by 0x0C. It isn't clear yet whether this is division or modulus, but<br />
// ; because an accumulator is being divided by the CDKey length, it's logical to assume that<br />
// ; this is modular division. edx will likely be used, and eax will likely be overwritten.<br />
edx = eax % ebx;<br />
//<br />
// mov al, [edi] ; Move the value that edi points to (which is a character in the CDKey) to al. Recall that al is the<br />
// ; right-most byte in eax. This confirms two things: that edi points to a character in the CDKey<br />
// ; (since a character is 1 byte) and that the division above is modulus (because eax is overwritten).<br />
eax = (char) *edi;<br />
<br />
// sub ecx, 11h ; Subtract 0x11 from ecx. Recall that ecx is often a loop counter, and likely is in this case.<br />
ecx = ecx - 0x11;<br />
// dec edi ; Decrement the pointer into the CDKey. edi started at the 12th character and is moving backwards.<br />
edi--;<br />
// cmp ecx, 7 ; Compare ecx to 7, which confirms that ecx is the loop counter. The jump corresponding to this<br />
// ; comparison is a few lines below.<br />
/* will handle this compare later */<br />
// mov bl, [edx+esi] ; Recall that edx is the remainder from the above division, which is (accumulator % 12), and that<br />
// ; esi points to the CDKey. So this takes the character corresponding to the accumulator and moves<br />
// ; it into bl, which is the right-most byte of ebx.<br />
ebx = (char) *(edx + esi);<br />
// mov [edi+1], bl ; Overwrite the character that edi pointed to at the top of the loop. Recall that [edi] is moved<br />
// ; into al, then decremented above, which is why this is +1 (to offset the decrement).<br />
*(edi + 1) = (char) ebx;<br />
// mov [edx+esi], al ; Move al into the string corresponding to the modular division.<br />
*(edx + esi) = (char) eax;<br />
// jge top ; Jump as long as the ecx counter is greater than or equal to 7<br />
}<br />
while(ecx >= 7);<br />
// ;<br />
// ; Note that the loop here is simply a swap. edi decrements, starting from the 12th character and<br />
// ; moving backwards. ecx is reduced by 0x11 each time, and the character at (ecx % 12) is swapped<br />
// ; with the character pointed at by edi.<br />
//<br />
}<br />
</pre><br />
<br />
== Cleaned up C Code ==<br />
Here's the code after removing the assembly, fixing up the variable declarations, and changing hex values to decimal:<br />
<pre><br />
void shuffleCDKey(char *key)<br />
{<br />
char *esi = key;<br />
char *edi = esi + 11;<br />
<br />
int ecx = 0xc2;<br />
int eax, ebx, edx;<br />
<br />
do<br />
{<br />
eax = ecx;<br />
ebx = 12;<br />
edx = eax % ebx;<br />
eax = (char) *edi;<br />
ecx = ecx - 17;<br />
edi--;<br />
ebx = (char) *(edx + esi);<br />
*(edi + 1) = (char) ebx;<br />
*(edx + esi) = (char) eax;<br />
}<br />
while(ecx >= 7);<br />
}<br />
</pre><br />
<br />
== Reduced C Code ==<br />
<br />
This code works, and can be used. However, for this exercise, reducing it all the way helps. <br />
<br />
Some variables are substituted where they aren't necessary:<br />
<pre><br />
void shuffleCDKey(char *key)<br />
{<br />
char *esi = key;<br />
char *edi = esi + 11;<br />
<br />
int ecx = 0xc2;<br />
int eax, ebx, edx;<br />
<br />
do<br />
{<br />
edx = ecx % 12;<br />
eax = (char) *edi;<br />
ecx = ecx - 17;<br />
edi--;<br />
*(edi + 1) = (char) *(edx + esi);<br />
*(edx + esi) = (char) eax;<br />
}<br />
while(ecx >= 7);<br />
}<br />
</pre><br />
<br />
Change the do loop to a for loop, rename ecx, and change the decrement of edi to the bottom of the loop:<br />
<pre><br />
void shuffleCDKey(char *key)<br />
{<br />
char *esi = key;<br />
char *edi = esi + 11;<br />
<br />
int i;<br />
int eax, ebx, edx;<br />
<br />
for(i = 0xc2; i >= 7; i -= 17)<br />
{<br />
edx = i % 12;<br />
eax = (char) *edi;<br />
*edi = (char) *(edx + esi);<br />
*(edx + esi) = (char) eax;<br />
edi--;<br />
}<br />
}<br />
</pre><br />
<br />
Replace esi with key, change pointers to arrays, declare eax as a char (to remove typecasting):<br />
<pre><br />
void shuffleCDKey(char *key)<br />
{<br />
char *edi = key + 11;<br />
<br />
int i;<br />
char eax;<br />
int ebx, edx;<br />
<br />
for(i = 0xc2; i >= 7; i -= 17)<br />
{<br />
edx = i % 12;<br />
eax = *edi;<br />
*edi = key[edx];<br />
key[edx] = eax;<br />
edi--;<br />
}<br />
}<br />
</pre><br />
<br />
Replacing the variable swap with a swap() function cleans things up significantly:<br />
<pre><br />
void swap (char *a, char *b)<br />
{<br />
char temp = *a;<br />
*a = *b;<br />
*b = temp;<br />
}<br />
<br />
void shuffleCDKey(char *key)<br />
{<br />
char *edi = key + 11;<br />
<br />
int i;<br />
char eax;<br />
int ebx, edx;<br />
<br />
for(i = 0xc2; i >= 7; i -= 17)<br />
{<br />
edx = i % 12;<br />
swap(edi, key + edx);<br />
edi--;<br />
}<br />
}<br />
</pre><br />
<br />
== Finished Code ==<br />
<br />
Finally, rename some variables, eliminate unused variables, and combine where possible, leaving this slick little function:<br />
<br />
<pre><br />
void shuffleCDKey(char *key)<br />
{<br />
char *position = key + 11;<br />
int i;<br />
for(i = 194; i >= 7; i -= 17)<br />
swap(position--, key + (i % 12));<br />
}<br />
</pre><br />
<br />
And that's it! Testing it with the driver function verifies that it still works.<br />
<br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, I'll do my best to answer them. But you may need to contact me to let me know that a question exists.</div>Mogigomahttps://wiki.skullsecurity.org/index.php?title=Simple_Instructions&diff=3110Simple Instructions2011-01-16T16:09:16Z<p>Mogigoma: /* and, or, xor, neg */</p>
<hr />
<div>{{Infobox assembly}}<br />
<br />
This section will go over some basic assembly instructions that you will likely see frequently. Some of the functions shown here are tricky, and some have special properties (such as the registers they use). Additionally, x86 assembly is comprised of hundreds of different instructions. As a result, you will likely want to find a complete reference book or website to have alongside you. This page however, will give enough of an introduction to get you started. <br />
<br />
== Pointers and Dereferencing==<br />
First, we will start with the hard stuff. If you understood the pointers section, this shouldn't be too bad. If you didn't, you should probably go back and refresh your memory. <br />
<br />
Recall that a pointer is a data type that stores an address as its value. Since registers are simply 32-bit values with no actual types, any register may or may not be a pointer, depending on what is stored. It is the responsibility of the program to treat pointers as pointers and to treat non-pointers as non-pointers. <br />
<br />
If a value is a pointer, it can be dereferenced. Recall that dereferencing a pointer retrieves the value stored at the address being pointed to. In assembly, this is generally done by putting square brackets ("[" and "]") around the register. For example:<br />
* eax -- is the value stored in eax<br />
* [eax] -- is the value pointed to by eax<br />
This will be thoroughly discussed in upcoming sections.<br />
<br />
== Doing Nothing ==<br />
The ''nop'' instruction is probably the simplest instruction in assembly. nop is short for "no operation" and it does nothing. This instruction is used for padding. <br />
<br />
== Moving Data Around ==<br />
The instructions in this section deal with relocating numbers and pointers. <br />
<br />
=== mov, movsx, movzx ===<br />
''mov'' is the instruction used for assignment, analogous to the "=" sign in most languages. mov can move data between a register and memory, two registers, or a constant to a register. Here are some examples:<br />
mov eax, 1 ; set eax to 1 (eax = 1)<br />
mov edx, ecx ; set edx to whatever ecx is (edx = ecx)<br />
mov eax, 18h ; set eax to 0x18<br />
mov eax, [ebx] ; set eax to the value in memory that ebx is pointing at<br />
mov [ebx], 3 ; move the number 3 into the memory address that ebx is pointing at<br />
<br />
''movsx'' and ''movzx'' are special versions of mov which are designed to be used between signed (movsx) and unsigned (movzx) registers of different sizes. <br />
<br />
''movsx'' means ''move with sign extension''. The data is moved from a smaller register into a bigger register, and the sign is preserved by either padding with 0's (for positive values) or F's (for negative values). Here are some examples:<br />
* '''0x1000''' becomes '''0x00001000''', since it was positive<br />
* '''0x7FFF''' becomes '''0x00007FFF''', since it was positive<br />
* '''0xFFFF''' becomes '''0xFFFFFFFF''', since it was negative (note that 0xFFFF is -1 in 16-bit signed, and 0xFFFFFFFF is -1 in 32-bit signed)<br />
* '''0x8000''' becomes '''0xFFFF8000''', since it was negative (note that 0x8000 is -32768 in 16-bit signed, and 0xFFFF8000 is -32768 in 32-bit signed)<br />
<br />
''movzx'' means ''move with zero extension''. The data is moved from a smaller register into a bigger register, and the sign is ignored. Here are some examples:<br />
* '''0x1000''' becomes '''0x00001000'''<br />
* '''0x7FFF''' becomes '''0x00007FFF'''<br />
* '''0xFFFF''' becomes '''0x0000FFFF'''<br />
* '''0x8000''' becomes '''0x00008000'''<br />
<br />
=== lea ===<br />
''lea'' is very similar to mov, except that math can be done on the original value before it is used. The "[" and "]" characters always surround the second parameter, but in this case they ''do not indicate dereferencing'', it is easiest to think of them as just being part of the formula. <br />
<br />
lea is generally used for calculating array offsets, since the address of an element of the array can be found with [arraystart + offset*datasize]. lea can also be used for quickly doing math, often with an addition and a multiplication. Examples of both uses are below. <br />
<br />
Here are some examples of using lea:<br />
lea eax, [eax+eax] ; Double the value of eax -- eax = eax * 2<br />
lea edi, [esi+0Bh] ; Add 11 to esi and store the result in edi<br />
lea eax, [esi+ecx*4] ; This is generally used for indexing an array of integers. esi is a <br />
pointer to the beginning of an array, and ecx is the index of the <br />
element that is to be retrieved. The index is multiplied by 4 <br />
because Integers are 4 bytes long. eax will end up storing the <br />
address of the ecx'th element of the array. <br />
<br />
lea edi, [eax+eax*2] ; Triple the value of eax -- eax = eax * 3<br />
lea edi, [eax+ebx*2] ; This likely indicates that eax stores an array of 16-bit (2 byte) <br />
values, and that ebx is an offset into it. Note the similarities <br />
between this and the previous example: the same math is being done, <br />
but for a different reason. <br />
<br />
== Math and Logic ==<br />
The instructions in this section deal with math and logic. Some are simple, and others (such as multiplication and division) are pretty tricky. <br />
<br />
=== add, sub ===<br />
A register can have either another register, a constant value, or a pointer added to or subtracted from it. The syntax of addition and subtraction is fairly simple:<br />
add eax, 3 ; Adds 3 to eax -- eax = eax + 3<br />
add ebx, eax ; Adds the value of eax to ebx -- ebx = ebx + eax<br />
sub ecx, 3 ; Subtracts 3 from ecx -- ecx = ecx - 3<br />
<br />
=== inc, dec ===<br />
These instructions simply increment and decrement a register. <br />
inc eax ; eax++<br />
dec ecx ; ecx--<br />
<br />
=== and, or, xor, neg ===<br />
All logical instructions are bitwise. If you don't know what "bitwise arithmetic" means, you should probably look it up. The simplest way of thinking of this is that each bit in the two operands has the operation done between them, and the result is stored in the first one. <br />
<br />
The instructions are pretty self-explanatory: and does a bitwise 'and', or does a bitwise 'or', xor does a bitwise 'xor', and neg does a bitwise negation.<br />
<br />
Here are some examples:<br />
and eax, 7 ; eax = eax & 7 -- because 7 is 000..000111, this clears all bits <br />
except for the last three. <br />
or eax, 16 ; eax = eax | 16 -- because 16 is 000..00010000, this sets the 5th <br />
bit from the right to "1". <br />
xor eax, 1 ; eax = eax ^ 1 -- this toggles the right-most bit in eax, 0=>1 or <br />
1=>0.<br />
xor eax, FFFFFFFFh ; eax = eax ^ 0xFFFFFFFF -- this toggles every bit in eax, which is <br />
identical to a bitwise negation.<br />
neg eax ; eax = ~eax -- inverts every bit in eax, same as the previous.<br />
xor eax, eax ; eax = 0 -- this clears eax quickly, and is extremely <br />
common.<br />
<br />
=== mul, imul, div, idiv, cdq ===<br />
Multiplication and division are the trickiest operations commonly used, because of how they deal with overflow issues. Both multiplication and division make use of the 64-bit register edx:eax. <br />
<br />
''mul'' multiplies the unsigned value in eax with the operand, and stores the result in the 64-bit pointer edx:eax. ''imul'' does the same thing, except the value is signed. Here are some examples of mul:<br />
mul ecx ; edx:eax = eax * ecx (unsigned)<br />
imul edx ; edx:eax = eax * edx (signed)<br />
<br />
When used with two parameters, ''mul'' instead multiplies the first by the second as expected:<br />
mul ecx, 10h ; ecx = ecx * 0x10 (unsigned)<br />
imul ecx, 20h ; ecx = ecx * 0x20 (signed)<br />
<br />
''div'' divides the 64-bit value in edx:eax by the operand, and stores the quotient in eax. The remainder (modulus) is stored in edx. In other words, div does both division and modular division, at the same time. Typically, a program will only use one or the other, so you will have to check which instructions follow to see whether eax or edx is saved. Here are some examples:<br />
div ecx ; eax = edx:eax / ecx (unsigned)<br />
; edx = edx:eax % ecx (unsigned)<br />
<br />
idiv edx ; eax = edx:eax / ecx (signed)<br />
; edx = edx:eax % ecx (signed)<br />
<br />
''cdq'' is generally used immediately before idiv. It stands for "convert double to quad." In other words, convert the 32-bit value in eax to the 64-bit value in edx:eax, overwriting anything in edx with either 0's (if eax is positive) or F's (if eax is negative). This is very similar to movsx, above. <br />
<br />
''xor edx, edx'' is generally used immediately before div. It clears edx to ensure that no leftover data is divided. <br />
<br />
Here is a common use of cdq and idiv:<br />
mov eax, 1007 ; 1007 will be divided<br />
mov ecx, 10 ; .. by 10<br />
cdq ; extends eax into edx<br />
idiv ecx ; eax will be 1007/10 = 100, and edx will be 1007%10 = 7<br />
<br />
Here is a common use of xor and div (the results are the same as the previous example):<br />
mov eax, 1007<br />
mov ecx, 10<br />
xor edx, edx<br />
div ecx<br />
<br />
== shl, shr ==<br />
shl stands for shift left, and shr stands for shift right. They are used to do a binary shift, equivalent to the C operations << and >>. <br />
<br />
They each take two operations: the register to shift, and the distance to shift it.<br />
<br />
== Jumping Around ==<br />
Instructions in this section are used to compare values and to make jumps. These jumps are used for calls, if statements, and every type of loop. The operand for most jump instructions is the address to jump to. <br />
<br />
=== jmp ===<br />
''jmp'', or jump, sends the program execution to the specified address no matter what. Here is an example:<br />
jmp 1400h ; jump to the address 0x1400<br />
<br />
=== call, ret ===<br />
''call'' is similar to jump, except that in addition to sending the program to the specified address, it also saves ("pushes") the address of the executable instruction onto the stack. This will be explained more in a later section. <br />
<br />
''ret'' removes ("pops") the first value off of the stack, and jumps to it. In almost all cases, this value was placed onto the stack by the call instruction. If the stack pointer is at the wrong location, or the saved address was overwritten, ret attempts to jump to an invalid address which usually crashes the program. In some cases, it may jump to the wrong place where the program will almost inevitably crash. <br />
<br />
''ret'' can also have a parameter. This parameter is added to the stack immediately after ret executes its jump. This addition allows the function to remove values that were pushed onto the stack. This will be discussed in a later section. <br />
<br />
The combination of ''call'' and ''ret'' are used to implement functions. Here is an example of a simple function:<br />
<br />
<pre> call 4000h<br />
...... ; any amount of code<br />
4000h:<br />
mov eax, 1<br />
ret ; Because eax represents the return value, this function would return 1, and <br />
nothing else would happen<br />
</pre><br />
<br />
=== cmp, test ===<br />
''cmp'', or compare, compares the two operands and sets a number of flags in a special-purpose register based on the result. Specialized jump commands can check these flags to jump on certain conditions. One way of remembering how ''cmp'' works is to think of it as subtracting the second parameter from the first, comparing the result to 0, and throwing away the result. <br />
<br />
''test'' is very similar to ''cmp'', except that it performs a bitwise 'and' operation between the two variables (and throws away the result), and compares it to zero. ''test'' is most commonly used to compare a variable to itself to check if it's zero. <br />
<br />
Here are the most common flags:<br />
* Zero -- set if and only if the two elements are equal (ie, if the resultant operation was equal to zero)<br />
* Greater than -- set if the first element is greater than the second (ie, if the resultant operation was greater than zero)<br />
* Less than -- set if the first element is less than the second (ie, if the resultant operation was less than zero)<br />
<br />
Flags are set by most arithmetic commands. The most commonly used commands used for comparisons are cmp, inc, and dec.<br />
<br />
=== jz/je, jnz/jne, jl/jb, jg, jle, jge ===<br />
* ''jz'' and ''je'' (which are synonyms) will jump to the address specified if and only if the 'zero' flag is set, which indicates that the two values were equal. In other words, "jump if equal". <br />
* ''jnz'' and ''jne'' (which are also synonyms) will jump to the address specified if and only if the 'zero' flag is not set, which indicates that the two values were not equal. In other words, "jump if different". <br />
* ''jl'' and ''jb'' (which are synonyms) jumps if the first parameter is less than the second. <br />
* ''jg'' jumps if the first parameter is greater than the second. <br />
* ''jle'' jumps if the 'less than' or the 'zero' flag is set, so "less than or equal to". <br />
* ''jge'' jumps if the first is "greater than or equal to" the second.<br />
<br />
These jumps are all used to implement various loops and conditions. For example, here is some C code:<br />
if(a == 3)<br />
b;<br />
else<br />
c;<br />
And here is how it might look in assembly (not exactly assembly, but this is an example):<br />
10 cmp a, 3<br />
20 jne 50<br />
30 b<br />
40 jmp 60<br />
50 c<br />
60<br />
<br />
Here is an example of a loop in C:<br />
for(i = 0; i < 5; i++)<br />
{<br />
a;<br />
b;<br />
}<br />
And here is the equivalent loop in assembly:<br />
10 mov ecx, 0<br />
20 a<br />
30 b<br />
40 inc ecx<br />
50 cmp ecx, 5<br />
60 jl 20<br />
<br />
== Manipulating the Stack ==<br />
Functions in this section are used for adding and removing data from the stack. The stack will be examined in detail in a later section; this section will simply show some commonly used commands. <br />
<br />
=== push, pop ===<br />
''push'' decrements the stack pointer by the size of the operand, then saves the operand to the new address. This line:<br />
push ecx<br />
Is functionally equivalent to:<br />
sub esp, 4<br />
mov [esp], ecx<br />
<br />
''pop'' sets the operand to the value on the stack, then increments the stack pointer by the size of the operand. This assembly:<br />
pop ecx<br />
Is functionally equivalent to:<br />
mov ecx, [esp]<br />
add esp, 4<br />
<br />
This will be examined in detail in the Stack section of this tutorial.<br />
<br />
=== pushaw, pushad, popaw, popad ===<br />
''pushaw'' and ''pushad'' save all 16-bit or 32-bit registers (respectively) onto the stack. <br />
<br />
''popaw'' and ''popad'' restore all 16-bit or 32-bit registers from the stack. <br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, and I will do my best to answer them; however, you may need to contact me to let me know that a question exists.<br />
<br />
Further explain bitwise<br />
<br />
In your example code:<br />
10 mov ecx, 0<br />
20 a<br />
30 b<br />
40 cmp ecx, 5<br />
50 jl 20<br />
Wont this just loop forever because ecx is never incremented? Total noob here so i may have missed something obvious.<br />
*** Yes, my mistake. <br />
<br />
Indeed, would not this be more accurate?:<br />
10 mov ecx, 0<br />
20 a<br />
30 b<br />
40 inc ecx<br />
50 comp ecx, 5<br />
60 jl 20<br />
*** Changed. <br />
<br />
Also, The stack is decremented when pushed, but increased when poped? Isn't this counterintuitive?<br />
*** Yes, the stack starts high and grows downwards. Welcome to x86 assembler!</div>Mogigomahttps://wiki.skullsecurity.org/index.php?title=Simple_Instructions&diff=3109Simple Instructions2011-01-16T15:43:50Z<p>Mogigoma: /* Pointers and Dereferencing */</p>
<hr />
<div>{{Infobox assembly}}<br />
<br />
This section will go over some basic assembly instructions that you will likely see frequently. Some of the functions shown here are tricky, and some have special properties (such as the registers they use). Additionally, x86 assembly is comprised of hundreds of different instructions. As a result, you will likely want to find a complete reference book or website to have alongside you. This page however, will give enough of an introduction to get you started. <br />
<br />
== Pointers and Dereferencing==<br />
First, we will start with the hard stuff. If you understood the pointers section, this shouldn't be too bad. If you didn't, you should probably go back and refresh your memory. <br />
<br />
Recall that a pointer is a data type that stores an address as its value. Since registers are simply 32-bit values with no actual types, any register may or may not be a pointer, depending on what is stored. It is the responsibility of the program to treat pointers as pointers and to treat non-pointers as non-pointers. <br />
<br />
If a value is a pointer, it can be dereferenced. Recall that dereferencing a pointer retrieves the value stored at the address being pointed to. In assembly, this is generally done by putting square brackets ("[" and "]") around the register. For example:<br />
* eax -- is the value stored in eax<br />
* [eax] -- is the value pointed to by eax<br />
This will be thoroughly discussed in upcoming sections.<br />
<br />
== Doing Nothing ==<br />
The ''nop'' instruction is probably the simplest instruction in assembly. nop is short for "no operation" and it does nothing. This instruction is used for padding. <br />
<br />
== Moving Data Around ==<br />
The instructions in this section deal with relocating numbers and pointers. <br />
<br />
=== mov, movsx, movzx ===<br />
''mov'' is the instruction used for assignment, analogous to the "=" sign in most languages. mov can move data between a register and memory, two registers, or a constant to a register. Here are some examples:<br />
mov eax, 1 ; set eax to 1 (eax = 1)<br />
mov edx, ecx ; set edx to whatever ecx is (edx = ecx)<br />
mov eax, 18h ; set eax to 0x18<br />
mov eax, [ebx] ; set eax to the value in memory that ebx is pointing at<br />
mov [ebx], 3 ; move the number 3 into the memory address that ebx is pointing at<br />
<br />
''movsx'' and ''movzx'' are special versions of mov which are designed to be used between signed (movsx) and unsigned (movzx) registers of different sizes. <br />
<br />
''movsx'' means ''move with sign extension''. The data is moved from a smaller register into a bigger register, and the sign is preserved by either padding with 0's (for positive values) or F's (for negative values). Here are some examples:<br />
* '''0x1000''' becomes '''0x00001000''', since it was positive<br />
* '''0x7FFF''' becomes '''0x00007FFF''', since it was positive<br />
* '''0xFFFF''' becomes '''0xFFFFFFFF''', since it was negative (note that 0xFFFF is -1 in 16-bit signed, and 0xFFFFFFFF is -1 in 32-bit signed)<br />
* '''0x8000''' becomes '''0xFFFF8000''', since it was negative (note that 0x8000 is -32768 in 16-bit signed, and 0xFFFF8000 is -32768 in 32-bit signed)<br />
<br />
''movzx'' means ''move with zero extension''. The data is moved from a smaller register into a bigger register, and the sign is ignored. Here are some examples:<br />
* '''0x1000''' becomes '''0x00001000'''<br />
* '''0x7FFF''' becomes '''0x00007FFF'''<br />
* '''0xFFFF''' becomes '''0x0000FFFF'''<br />
* '''0x8000''' becomes '''0x00008000'''<br />
<br />
=== lea ===<br />
''lea'' is very similar to mov, except that math can be done on the original value before it is used. The "[" and "]" characters always surround the second parameter, but in this case they ''do not indicate dereferencing'', it is easiest to think of them as just being part of the formula. <br />
<br />
lea is generally used for calculating array offsets, since the address of an element of the array can be found with [arraystart + offset*datasize]. lea can also be used for quickly doing math, often with an addition and a multiplication. Examples of both uses are below. <br />
<br />
Here are some examples of using lea:<br />
lea eax, [eax+eax] ; Double the value of eax -- eax = eax * 2<br />
lea edi, [esi+0Bh] ; Add 11 to esi and store the result in edi<br />
lea eax, [esi+ecx*4] ; This is generally used for indexing an array of integers. esi is a <br />
pointer to the beginning of an array, and ecx is the index of the <br />
element that is to be retrieved. The index is multiplied by 4 <br />
because Integers are 4 bytes long. eax will end up storing the <br />
address of the ecx'th element of the array. <br />
<br />
lea edi, [eax+eax*2] ; Triple the value of eax -- eax = eax * 3<br />
lea edi, [eax+ebx*2] ; This likely indicates that eax stores an array of 16-bit (2 byte) <br />
values, and that ebx is an offset into it. Note the similarities <br />
between this and the previous example: the same math is being done, <br />
but for a different reason. <br />
<br />
== Math and Logic ==<br />
The instructions in this section deal with math and logic. Some are simple, and others (such as multiplication and division) are pretty tricky. <br />
<br />
=== add, sub ===<br />
A register can have either another register, a constant value, or a pointer added to or subtracted from it. The syntax of addition and subtraction is fairly simple:<br />
add eax, 3 ; Adds 3 to eax -- eax = eax + 3<br />
add ebx, eax ; Adds the value of eax to ebx -- ebx = ebx + eax<br />
sub ecx, 3 ; Subtracts 3 from ecx -- ecx = ecx - 3<br />
<br />
=== inc, dec ===<br />
These instructions simply increment and decrement a register. <br />
inc eax ; eax++<br />
dec ecx ; ecx--<br />
<br />
=== and, or, xor, neg ===<br />
All logical instructions are bitwise. If you don't know what "bitwise arithmetic" means, you should probably look it up. The simplest way of thinking of this is that each bit in the two operands have the operation done between them, and the result is stored in the first one. <br />
<br />
The instructions are pretty self-explanatory: and does a bitwise 'and', or does a bitwise 'or', xor does a bitwise 'xor', and neg does a bitwise negation.<br />
<br />
Here are some examples:<br />
and eax, 7 ; eax = eax & 7 -- because 7 is 000..000111, this clears all bits <br />
except for the last three. <br />
or eax, 16 ; eax = eax | 16 -- because 16 is 000..00010000, this sets the 5th <br />
bit from the right to "1". <br />
xor eax, 1 ; eax = eax ^ 1 -- this toggles the right-most bit in eax, 0=>1 or <br />
1=>0.<br />
xor eax, FFFFFFFFh ; eax = eax ^ 0xFFFFFFFF -- this toggles every bit in eax, which is <br />
identical to a bitwise negation.<br />
neg eax ; eax = ~eax -- inverts every bit in eax, same as the previous.<br />
xor eax, eax ; eax = 0 -- this clears eax quickly, and is extremely <br />
common.<br />
<br />
=== mul, imul, div, idiv, cdq ===<br />
Multiplication and division are the trickiest operations commonly used, because of how they deal with overflow issues. Both multiplication and division make use of the 64-bit register edx:eax. <br />
<br />
''mul'' multiplies the unsigned value in eax with the operand, and stores the result in the 64-bit pointer edx:eax. ''imul'' does the same thing, except the value is signed. Here are some examples of mul:<br />
mul ecx ; edx:eax = eax * ecx (unsigned)<br />
imul edx ; edx:eax = eax * edx (signed)<br />
<br />
When used with two parameters, ''mul'' instead multiplies the first by the second as expected:<br />
mul ecx, 10h ; ecx = ecx * 0x10 (unsigned)<br />
imul ecx, 20h ; ecx = ecx * 0x20 (signed)<br />
<br />
''div'' divides the 64-bit value in edx:eax by the operand, and stores the quotient in eax. The remainder (modulus) is stored in edx. In other words, div does both division and modular division, at the same time. Typically, a program will only use one or the other, so you will have to check which instructions follow to see whether eax or edx is saved. Here are some examples:<br />
div ecx ; eax = edx:eax / ecx (unsigned)<br />
; edx = edx:eax % ecx (unsigned)<br />
<br />
idiv edx ; eax = edx:eax / ecx (signed)<br />
; edx = edx:eax % ecx (signed)<br />
<br />
''cdq'' is generally used immediately before idiv. It stands for "convert double to quad." In other words, convert the 32-bit value in eax to the 64-bit value in edx:eax, overwriting anything in edx with either 0's (if eax is positive) or F's (if eax is negative). This is very similar to movsx, above. <br />
<br />
''xor edx, edx'' is generally used immediately before div. It clears edx to ensure that no leftover data is divided. <br />
<br />
Here is a common use of cdq and idiv:<br />
mov eax, 1007 ; 1007 will be divided<br />
mov ecx, 10 ; .. by 10<br />
cdq ; extends eax into edx<br />
idiv ecx ; eax will be 1007/10 = 100, and edx will be 1007%10 = 7<br />
<br />
Here is a common use of xor and div (the results are the same as the previous example):<br />
mov eax, 1007<br />
mov ecx, 10<br />
xor edx, edx<br />
div ecx<br />
<br />
== shl, shr ==<br />
shl stands for shift left, and shr stands for shift right. They are used to do a binary shift, equivalent to the C operations << and >>. <br />
<br />
They each take two operations: the register to shift, and the distance to shift it.<br />
<br />
== Jumping Around ==<br />
Instructions in this section are used to compare values and to make jumps. These jumps are used for calls, if statements, and every type of loop. The operand for most jump instructions is the address to jump to. <br />
<br />
=== jmp ===<br />
''jmp'', or jump, sends the program execution to the specified address no matter what. Here is an example:<br />
jmp 1400h ; jump to the address 0x1400<br />
<br />
=== call, ret ===<br />
''call'' is similar to jump, except that in addition to sending the program to the specified address, it also saves ("pushes") the address of the executable instruction onto the stack. This will be explained more in a later section. <br />
<br />
''ret'' removes ("pops") the first value off of the stack, and jumps to it. In almost all cases, this value was placed onto the stack by the call instruction. If the stack pointer is at the wrong location, or the saved address was overwritten, ret attempts to jump to an invalid address which usually crashes the program. In some cases, it may jump to the wrong place where the program will almost inevitably crash. <br />
<br />
''ret'' can also have a parameter. This parameter is added to the stack immediately after ret executes its jump. This addition allows the function to remove values that were pushed onto the stack. This will be discussed in a later section. <br />
<br />
The combination of ''call'' and ''ret'' are used to implement functions. Here is an example of a simple function:<br />
<br />
<pre> call 4000h<br />
...... ; any amount of code<br />
4000h:<br />
mov eax, 1<br />
ret ; Because eax represents the return value, this function would return 1, and <br />
nothing else would happen<br />
</pre><br />
<br />
=== cmp, test ===<br />
''cmp'', or compare, compares the two operands and sets a number of flags in a special-purpose register based on the result. Specialized jump commands can check these flags to jump on certain conditions. One way of remembering how ''cmp'' works is to think of it as subtracting the second parameter from the first, comparing the result to 0, and throwing away the result. <br />
<br />
''test'' is very similar to ''cmp'', except that it performs a bitwise 'and' operation between the two variables (and throws away the result), and compares it to zero. ''test'' is most commonly used to compare a variable to itself to check if it's zero. <br />
<br />
Here are the most common flags:<br />
* Zero -- set if and only if the two elements are equal (ie, if the resultant operation was equal to zero)<br />
* Greater than -- set if the first element is greater than the second (ie, if the resultant operation was greater than zero)<br />
* Less than -- set if the first element is less than the second (ie, if the resultant operation was less than zero)<br />
<br />
Flags are set by most arithmetic commands. The most commonly used commands used for comparisons are cmp, inc, and dec.<br />
<br />
=== jz/je, jnz/jne, jl/jb, jg, jle, jge ===<br />
* ''jz'' and ''je'' (which are synonyms) will jump to the address specified if and only if the 'zero' flag is set, which indicates that the two values were equal. In other words, "jump if equal". <br />
* ''jnz'' and ''jne'' (which are also synonyms) will jump to the address specified if and only if the 'zero' flag is not set, which indicates that the two values were not equal. In other words, "jump if different". <br />
* ''jl'' and ''jb'' (which are synonyms) jumps if the first parameter is less than the second. <br />
* ''jg'' jumps if the first parameter is greater than the second. <br />
* ''jle'' jumps if the 'less than' or the 'zero' flag is set, so "less than or equal to". <br />
* ''jge'' jumps if the first is "greater than or equal to" the second.<br />
<br />
These jumps are all used to implement various loops and conditions. For example, here is some C code:<br />
if(a == 3)<br />
b;<br />
else<br />
c;<br />
And here is how it might look in assembly (not exactly assembly, but this is an example):<br />
10 cmp a, 3<br />
20 jne 50<br />
30 b<br />
40 jmp 60<br />
50 c<br />
60<br />
<br />
Here is an example of a loop in C:<br />
for(i = 0; i < 5; i++)<br />
{<br />
a;<br />
b;<br />
}<br />
And here is the equivalent loop in assembly:<br />
10 mov ecx, 0<br />
20 a<br />
30 b<br />
40 inc ecx<br />
50 cmp ecx, 5<br />
60 jl 20<br />
<br />
== Manipulating the Stack ==<br />
Functions in this section are used for adding and removing data from the stack. The stack will be examined in detail in a later section; this section will simply show some commonly used commands. <br />
<br />
=== push, pop ===<br />
''push'' decrements the stack pointer by the size of the operand, then saves the operand to the new address. This line:<br />
push ecx<br />
Is functionally equivalent to:<br />
sub esp, 4<br />
mov [esp], ecx<br />
<br />
''pop'' sets the operand to the value on the stack, then increments the stack pointer by the size of the operand. This assembly:<br />
pop ecx<br />
Is functionally equivalent to:<br />
mov ecx, [esp]<br />
add esp, 4<br />
<br />
This will be examined in detail in the Stack section of this tutorial.<br />
<br />
=== pushaw, pushad, popaw, popad ===<br />
''pushaw'' and ''pushad'' save all 16-bit or 32-bit registers (respectively) onto the stack. <br />
<br />
''popaw'' and ''popad'' restore all 16-bit or 32-bit registers from the stack. <br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, and I will do my best to answer them; however, you may need to contact me to let me know that a question exists.<br />
<br />
Further explain bitwise<br />
<br />
In your example code:<br />
10 mov ecx, 0<br />
20 a<br />
30 b<br />
40 cmp ecx, 5<br />
50 jl 20<br />
Wont this just loop forever because ecx is never incremented? Total noob here so i may have missed something obvious.<br />
*** Yes, my mistake. <br />
<br />
Indeed, would not this be more accurate?:<br />
10 mov ecx, 0<br />
20 a<br />
30 b<br />
40 inc ecx<br />
50 comp ecx, 5<br />
60 jl 20<br />
*** Changed. <br />
<br />
Also, The stack is decremented when pushed, but increased when poped? Isn't this counterintuitive?<br />
*** Yes, the stack starts high and grows downwards. Welcome to x86 assembler!</div>Mogigomahttps://wiki.skullsecurity.org/index.php?title=Fundamentals&diff=3108Fundamentals2011-01-16T15:18:49Z<p>Mogigoma: /* Arrays */</p>
<hr />
<div>{{Infobox assembly}}<br />
<br />
This page is going to be about the fundamentals that you have to understand before you can make any sense out of assembly. Most of this stuff you'll learn if you learn to program in C. If this is old or boring stuff to you, feel free to skip this section entirely. <br />
<br />
The topics here are going to be a short overview of each section. If you want a more complete explanation, you should find an actual reference, or look it up on the Internet. This is only meant to be a quick and dirty primer. <br />
<br />
<br />
<br />
== Hexadecimal ==<br />
To work in assembly, you have to be able to read hexadecimal fairly comfortably. Converting to decimal in your mind isn't necessary, but being able to do some simple arithmetic is. <br />
<br />
Hex can be denoted in a number of ways, but the two most common are:<br />
* Prefixed with a 0x, eg. 0x1ef7<br />
* Postfixed with a h, eg. 1ef7h<br />
<br />
The characters 0 - f represent the decimal numbers 0 - 15:<br />
* 0 = 0<br />
* 1 = 1<br />
* ...<br />
* 9 = 9<br />
* a = 10<br />
* b = 11<br />
* c = 12<br />
* d = 13<br />
* e = 14<br />
* f = 15<br />
<br />
To convert from hex to decimal, multiply each digit, starting with the right-most, with 16<sup>0</sup>, 16<sup>1</sup>, 16<sup>2</sup>, etc. So in the example of 0x1ef7, the conversion is this:<br />
* (7 * 16<sup>0</sup>) + (f * 16<sup>1</sup>) + (e * 16<sup>2</sup>) + (1 * 16<sup>3</sup>)<br />
* = (7 * 16<sup>0</sup>) + (15 * 16<sup>1</sup>) + (14 * 16<sup>2</sup>) + (1 * 16<sup>3</sup>)<br />
* = (7 * 1) + (15 * 16) + (14 * 256) + (1 * 4096)<br />
* = 7 + 240 + 3584 + 4096<br />
* = 7927<br />
<br />
It isn't necessary to do that constantly, that's why we have calculators. But you should be fairly familiar with the numbers 00 - FF (0 - 255), they will come up often and you will spend a lot of time looking them up. <br />
<br />
<br />
<br />
== Binary == <br />
Binary, as we all know, is a number system using only 0's and 1's. The usage is basically the same as hex, but change powers of 16 to powers of 2. <br />
<br />
1011 to decimal:<br />
* (1 * 2<sup>0</sup>) + (1 * 2<sup>1</sup>) + (0 * 2<sup>2</sup>) + (1 * 2<sup>3</sup>)<br />
* = (1 * 1) + (1 * 2) + (0 * 4) + (1 * 8)<br />
* = 1 + 2 + 0 + 8<br />
* = 11<br />
<br />
Conversion between decimal and binary is rare, it's much more common to convert between hexadecimal and binary. This conversion is common because it's so easy: every 4 binary digits is converted to a single hex digit. So all you really need to know are the first 16 binary to hex conversions:<br />
* 0x0 = 0000<br />
* 0x1 = 0001<br />
* 0x2 = 0010<br />
* 0x3 = 0011<br />
* 0x4 = 0100<br />
* 0x5 = 0101<br />
* 0x6 = 0110<br />
* 0x7 = 0111<br />
* 0x8 = 1000<br />
* 0x9 = 1001<br />
* 0xa = 1010<br />
* 0xb = 1011<br />
* 0xc = 1100<br />
* 0xd = 1101<br />
* 0xe = 1110<br />
* 0xf = 1111<br />
<br />
So take the binary number 100101101001110, for example. <br />
# Pad the front with zeros to make its length a multiple of 4: 0100101101001110<br />
# Break it into 4-digit groups: 0100 1011 0100 1110<br />
# Look up each set of 4-digits on the table: 0x4 0xb 0x4 0xe<br />
# Put them all together 0x4b4e<br />
<br />
To go the other way is even easier, using 0x469e for example:<br />
# Separate the digits: 0x4 0x6 0x9 0xe<br />
# Convert each of them to binary, by the table: 0100 0110 1001 1110<br />
# Put them together, and leading zeros on the first group can be removed: 100011010011110<br />
<br />
<br />
<br />
== Datatypes ==<br />
<br />
A datatype basically refers to how digits in hex are partitioned off and divided into numbers. Datatypes are typically measures by two factors: the number of bits (or bytes), and whether or not negative numbers are allowed.<br />
<br />
The number of bits (or bytes) refers to the length of the number. An 8-bit (or 1-byte) number is made up of two hexadecimal digits. For example, 0x03, 0x34, and 0xFF are all 8-bit, while 0x1234, 0x0001, and 0xFFFF are 16-bit. <br />
<br />
The signed or unsigned property refers to whether or not the number can have negative values. If it can, then the maximum number is half of what it could have, with the other half being negatives. The way sign is determined is by looking at the very first bit. If the first bit is a 1, or the first hex digit is 8 - F, then it's negative and the rest of the number, inverted plus one, is used for the magnitude. <br />
<br />
For example (use a calculator to convert to binary):<br />
* 0x10 in binary is 0001 0000, so it's positive 16<br />
* 0xFF in binary is 1111 1111, so it's negative. The rest of the number is the 7-bits, 1111111, inverted to 0000000, plus one is 0000001, or -1 in decimal. <br />
* 0x80 in binary is 1000 0000, so it's negative. The rest of the number is the 7-bits, 0000000, inverted to 1111111, plus one is 10000000, or -128 in decimal. <br />
* 0x7F in binary is 0111 1111, so it's positive 127. <br />
<br />
Although different data lengths are called different things, here are some common ones by their usual name:<br />
<br />
* 8-bit (1 byte) = char (or BYTE)<br />
** In hex, can be 0x00 to 0xFF<br />
** Signed: ranges from -128 to 127<br />
** Unsigned: ranges from 0 to 255<br />
<br />
<br />
* 16-bit (2 bytes) = short int (often referred to as a WORD)<br />
** In hex, can be 0x0000 to 0xFFFF<br />
** Signed: ranges from -32768 to 32767<br />
** Unsigned: ranges from 0 to 65535<br />
<br />
<br />
* 32-bit (4 bytes) = long int (often referred to as a DWORD or double-WORD)<br />
** In hex, can be 0x00000000 to 0xFFFFFFFF<br />
** Signed: ranges from -2147483648 to 2147483647<br />
** Unsigned: ranges from 0 to 4294967295<br />
<br />
<br />
* 64-bit (8 bytes) = long long (often referred to as a QWORD or quad-WORD)<br />
** In hex, can be from 0x0000000000000000 to 0xFFFFFFFFFFFFFFFF<br />
** Signed: -9223372036854775808 to 9223372036854775807<br />
** Unsigned: 0 to 18446744073709551615<br />
<br />
== Memory ==<br />
<br />
Each running program has its own space of memory that isn't shared with any other process. Within this memory can be found everything the program needs to be able to run, including the program's code, variables, loaded .dll's, and the program stack. <br />
<br />
When a program runs, the code from the .exe file is loaded into memory, and the instructions are executed from this memory image. This will become important, since we can modify the image loaded in memory without touching the .exe on the physical disk. <br />
<br />
In addition to the program, any .dll files are loaded into the process's memory space. Each of the .dlls have a chunk of memory that may or may not be the same every time they're loaded. Each .dll also has its own section for its variables. <br />
<br />
All variables in memory are stored in a certain byte order, which can be either little endian or big endian format. This is constant across the architecture, so every Intel x86 processor uses little endian, and every PowerPC uses big endian. Since this guide is about Intel x86, we won't worry about big endian. <br />
<br />
In little endian, the bytes are stored in reverse order. So for example:<br />
* 0x12345678 (4 bytes) is stored as 78 56 34 12<br />
* 0x00001234 (4 bytes) is stored as 34 12 00 00<br />
* 0xaabb (2 bytes) is stored as bb aa<br />
<br />
This will be confusing at first, but you'll get used to seeing numbers backwards.<br />
<br />
== Pointers ==<br />
<br />
This is quite possibly the most difficult part of the C language to understand, and I won't pretend that I'm such an amazing teacher that you'll understand this completely after reading my blurb. You HAVE to know this to get by in assembly, so if it isn't perfectly clear after you read this, find a tutorial and make sure you understand! <br />
<br />
A pointer is, simply, a variable that stores a memory address as its value. The memory address it stores can be anything. The value of the referenced memory address can be obtained by "dereferencing" the pointer. Dereferencing means retrieving the value being pointed to, rather than the pointer itself. <br />
<br />
This is the C code to declare a variable that will point to an integer:<br />
int *i;<br />
And here is the C code to declare a pointer to a character:<br />
char *c;<br />
<br />
These are declared like any other variable, except for the asterisk. When declaring variables, the asterisk simply means it's a pointer, and nothing else. <br />
<br />
The address of any variable (pointer or otherwise) can be obtained with the "address of" operator, '&'. By putting '&' in front of a variable, its address is returned. This address can then be stored by a pointer. In other words, the pointer can point to some variable. Here's an example of using the "address of" operator:<br />
<br />
<pre><br />
int *i; /* Declare i as a pointer. */<br />
int somevar = 7; /* Declare a variable called somevar, and set it to 7. */<br />
<br />
i = &somevar; /* Set the value of i (which is the address) to the address of somevar. */<br />
</pre><br />
<br />
The final use of a pointer is "dereferencing", as discussed before. To dereference a pointer, an asterisk ("*") is put before it. This should not be confused with the asterisk used to declare a pointer, which is completely different. Here is an example of dereferencing, be sure you fully understand this (note that I'm using a function called 'print()' which doesn't actually exist to illustrate the point):<br />
<br />
<pre><br />
int *i; /* Declare i as a pointer. */<br />
int somevar = 7; /* Declare a variable called somevar, and set it to 7. */<br />
<br />
i = &somevar; /* Set the value of i (which is the address) to the address of somevar. */<br />
<br />
print(i); /* This will print out the ADDRESS stored in i, which is the address of somevar. */<br />
print(*i); /* This dereferences i, and prints out 7, because i points to the address of somevar and somevar is 7. */<br />
<br />
*i = 10; /* This sets the value that i points to (somevar) to 10. */<br />
print(somevar); /* This will print out 10, because its memory has just been changed via the pointer i. */ <br />
</pre><br />
<br />
Hopefully that all makes sense. To summarize:<br />
* An asterisk ("*") is used in declaration to show a variable is a pointer: int *i<br />
* An ampersand ("&") is used in front of any variable to retrieve its address: i = &somevar<br />
* An asterisk ("*") is used on a pointer to dereference it, to get the value it's pointing to: print(*i)<br />
<br />
Note that if a pointer without a valid address is dereferenced, the program will crash.<br />
<br />
Some arithmetic can be done on pointers, such add addition and subtraction. Doing addition/subtraction on pointers is different than a normal variable because the size of the data type is taken into account. That is, instead of "ptr + 1" going to the next whole number, if ptr is an integer it goes to the next possible integer in memory, which is 4 bytes away. If the ptr was a short (2 bytes), "ptr + 1" goes ahead 2 bytes, to the next short in memory. The reason for this is so that arrays can be easily stepped through, which will be shown below.<br />
<br />
<br />
<br />
== Ascii ==<br />
<br />
Ascii is the way in which letters, numbers, and symbols are represented in memory. <br />
<br />
A single ascii character is a 1-byte value. Typically, ascii characters fall between 0x00 and 0x7F. The important ones are:<br />
* 0x00 or '\0' signals the end of a string (a sequence of ascii characters)<br />
* 0x0d and 0x0a are carriage return and linefeed, respectively. They are used to add a new-line within a string. <br />
* 0x20 is space, ' '<br />
* 0x30 - 0x39 are '0' - '9' (0x30 = '0', 0x31 = '1', 0x32 = '2', etc)<br />
* 0x41 - 0x5A are 'A' - 'Z'<br />
* 0x61 - 0x7A are 'a' - 'z'<br />
<br />
<br />
<br />
== Arrays ==<br />
<br />
An array is a sequence of 1 or more values of the same type. In memory, all entries in an array are stored sequentially. <br />
<br />
For example, an array of these five integers {1, 2, 3, 0xaabb, 0xccdd} will be stored like this:<br />
01 00 00 00 02 00 00 00 03 00 00 00 bb aa 00 00 dd cc 00 00<br />
| | | |<br />
<br />
Note that values stored are all expanded to the full size of integers, padded with zeros. Also note that the values are stored in little endian, so the order of the bytes are reversed.<br />
<br />
An array in C is declared like this:<br />
int arr[5] = { 1, 2, 3, 0xaabb, 0xccdd }; <br />
<br />
This creates an array of 5 integers, which reserves 20 bytes (5 integers * 4 bytes/integer) to store them. An array in C must have a static length, because the space is allocated before the program ever runs. <br />
<br />
When an array is created this way, the array variable ("arr") is actually a ''pointer'' to the first element. Then when an array is accessed, an ''addition'' and a ''dereference'' occur. <br />
<br />
This code:<br />
int arr[5] = { 1, 2, 3, 0xaabb, 0xccdd };<br />
print(arr[2]); <br />
Will display the third element in the array, the number 3. This code:<br />
int arr[5] = { 1, 2, 3, 0xaabb, 0xccdd };<br />
print( *(arr + 2) );<br />
Is identical. The address that is 2 past the first element will be dereferenced, and that address contains the third value. Recall that addition on a pointer increments based on the type, so "arr + 2" in this case goes ahead by 2 integers, or 8 bytes. <br />
<br />
Here is a way to loop through an array using pointers:<br />
<pre><br />
int arr[5] = { 1, 2, 3, 0xaabb, 0xccdd };<br />
int *ptr;<br />
int i;<br />
<br />
ptr = arr; // Point ptr at arr. Note that we don't use "address of" on arr, since arr is already a pointer and therefore already contains an address. <br />
for(i = 0; i < 5; i++)<br />
{<br />
print(*ptr); // Print the value of the element, starting at 0, ending with 4<br />
ptr++; // Go to the next element in the array<br />
}<br />
</pre><br />
<br />
== Strings ==<br />
After all that tough stuff, strings are actually pretty easy! <br />
<br />
A string is an array of ascii characters, ended with a null '\0' character. <br />
<br />
This:<br />
char str[] = "Hello";<br />
creates an array of 6 characters, and copies in the string, creating the array shown here: <br />
{ 'H', 'e', 'l', 'l', 'o', '\0' }<br />
Or:<br />
{ 0x48, 0x65, 0x6c, 0x6c, 0xcf, 0x00 }<br />
<br />
This similar construct:<br />
char *str = "Hello";<br />
creates a pointer to a static string, "Hello", which can't be changed. Very similar, but not exactly the same. A string created in this way can't be changed.<br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, I'll do my best to answer them. But you may need to contact me to let me know that a question exists.</div>Mogigomahttps://wiki.skullsecurity.org/index.php?title=PDDB&diff=2996PDDB2010-06-29T21:16:51Z<p>Mogigoma: /* Dissemination */</p>
<hr />
<div>= Public DNS Database =<br />
<br />
The core idea of this project is the collection, aggregation, and dissemination of the data stored in the Internet's DNS hierarchy.<br />
<br />
== Dissemination ==<br />
<br />
Our goal is for the data we collect to be useful in the widest variety of ways. To this end, there will be three ways to access the data in the beginning of the project:<br />
<br />
# A user may access our website, and perform queries there. Users may perform query by specific IP addresses, ranges of IP addresses, and with wildcards against the text fields with which the IP addresses are connected. No login will be required to access this service.<br />
# Programs may access query our web API, enabling them to use the data in new and exciting ways.<br />
# On a monthly basis, a highly-compressed torrent of the current ''snapshot'' view of the DNS hierarchy will be published. This torrent will not include historical data, to prevent it from growing too quickly. We may also provide a full, historical dump of our data at wider intervals. We expect the torrents will be useful to both researchers, and those who'd prefer to keep their queries as private as possible.<br />
<br />
Additional methods of accessing the data have been proposed, but are of a lower priority. Such methods include:<br />
<br />
# Custom reports that are available as Atom/RSS feeds, that let users keep up-to-date.<br />
# Custom reports that are sent out periodically through email, that let users keep up-to-date.<br />
<br />
== Collection ==<br />
<br />
== Verification ==<br />
<br />
We can never be absolutely sure of the veracity of the records in our database. The problem stems from many sources:<br />
<br />
* A distributed effort that anyone can assist in is vulnerable to malicious users, any number of whom can submit erroneous data in hopes of either skewing our results or compromising our system.<br />
* We cannot trust that all of the DNS responses we receive from the servers will be accurate.<br />
** Intermediate servers may have a cached record that is stale.<br />
** Servers may have fallen victim to [http://en.wikipedia.org/wiki/DNS_cache_poisoning cache poisoning].<br />
** Servers may give different responses based on where a query comes from (e.g. [http://en.wikipedia.org/wiki/Content_delivery_network CDNs], [http://en.wikipedia.org/wiki/Anycast anycast]).<br />
** Some ISPs [http://en.wikipedia.org/wiki/DNS_hijacking hijack] NXDOMAIN responses.<br />
<br />
== Aggregation ==</div>Mogigomahttps://wiki.skullsecurity.org/index.php?title=PDDB&diff=2995PDDB2010-06-29T18:58:24Z<p>Mogigoma: /* Dissemination */</p>
<hr />
<div>= Public DNS Database =<br />
<br />
The core idea of this project is the collection, aggregation, and dissemination of the data stored in the Internet's DNS hierarchy.<br />
<br />
== Dissemination ==<br />
<br />
Our goal is for the data we collect to be useful in the widest variety of ways. To this end, there will be three ways to access the data in the beginning of the project:<br />
<br />
# A user may access our website, and perform queries there. Users may perform query by specific IP addresses, ranges of IP addresses, and with wildcards against the text fields with which the IP addresses are connected. No login will be required to access this service.<br />
# Programs may access query our web API, enabling them to use the data in new and exciting ways.<br />
# On a monthly basis, a highly-compressed torrent of the current ''snapshot'' view of the DNS hierarchy will be published. This torrent will not include historical data, to prevent it from growing too quickly. We expect the snapshot will be useful to both researchers, and those who'd prefer to keep their queries as private as possible.<br />
<br />
Additional methods of accessing the data have been proposed, but are of a lower priority. Such methods include:<br />
<br />
# Custom reports that are available as Atom/RSS feeds, that let users keep up-to-date.<br />
# Custom reports that are sent out periodically through email, that let users keep up-to-date.<br />
<br />
== Collection ==<br />
<br />
== Verification ==<br />
<br />
We can never be absolutely sure of the veracity of the records in our database. The problem stems from many sources:<br />
<br />
* A distributed effort that anyone can assist in is vulnerable to malicious users, any number of whom can submit erroneous data in hopes of either skewing our results or compromising our system.<br />
* We cannot trust that all of the DNS responses we receive from the servers will be accurate.<br />
** Intermediate servers may have a cached record that is stale.<br />
** Servers may have fallen victim to [http://en.wikipedia.org/wiki/DNS_cache_poisoning cache poisoning].<br />
** Servers may give different responses based on where a query comes from (e.g. [http://en.wikipedia.org/wiki/Content_delivery_network CDNs], [http://en.wikipedia.org/wiki/Anycast anycast]).<br />
** Some ISPs [http://en.wikipedia.org/wiki/DNS_hijacking hijack] NXDOMAIN responses.<br />
<br />
== Aggregation ==</div>Mogigomahttps://wiki.skullsecurity.org/index.php?title=PDDB&diff=2994PDDB2010-06-29T18:32:10Z<p>Mogigoma: /* Verification */</p>
<hr />
<div>= Public DNS Database =<br />
<br />
The core idea of this project is the collection, aggregation, and dissemination of the data stored in the Internet's DNS hierarchy.<br />
<br />
== Dissemination ==<br />
<br />
Our goal is for the data we collect to be useful in the widest variety of ways. To this end, there will be three ways to access the data in the beginning of the project:<br />
<br />
# A user may access our website, and perform queries there. Users may perform query by specific IP addresses, ranges of IP addresses, and by regular expressions against the text fields with which the IP addresses are connected. No login will be required to access this service.<br />
# Programs may access query our web API, enabling them to use the data in new and exciting ways.<br />
# On a monthly basis, a highly-compressed torrent of the current ''snapshot'' view of the DNS hierarchy will be published. This torrent will not include historical data, to prevent it from growing too quickly. We expect the snapshot will be useful to both researchers, and those who'd prefer to keep their queries as private as possible.<br />
<br />
Additional methods of accessing the data have been proposed, but are of a lower priority. Such methods include:<br />
<br />
# Custom reports that are available as Atom/RSS feeds, that let users keep up-to-date.<br />
# Custom reports that are sent out periodically through email, that let users keep up-to-date.<br />
<br />
== Collection ==<br />
<br />
== Verification ==<br />
<br />
We can never be absolutely sure of the veracity of the records in our database. The problem stems from many sources:<br />
<br />
* A distributed effort that anyone can assist in is vulnerable to malicious users, any number of whom can submit erroneous data in hopes of either skewing our results or compromising our system.<br />
* We cannot trust that all of the DNS responses we receive from the servers will be accurate.<br />
** Intermediate servers may have a cached record that is stale.<br />
** Servers may have fallen victim to [http://en.wikipedia.org/wiki/DNS_cache_poisoning cache poisoning].<br />
** Servers may give different responses based on where a query comes from (e.g. [http://en.wikipedia.org/wiki/Content_delivery_network CDNs], [http://en.wikipedia.org/wiki/Anycast anycast]).<br />
** Some ISPs [http://en.wikipedia.org/wiki/DNS_hijacking hijack] NXDOMAIN responses.<br />
<br />
== Aggregation ==</div>Mogigomahttps://wiki.skullsecurity.org/index.php?title=PDDB&diff=2993PDDB2010-06-29T18:30:58Z<p>Mogigoma: /* Verification */</p>
<hr />
<div>= Public DNS Database =<br />
<br />
The core idea of this project is the collection, aggregation, and dissemination of the data stored in the Internet's DNS hierarchy.<br />
<br />
== Dissemination ==<br />
<br />
Our goal is for the data we collect to be useful in the widest variety of ways. To this end, there will be three ways to access the data in the beginning of the project:<br />
<br />
# A user may access our website, and perform queries there. Users may perform query by specific IP addresses, ranges of IP addresses, and by regular expressions against the text fields with which the IP addresses are connected. No login will be required to access this service.<br />
# Programs may access query our web API, enabling them to use the data in new and exciting ways.<br />
# On a monthly basis, a highly-compressed torrent of the current ''snapshot'' view of the DNS hierarchy will be published. This torrent will not include historical data, to prevent it from growing too quickly. We expect the snapshot will be useful to both researchers, and those who'd prefer to keep their queries as private as possible.<br />
<br />
Additional methods of accessing the data have been proposed, but are of a lower priority. Such methods include:<br />
<br />
# Custom reports that are available as Atom/RSS feeds, that let users keep up-to-date.<br />
# Custom reports that are sent out periodically through email, that let users keep up-to-date.<br />
<br />
== Collection ==<br />
<br />
== Verification ==<br />
<br />
We can never be absolutely sure of the veracity of the records in our database. The problem stems from many sources:<br />
<br />
* A distributed effort that anyone can assist in is vulnerable to malicious users, any number of whom can submit erroneous data in hopes of either skewing our results or compromising our system.<br />
* We cannot trust that all of the DNS responses we receive from the servers will be accurate.<br />
** Intermediate servers may have a cached record that is stale.<br />
** Servers may have fallen victim to [http://en.wikipedia.org/wiki/DNS_cache_poisoning cache poisoning].<br />
** Servers may give different responses based on where a query comes from (e.g. [http://en.wikipedia.org/wiki/Content_delivery_network CDNs], [http://en.wikipedia.org/wiki/Anycast anycast]).<br />
** Some ISPs [en.wikipedia.org/wiki/DNS_hijacking hijack] NXDOMAIN responses.<br />
<br />
== Aggregation ==</div>Mogigomahttps://wiki.skullsecurity.org/index.php?title=PDDB&diff=2992PDDB2010-06-29T18:14:49Z<p>Mogigoma: /* Dissemination */</p>
<hr />
<div>= Public DNS Database =<br />
<br />
The core idea of this project is the collection, aggregation, and dissemination of the data stored in the Internet's DNS hierarchy.<br />
<br />
== Dissemination ==<br />
<br />
Our goal is for the data we collect to be useful in the widest variety of ways. To this end, there will be three ways to access the data in the beginning of the project:<br />
<br />
# A user may access our website, and perform queries there. Users may perform query by specific IP addresses, ranges of IP addresses, and by regular expressions against the text fields with which the IP addresses are connected. No login will be required to access this service.<br />
# Programs may access query our web API, enabling them to use the data in new and exciting ways.<br />
# On a monthly basis, a highly-compressed torrent of the current ''snapshot'' view of the DNS hierarchy will be published. This torrent will not include historical data, to prevent it from growing too quickly. We expect the snapshot will be useful to both researchers, and those who'd prefer to keep their queries as private as possible.<br />
<br />
Additional methods of accessing the data have been proposed, but are of a lower priority. Such methods include:<br />
<br />
# Custom reports that are available as Atom/RSS feeds, that let users keep up-to-date.<br />
# Custom reports that are sent out periodically through email, that let users keep up-to-date.<br />
<br />
== Collection ==<br />
<br />
== Verification ==<br />
<br />
== Aggregation ==</div>Mogigomahttps://wiki.skullsecurity.org/index.php?title=PDDB&diff=2991PDDB2010-06-29T18:09:50Z<p>Mogigoma: </p>
<hr />
<div>= Public DNS Database =<br />
<br />
The core idea of this project is the collection, aggregation, and dissemination of the data stored in the Internet's DNS hierarchy.<br />
<br />
== Dissemination ==<br />
<br />
Our goal is for the data we collect to be useful in the widest variety of ways. To this end, there will be three ways to access the data in the beginning of the project.<br />
<br />
# A user may access our website, and perform queries there. Users may perform query by specific IP addresses, ranges of IP addresses, and by regular expressions against the text fields with which the IP addresses are connected. No login will be required to access this service.<br />
# Programs may access query our web API, enabling them to use the data in new and exciting ways.<br />
# On a monthly basis, a highly-compressed torrent of the current ''snapshot'' view of the DNS hierarchy will be published. This torrent will not include historical data, to prevent it from growing too quickly. We expect the snapshot will be useful to both researchers, and those who'd prefer to keep their queries as private as possible.<br />
<br />
== Collection ==<br />
<br />
== Verification ==<br />
<br />
== Aggregation ==</div>Mogigomahttps://wiki.skullsecurity.org/index.php?title=PDDB&diff=2990PDDB2010-06-29T17:58:34Z<p>Mogigoma: New page: = Public DNS Database = The core idea of this project is the collection, aggregation, and dissemination of the data stored in the Internet's DNS hierarchy. == Dissemination == == Collec...</p>
<hr />
<div>= Public DNS Database =<br />
<br />
The core idea of this project is the collection, aggregation, and dissemination of the data stored in the Internet's DNS hierarchy.<br />
<br />
== Dissemination ==<br />
<br />
== Collection ==<br />
<br />
== Verification ==<br />
<br />
== Aggregation ==</div>Mogigoma