Simple Instructions

From SkullSecurity
Revision as of 21:31, 12 March 2007 by Ron (talk | contribs) (→‎call, ret)
Jump to navigation Jump to search
Assembly Language Tutorial
Please choose a tutorial page:

This section will go over some basic assembly instructionsthat you'll likely see frequently. Some of the functions shown here are tricky, and some have special properties (such as the registers they use). Additionally, x86 assembly is comprised of hundreds of different instructions. As a result, you'll likely want to find a complete reference book or website to have alongside you. This page, however, will give enough of an introduction to get you started.

Pointers and Dereferencing

First, we'll start with the hard stuff. If you understood the pointers section, this shouldn't be too bad. If you didn't, you should probably go back and refresh your memory.

Recall that a pointer is a datatype that stores an address as its value. Since registers are simply 32-bit values with no actual types, any register may or may not be a pointer, depending on what is stored. It is the responsibility of the program to treat pointers and pointers and to treat non-pointers as non-pointers.

If a value is a pointer, it can be dereferenced. Recall that dereferencing a pointer retrieves the value stored at the address being pointed to. In assembly, this is generally done by putting square brackets ("[" and "]") around the register. For example:

  • eax -- is the value stored in eax
  • [eax] -- is the value pointed to by eax

This will be discussed more in upcoming sections.

Moving Data Around

The instructions in this section deal with moving around numbers and pointers.

mov, movsx, movzx

mov is the instruction used for assignment, much like the "=" sign in most languages. mov can move data between a register and memory, two registers, or a constant to a register. Here are some examples:

mov eax, 1     ; set eax to 1 (eax = 1)
mov edx, ecx   ; set edx to whatever ecx is (edx = ecx)
mov eax, 18h   ; set eax to 0x18
mov eax, [ebx] ; set eax to the value in memory that ebx is pointing at
mov [ebx], 3   ; move the number 3 into the memory address that ebx is pointing at

movsx and movzx are special versions of mov which are designed to be used between signed (movsx) and unsigned (movzx) registers of different sizes.

movsx means move with sign extension. The data is moved from a smaller register into a bigger register, and the sign is preserved by either padding with 0's (for positive values) or F's (for negative values). Here are some examples:

  • 0x1000 becomes 0x00001000, since it was positive
  • 0x7FFF becomes 0x00007FFF, since it was positive
  • 0xFFFF becomes 0xFFFFFFFF, since it was negative (note that 0xFFFF is -1 in 16-bit signed, and 0xFFFFFFFF is -1 in 32-bit signed)
  • 0x8000 becomes 0xFFFF8000, since it was negative (note that 0x8000 is -32768 in 16-bit signed, and 0xFFFF8000 is -32768 in 32-bit signed)

movzx means move with zero extension. The data is moved from a smaller register into a bigger register, and the sign is ignored. Here are some examples:

  • 0x1000 becomes 0x00001000
  • 0x7FFF becomes 0x00007FFF
  • 0xFFFF becomes 0x0000FFFF
  • 0x8000 becomes 0x00008000

lea

lea is very similar to mov, except that math can be done on the original value before it's used. The "[" and "]" characters always surround the second parameter, but in this case they don't indicate dereferencing, it is easiest to think of them as just being part of the formula.

lea is generally used for calculating array offsets, since the address of an element of the array can be found with, [arraystart + offset*datasize]. lea can also be used for quickly doing math, often with an addition and a multiplication. Examples of both uses are below.

Here are some examples of using lea:

lea     eax, [eax+eax]   ; Double the value of eax -- eax = eax * 2
lea     edi, [esi+0Bh]   ; Add 11 to esi and store the result in edi
lea     eax, [esi+ecx*4] ; This is generally used for indexing an array of integers. esi is a pointer to the beginning of an array, and ecx is the index of the element that is to be retrieved. The index is multiplied by 4 because Integers are 4 bytes long. eax will end up storing the address of the ecx'th element of the array. 
lea     edi, [eax+eax*2] ; Triple the value of eax -- eax = eax * 3
lea     edi, [eax+ebx*2] ; This likely indicates that eax stores an array of 16-bit (2 byte) values, and that ebx is an offset into it. Note the similarities between this and the previous example: the same math is being done, but for a different reason. 

Math and Logic

The instructions in this section deal with math and logic. Some are simple, and others (like multiplication and division) are pretty tricky.

add, sub

A register can have either another register, a constant value, or a pointer added to or subtracted from it. The syntax of addition and subtraction is fairly simple:

add eax, 3   ; Adds 3 to eax -- eax = eax + 3
add ebx, eax ; Adds the value of eax to ebx -- ebx = ebx + eax
sub ecx, 3   ; Subtracts 3 from ecx -- ecx = ecx - 3

inc, dec

These instructions simply increment and decrement a register.

inc eax   ; eax++
dec ecx   ; ecx--

and, or, xor, neg

All logical instructions are bit-wise. If you don't know what "bitwise arithmetic" means, you should probably look it up. But the simplest way of thinking of this is that each bit in the two operands have the operation done between them, and the result is stored in the first one.

The instructions are pretty self-explanatory: and does a bitwise 'and', or does a bitwise 'or', xor does a bitwise 'xor', and neg does a bitwise negation.

Here are some examples:

and eax, 7         ; eax = eax & 7          -- because 7 is 000..000111, this clears all bits except for the last three. 
or  eax, 16        ; eax = eax | 16         -- because 16 is 000..00010000, this sets the 5th bit from the right to "1". 
xor eax, 1         ; eax = eax ^ 1          -- this toggles the right-most bit in eax, 0=>1 or 1=>0.
xor eax, FFFFFFFFh ; eax = eax ^ 0xFFFFFFFF -- this toggles every bit in eax, which is identical to a bitwise negation.
neg eax            ; eax = ~eax             -- inverts every bit in eax, same as the previous.
xor eax, eax       ; eax = 0                -- this clears eax quickly, and is extremely common.

mul, imul, div, idiv, cdq

Multiplication and division are the trickiest operations commonly used, because of how they deal with overflow issues. Both multiplication and division make use of the 64-bit register edx:eax.

mul multiplies the unsigned value in eax with the operand, and stores the result in the 64-bit pointer edx:eax. imul does the same thing, but the value is signed. Here are some examples of mul:

mul ecx  ; edx:eax = eax * ecx (unsigned)
imul edx ; edx:eax = eax * edx (signed)

div divides the 64-bit value in edx:eax by the operand, and stores the quotient in eax. The remainder (modulus) is stored in edx. In other words, div does both division and modular division, at the same time. Typically, a program will only use one or the other, so you have to check which instructions follow to see whether eax or edx is saved. Here are some examples:

div ecx  ; eax = edx:eax / ecx (unsigned)
         ; edx = edx:eax % edx (unsigned)
idiv edx ; eax = edx:eax / edx (signed)
         ; edx = edx:eax % edx (signed)

cdq is generally used immediately before idiv. It stands for "convert double to quad." In other words, convert the 32-bit value in eax to the 64-bit value in edx:eax, overwriting anything in edx with either 0's (if eax is positive) or F's (if eax is negative). This is very similar to movsx, above.

xor edx, edx is generally used immediately before div. It clears edx to ensure that no leftover data is divided.

Here is a common use of cdq and idiv:

mov eax, 1007 ; 1007 will be divided
mov ecx, 10   ; .. by 10
cdq           ; extends eax into edx
idiv ecx      ; eax will be 1007/10 = 100, and edx will be 1007%10 = 7

Here is a common use of xor and div (the results are the same as the previous example):

mov eax, 1007
mov ecx, 10
xor edx, edx
div ecx

Jumping Around

Instructions in this section are used to compare values and make jumps. These jumps are used for calls, if statements, and every type of loop. The operand for most jump instructions is the address to jump to.

jmp

jmp, or jump, sends the program execution to the specified address no matter what. Here is an example:

jmp 1400h ; jump to the address 0x1400

call, ret

call is similar to jump, except that in addition to sending the program to the specified address, it also saves ("pushes") the address of the next instruction it would have executed on the stack. This will be explained more in a later section.

ret removes ("pops") the first value off the stack, and jumps to it. In almost all cases, this value was placed on the stack by the call instruction. If the stack pointer is at the wrong location, or the saved address was overwritten, ret attempts to jump to an invalid address which usually crashes the program. In some cases, it may jump to the wrong place where the program will almost inevitably crash, eventually.

ret can also have a parameter. This parameter is added to the stack immediately after ret does its jump. This addition allows the function to remove values that were pushed onto the stack. This will be discussed in a later section.

The combination of call and ret are used to implement functions. Here is an example of a simple function:

 call 4000h
 ...... ; any amount of code
 4000h:
  mov eax, 1
  ret         ; Because eax represents the return value, this function would return 1, and nothing else would happen

cmp

cmp, or compare, compares the two operands and sets a number of flags in a special-purpose register based on the result. Specialized jump commands can check these flags to jump on certain conditions.

Here are the most common flags:

  • Zero -- set if and only if the two elements are equal
  • Greater than -- set if the first element is greater than the second
  • Less than -- set if the first element is less than the second

Flags are set by most arithmetic commands. The most common commands used for comparisons are cmp, inc, and dec.

jz/je, jnz/jne, jl, jg, jle, jge

  • jz and je (which are synonyms) will jump to the address specified if and only if the 'zero' flag is set, which indicates that the two values were equal. In other words, "jump if equal".
  • jnz and jne (which are also synonyms) will jump to the address specified if and only if the 'zero' flag is not set, which indicates that the two values were not equal. In other words, "jump if different".
  • jl jumps if the first parameter is less than the second.
  • jg jumps if the first parameter is greater than the second.
  • jle jumps if the 'less than' or the 'zero' flag is set, so "less than or equal to".
  • jge jumps if the first is "greater than or equal to" the second.

Manipulating the Stack

push, pop

ret

cmp

cmp, or compare, compares the two operands and sets a number of flags in a special-purpose register based on the result. Specialized jump commands can check these flags to jump on certain conditions.

Here are the most common flags:

  • Zero -- set if and only if the two elements are equal
  • Greater than -- set if the first element is greater than the second
  • Less than -- set if the first element is less than the second

Flags are set by most arithmetic commands. The most common commands used for comparisons are cmp, inc, and dec.

jz/je, jnz/jne, jl, jg, jle, jge

  • jz and je (which are synonyms) will jump to the address specified if and only if the 'zero' flag is set, which indicates that the two values were equal. In other words, "jump if equal".
  • jnz and jne (which are also synonyms) will jump to the address specified if and only if the 'zero' flag is not set, which indicates that the two values were not equal. In other words, "jump if different".
  • jl jumps if the first parameter is less than the second.
  • jg jumps if the first parameter is greater than the second.
  • jle jumps if the 'less than' or the 'zero' flag is set, so "less than or equal to".
  • jge jumps if the first is "greater than or equal to" the second.

Manipulating the Stack

push, pop

ret