Difference between revisions of "Fundamentals"

From SkullSecurity
Jump to navigation Jump to search
Line 109: Line 109:


== Memory ==
== Memory ==
Each running program has its own space of memory that isn't share with any other process. Within this memory can be found everything the program needs to be able to run, including the program's code, variables, loaded .dll's, and the program stack.
When a program runs, the code from the .exe file is loaded into memory, and the instructions are executed from this memory image. This will become important, since we can modify the image loaded in memory without touching the .exe on the physical disk.
In addition to the program, any .dll files are loaded into the process's memory space. Each of the .dlls have a chunk of memory that may or may not be the same every time they're loaded. Each .dll also has its own section for its variables.
All variables in memory are stored in a certain byte order, which can be either little endian or big endian format. This is constant across the architecture, so every Intel x86 processor uses little endian, and every PowerPC uses big endian. Since this guide is about Intel x86, we won't worry about big endian.
In little endian, the bytes are stored in reverse order. So for example:
* 0x12345678 (4 bytes) is stored as 78 56 34 12
* 0x00001234 (4 bytes) is stored as 00 00 34 12
* 0xaabb (2 bytes) is stored as bb aa
This will be confusing at first, but you'll get used to seeing numbers backwards.


== Pointers ==  
== Pointers ==  

Revision as of 00:12, 12 March 2007

This page is going to be about the fundamentals that you have to understand before you can make any sense out of assembly. Most of this stuff you'll learn if you learn to program in C. If this is old or boring stuff to you, feel free to skip this section entirely.

The topics here are going to be a short overview of each section. If you want a more complete explanation, you should find an actual reference, or look it up on the Internet. This is only meant to be a quick and dirty primer.

Hexadecimal

To work in assembly, you have to be able to read hexadecimal fairly comfortably. Converting to decimal in your mind isn't necessary, but being able to do some simple arithmetic is.

Hex can be denoted in a number of ways, but the two most common are:

  • Prefixed with a 0x, eg. 0x1ef7
  • Postfixed with a h, eg. 1ef7h

The characters 0 - f represent the decimal numbers 0 - 15:

  • 0 = 0
  • 1 = 1
  • ...
  • 9 = 9
  • a = 10
  • b = 11
  • c = 12
  • d = 13
  • e = 14
  • f = 15

To convert from hex to decimal, multiply each digit, starting with the right-most, with 160, 161, 162, etc. So in the example of 0x1ef7, the conversion is this:

  • (7 * 160) + (f * 161) + (e * 162) + (1 * 163)
  • = (7 * 160) + (15 * 161) + (14 * 162) + (1 * 163)
  • = (7 * 1) + (15 * 16) + (14 * 256) + (1 * 4096)
  • = 7 + 240 + 3584 + 4096
  • = 7927

It isn't necessary to do that constantly, that's why we have calculators. But you should be fairly familiar with the numbers 00 - FF (0 - 255), they will come up often and you will spend a lot of time looking them up.

Binary

Binary, as we all know, is a number system using only 0's and 1's. The usage is basically the same as hex, but change powers of 16 to powers of 2.

1011 to decimal:

  • (1 * 20) + (1 * 21) + (0 * 22) + (1 * 23)
  • = (1 * 1) + (1 * 2) + (0 * 4) + (1 * 8)
  • = 1 + 2 + 0 + 8
  • = 11

Conversion between decimal and binary is rare, it's much more common to convert between hexadecimal and binary. This conversion is common because it's so easy: every 4 binary digits is converted to a single hex digit. So all you really need to know are the first 16 binary to hex conversions:

  • 0x0 = 0000
  • 0x1 = 0001
  • 0x2 = 0010
  • 0x3 = 0011
  • 0x4 = 0100
  • 0x5 = 0101
  • 0x6 = 0110
  • 0x7 = 0111
  • 0x8 = 1000
  • 0x9 = 1001
  • 0xa = 1010
  • 0xb = 1011
  • 0xc = 1100
  • 0xd = 1101
  • 0xe = 1110
  • 0xf = 1111

So take the binary number 100101101001110, for example.

  1. Pad the front with zeros to make its length a multiple of 4: 0100101101001110
  2. Break it into 4-digit groups: 0100 1011 0100 1110
  3. Look up each set of 4-digits on the table: 0x4 0xb 0x4 0xe
  4. Put them all together 0x4b4e

To go the other way is even easier, using 0x469e for example:

  1. Separate the digits: 0x4 0x6 0x9 0xe
  2. Convert each of them to binary, by the table: 0100 0110 1001 1110
  3. Put them together, and leading zeros on the first group can be removed: 100011010011110

Datatypes

A datatype basically refers to how digits in hex are partitioned off and divided into numbers. Datatypes are typically measures by two factors: the number of bits (or bytes), and whether or not negative numbers are allowed.

The number of bits (or bytes) refers to the length of the number. An 8-bit (or 1-byte) number is made up of two hexadecimal digits. For example, 0x03, 0x34, and 0xFF are all 8-bit, while 0x1234, 0x0001, and 0xFFFF are 16-bit.

The signed or unsigned property refers to whether or not the number can have negative values. If it can, then the maximum number is half of what it could have, with the other half being negatives. The way sign is determined is by looking at the very first bit. If the first bit is a 1, or the first hex digit is 8 - F, then it's negative and the rest of the number, inverted plus one, is used for the magnitude.

For example (use a calculator to convert to binary):

  • 0x10 in binary ix 0001 0000, so it's positive 16
  • 0xFF in binary is 1111 1111, so it's negative. The rest of the number is the 7-bits, 1111111, inverted to 0000000, plus one is 0000001, or -1 in decimal.
  • 0x80 in binary is 1000 0000, so it's negative. The rest of the number is the 7-bits, 0000000, inverted to 1111111, plus one is 10000000, or -128 in decimal.
  • 0x7F in binary is 0111 1111, so it's positive 127.

Although different data lengths are called different things, here are some common ones by their usual name:

  • 8-bit (1 byte) = char (or BYTE)
    • In hex, can be 0x00 to 0xFF
    • Signed: ranges from -128 to 127
    • Unsigned: ranges from 0 to 255


  • 16-bit (2 bytes) = short int (often referred to as a WORD)
    • In hex, can be 0x0000 to 0xFFFF
    • Signed: ranges from -32768 to 32767
    • Unsigned: ranges from 0 to 65535


  • 32-bit (4 bytes) = long int (often referred to as a DWORD or double-WORD)
    • In hex, can be 0x00000000 to 0xFFFFFFFF
    • Signed: ranges from -2147483648 - 2147483647
    • Unsigned: ranges from 0 - 4294967295


  • 64-bit (8 bytes) = long long (often referred to as a QWORD or quad-WORD)
    • In hex, can be from 0x0000000000000000 - 0xFFFFFFFFFFFFFFFF
    • Signed: -9223372036854775808 - 9223372036854775807
    • Unsigned: 0 - 18446744073709551615

Memory

Each running program has its own space of memory that isn't share with any other process. Within this memory can be found everything the program needs to be able to run, including the program's code, variables, loaded .dll's, and the program stack.

When a program runs, the code from the .exe file is loaded into memory, and the instructions are executed from this memory image. This will become important, since we can modify the image loaded in memory without touching the .exe on the physical disk.

In addition to the program, any .dll files are loaded into the process's memory space. Each of the .dlls have a chunk of memory that may or may not be the same every time they're loaded. Each .dll also has its own section for its variables.

All variables in memory are stored in a certain byte order, which can be either little endian or big endian format. This is constant across the architecture, so every Intel x86 processor uses little endian, and every PowerPC uses big endian. Since this guide is about Intel x86, we won't worry about big endian.

In little endian, the bytes are stored in reverse order. So for example:

  • 0x12345678 (4 bytes) is stored as 78 56 34 12
  • 0x00001234 (4 bytes) is stored as 00 00 34 12
  • 0xaabb (2 bytes) is stored as bb aa

This will be confusing at first, but you'll get used to seeing numbers backwards.

Pointers

Ascii

Arrays

Strings

Questions

Feel free to edit this section and post questions, I'll do my best to answer them. But you may need to contact me to let me know that a question exists.