Reverse Engineering Guide on x86 Assembly: Part 1 Intro To Registers




Learning x86 assembly is critical when you’re analysing malware, deconstructing executable files and developing your own exploits. However, before you’re able to embark on this journey, it’s crucial you’re familiar with C and compilation.

What are registers?

 A register is a storage space in the CPU that’s faster to access then RAM. All x86 CPUs have 8 general-purpose registers in total. They are generally 32-bits wide, however 16-bit versions are also accessible. Some registers have reserved purposes for the CPU and others don’t and are referred to as ‘general purpose’ registers.

Introduction to the 8 Registers

Here are the 8 registers with their register names (the acronym) and their meaning:
  • EAX (extended accumulator register used for major calculations)
  • EBX (extended base register used for storing data)
  • ECX (extended counter register used as the universal loop counter)
  • EDX (extended data register used for storing data related to the accumulator’s current calculation)
  • ESI (extended source index used to hold the location of the input data stream)
  • EDI (extended destination index stores the result of every loop)
  • EBP (extended base pointer)
  • ESP (extended stack pointer)

Register Access

Some of these registers can be accessible in subsections of 16-bits or 8-bits, rather than the whole register dependent on the program and the requirements. For example if we analysed EAX and how it can be accessed – it can be broken down into these subsections:
  • AX (16 least significant bits of EAX)
  • AH (8 most significant bits of AX)
  • AL (8 least significant bits of AX)

Stack Frames: ESP and EBP Registers

There are four main segments that belong on all programs, the .text, .data,
.stack and .heap. The code of the program will be stored in .text, the global data is stored in .data, .stack stores local variables and functional arguments and finally, the .heap stores extendable memory (malloc, calloc calls in C).

You should be familiar with how the stack works – LIFO (last-in-first-out). Let’s refresh this quickly: whenever something is added to the stack, it’s added on top of the most recent addition. The ‘adding’ of something is called ‘push’ – akin to pushing something on top of the stack. Think about this like a stack of paper on the table. When something is removed, it’s removed in the order of the last piece of paper that was placed down. This process is called popping. Therefore, the stack grows backwards — from the highest memory address to the lowest.

The two registers: ESP and EBP work closely with the stack. ESP points to the top of the stack and every time the stack grows, the address of ESP will be updated. This ‘updating’ is done by decrementing the ESP (as the stack grows backwards from high to low). EBP points to the base of the attack – also known as the beginning of the stack.

FLAGS Register

The FLAGS register holds tiny bits of values used to represent the current status of the processor. These bits are either (1) or (0). The FLAGS register is 16 bits wide, EFLAGS is 32 bits and RFLAGS is 64 bits wide. Here are some of the more common flags:
  • ZF – zero flag set when the last operation is zero
  • CF – carry flag that’s set when the last operation changes the most significant bit
  • SF – signed flag, used to determine if values should be signed or unsigned
  • OF – overflow flag used when the last operation switches the most significant bit
  • PF – parity flag used to indicate if the number of set bits are odd or even
  • DF – direction flag used to determine the direction (forwards or backwards) of bytes being copied

AT&T vs Intel syntax

Dependent on the program you’re using – i.e. radare2, IDA Pro, gdb – often two different types of syntax will appear for the executable layout. They are:
  • Intel: mov     eax, 1
  • AT&T: mov $1, %eax
Both of these are communicating the same thing, however we will only be focusing on Intel syntax in this guide.

Refresher: Bits, Bytes & Dwords

Bits, bytes and dwords are all data types. Bits are generally 0 or 1. There are 8 bits in a byte and can hold a value between 0 and 255. A word consists of two bytes or 16-bits and can hold a value up to and including 65525. A dword is made up of two words, four bytes or 32-bits.

In part 2, we will start to analyse the different instructions and disassemble a program together. For any questions or enquires, leave a comment below.

Comments

  1. Hey I am so happy I found your web site, I really found you by error, while I was looking on Yahoo for something else, Anyhow I am here now and would just like to say many thanks for a marvelous post and a all round interesting blog (I also love the theme/design), I don’t have time to read through it all at the moment but I have book-marked it and also added in your RSS feeds, so when I have time I will be back to read a lot more, Please do keep up the excellent job. return America to greatness

    ReplyDelete
  2. the blog is very interesting and will be much useful for us. thank you for sharing the blog with us. please keep on updating.

    BIM documentation in USA
    REVERSE ENGINEERING

    ReplyDelete

Post a Comment

Popular Posts