2.1. X86lite Specification¶
2.1.1. X86lite Machine State¶
The X86lite machine state consists of sixteen general-purpose 64-bit registers, an instruction pointer register that can only be manipulated indirectly by control flow instructions, three condition flags, and a memory consisting of 2^64 bytes. Instructions are represented as 8-byte words using an unspecified fixed-length encoding.
Register file
The 16 64-bit registers in X86lite and their common uses in the full X86 architecture are given below. In X86lite, most of the registers can be used for general purpose calculation, but some X86lite instructions make special use of some of the registers; see the instruction descriptions below.
Register |
Description / common use on X86 |
---|---|
rax | General purpose accumulator |
rbx | Base register, pointer to data |
rcx | Counter register for strings and loops |
rdx | Data register for I/O |
rsi | Pointer register, string source register |
rdi | Pointer register, string destination register |
rbp | Base pointer, points to the stack frame |
rsp | Stack pointer, points to the value at the top of the stack |
r08 - r15 | General purpose |
In addition, the instruction pointer register rip contains the address of the next instruction to execute. The address in rip is used to load the next instruction to execute, then rip is increased by the size of the instruction (always 8-bytes, since we use a fixed-length encoding), and then the instruction is executed.
Condition flags
The X86 architecture provides conditional branch and conditional move instructions. The processor maintains a set of bit-sized flags to keep track of conditions arising from arithmetic and comparison operations. These condition flags are tested by the conditional jump and move instructions; the flags are set by the arithmetic instructions. X86lite provides only three condition flags (the full X86 architecture has several more).
Condition Flag |
Description |
OF | Overflow: set when the result is too big or too small to fit in a 64-bit value and cleared otherwise. This is overflow/underflow for signed (two's complement) 64-bit arithmetic. |
SF | Sign: equal to the most significant bit of the result (0=positive, 1=negative) |
ZF | Zero: set if the result is 0 and cleared otherwise |
Memory, addresses, and the stack
The X86lite memory consists of 2^64 bytes numbered 0x0000000000000000 through 0xffffffffffffffff. All of the X86lite instructions operate on 8-byte quadwords (i.e., blocks of 64 bits or 4 * 16-bit words), but memory is byte-addressable. That is, unaligned memory accesses are legal.
The only general-purpose register that is treated specially by the X86lite ISA is rsp, which contains the address of the top of the stack of the executing program. By convention on X86 machines, the program stack starts at the high addresses of virtual memory and grows toward the low addresses. Instructions like pushq, popq, Callq, and retq, increment and decrement rsp as needed to maintain this invariant.
X86lite Operands and Condition Codes
This section describes the X86lite instruction set.
Operands
X86lite instructions manipulate data stored in memory or in registers. The values operated on by a given instruction are described by operands, which are constant values like integers and statically known memory addresses, or dynamic values such as the contents of a register or a computed memory address.
Operands can take one of several forms, described below:
Operand kind |
Description |
Imm : imm | An immediate, constant literal of size 64-bits or a symbolic label that is resolved by the assembler/linker/loader to a 64-bit constant. Label values typically denote targets of Jmp or Call instructions. |
Reg : reg | One of the sixteen machine registers, or rip. The value of a register is its contents. |
Ind1 : imm | An indirect address consisting only of a displacement by a literal or symbolic label immediate value. An example use is leaq _Label, %rax, which loads the address denoted by the symbolic label into register rax. |
Ind2 : reg | An indirect reference to an address held in a register. For example, movq %rbx, (%rax) moves the contents of register rbx into the memory location at the address held in rax. |
Ind3 : imm * reg | An indirect reference to an offset of an address held in a register. For example, movq %rbx, 8(%rax) moves the contents of register rbx into the memory location 8 bytes past the address held in rax. |
In their full generality, and X86 indirect reference operand consists of three optional components:
[base : reg] [index : reg, scale : int64] [disp : (int64 | Label)]
The effective address denoted by an indirect address is calculated by:
addr(Ind) = base + (index * scale) + disp.
In the formula above, a missing optional component's value is 0. For the purposes of X86lite, we disregard the index and scale parts, which yields the three combinations given by Ind1, Ind2, and Ind3 described above.
When an Ind operand is used as a value (not a location) the operand denotes Mem[addr(Ind)], the contents of the machine memory at the effective address denoted by Ind.
Condition codes
The X86lite cmpq SRC1, SRC2 instruction is used to compare two 64-bit operands (SRC1 and SRC2). It works by subtracting SRC1 from SRC2 (i.e., SRC2 - SRC1), setting the condition flags according to the result (the actual result of the subtraction is ignored).
The X86lite conditional branch (J) and conditional set (setb) instructions specify condition codes that look at the condition flags to determine whether or not the condition is satisfied. The eight condition codes and their interpretation in terms of condition flags are given in the following table:
Condition code |
Description |
eq | Equals: This condition holds when ZF is set. (Intuitively SRC1 = SRC2 when SRC1 - SRC2 = 0.) |
neq | Not equals: This condition holds when ZF is not set. |
lt | (Signed) less than: This condition holds when SF does not equal OF. Equivalently, this condition holds when ((SF = 1 and OF = 0) or (SF = 0 and OF = 1)). The first case holds when the result of SRC1 - SRC2 is negative and there has been no overflow, the second case holds when the result of SRC1 - SRC2 is positive and there has been an overflow. |
le | (Signed) less than or equal: This condition holds when (SF is not equal to OF) or ZF is set. This is equivalent to (lt or eq). |
gt | (Signed) greater than: This condition holds when (not le) holds. |
ge | (Signed) greater than or equal: This condition holds when (not lt) holds. Or, equivalently, if SF equals OF. |