X86 instruction listings

The x86 instruction set refers to the set of instructions that x86-compatible microprocessors support. The instructions are usually part of an executable program, often stored as a computer file and executed on the processor.

The x86 instruction set has been extended several times, introducing wider registers and datatypes as well as new functionality.^[1]

x86 integer instructions

Below is the full 8086/8088 instruction set of Intel (81 instructions total).^[2] These instructions are also available in 32-bit mode, in which they operate on 32-bit registers (eax, ebx, etc.) and values instead of their 16-bit (ax, bx, etc.) counterparts. The updated instruction set is grouped according to architecture (i186, i286, i386, i486, i586/i686) and is referred to as (32-bit) x86 and (64-bit) x86-64 (also known as AMD64).

Original 8086/8088 instructions

This is the original instruction set. In the 'Notes' column, r means register, m means memory address and imm means immediate (i.e. a value).

Original 8086/8088 instruction set
In- struc- tion	Meaning	Notes	Opcode
AAA	ASCII adjust AL after addition	used with unpacked binary-coded decimal	0x37
AAD	ASCII adjust AX before division	8086/8088 datasheet documents only base 10 version of the AAD instruction (opcode 0xD5 0x0A), but any other base will work. Later Intel's documentation has the generic form too. NEC V20 and V30 (and possibly other NEC V-series CPUs) always use base 10, and ignore the argument, causing a number of incompatibilities	0xD5
AAM	ASCII adjust AX after multiplication	Only base 10 version (Operand is 0xA) is documented, see notes for AAD	0xD4
AAS	ASCII adjust AL after subtraction		0x3F
ADC	Add with carry	(1) `r += (r/m/imm+CF);` (2) `m += (r/imm+CF);`	0x10...0x15, 0x80...0x81/2, 0x83/2
ADD	Add	(1) `r += r/m/imm;` (2) `m += r/imm;`	0x00...0x05, 0x80/0...0x81/0, 0x83/0
AND	Logical AND	(1) `r &= r/m/imm;` (2) `m &= r/imm;`	0x20...0x25, 0x80...0x81/4, 0x83/4
CALL	Call procedure	`push eip; eip points to the instruction directly after the call`	0x9A, 0xE8, 0xFF/2, 0xFF/3
CBW	Convert byte to word	`AX = AL ; sign extended`	0x98
CLC	Clear carry flag	`CF = 0;`	0xF8
CLD	Clear direction flag	`DF = 0;`	0xFC
CLI	Clear interrupt flag	`IF = 0;`	0xFA
CMC	Complement carry flag	`CF = !CF;`	0xF5
CMP	Compare operands	(1) `r - r/m/imm;` (2) `m - r/imm;`	0x38...0x3D, 0x80...0x81/7, 0x83/7
CMPSB	Compare bytes in memory. May be used with a REPE or REPNE prefix to test and repeat the instruction CX times.	if (DF==0) (byte)SI++ - (byte)ES:DI++; else (byte)SI-- - (byte)ES:DI--;	0xA6
CMPSW	Compare words. May be used with a REPE or REPNE prefix to test and repeat the instruction CX times.	if (DF==0) (word)SI++ - (word)ES:DI++; else (word)SI-- - (word)ES:DI--;	0xA7
CWD	Convert word to doubleword		0x99
DAA	Decimal adjust AL after addition	(used with packed binary-coded decimal)	0x27
DAS	Decimal adjust AL after subtraction		0x2F
DEC	Decrement by 1		0x48...0x4F, 0xFE/1, 0xFF/1
DIV	Unsigned divide	(1) `AX = DX:AX / r/m;` resulting `DX = remainder` (2) `AL = AX / r/m;` resulting `AH = remainder`	0xF7/6, 0xF6/6
ESC	Used with floating-point unit		0xD8..0xDF
HLT	Enter halt state		0xF4
IDIV	Signed divide	(1) `AX = DX:AX / r/m;` resulting `DX = remainder` (2) `AL = AX / r/m;` resulting `AH = remainder`	0xF7/7, 0xF6/7
IMUL	Signed multiply in One-operand form	(1) `DX:AX = AX * r/m;` (2) `AX = AL * r/m`	0xF7/5, 0xF6/5
IN	Input from port	(1) `AL = port[imm];` (2) `AL = port[DX];` (3) `AX = port[imm];` (4) `AX = port[DX];`	0xE4, 0xE5, 0xEC, 0xED
INC	Increment by 1		0x40...0x47, 0xFE/0, 0xFF/0
INT	Call to interrupt		0xCC, 0xCD
INTO	Call to interrupt if overflow		0xCE
IRET	Return from interrupt		0xCF
Jcc	Jump if condition	JA, JAE, JB, JBE, JC (same as JB), JE, JG, JGE, JL, JLE, JNA (same as JBE), JNAE (same as JB), JNB (same as JAE), JNBE, JNC (same as JAE), JNE, JNG (same as JLE), JNGE (same as JL), JNL (same as JGE), JNLE (same as JG), JNO, JNP, JNS, JNZ (same as JNE), JO, JP, JPE (same as JP), JPO (same as JNP), JS, JZ (same as JE)^[3]	0x70...0x7F
JCXZ	Jump if CX is zero	JECXZ for ECX instead of CX in 32 bit mode (same opcode).	0xE3
JMP	Jump		0xE9...0xEB, 0xFF/4, 0xFF/5
LAHF	Load FLAGS into AH register		0x9F
LDS	Load DS:r with far pointer	`r = m; DS = 2 + m;`	0xC5
LEA	Load Effective Address		0x8D
LES	Load ES:r with far pointer	`r = m; ES = 2 + m;`	0xC4
LOCK	Assert BUS LOCK# signal	(for multiprocessing)	0xF0
LODSB	Load string byte. May be used with a REP prefix to repeat the instruction CX times.	`if (DF==0) AL = SI++; else AL = SI--;`	0xAC
LODSW	Load string word. May be used with a REP prefix to repeat the instruction CX times.	`if (DF==0) AX = SI++; else AX = SI--;`	0xAD
LOOP/ LOOPx	Loop control	(LOOPE, LOOPNE, LOOPNZ, LOOPZ) `if (x && --CX) goto lbl;`	0xE0...0xE2
MOV	Move	(1) `r = r/m/imm;` (2) `m = r/imm;` (3) `r/m = sreg;` (4) `sreg = r/m;`	0xA0...0xA3, 0x8C, 0x8E
MOVSB	Move byte from string to string. May be used with a REP prefix to repeat the instruction CX times.	if (DF==0) (byte)ES:DI++ = (byte)SI++; else (byte)ES:DI-- = (byte)SI--; .	0xA4
MOVSW	Move word from string to string. May be used with a REP prefix to repeat the instruction CX times.	if (DF==0) (word)ES:DI++ = (word)SI++; else (word)ES:DI-- = (word)SI--;	0xA5
MUL	Unsigned multiply	(1) `DX:AX = AX * r/m;` (2) `AX = AL * r/m;`	0xF7/4, 0xF6/4
NEG	Two's complement negation	`r/m = 0 – r/m;`	0xF6/3...0xF7/3
NOP	No operation	opcode equivalent to `XCHG EAX, EAX`	0x90
NOT	Negate the operand, logical NOT	`r/m ^= -1;`	0xF6/2...0xF7/2
OR	Logical OR	(1) `r ∣= r/m/imm;` (2) `m ∣= r/imm;`	0x08...0x0D, 0x80...0x81/1, 0x83/1
OUT	Output to port	(1) `port[imm] = AL;` (2) `port[DX] = AL;` (3) `port[imm] = AX;` (4) `port[DX] = AX;`	0xE6, 0xE7, 0xEE, 0xEF
POP	Pop data from stack	`r/m/sreg = *SP++;`	0x07, 0x17, 0x1F, 0x58...0x5F, 0x8F/0
POPF	Pop FLAGS register from stack	`FLAGS = *SP++;`	0x9D
PUSH	Push data onto stack	`*--SP = r/m/sreg;`	0x06, 0x0E, 0x16, 0x1E, 0x50...0x57, 0xFF/6
PUSHF	Push FLAGS onto stack	`*--SP = FLAGS;`	0x9C
RCL	Rotate left (with carry)		0xC0...0xC1/2 (186+), 0xD0...0xD3/2
RCR	Rotate right (with carry)		0xC0...0xC1/3 (186+), 0xD0...0xD3/3
REPxx	Repeat MOVS/STOS/CMPS/LODS/SCAS	(REP, REPE, REPNE, REPNZ, REPZ)	0xF2, 0xF3
RET	Return from procedure	Not a real instruction. The assembler will translate these to a RETN or a RETF depending on the memory model of the target system.
RETN	Return from near procedure		0xC2, 0xC3
RETF	Return from far procedure		0xCA, 0xCB
ROL	Rotate left		0xC0...0xC1/0 (186+), 0xD0...0xD3/0
ROR	Rotate right		0xC0...0xC1/1 (186+), 0xD0...0xD3/1
SAHF	Store AH into FLAGS		0x9E
SAL	Shift Arithmetically left (signed shift left)	(1) `r/m <<= 1;` (2) `r/m <<= CL;`	0xC0...0xC1/4 (186+), 0xD0...0xD3/4
SAR	Shift Arithmetically right (signed shift right)	(1) `(signed) r/m >>= 1;` (2) `(signed) r/m >>= CL;`	0xC0...0xC1/7 (186+), 0xD0...0xD3/7
SBB	Subtraction with borrow	(1) `r -= (r/m/imm+CF);` (2) `m -= (r/imm+CF);` alternative 1-byte encoding of `SBB AL, AL` is available via undocumented SALC instruction	0x18...0x1D, 0x80...0x81/3, 0x83/3
SCASB	Compare byte string. May be used with a REPE or REPNE prefix to test and repeat the instruction CX times.	`if (DF==0) AL - ES:DI++; else AL - ES:DI--;`	0xAE
SCASW	Compare word string. May be used with a REPE or REPNE prefix to test and repeat the instruction CX times.	`if (DF==0) AX - ES:DI++; else AX - ES:DI--;`	0xAF
SHL	Shift left (unsigned shift left)	Same opcode as SAL, since logical left shifts are equal to arithmetical left shifts.	0xC0...0xC1/4 (186+), 0xD0...0xD3/4
SHR	Shift right (unsigned shift right)		0xC0...0xC1/5 (186+), 0xD0...0xD3/5
STC	Set carry flag	`CF = 1;`	0xF9
STD	Set direction flag	`DF = 1;`	0xFD
STI	Set interrupt flag	`IF = 1;`	0xFB
STOSB	Store byte in string. May be used with a REP prefix to repeat the instruction CX times.	`if (DF==0) ES:DI++ = AL; else ES:DI-- = AL;`	0xAA
STOSW	Store word in string. May be used with a REP prefix to repeat the instruction CX times.	`if (DF==0) ES:DI++ = AX; else ES:DI-- = AX;`	0xAB
SUB	Subtraction	(1) `r -= r/m/imm;` (2) `m -= r/imm;`	0x28...0x2D, 0x80...0x81/5, 0x83/5
TEST	Logical compare (AND)	(1) `r & r/m/imm;` (2) `m & r/imm;`	0x84, 0x85, 0xA8, 0xA9, 0xF6/0, 0xF7/0
WAIT	Wait until not busy	Waits until BUSY# pin is inactive (used with floating-point unit)	0x9B
XCHG	Exchange data	`r :=: r/m;` A spinlock typically uses xchg as an atomic operation. (coma bug).	0x86, 0x87, 0x91...0x97
XLAT	Table look-up translation	behaves like `MOV AL, [BX+AL]`	0xD7
XOR	Exclusive OR	(1) `r ^+= r/m/imm;` (2) `m ^= r/imm;`	0x30...0x35, 0x80...0x81/6, 0x83/6

Added in specific processors

Added with 80186/80188

Instruction	Opcode	Meaning	Notes
BOUND	62 /r	Check array index against bounds	raises software interrupt 5 if test fails
ENTER	C8 iw ib	Enter stack frame	Modifies stack for entry to procedure for high level language. Takes two operands: the amount of storage to be allocated on the stack and the nesting level of the procedure.
INSB/INSW	6C	Input from port to string. May be used with a REP prefix to repeat the instruction CX times.	equivalent to: IN AL, DX MOV ES:[DI], AL INC DI ; adjust DI according to operand size and DF
INSB/INSW	6D
LEAVE	C9	Leave stack frame	Releases the local stack storage created by the previous ENTER instruction.
OUTSB/OUTSW	6E	Output string to port. May be used with a REP prefix to repeat the instruction CX times.	equivalent to: MOV AL, DS:[SI] OUT DX, AL INC SI ; adjust SI according to operand size and DF
OUTSB/OUTSW	6F
POPA	61	Pop all general purpose registers from stack	equivalent to: POP DI POP SI POP BP POP AX ; no POP SP here, all it does is ADD SP, 2 (since AX will be overwritten later) POP BX POP DX POP CX POP AX
PUSHA	60	Push all general purpose registers onto stack	equivalent to: PUSH AX PUSH CX PUSH DX PUSH BX PUSH SP ; The value stored is the initial SP value PUSH BP PUSH SI PUSH DI
PUSH immediate	6A ib	Push an immediate byte/word value onto the stack	example: PUSH 12h PUSH 1200h
PUSH immediate	68 iw	Push an immediate byte/word value onto the stack	example: PUSH 12h PUSH 1200h
IMUL immediate	6B /r ib	Signed and unsigned multiplication of immediate byte/word value	example: IMUL BX,12h IMUL DX,1200h IMUL CX, DX, 12h IMUL BX, SI, 1200h IMUL DI, word ptr [BX+SI], 12h IMUL SI, word ptr [BP-4], 1200h Note that since the lower half is the same for unsigned and signed multiplication, this version of the instruction can be used for unsigned multiplication as well.
IMUL immediate	69 /r iw
SHL/SHR/SAL/SAR/ROL/ROR/RCL/RCR immediate	C0	Rotate/shift bits with an immediate value greater than 1	example: ROL AX,3 SHR BL,3
SHL/SHR/SAL/SAR/ROL/ROR/RCL/RCR immediate	C1	Rotate/shift bits with an immediate value greater than 1	example: ROL AX,3 SHR BL,3

Added with 80286

The new instructions added in 80286 add support for x86 protected mode. Some but not all of the instructions are available in real mode as well.

Instruction	Opcode	Instruction description	Real mode	Ring

`LGDT m16&32`^[a]	`0F 01 /2`	Load GDTR (Global Descriptor Table Register) from memory.^[b]	Yes	0
`LIDT m16&32`^[a]	`0F 01 /3`	Load IDTR (Interrupt Descriptor Table Register) from memory.^[b] The IDTR controls not just the address/size of the IDT (interrupt Descriptor Table) in protected mode, but the IVT (Interrupt Vector Table) in real mode as well.
`LMSW r/m16`	`0F 01 /6`	Load MSW (Machine Status Word) from 16-bit register or memory.^[c]^[d]
`CLTS`	`0F 06`	Clear task-switched flag in the MSW.
`LLDT r/m16`	`0F 00 /2`	Load LDTR (Local Descriptor Table Register) from 16-bit register or memory.^[b]	#UD
`LTR r/m16`	`0F 00 /3`	Load TR (Task Register) from 16-bit register or memory.^[b] The TSS (Task State Segment) specified by the 16-bit argument is marked busy, but a task switch is not done.	#UD

`SGDT m16&32`^[a]	`0F 01 /0`	Store GDTR to memory.	Yes	Usually 3^[e]
`SIDT m16&32`^[a]	`0F 01 /1`	Store IDTR to memory.
`SMSW r/m16`	`0F 01 /4`	Store MSW to register or 16-bit memory.^[f]
`SLDT r/m16`	`0F 00 /0`	Store LDTR to register or 16-bit memory.^[f]	#UD
`STR r/m16`	`0F 00 /1`	Store TR to register or 16-bit memory.^[f]	#UD

`ARPL r/m16,r16`	`63 /r`^[g]	Adjust RPL (Requested Privilege Level) field of selector. The operation performed is: if (dst & 3) < (src & 3) then dst = (dst & 0xFFFC) \| (src & 3) eflags.zf = 1 else eflags.zf = 0	#UD^[h]	3
`LAR r,r/m16`	`0F 02 /r`	Load access rights byte from the specified segment descriptor. Reads bytes 4-7 of segment descriptor, bitwise-ANDs it with `0x00FxFF00`,^[i] then stores the bottom 16/32 bits of the result in destination register. Sets EFLAGS.ZF=1 if the descriptor could be loaded, ZF=0 otherwise.^[j]	#UD
`LSL r,r/m16`	`0F 03 /r`	Load segment limit from the specified segment descriptor. Sets ZF=1 if the descriptor could be loaded, ZF=0 otherwise.^[j]
`VERR r/m16`	`0F 00 /4`	Verify a segment for reading. Sets ZF=1 if segment can be read, ZF=0 otherwise.
`VERW r/m16`	`0F 00 /5`	Verify a segment for writing. Sets ZF=1 if segment can be written, ZF=0 otherwise.^[k]

LOADALL^[l]	0F 05	Load all CPU registers from a 102-byte data structure starting at physical address `800h`, including "hidden" part of segment descriptor registers.	Yes	0
STOREALL^[l]	F1 0F 04	Store all CPU registers to a 102-byte data structure starting at physical address `800h`, then shut down CPU.	Yes	0

^ ^a ^b ^c ^d The descriptors used by the LGDT, LIDT, SGDT and SIDT instructions consist of a 2-part data structure. The first part is a 16-bit value, specifying table size in bytes minus 1. The second part is a 32-bit value (64-bit value in 64-bit mode), specifying the linear start address of the table.
For LGDT and LIDT with a 16-bit operand size, the address is ANDed with 00FFFFFFh. On Intel (but not AMD) CPUs, the SGDT and SIDT instructions with a 16-bit operand size is – as of Intel SDM revision 079, March 2023 – documented to write a descriptor to memory with the last byte being set to 0. However, observed behavior is that bits 31:24 of the descriptor table address are written instead.^[4]
^ ^a ^b ^c ^d The LGDT, LIDT, LLDT and LTR instructions are serializing on Pentium and later processors.
^ The LMSW instruction is serializing on Intel processors from Pentium onwards, but not on AMD processors.
^ On 80386 and later, the "Machine Status Word" is the same as the CR0 control register – however, the LMSW instruction can only modify the bottom 4 bits of this register and cannot clear bit 0. The inability to clear bit 0 means that LMSW can be used to enter but not leave x86 Protected Mode.
On 80286, it is not possible to leave Protected Mode at all (neither with LMSW nor with LOADALL^[5]) without a CPU reset – on 80386 and later, it is possible to leave Protected Mode, but this requires the use of the 80386-and-later MOV to CR0 instruction.
^ If CR4.UMIP=1 is set, then the SGDT, SIDT, SLDT, SMSW and STR instructions can only run in Ring 0.
These instructions were unprivileged on all x86 CPUs from 80286 onwards until the introduction of UMIP in 2017.^[6] This has been a significant security problem for software-based virtualization, since it enables these instructions to be used by a VM guest to detect that it is running inside a VM.^[7]^[8]
^ ^a ^b ^c
The SMSW, SLDT and STR instructions always use an operand size of 16 bits when used with a memory argument. With a register argument on 80386 or later processors, wider destination operand sizes are available and behave as follows:
- SMSW: Stores full CR0 in x86-64 long mode, undefined otherwise.
- SLDT: Zero-extends 16-bit argument on Pentium Pro and later processors, undefined on earlier processors.
- STR: Zero-extends 16-bit argument.
^ In 64-bit long mode, the ARPL instruction is not available – the 63 /r opcode has been reassigned to the 64-bit-mode-only MOVSXD instruction.
^ The ARPL instruction causes #UD in Real mode and Virtual 8086 Mode – Windows 95 and OS/2 2.x are known to make extensive use of this #UD to use the 63 opcode as a one-byte breakpoint to transition from Virtual 8086 Mode to kernel mode.^[9]^[10]
^ Bits 19:16 of this mask are documented as "undefined" on Intel CPUs.^[11] On AMD CPUs, the mask is documented as 0x00FFFF00.
^ ^a ^b For the LAR and LSL instructions, if the specified segment descriptor could not be loaded, then the instruction's destination register is left unmodified.
^ On some Intel CPU/microcode combinations from 2019 onwards, the VERW instruction also flushes microarchitectural data buffers. This enables it to be used as part of workarounds for Microarchitectural Data Sampling security vulnerabilities.^[12]^[13] Some of the microarchitectural buffer-flushing functions that have been added to VERW may require the instruction to be executed with a memory operand.^[14]
^ ^a ^b Undocumented, 80286 only.^[5]^[15]^[16] (A different variant of LOADALL with a different opcode and memory layout exists on 80386.)

Added with 80386

The 80386 added support for 32-bit operation to the x86 instruction set. This was done by widening the general-purpose registers to 32 bits and introducing the concepts of OperandSize and AddressSize – most instruction forms that would previously take 16-bit data arguments were given the ability to take 32-bit arguments by setting their OperandSize to 32 bits, and instructions that could take 16-bit address arguments were given the ability to take 32-bit address arguments by setting their AddressSize to 32 bits. (Instruction forms that work on 8-bit data continue to be 8-bit regardless of OperandSize. Using a data size of 16 bits will cause only the bottom 16 bits of the 32-bit general-purpose registers to be modified – the top 16 bits are left unchanged.)

The default OperandSize and AddressSize to use for each instruction is given by the D bit of the segment descriptor of the current code segment - D=0 makes both 16-bit, D=1 makes both 32-bit. Additionally, they can be overridden on a per-instruction basis with two new instruction prefixes that were introduced in the 80386:

66h: OperandSize override. Will change OperandSize from 16-bit to 32-bit if CS.D=0, or from 32-bit to 16-bit if CS.D=1.
67h: AddressSize override. Will change AddressSize from 16-bit to 32-bit if CS.D=0, or from 32-bit to 16-bit if CS.D=1.

The 80386 also introduced the two new segment registers FS and GS as well as the x86 control, debug and test registers.

The new instructions introduced in the 80386 can broadly be subdivided into two classes:

Pre-existing opcodes that needed new mnemonics for their 32-bit OperandSize variants (e.g. CWDE, LODSD)
New opcodes that introduced new functionality (e.g. SHLD, SETcc)

For instruction forms where the operand size can be inferred from the instruction's arguments (e.g. ADD EAX,EBX can be inferred to have a 32-bit OperandSize due to its use of EAX as an argument), new instruction mnemonics are not needed and not provided.

80386: new instruction mnemonics for 32-bit variants of older opcodes
Type	Instruction mnemonic	Opcode	Description	Mnemonic for older 16-bit variant	Ring
String instructions^[a]^[b]	`LODSD`	`AD`	Load string doubleword: `EAX := DS:[rSI±±]`	`LODSW`	3
	`STOSD`	`AB`	Store string doubleword: `ES:[rDI±±] := EAX`	`STOSW`
	`MOVSD`	`A5`	Move string doubleword: `ES:[rDI±±] := DS:[rSI±±]`	`MOVSW`
	`CMPSD`	`A7`	Compare string doubleword: temp1 := DS:[rSI±±] temp2 := ES:[rDI±±] CMP temp1, temp2 /* 32-bit compare and set EFLAGS */	`CMPSW`
	`SCASD`	`AF`	Scan string doubleword: temp1 := ES:[rDI±±] CMP EAX, temp1 /* 32-bit compare and set EFLAGS */	`SCASW`
	`INSD`	`6D`	Input string from doubleword I/O port:`ES:[rDI±±] := port[DX]`^[c]	`INSW`	Usually 0^[d]
	`OUTSD`	`6F`	Output string to doubleword I/O port:`port[DX] := DS:[rSI±±]`	`OUTSW`	Usually 0^[d]

Other	`CWDE`	`98`	Sign-extend 16-bit value in AX to 32-bit value in EAX^[e]	`CBW`	3
	`CDQ`	`99`	Sign-extend 32-bit value in EAX to 64-bit value in EDX:EAX. Mainly used to prepare a dividend for the 32-bit `IDIV` (signed divide) instruction.	`CWD`
	`JECXZ rel8`	`E3 cb`^[f]	Jump if ECX is zero	`JCXZ`
	`PUSHAD`	`60`	Push all 32-bit registers onto stack^[g]	`PUSHA`
	`POPAD`	`61`	Pop all 32-bit general-purpose registers off stack^[h]	`POPA`
	`PUSHFD`	`9C`	Push 32-bit EFLAGS register onto stack	`PUSHF`	Usually 3^[i]
	`POPFD`	`9D`	Pop 32-bit EFLAGS register off stack	`POPF`
	`IRETD`	`CF`	32-bit interrupt return. Differs from the older 16-bit `IRET` instruction in that it will pop interrupt return items (EIP,CS,EFLAGS; also ESP^[j] and SS if there is a CPL change; and also ES,DS,FS,GS if returning to virtual 8086 mode) off the stack as 32-bit items instead of 16-bit items. Should be used to return from interrupts when the interrupt handler was entered through a 32-bit IDT interrupt/trap gate. Instruction is serializing.	`IRET`

^ For the 32-bit string instructions, the ±± notation is used to indicate that the indicated register is post-decremented by 4 if EFLAGS.DF=1 and post-incremented by 4 otherwise.
For the operands where the DS segment is indicated, the DS segment can be overridden by a segment-override prefix – where the ES segment is indicated, the segment is always ES and cannot be overridden.
The choice of whether to use the 16-bit SI/DI registers or the 32-bit ESI/EDI registers as the address registers to use is made by AddressSize, overridable with the 67 prefix.
^ The 32-bit string instructions accept repeat-prefixes in the same way as older 8/16-bit string instructions.
For LODSD, STOSD, MOVSD, INSD and OUTSD, the REP prefix (F3) will repeat the instruction the number of times specified in rCX (CX or ECX, decided by AddressSize), decrementing rCX for each iteration (with rCX=0 resulting in no-op and proceeding to the next instruction).
For CMPSD and SCASD, the REPE (F3) and REPNE (F2) prefixes are available, which will repeat the instruction, decrementing rCX for each iteration, but only as long as the flag condition (ZF=1 for REPE, ZF=0 for REPNE) holds true AND rCX ≠ 0.
^ For the INSB/W/D instructions, the memory access rights for the ES:[rDI] memory address might not be checked until after the port access has been performed – if this check fails (e.g. page fault or other memory exception), then the data item read from the port is lost. As such, it is not recommended to use this instruction to access an I/O port that performs any kind of side effect upon read.
^ I/O port access is only allowed when CPL≤IOPL or the I/O port permission bitmap bits for the port to access are all set to 0.
^ The CWDE instruction differs from the older CWD instruction in that CWD would sign-extend the 16-bit value in AX into a 32-bit value in the DX:AX register pair.
^ For the E3 opcode (JCXZ/JECXZ), the choice of whether the instruction will use CX or ECX for its comparison (and consequently which mnemonic to use) is based on the AddressSize, not OperandSize. (OperandSize instead controls whether the jump destination should be truncated to 16 bits or not).
This also applies to the loop instructions LOOP,LOOPE,LOOPNE (opcodes E0,E1,E2), however, unlike JCXZ/JECXZ, these instructions have not been given new mnemonics for their ECX-using variants.
^ For PUSHA(D), the value of SP/ESP pushed onto the stack is the value it had just before the PUSHA(D) instruction started executing.
^ For POPA/POPAD, the stack item corresponding to SP/ESP is popped off the stack (performing a memory read), but not placed into SP/ESP.
^ The PUSHFD and POPFD instructions will cause a #GP exception if executed in virtual 8086 mode if IOPL is not 3.
The PUSHF, POPF, IRET and IRETD instructions will cause a #GP exception if executed in Virtual-8086 mode if IOPL is not 3 and VME is not enabled.
^ If IRETD is used to return from kernel mode to user mode (which will entail a CPL change) and the user-mode stack segment indicated by SS is a 16-bit segment, then the IRETD instruction will only restore the low 16 bits of the stack pointer (ESP/RSP), with the remaining bits keeping whatever value they had in kernel code before the IRETD. This has necessitated complex workarounds on both Linux ("ESPFIX")^[17] and Windows.^[18] This issue also affects the later 64-bit IRETQ instruction.

80386: new opcodes introduced
Instruction mnemonics	Opcode	Description	Ring
`BT r/m, r`	`0F A3 /r`	Bit Test.^[a] Second operand specifies which bit of the first operand to test. The bit to test is copied to EFLAGS.CF.	3
`BT r/m, imm8`	`0F BA /4 ib`
`BTS r/m, r`	`0F AB /r`	Bit Test-and-set.^[a]^[b] Second operand specifies which bit of the first operand to test and set.
`BTS r/m, imm8`	`0F BA /5 ib`
`BTR r/m, r`	`0F B3 /r`	Bit Test and Reset.^[a]^[b] Second operand specifies which bit of the first operand to test and clear.
`BTR r/m, imm8`	`0F BA /6 ib`
`BTC r/m, r`	`0F BB /r`	Bit Test and Complement.^[a]^[b] Second operand specifies which bit of the first operand to test and toggle.
`BTC r/m, imm8`	`0F BA /7 ib`

`BSF r, r/m`	`NFx 0F BC /r`^[c]	Bit scan forward. Returns bit index of lowest set bit in input.^[d]	3
`BSR r, r/m`	`NFx 0F BD /r`^[e]	Bit scan reverse. Returns bit index of highest set bit in input.^[d]
`SHLD r/m, r, imm8`	`0F A4 /r ib`	Shift Left Double. The operation of `SHLD arg1,arg2,shamt` is: `arg1 := (arg1<<shamt) \| (arg2>>(operand_size - shamt))`^[f]
`SHLD r/m, r, CL`	`0F A5 /r`
`SHRD r/m, r, imm8`	`0F AC /r ib`	Shift Right Double. The operation of `SHRD arg1,arg2,shamt` is: `arg1 := (arg1>>shamt) \| (arg2<<(operand_size - shamt))`^[f]
`SHRD r/m, r, CL`	`0F AD /r`

`MOVZX reg, r/m8`	`0F B6 /r`	Move from 8/16-bit source to 16/32-bit register with zero-extension.	3
`MOVZX reg, r/m16`	`0F B7 /r`
`MOVSX reg, r/m8`	`0F BE /r`	Move from 8/16-bit source to 16/32/64-bit register with sign-extension.
`MOVSX reg, r/m16`	`0F BF /r`
`SETcc r/m8`	`0F 9x /0`^[g]^[h]	Set byte to 1 if condition is satisfied, 0 otherwise.
`Jcc rel16` `Jcc rel32`	`0F 8x cw` `0F 8x cd`^[g]	Conditional jump near. Differs from older variants of conditional jumps in that they accept a 16/32-bit offset rather than just an 8-bit offset.
`IMUL r, r/m`	`0F AF /r`	Two-operand non-widening integer multiply.

`FS:`	`64`	Segment-override prefixes for FS and GS segment registers.	3
`GS:`	`65`	Segment-override prefixes for FS and GS segment registers.
`PUSH FS`	`0F A0`	Push/pop FS and GS segment registers.
`POP FS`	`0F A1`
`PUSH GS`	`0F A8`
`POP GS`	`0F A9`
`LFS r16, m16&16` `LFS r32, m32&16`	`0F B4 /r`	Load far pointer from memory. Offset part is stored in destination register argument, segment part in FS/GS/SS segment register as indicated by the instruction mnemonic.^[i]
`LGS r16, m16&16` `LGS r32, m32&16`	`0F B5 /r`
`LSS r16, m16&16` `LSS r32, m32&16`	`0F B2 /r`

`MOV reg,CRx`	`0F 20 /r`^[j]	Move from control register to general register.^[k]	0
`MOV CRx,reg`	`0F 22 /r`^[j]	Move from general register to control register.^[k] Moves to the `CR3` control register are serializing and will flush the TLB.^[l] On Pentium and later processors, moves to the `CR0` and `CR4` control registers are also serializing.^[m]
`MOV reg,DRx`	`0F 21 /r`^[j]	Move from x86 debug register to general register.^[k]
`MOV DRx,reg`	`0F 23 /r`^[j]	Move from general register to x86 debug register.^[k] On Pentium and later processors, moves to the DR0-DR7 debug registers are serializing.
`MOV reg,TRx`	`0F 24 /r`^[j]	Move from x86 test register to general register.^[n]
`MOV TRx,reg`	`0F 26 /r`^[j]	Move from general register to x86 test register.^[n]

ICEBP, INT01, INT1^[o]	F1	In-circuit emulation breakpoint. Performs software interrupt #1 if executed when not using in-circuit emulation.^[p]	3
UMOV r/m, r8	0F 10 /r	User Move – perform data moves that can access user memory while in In-circuit emulation HALT mode. Performs same operation as `MOV` if executed when not doing in-circuit emulation.^[q]
UMOV r/m, r16/32	0F 11 /r
UMOV r8, r/m	0F 12 /r
UMOV r16/32, r/m	0F 13 /r
XBTS reg,r/m	0F A6 /r	Bitfield extract (early 386 only).^[r]^[s]
IBTS r/m,reg	0F A7 /r	Bitfield insert (early 386 only).^[r]^[s]
LOADALLD, LOADALL386^[t]	0F 07	Load all CPU registers from a 296-byte data structure starting at ES:EDI, including "hidden" part of segment descriptor registers.	0

^ ^a ^b ^c ^d
For the BT, BTS, BTR and BTC instructions:
- If the first argument to the instruction is a register operand and/or the second argument is an immediate, then the bit-index in the second argument is taken modulo operand size (16/32/64, in effect using only the bottom 4, 5 or 6 bits of the index.)
- If the first argument is a memory operand and the second argument is a register operand, then the bit-index in the second argument is used in full – it is interpreted as a signed bit-index that is used to offset the memory address to use for the bit test.
^ ^a ^b ^c The BTS, BTC and BTR instructions accept the LOCK (F0) prefix when used with a memory argument – this results in the instruction executing atomically.
^ If the F3 prefix is used with the 0F BC /r opcode, then the instruction will execute as TZCNT on systems that support the BMI1 extension. TZCNT differs from BSF in that TZCNT but not BSR is defined to return operand size if the source operand is zero – for other source operand values, they produce the same result (except for flags).
^ ^a ^b BSF and BSR set the EFLAGS.ZF flag to 1 if the source argument was all-0s and 0 otherwise.
If the source argument was all-0s, then the destination register is documented as being left unchanged on AMD processors, but set to an undefined value on Intel processors.
^ If the F3 prefix is used with the 0F BD /r opcode, then the instruction will execute as LZCNT on systems that support the ABM or LZCNT extensions. LZCNT produces a different result from BSR for most input values.
^ ^a ^b For SHLD and SHRD, the shift-amount is masked – the bottom 5 bits are used for 16/32-bit operand size and 6 bits for 64-bit operand size.
SHLD and SHRD with 16-bit arguments and a shift-amount greater than 16 produce undefined results. (Actual results differ between different Intel CPUs, with at least three different behaviors known.^[19])

^ ^a ^b The condition codes supported for the SETcc and Jcc near instructions (opcodes 0F 9x /0 and 0F 8x respectively, with the x nibble specifying the condition) are:

x	cc	Condition (EFLAGS)
0	O	OF=1: "Overflow"
1	NO	OF=0: "Not Overflow"
2	C,B,NAE	CF=1: "Carry", "Below", "Not Above or Equal"
3	NC,NB,AE	CF=0: "Not Carry", "Not Below", "Above or Equal"
4	Z,E	ZF=1: "Zero", "Equal"
5	NZ,NE	ZF=0: "Not Zero", "Not Equal"
6	NA,BE	(CF=1 or ZF=1): "Not Above", "Below or Equal"
7	A,NBE	(CF=0 and ZF=0): "Above", "Not Below or Equal"
8	S	SF=1: "Sign"
9	NS	SF=0: "Not Sign"
A	P,PE	PF=1: "Parity", "Parity Even"
B	NP,PO	PF=0: "Not Parity", "Parity Odd"
C	L,NGE	SF≠OF: "Less", "Not Greater Or Equal"
D	NL,GE	SF=OF: "Not Less", "Greater Or Equal"
E	LE,NG	(ZF=1 or SF≠OF): "Less or Equal", "Not Greater"
F	NLE,G	(ZF=0 and SF=OF): "Not Less or Equal", "Greater"

^ For SETcc, while the opcode is commonly specified as /0 – implying that bits 5:3 of the instruction's ModR/M byte should be 000 – modern x86 processors (Pentium and later) ignore bits 5:3 and will execute the instruction as SETcc regardless of the contents of these bits.
^ For LFS, LGS and LSS, the size of the offset part of the far pointer is given by operand size – the size of the segment part is always 16 bits. In 64-bit mode, using the REX.W prefix with these instructions will cause them to load a far pointer with a 64-bit offset on Intel but not AMD processors.
^ ^a ^b ^c ^d ^e ^f For MOV to/from the CRx, DRx and TRx registers, the reg part of the ModR/M byte is used to indicate CRx/DRx/TRx register and r/m part the general-register. Uniquely for the MOV CRx/DRx/TRx opcodes, the top two bits of the ModR/M byte is ignored – these opcodes are decoded and executed as if the top two bits of the ModR/M byte are 11b.
^ ^a ^b ^c ^d For moves to/from the CRx and DRx registers, the operand size is always 64 bits in 64-bit mode and 32 bits otherwise.
^ On processors that support global pages (Pentium and later), global page table entries will not be flushed by a MOV to CR3 − instead, these entries can be flushed by toggling the CR4.PGE bit.
On processors that support PCIDs, writing to CR3 while PCIDs are enabled will only flush TLB entries belonging to the PCID specified in bits 11:0 of the value written to CR3 (this flush can be suppressed by setting bit 63 of the written value to 1). Flushing pages belonging to other PCIDs can instead be done by toggling the CR4.PGE bit, clearing the CR4.PCIDE bit, or using the INVPCID instruction.
^ On processors prior to Pentium, moves to CR0 would not serialize the instruction stream – in part for this reason, it is usually required to perform a far jump^[20] immediately after a MOV to CR0 if such a MOV is used to enable/disable protected mode and/or memory paging.
MOV to CR2 is architecturally listed as serializing, but has been reported to be non-serializing on at least some Intel Core-i7 processors.^[21]
MOV to CR8 (introduced with x86-64) is serializing on AMD but not Intel processors.
^ ^a ^b The MOV TRx instructions were discontinued from Pentium onwards.
^ The INT1/ICEBP (F1) instruction is present on all known Intel x86 processors from the 80386 onwards,^[22] but only fully documented for Intel processors from the May 2018 release of the Intel SDM (rev 067) onwards.^[23] Before this release, mention of the instruction in Intel material was sporadic, e.g. AP-526 rev 001.^[24]
For AMD processors, the instruction has been documented since 2002.^[25]
^
The operation of the F1(ICEBP) opcode differs from the operation of the regular software interrupt opcode CD 01 in several ways:
- In virtual-8086 mode, CD 01 will also check CPL against IOPL as an access-rights check, while F1 will not.
- In virtual-8086 mode with VME enabled, interrupt redirection is supported for CD 01 but not F1.
^ The UMOV instruction is present on 386 and 486 processors only.^[22]
^ ^a ^b The XBTS and IBTS instructions were discontinued with the B1 stepping of 80386.
They have been used by software mainly for detection of the buggy^[26] B0 stepping of the 80386. Microsoft Windows (v2.01 and later) will attempt to run the XBTS instruction as part of its CPU detection if CPUID is not present, and will refuse to boot if XBTS is found to be working.^[27]
^ ^a ^b For XBTS and IBTS, the r/m argument represents the data to extract/insert a bitfield from/to, the reg argument the bitfield to be inserted/extracted, AX/EAX a bit-offset and CL a bitfield length.^[28]
^ Undocumented, 80386 only.^[29]

Added with 80486

Instruction	Opcode	Description	Ring
`BSWAP r32`	`0F C8+r`	Byte Order Swap. Usually used to convert between big-endian and little-endian data representations. For 32-bit registers, the operation performed is: r = (r << 24) \| ((r << 8) & 0x00FF0000) \| ((r >> 8) & 0x0000FF00) \| (r >> 24); Using `BSWAP` with a 16-bit register argument produces an undefined result.^[a]	3
`CMPXCHG r/m8,r8`	`0F B0 /r`^[b]	Compare and Exchange. If accumulator (AL/AX/EAX/RAX) compares equal to first operand,^[c] then `EFLAGS.ZF` is set to 1 and the first operand is overwritten with the second operand. Otherwise, `EFLAGS.ZF` is set to 0, and first operand is copied into the accumulator. Instruction atomic only if used with `LOCK` prefix.
`CMPXCHG r/m,r16` `CMPXCHG r/m,r32`	`0F B1 /r`^[b]
`XADD r/m,r8`	`0F C0 /r`	eXchange and ADD. Exchanges the first operand with the second operand, then stores the sum of the two values into the destination operand. Instruction atomic only if used with `LOCK` prefix.
`XADD r/m,r16` `XADD r/m,r32`	`0F C1 /r`
`INVLPG m8`	`0F 01 /7`	Invalidate the TLB entries that would be used for the 1-byte memory operand.^[d] Instruction is serializing.	0
`INVD`	`0F 08`	Invalidate Internal Caches.^[e] Modified data in the cache are not written back to memory, potentially causing data loss.^[f]
`WBINVD`	`NFx 0F 09`^[g]	Write Back and Invalidate Cache.^[e] Writes back all modified cache lines in the processor's internal cache to main memory and invalidates the internal caches.

^ Using BSWAP with 16-bit registers is not disallowed per se (it will execute without producing an #UD or other exceptions) but is documented to produce undefined results – it is reported to produce various different results on 486,^[30] 586, and Bochs/QEMU.^[31]
^ ^a ^b On Intel 80486 stepping A,^[32] the CMPXCHG instruction uses a different encoding - 0F A6 /r for 8-bit variant, 0F A7 /r for 16/32-bit variant. The 0F B0/B1 encodings are used on 80486 stepping B and later.^[33]^[34]
^ The CMPXCHG instruction sets EFLAGS in the same way as a CMP instruction that uses the accumulator (AL/AX/EAX/RAX) as its first argument would do.
^ INVLPG executes as no-operation if the m8 argument is invalid (e.g. unmapped page or non-canonical address).
INVLPG can be used to invalidate TLB entries for individual global pages.
^ ^a ^b The INVD and WBINVD instructions will invalidate all cache lines in the CPU's L1 caches. It is implementation-defined whether they will invalidate L2/L3 caches as well.
These instructions are serializing – on some processors, they may block interrupts until completion as well.
^ Under Intel VT-x virtualization, the INVD instruction will cause a mandatory #VMEXIT. Also, on processors that support Intel SGX, if the PRM (Processor Reserved Memory) has been set up by using the PRMRRs (PRM range registers), then the INVD instruction is not permitted and will cause a #GP(0) exception.^[35]
^ If the F3 prefix is used with the 0F 09 opcode, then the instruction will execute as WBNOINVD on processors that support the WBNOINVD extension – this will not invalidate the cache.

Added in P5/P6-class processors

Integer/system instructions that were not present in the basic 80486 instruction set, but were added in various x86 processors prior to the introduction of SSE. (Discontinued instructions are not included.)

Instruction	Opcode	Description	Ring	Added in

`RDMSR`	`0F 32`	Read Model-specific register. The MSR to read is specified in ECX. The value of the MSR is then returned as a 64-bit value in EDX:EAX.^[a]	0	IBM 386SLC,^[36] Intel Pentium, AMD K5, Cyrix 6x86MX,MediaGXm, IDT WinChip C6, Transmeta Crusoe, DM&P Vortex86DX3
`WRMSR`	`0F 30`	Write Model-specific register. The MSR to write is specified in ECX, and the data to write is given in EDX:EAX.^[b] Instruction is, with some exceptions, serializing.^[c]	0
`RSM`^[43]	`0F AA`	Resume from System Management Mode. Instruction is serializing.	-2 (SMM)	Intel 386SL,^[44]^[45] 486SL,^[d] Intel Pentium, AMD 5x86, Cyrix 486SLC/e,^[46] IDT WinChip C6, Transmeta Crusoe, Rise mP6
`CPUID`	`0F A2`	CPU Identification and feature information. Takes as input a CPUID leaf index in EAX and, depending on leaf, a sub-index in ECX. Result is returned in EAX,EBX,ECX,EDX.^[e] Instruction is serializing, and causes a mandatory #VMEXIT under virtualization. Support for `CPUID` can be checked by toggling bit 21 of EFLAGS (EFLAGS.ID) – if this bit can be toggled, `CPUID` is present.	Usually 3^[f]	Intel Pentium,^[g] AMD 5x86,^[g] Cyrix 5x86,^[h] IDT WinChip C6, Transmeta Crusoe, Rise mP6, NexGen Nx586,^[i] UMC Green CPU
`CMPXCHG8B m64`	`0F C7 /1`	Compare and Exchange 8 bytes. Compares EDX:EAX with m64. If equal, set ZF^[j] and store ECX:EBX into m64. Else, clear ZF and load m64 into EDX:EAX. Instruction atomic only if used with `LOCK` prefix.^[k]	3	Intel Pentium, AMD K5, Cyrix 6x86L,MediaGXm, IDT WinChip C6,^[l] Transmeta Crusoe,^[l] Rise mP6^[l]
`RDTSC`	`0F 31`	Read 64-bit Time Stamp Counter (TSC) into EDX:EAX.^[m]^[a] In early processors, the TSC was a cycle counter, incrementing by 1 for each clock cycle (which could cause its rate to vary on processors that could change clock speed at runtime) – in later processors, it increments at a fixed rate that doesn't necessarily match the CPU clock speed.^[n]	Usually 3^[o]	Intel Pentium, AMD K5, Cyrix 6x86MX,MediaGXm, IDT WinChip C6, Transmeta Crusoe, Rise mP6

`RDPMC`	`0F 33`	Read Performance Monitoring Counter. The counter to read is specified by ECX and its value is returned in EDX:EAX.^[m]^[a]	Usually 3^[p]	Intel Pentium MMX, Intel Pentium Pro, AMD K7, Cyrix 6x86MX, IDT WinChip C6, AMD Geode LX, VIA Nano^[q]
`CMOVcc reg,r/m`	`0F 4x /r`^[r]	Conditional move to register. The source operand may be either register or memory.^[s]	3	Intel Pentium Pro, AMD K7, Cyrix 6x86MX,MediaGXm, Transmeta Crusoe, VIA C3 "Nehemiah",^[t] DM&P Vortex86DX3

`NOP r/m`, `NOPL r/m`	`NFx 0F 1F /0`^[u]	Official long NOP. Other than AMD K7/K8, broadly unsupported in non-Intel processors released before 2005.^[v]^[63]	3	Intel Pentium Pro,^[w] AMD K7, x86-64,^[x] VIA C7^[67]
`UD2`,^[y] `UD2A`^[z]	`0F 0B`	Undefined Instructions – will generate an invalid opcode (#UD) exception in all operating modes.^[aa] These instructions are provided for software testing to explicitly generate invalid opcodes. The opcodes for these instructions are reserved for this purpose.	(3)	(80186),^[ab] Intel Pentium^[72]
`UD1 reg,r/m`,^[ac] `UD2B reg,r/m`^[z]	`0F B9`, `0F B9 /r`^[ad]			(80186),^[ab] Intel Pentium^[72]
`OIO`, `UD0`, `UD0 reg,r/m`^[ae]	`0F FF`, `0F FF /r`^[ad]			(80186),^[ab] Cyrix 6x86,^[78] AMD K5^[80]

`SYSCALL`	`0F 05`	Fast System call.	3	AMD K6,^[af] x86-64^[ag]^[ah]
`SYSRET`	`0F 07`^[ai]	Fast Return from System Call. Designed to be used together with `SYSCALL`.	0^[aj]	AMD K6,^[af] x86-64^[ag]^[ah]
`SYSENTER`	`0F 34`	Fast System call.	3^[aj]	Intel Pentium II,^[ak] AMD K7,^[85]^[al] Transmeta Crusoe,^[am] NatSemi Geode GX2, VIA C3 "Nehemiah",^[an] DM&P Vortex86DX3
`SYSEXIT`	`0F 35`^[ai]	Fast Return from System Call. Designed to be used together with `SYSENTER`.	0^[aj]

^ ^a ^b ^c In 64-bit mode, the RDMSR, RDTSC and RDPMC instructions will set the top 32 bits of RDX and RAX to zero.
^ On Intel and AMD CPUs, the WRMSR instruction is also used to update the CPU microcode. This is done by writing the virtual address of the new microcode to upload to MSR 79h on Intel CPUs and MSR C001_0020h^[37] on AMD CPUs.

^ Writes to the following MSRs are not serializing:^[38]^[39]

Number	Name
`48h`	SPEC_CTRL
`49h`	PRED_CMD
`10Bh`	FLUSH_CMD
`122h`	TSX_CTRL
`6E0h`	TSC_DEADLINE
`6E1h`	PKRS
`774h`	HWP_REQUEST (non-serializing only if the FAST_IA32_HWP_REQUEST bit it set)
`802h` to `83Fh`	(x2APIC MSRs)
`1B01h`	UARCH_MISC_CTL
`C001_0100h`	FS_BASE (non-serializing on AMD Zen 4 and later)^[40]
`C001_0101h`	GS_BASE (Zen 4 and later)
`C001_0102h`	KernelGSbase (Zen 4 and later)
`C001_011Bh`	Doorbell Register (AMD-specific)

WRMSR to the x2APIC ICR (Interrupt Command Register; MSR 830h) is commonly used to produce an IPI (Inter-processor interrupt) - on Intel^[41] but not AMD^[42] CPUs, such an IPI can be reordered before an older memory store.

^ System Management Mode and the RSM instruction were made available on non-SL variants of the Intel 486 only after the initial release of the Intel Pentium in 1993.
^ On some older 32-bit processors, executing CPUID with a leaf index (EAX) greater than 0 may leave EBX and ECX unmodified, keeping their old values. For this reason, it is recommended to zero out EBX and ECX before executing CPUID.
Processors noted to exhibit this behavior include Cyrix MII^[47] and IDT WinChip 2.^[48]

In 64-bit mode, CPUID will set the top 32 bits of RAX, RBX, RCX and RDX to zero.
^ On some Intel processors starting from Ivy Bridge, there exists MSRs that can be used to restrict CPUID to ring 0. Such MSRs are documented for at least Ivy Bridge^[49] and Denverton.^[50]
The ability to restrict CPUID to ring 0 also exists on AMD processors supporting the "CpuidUserDis" feature (Zen 4 "Raphael" and later).^[51]
^ ^a ^b CPUID is also available on some Intel and AMD 486 processor variants that were released after the initial release of the Intel Pentium.
^ On the Cyrix 5x86 and 6x86 CPUs, CPUID is not enabled by default and must be enabled through a Cyrix configuration register.
^ On NexGen CPUs, CPUID is only supported with some system BIOSes. On some NexGen CPUs that do support CPUID, EFLAGS.ID is not supported but EFLAGS.AC is, complicating CPU detection.^[52]
^ Unlike the older CMPXCHG instruction, the CMPXCHG8B instruction does not modify any EFLAGS bits other than ZF.
^ LOCK CMPXCHG8B with a register operand (which is an invalid encoding) will, on some Intel Pentium CPUs, cause a hang rather than the expected #UD exception - this is known as the Pentium F00F bug.
^ ^a ^b ^c On IDT WinChip, Transmeta Crusoe and Rise mP6 processors, the CMPXCHG8B instruction is always supported, however its CPUID bit may be missing. This is a workaround for a bug in Windows NT.^[53]
^ ^a ^b The RDTSC and RDPMC instructions are not ordered with respect to other instructions, and may sample their respective counters before earlier instructions are executed or after later instructions have executed. Invocations of RDPMC (but not RDTSC) may be reordered relative to each other even for reads of the same counter.
In order to impose ordering with respect to other instructions, LFENCE or serializing instructions (e.g. CPUID) are needed.^[54]
^ Fixed-rate TSC was introduced in two stages:
Constant TSC
TSC running at a fixed rate as long as the processor core is not in a deep-sleep (C2 or deeper) mode, but not synchronized between CPU cores. Introduced in Intel Prescott, Yonah and Bonnell. Also present in all Transmeta and VIA Nano^[55] CPUs, as well as AMD Geode LX.^[56] Does not have a CPUID bit.
Invariant TSC
TSC running at a fixed rate, and remaining synchronized between CPU cores in all P-,C- and T-states (but not necessarily S-states).
Present in AMD K10 and later; Intel Nehalem/Saltwell^[57] and later; Zhaoxin WuDaoKou^[58] and later. Indicated with a CPUID bit (leaf 8000_0007:EDX[8]).
^ RDTSC can be run outside Ring 0 only if CR4.TSD=0.
On Intel Pentium and AMD K5/K6, RDTSC cannot be run in Virtual-8086 mode.^[59]^[60] Later processors (Pentium Pro, Athlon 64) removed this restriction.
^ RDPMC can be run outside Ring 0 only if CR4.PCE=1.
^ The RDPMC instruction is not present in VIA processors prior to the Nano.

^ The condition codes supported for CMOVcc instruction (opcode 0F 4x /r, with the x nibble specifying the condition) are:

x	cc	Condition (EFLAGS)
0	O	OF=1: "Overflow"
1	NO	OF=0: "Not Overflow"
2	C,B,NAE	CF=1: "Carry", "Below", "Not Above or Equal"
3	NC,NB,AE	CF=0: "Not Carry", "Not Below", "Above or Equal"
4	Z,E	ZF=1: "Zero", "Equal"
5	NZ,NE	ZF=0: "Not Zero", "Not Equal"
6	NA,BE	(CF=1 or ZF=1): "Not Above", "Below or Equal"
7	A,NBE	(CF=0 and ZF=0): "Above", "Not Below or Equal"
8	S	SF=1: "Sign"
9	NS	SF=0: "Not Sign"
A	P,PE	PF=1: "Parity", "Parity Even"
B	NP,PO	PF=0: "Not Parity", "Parity Odd"
C	L,NGE	SF≠OF: "Less", "Not Greater Or Equal"
D	NL,GE	SF=OF: "Not Less", "Greater Or Equal"
E	LE,NG	(ZF=1 or SF≠OF): "Less or Equal", "Not Greater"
F	NLE,G	(ZF=0 and SF=OF): "Not Less or Equal", "Greater"

^ In 64-bit mode, CMOVcc with a 32-bit operand size will clear the upper 32 bits of the destination register even if the condition is false.
For CMOVcc with a memory source operand, the CPU will always read the operand from memory – potentially causing memory exceptions and cache line-fills – even if the condition for the move is not satisfied. (The Intel APX extension defines a set of new EVEX-encoded variants of CMOVcc that will suppress memory exceptions if the condition is false.)
^ On pre-Nehemiah VIA C3 variants ("Samuel"/"Ezra"), the reg,reg but not reg,[mem] forms of the CMOVcc instructions have been reported to be present as undocumented instructions.^[61]
^ Intel's recommended byte encodings for multi-byte NOPs of lengths 2 to 9 bytes in 32/64-bit mode are (in hex):^[62]

Length Byte Sequence

2 66 90

3 0F 1F 00

4 0F 1F 40 00

5 0F 1F 44 00 00

6 66 0F 1F 44 00 00

7 0F 1F 80 00 00 00 00

8 0F 1F 84 00 00 00 00 00

9 66 0F 1F 84 00 00 00 00 00

For cases where there is a need to use more than 9 bytes of NOP padding, it is recommended to use multiple NOPs.
^ Unlike other instructions added in Pentium Pro, long NOP does not have a CPUID feature bit.
^ 0F 1F /0 as long-NOP was introduced in the Pentium Pro, but remained undocumented until 2006.^[64] The whole 0F 18..1F opcode range was NOP in Pentium Pro. However, except for 0F 1F /0, Intel does not guarantee that these opcodes will remain NOP in future processors, and have indeed assigned some of these opcodes to other instructions in at least some processors.^[65]
^ Documented for AMD x86-64 since 2002.^[66]
^ While the 0F 0B opcode was officially reserved as an invalid opcode from Pentium onwards, it only got assigned the mnemonic UD2 from Pentium Pro onwards.^[68]
^ ^a ^b GNU Binutils have used the UD2A and UD2B mnemonics for the 0F 0B and 0F B9 opcodes since version 2.7.^[69]
Neither UD2A nor UD2B originally took any arguments - UD2B was later modified to accept a ModR/M byte, in Binutils version 2.30.^[70]
^ The UD2 (0F 0B) instruction will additionally stop subsequent bytes from being decoded as instructions, even speculatively. For this reason, if an indirect branch instruction is followed by something that is not code, it is recommended to place an UD2 instruction after the indirect branch.^[71]

[gdt_idt_descriptor-5] The descriptors used by the LGDT, LIDT, SGDT and SIDT instructions consist of a 2-part data structure. The first part is a 16-bit value, specifying table size in bytes minus 1. The second part is a 32-bit value (64-bit value in 64-bit mode), specifying the linear start address of the table.
For LGDT and LIDT with a 16-bit operand size, the address is ANDed with 00FFFFFFh. On Intel (but not AMD) CPUs, the SGDT and SIDT instructions with a 16-bit operand size is – as of Intel SDM revision 079, March 2023 – documented to write a descriptor to memory with the last byte being set to 0. However, observed behavior is that bits 31:24 of the descriptor table address are written instead.^[4]

[i286_serialize-6] The LGDT, LIDT, LLDT and LTR instructions are serializing on Pentium and later processors.

[7] The LMSW instruction is serializing on Intel processors from Pentium onwards, but not on AMD processors.

[9] On 80386 and later, the "Machine Status Word" is the same as the CR0 control register – however, the LMSW instruction can only modify the bottom 4 bits of this register and cannot clear bit 0. The inability to clear bit 0 means that LMSW can be used to enter but not leave x86 Protected Mode.
On 80286, it is not possible to leave Protected Mode at all (neither with LMSW nor with LOADALL^[5]) without a CPU reset – on 80386 and later, it is possible to leave Protected Mode, but this requires the use of the 80386-and-later MOV to CR0 instruction.

[13] If CR4.UMIP=1 is set, then the SGDT, SIDT, SLDT, SMSW and STR instructions can only run in Ring 0.
These instructions were unprivileged on all x86 CPUs from 80286 onwards until the introduction of UMIP in 2017.^[6] This has been a significant security problem for software-based virtualization, since it enables these instructions to be used by a VM guest to detect that it is running inside a VM.^[7]^[8]

[i286_extend16-14] The SMSW, SLDT and STR instructions always use an operand size of 16 bits when used with a memory argument. With a register argument on 80386 or later processors, wider destination operand sizes are available and behave as follows:
SMSW: Stores full CR0 in x86-64 long mode, undefined otherwise.

SLDT: Zero-extends 16-bit argument on Pentium Pro and later processors, undefined on earlier processors.

STR: Zero-extends 16-bit argument.

[7] SMSW: Stores full CR0 in x86-64 long mode, undefined otherwise.

[8] SLDT: Zero-extends 16-bit argument on Pentium Pro and later processors, undefined on earlier processors.

[9] STR: Zero-extends 16-bit argument.

[15] In 64-bit long mode, the ARPL instruction is not available – the 63 /r opcode has been reassigned to the 64-bit-mode-only MOVSXD instruction.

[18] The ARPL instruction causes #UD in Real mode and Virtual 8086 Mode – Windows 95 and OS/2 2.x are known to make extensive use of this #UD to use the 63 opcode as a one-byte breakpoint to transition from Virtual 8086 Mode to kernel mode.^[9]^[10]

[20] Bits 19:16 of this mask are documented as "undefined" on Intel CPUs.^[11] On AMD CPUs, the mask is documented as 0x00FFFF00.

[lar_lsl_unmod-21] For the LAR and LSL instructions, if the specified segment descriptor could not be loaded, then the instruction's destination register is left unmodified.

[25] On some Intel CPU/microcode combinations from 2019 onwards, the VERW instruction also flushes microarchitectural data buffers. This enables it to be used as part of workarounds for Microarchitectural Data Sampling security vulnerabilities.^[12]^[13] Some of the microarchitectural buffer-flushing functions that have been added to VERW may require the instruction to be executed with a memory operand.^[14]

[i286_undoc-28] Undocumented, 80286 only.^[5]^[15]^[16] (A different variant of LOADALL with a different opcode and memory layout exists on 80386.)

[29] For the 32-bit string instructions, the ±± notation is used to indicate that the indicated register is post-decremented by 4 if EFLAGS.DF=1 and post-incremented by 4 otherwise.
For the operands where the DS segment is indicated, the DS segment can be overridden by a segment-override prefix – where the ES segment is indicated, the segment is always ES and cannot be overridden.
The choice of whether to use the 16-bit SI/DI registers or the 32-bit ESI/EDI registers as the address registers to use is made by AddressSize, overridable with the 67 prefix.

[30] The 32-bit string instructions accept repeat-prefixes in the same way as older 8/16-bit string instructions.
For LODSD, STOSD, MOVSD, INSD and OUTSD, the REP prefix (F3) will repeat the instruction the number of times specified in rCX (CX or ECX, decided by AddressSize), decrementing rCX for each iteration (with rCX=0 resulting in no-op and proceeding to the next instruction).
For CMPSD and SCASD, the REPE (F3) and REPNE (F2) prefixes are available, which will repeat the instruction, decrementing rCX for each iteration, but only as long as the flag condition (ZF=1 for REPE, ZF=0 for REPNE) holds true AND rCX ≠ 0.

[31] For the INSB/W/D instructions, the memory access rights for the ES:[rDI] memory address might not be checked until after the port access has been performed – if this check fails (e.g. page fault or other memory exception), then the data item read from the port is lost. As such, it is not recommended to use this instruction to access an I/O port that performs any kind of side effect upon read.

[32] I/O port access is only allowed when CPL≤IOPL or the I/O port permission bitmap bits for the port to access are all set to 0.

[33] The CWDE instruction differs from the older CWD instruction in that CWD would sign-extend the 16-bit value in AX into a 32-bit value in the DX:AX register pair.

[34] For the E3 opcode (JCXZ/JECXZ), the choice of whether the instruction will use CX or ECX for its comparison (and consequently which mnemonic to use) is based on the AddressSize, not OperandSize. (OperandSize instead controls whether the jump destination should be truncated to 16 bits or not).
This also applies to the loop instructions LOOP,LOOPE,LOOPNE (opcodes E0,E1,E2), however, unlike JCXZ/JECXZ, these instructions have not been given new mnemonics for their ECX-using variants.

[35] For PUSHA(D), the value of SP/ESP pushed onto the stack is the value it had just before the PUSHA(D) instruction started executing.

[36] For POPA/POPAD, the stack item corresponding to SP/ESP is popped off the stack (performing a memory read), but not placed into SP/ESP.

[37] The PUSHFD and POPFD instructions will cause a #GP exception if executed in virtual 8086 mode if IOPL is not 3.
The PUSHF, POPF, IRET and IRETD instructions will cause a #GP exception if executed in Virtual-8086 mode if IOPL is not 3 and VME is not enabled.

[40] If IRETD is used to return from kernel mode to user mode (which will entail a CPL change) and the user-mode stack segment indicated by SS is a 16-bit segment, then the IRETD instruction will only restore the low 16 bits of the stack pointer (ESP/RSP), with the remaining bits keeping whatever value they had in kernel code before the IRETD. This has necessitated complex workarounds on both Linux ("ESPFIX")^[17] and Windows.^[18] This issue also affects the later 64-bit IRETQ instruction.

[bt_offsetting-41] For the BT, BTS, BTR and BTC instructions:
If the first argument to the instruction is a register operand and/or the second argument is an immediate, then the bit-index in the second argument is taken modulo operand size (16/32/64, in effect using only the bottom 4, 5 or 6 bits of the index.)

If the first argument is a memory operand and the second argument is a register operand, then the bit-index in the second argument is used in full – it is interpreted as a signed bit-index that is used to offset the memory address to use for the bit test.

[27] If the first argument to the instruction is a register operand and/or the second argument is an immediate, then the bit-index in the second argument is taken modulo operand size (16/32/64, in effect using only the bottom 4, 5 or 6 bits of the index.)

[28] If the first argument is a memory operand and the second argument is a register operand, then the bit-index in the second argument is used in full – it is interpreted as a signed bit-index that is used to offset the memory address to use for the bit test.

[bt_atomic-42] The BTS, BTC and BTR instructions accept the LOCK (F0) prefix when used with a memory argument – this results in the instruction executing atomically.

[43] If the F3 prefix is used with the 0F BC /r opcode, then the instruction will execute as TZCNT on systems that support the BMI1 extension. TZCNT differs from BSF in that TZCNT but not BSR is defined to return operand size if the source operand is zero – for other source operand values, they produce the same result (except for flags).

[bsf_bsr_zero-44] BSF and BSR set the EFLAGS.ZF flag to 1 if the source argument was all-0s and 0 otherwise.
If the source argument was all-0s, then the destination register is documented as being left unchanged on AMD processors, but set to an undefined value on Intel processors.

[45] If the F3 prefix is used with the 0F BD /r opcode, then the instruction will execute as LZCNT on systems that support the ABM or LZCNT extensions. LZCNT produces a different result from BSR for most input values.

[shld_shamt-47] For SHLD and SHRD, the shift-amount is masked – the bottom 5 bits are used for 16/32-bit operand size and 6 bits for 64-bit operand size.
SHLD and SHRD with 16-bit arguments and a shift-amount greater than 16 produce undefined results. (Actual results differ between different Intel CPUs, with at least three different behaviors known.^[19])

[setcc_conds-48] The condition codes supported for the SETcc and Jcc near instructions (opcodes 0F 9x /0 and 0F 8x respectively, with the x nibble specifying the condition) are:

x cc Condition (EFLAGS)

0 O OF=1: "Overflow"

1 NO OF=0: "Not Overflow"

2 C,B,NAE CF=1: "Carry", "Below", "Not Above or Equal"

3 NC,NB,AE CF=0: "Not Carry", "Not Below", "Above or Equal"

4 Z,E ZF=1: "Zero", "Equal"

5 NZ,NE ZF=0: "Not Zero", "Not Equal"

6 NA,BE (CF=1 or ZF=1): "Not Above", "Below or Equal"

7 A,NBE (CF=0 and ZF=0): "Above", "Not Below or Equal"

8 S SF=1: "Sign"

9 NS SF=0: "Not Sign"

A P,PE PF=1: "Parity", "Parity Even"

B NP,PO PF=0: "Not Parity", "Parity Odd"

C L,NGE SF≠OF: "Less", "Not Greater Or Equal"

D NL,GE SF=OF: "Not Less", "Greater Or Equal"

E LE,NG (ZF=1 or SF≠OF): "Less or Equal", "Not Greater"

F NLE,G (ZF=0 and SF=OF): "Not Less or Equal", "Greater"

[49] For SETcc, while the opcode is commonly specified as /0 – implying that bits 5:3 of the instruction's ModR/M byte should be 000 – modern x86 processors (Pentium and later) ignore bits 5:3 and will execute the instruction as SETcc regardless of the contents of these bits.

[50] For LFS, LGS and LSS, the size of the offset part of the far pointer is given by operand size – the size of the segment part is always 16 bits. In 64-bit mode, using the REX.W prefix with these instructions will cause them to load a far pointer with a 64-bit offset on Intel but not AMD processors.

[movcr_modrm-51] ^ ^a ^b ^c ^d ^e ^f For MOV to/from the CRx, DRx and TRx registers, the reg part of the ModR/M byte is used to indicate CRx/DRx/TRx register and r/m part the general-register. Uniquely for the MOV CRx/DRx/TRx opcodes, the top two bits of the ModR/M byte is ignored – these opcodes are decoded and executed as if the top two bits of the ModR/M byte are 11b.

[movcr_opsiz-52] For moves to/from the CRx and DRx registers, the operand size is always 64 bits in 64-bit mode and 32 bits otherwise.

[53] On processors that support global pages (Pentium and later), global page table entries will not be flushed by a MOV to CR3 − instead, these entries can be flushed by toggling the CR4.PGE bit.
On processors that support PCIDs, writing to CR3 while PCIDs are enabled will only flush TLB entries belonging to the PCID specified in bits 11:0 of the value written to CR3 (this flush can be suppressed by setting bit 63 of the written value to 1). Flushing pages belonging to other PCIDs can instead be done by toggling the CR4.PGE bit, clearing the CR4.PCIDE bit, or using the INVPCID instruction.

[56] On processors prior to Pentium, moves to CR0 would not serialize the instruction stream – in part for this reason, it is usually required to perform a far jump^[20] immediately after a MOV to CR0 if such a MOV is used to enable/disable protected mode and/or memory paging.
MOV to CR2 is architecturally listed as serializing, but has been reported to be non-serializing on at least some Intel Core-i7 processors.^[21]
MOV to CR8 (introduced with x86-64) is serializing on AMD but not Intel processors.

[movtr_pent-57] The MOV TRx instructions were discontinued from Pentium onwards.

[62] The INT1/ICEBP (F1) instruction is present on all known Intel x86 processors from the 80386 onwards,^[22] but only fully documented for Intel processors from the May 2018 release of the Intel SDM (rev 067) onwards.^[23] Before this release, mention of the instruction in Intel material was sporadic, e.g. AP-526 rev 001.^[24]
For AMD processors, the instruction has been documented since 2002.^[25]

[63] The operation of the F1(ICEBP) opcode differs from the operation of the regular software interrupt opcode CD 01 in several ways:
In protected mode, CD 01 will check CPL against the interrupt descriptor's DPL field as an access-rights check, while F1 will not.
In virtual-8086 mode, CD 01 will also check CPL against IOPL as an access-rights check, while F1 will not.
In virtual-8086 mode with VME enabled, interrupt redirection is supported for CD 01 but not F1.

[44] In virtual-8086 mode, CD 01 will also check CPL against IOPL as an access-rights check, while F1 will not.

[45] In virtual-8086 mode with VME enabled, interrupt redirection is supported for CD 01 but not F1.

[64] The UMOV instruction is present on 386 and 486 processors only.^[22]

[xbts_discon-67] The XBTS and IBTS instructions were discontinued with the B1 stepping of 80386.
They have been used by software mainly for detection of the buggy^[26] B0 stepping of the 80386. Microsoft Windows (v2.01 and later) will attempt to run the XBTS instruction as part of its CPU detection if CPUID is not present, and will refuse to boot if XBTS is found to be working.^[27]

[xbts_op-69] For XBTS and IBTS, the r/m argument represents the data to extract/insert a bitfield from/to, the reg argument the bitfield to be inserted/extracted, AX/EAX a bit-offset and CL a bitfield length.^[28]

[i386_loadall-71] Undocumented, 80386 only.^[29]

[74] Using BSWAP with 16-bit registers is not disallowed per se (it will execute without producing an #UD or other exceptions) but is documented to produce undefined results – it is reported to produce various different results on 486,^[30] 586, and Bochs/QEMU.^[31]

[i486_cmpxchg-78] On Intel 80486 stepping A,^[32] the CMPXCHG instruction uses a different encoding - 0F A6 /r for 8-bit variant, 0F A7 /r for 16/32-bit variant. The 0F B0/B1 encodings are used on 80486 stepping B and later.^[33]^[34]

[79] The CMPXCHG instruction sets EFLAGS in the same way as a CMP instruction that uses the accumulator (AL/AX/EAX/RAX) as its first argument would do.

[80] INVLPG executes as no-operation if the m8 argument is invalid (e.g. unmapped page or non-canonical address).
INVLPG can be used to invalidate TLB entries for individual global pages.

[invd_scope-81] The INVD and WBINVD instructions will invalidate all cache lines in the CPU's L1 caches. It is implementation-defined whether they will invalidate L2/L3 caches as well.
These instructions are serializing – on some processors, they may block interrupts until completion as well.

[83] Under Intel VT-x virtualization, the INVD instruction will cause a mandatory #VMEXIT. Also, on processors that support Intel SGX, if the PRM (Processor Reserved Memory) has been set up by using the PRMRRs (PRM range registers), then the INVD instruction is not permitted and will cause a #GP(0) exception.^[35]

[84] If the F3 prefix is used with the 0F 09 opcode, then the instruction will execute as WBNOINVD on processors that support the WBNOINVD extension – this will not invalidate the cache.

[p5rd_clear_hi32-85] In 64-bit mode, the RDMSR, RDTSC and RDPMC instructions will set the top 32 bits of RDX and RAX to zero.

[88] On Intel and AMD CPUs, the WRMSR instruction is also used to update the CPU microcode. This is done by writing the virtual address of the new microcode to upload to MSR 79h on Intel CPUs and MSR C001_0020h^[37] on AMD CPUs.

[94] Writes to the following MSRs are not serializing:^[38]^[39]

Number Name

48h SPEC_CTRL

49h PRED_CMD

10Bh FLUSH_CMD

122h TSX_CTRL

6E0h TSC_DEADLINE

6E1h PKRS

774h HWP_REQUEST
(non-serializing only if the FAST_IA32_HWP_REQUEST bit it set)

802h to 83Fh (x2APIC MSRs)

1B01h UARCH_MISC_CTL

C001_0100h FS_BASE (non-serializing on AMD Zen 4 and later)^[40]

C001_0101h GS_BASE (Zen 4 and later)

C001_0102h KernelGSbase (Zen 4 and later)

C001_011Bh Doorbell Register (AMD-specific)

WRMSR to the x2APIC ICR (Interrupt Command Register; MSR 830h) is commonly used to produce an IPI (Inter-processor interrupt) - on Intel^[41] but not AMD^[42] CPUs, such an IPI can be reordered before an older memory store.

[98] System Management Mode and the RSM instruction were made available on non-SL variants of the Intel 486 only after the initial release of the Intel Pentium in 1993.

[102] On some older 32-bit processors, executing CPUID with a leaf index (EAX) greater than 0 may leave EBX and ECX unmodified, keeping their old values. For this reason, it is recommended to zero out EBX and ECX before executing CPUID.
Processors noted to exhibit this behavior include Cyrix MII^[47] and IDT WinChip 2.^[48]

In 64-bit mode, CPUID will set the top 32 bits of RAX, RBX, RCX and RDX to zero.

[106] On some Intel processors starting from Ivy Bridge, there exists MSRs that can be used to restrict CPUID to ring 0. Such MSRs are documented for at least Ivy Bridge^[49] and Denverton.^[50]
The ability to restrict CPUID to ring 0 also exists on AMD processors supporting the "CpuidUserDis" feature (Zen 4 "Raphael" and later).^[51]

[cpuid_backported-107] CPUID is also available on some Intel and AMD 486 processor variants that were released after the initial release of the Intel Pentium.

[108] On the Cyrix 5x86 and 6x86 CPUs, CPUID is not enabled by default and must be enabled through a Cyrix configuration register.

[110] On NexGen CPUs, CPUID is only supported with some system BIOSes. On some NexGen CPUs that do support CPUID, EFLAGS.ID is not supported but EFLAGS.AC is, complicating CPU detection.^[52]

[111] Unlike the older CMPXCHG instruction, the CMPXCHG8B instruction does not modify any EFLAGS bits other than ZF.

[112] LOCK CMPXCHG8B with a register operand (which is an invalid encoding) will, on some Intel Pentium CPUs, cause a hang rather than the expected #UD exception - this is known as the Pentium F00F bug.

[cmpxchg8b_ntbug-114] On IDT WinChip, Transmeta Crusoe and Rise mP6 processors, the CMPXCHG8B instruction is always supported, however its CPUID bit may be missing. This is a workaround for a bug in Windows NT.^[53]

[rdtsc_pmc_unordered-116] The RDTSC and RDPMC instructions are not ordered with respect to other instructions, and may sample their respective counters before earlier instructions are executed or after later instructions have executed. Invocations of RDPMC (but not RDTSC) may be reordered relative to each other even for reads of the same counter.
In order to impose ordering with respect to other instructions, LFENCE or serializing instructions (e.g. CPUID) are needed.^[54]

[121] Fixed-rate TSC was introduced in two stages:
Constant TSC
TSC running at a fixed rate as long as the processor core is not in a deep-sleep (C2 or deeper) mode, but not synchronized between CPU cores. Introduced in Intel Prescott, Yonah and Bonnell. Also present in all Transmeta and VIA Nano^[55] CPUs, as well as AMD Geode LX.^[56] Does not have a CPUID bit.
Invariant TSC
TSC running at a fixed rate, and remaining synchronized between CPU cores in all P-,C- and T-states (but not necessarily S-states).
Present in AMD K10 and later; Intel Nehalem/Saltwell^[57] and later; Zhaoxin WuDaoKou^[58] and later. Indicated with a CPUID bit (leaf 8000_0007:EDX[8]).

[124] RDTSC can be run outside Ring 0 only if CR4.TSD=0.
On Intel Pentium and AMD K5/K6, RDTSC cannot be run in Virtual-8086 mode.^[59]^[60] Later processors (Pentium Pro, Athlon 64) removed this restriction.

[125] RDPMC can be run outside Ring 0 only if CR4.PCE=1.

[126] The RDPMC instruction is not present in VIA processors prior to the Nano.

[127] The condition codes supported for CMOVcc instruction (opcode 0F 4x /r, with the x nibble specifying the condition) are:

x cc Condition (EFLAGS)

0 O OF=1: "Overflow"

1 NO OF=0: "Not Overflow"

2 C,B,NAE CF=1: "Carry", "Below", "Not Above or Equal"

3 NC,NB,AE CF=0: "Not Carry", "Not Below", "Above or Equal"

4 Z,E ZF=1: "Zero", "Equal"

5 NZ,NE ZF=0: "Not Zero", "Not Equal"

6 NA,BE (CF=1 or ZF=1): "Not Above", "Below or Equal"

7 A,NBE (CF=0 and ZF=0): "Above", "Not Below or Equal"

8 S SF=1: "Sign"

9 NS SF=0: "Not Sign"

A P,PE PF=1: "Parity", "Parity Even"

B NP,PO PF=0: "Not Parity", "Parity Odd"

C L,NGE SF≠OF: "Less", "Not Greater Or Equal"

D NL,GE SF=OF: "Not Less", "Greater Or Equal"

E LE,NG (ZF=1 or SF≠OF): "Less or Equal", "Not Greater"

F NLE,G (ZF=0 and SF=OF): "Not Less or Equal", "Greater"

[128] In 64-bit mode, CMOVcc with a 32-bit operand size will clear the upper 32 bits of the destination register even if the condition is false.
For CMOVcc with a memory source operand, the CPU will always read the operand from memory – potentially causing memory exceptions and cache line-fills – even if the condition for the move is not satisfied. (The Intel APX extension defines a set of new EVEX-encoded variants of CMOVcc that will suppress memory exceptions if the condition is false.)

[130] On pre-Nehemiah VIA C3 variants ("Samuel"/"Ezra"), the reg,reg but not reg,[mem] forms of the CMOVcc instructions have been reported to be present as undocumented instructions.^[61]

[132] Intel's recommended byte encodings for multi-byte NOPs of lengths 2 to 9 bytes in 32/64-bit mode are (in hex):^[62]

Length Byte Sequence

2 66 90

3 0F 1F 00

4 0F 1F 40 00

5 0F 1F 44 00 00

6 66 0F 1F 44 00 00

7 0F 1F 80 00 00 00 00

8 0F 1F 84 00 00 00 00 00

9 66 0F 1F 84 00 00 00 00 00

For cases where there is a need to use more than 9 bytes of NOP padding, it is recommended to use multiple NOPs.

[133] Unlike other instructions added in Pentium Pro, long NOP does not have a CPUID feature bit.

[137] 0F 1F /0 as long-NOP was introduced in the Pentium Pro, but remained undocumented until 2006.^[64] The whole 0F 18..1F opcode range was NOP in Pentium Pro. However, except for 0F 1F /0, Intel does not guarantee that these opcodes will remain NOP in future processors, and have indeed assigned some of these opcodes to other instructions in at least some processors.^[65]

[139] Documented for AMD x86-64 since 2002.^[66]

[142] While the 0F 0B opcode was officially reserved as an invalid opcode from Pentium onwards, it only got assigned the mnemonic UD2 from Pentium Pro onwards.^[68]

[ud2_binutils-145] GNU Binutils have used the UD2A and UD2B mnemonics for the 0F 0B and 0F B9 opcodes since version 2.7.^[69]
Neither UD2A nor UD2B originally took any arguments - UD2B was later modified to accept a ModR/M byte, in Binutils version 2.30.^[70]

[147] The UD2 (0F 0B) instruction will additionally stop subsequent bytes from being decoded as instructions, even speculatively. For this reason, if an indirect branch instruction is followed by something that is not code, it is recommended to place an UD2 instruction after the indirect branch.^[71]

[1]

[2]

[3]

[a]

[b]

[c]

[d]

[e]

[f]

[g]

[h]

[i]

[j]

[k]

[l]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[a]

[b]

[c]

[d]

[e]

[f]

[g]

[h]

[i]

[j]

[17]

[18]

[a]

[b]

[c]

[d]

[e]

[f]

[g]

[h]

[i]

[j]

[k]

[l]

[m]

[n]

[o]

[p]

[q]

[r]

[s]

[t]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[a]

[b]

[c]

[d]

[e]

[f]

[g]

[30]

[31]

[32]

[33]

[34]

[35]

[a]

[36]

[b]

[c]

[43]

[44]

[45]

[d]

[46]

[e]

[f]

[g]

[h]

[i]

[j]

[k]