Share to: share facebook share twitter share wa share telegram print page

X86 instruction listings

The x86 instruction set refers to the set of instructions that x86-compatible microprocessors support. The instructions are usually part of an executable program, often stored as a computer file and executed on the processor.

The x86 instruction set has been extended several times, introducing wider registers and datatypes as well as new functionality.[1]

x86 integer instructions

Below is the full 8086/8088 instruction set of Intel (81 instructions total).[2] These instructions are also available in 32-bit mode, in which they operate on 32-bit registers (eax, ebx, etc.) and values instead of their 16-bit (ax, bx, etc.) counterparts. The updated instruction set is grouped according to architecture (i186, i286, i386, i486, i586/i686) and is referred to as (32-bit) x86 and (64-bit) x86-64 (also known as AMD64).

Original 8086/8088 instructions

This is the original instruction set. In the 'Notes' column, r means register, m means memory address and imm means immediate (i.e. a value).

Added in specific processors

Added with 80186/80188

Added with 80286

The new instructions added in 80286 add support for x86 protected mode. Some but not all of the instructions are available in real mode as well.

  1. ^ a b c d The descriptors used by the LGDT, LIDT, SGDT and SIDT instructions consist of a 2-part data structure. The first part is a 16-bit value, specifying table size in bytes minus 1. The second part is a 32-bit value (64-bit value in 64-bit mode), specifying the linear start address of the table.
    For LGDT and LIDT with a 16-bit operand size, the address is ANDed with 00FFFFFFh. On Intel (but not AMD) CPUs, the SGDT and SIDT instructions with a 16-bit operand size is – as of Intel SDM revision 079, March 2023 – documented to write a descriptor to memory with the last byte being set to 0. However, observed behavior is that bits 31:24 of the descriptor table address are written instead.[4]
  2. ^ a b c d The LGDT, LIDT, LLDT and LTR instructions are serializing on Pentium and later processors.
  3. ^ The LMSW instruction is serializing on Intel processors from Pentium onwards, but not on AMD processors.
  4. ^ On 80386 and later, the "Machine Status Word" is the same as the CR0 control register – however, the LMSW instruction can only modify the bottom 4 bits of this register and cannot clear bit 0. The inability to clear bit 0 means that LMSW can be used to enter but not leave x86 Protected Mode.
    On 80286, it is not possible to leave Protected Mode at all (neither with LMSW nor with LOADALL[5]) without a CPU reset – on 80386 and later, it is possible to leave Protected Mode, but this requires the use of the 80386-and-later MOV to CR0 instruction.
  5. ^ If CR4.UMIP=1 is set, then the SGDT, SIDT, SLDT, SMSW and STR instructions can only run in Ring 0.
    These instructions were unprivileged on all x86 CPUs from 80286 onwards until the introduction of UMIP in 2017.[6] This has been a significant security problem for software-based virtualization, since it enables these instructions to be used by a VM guest to detect that it is running inside a VM.[7][8]
  6. ^ a b c The SMSW, SLDT and STR instructions always use an operand size of 16 bits when used with a memory argument. With a register argument on 80386 or later processors, wider destination operand sizes are available and behave as follows:
    • SMSW: Stores full CR0 in x86-64 long mode, undefined otherwise.
    • SLDT: Zero-extends 16-bit argument on Pentium Pro and later processors, undefined on earlier processors.
    • STR: Zero-extends 16-bit argument.
  7. ^ In 64-bit long mode, the ARPL instruction is not available – the 63 /r opcode has been reassigned to the 64-bit-mode-only MOVSXD instruction.
  8. ^ The ARPL instruction causes #UD in Real mode and Virtual 8086 Mode – Windows 95 and OS/2 2.x are known to make extensive use of this #UD to use the 63 opcode as a one-byte breakpoint to transition from Virtual 8086 Mode to kernel mode.[9][10]
  9. ^ Bits 19:16 of this mask are documented as "undefined" on Intel CPUs.[11] On AMD CPUs, the mask is documented as 0x00FFFF00.
  10. ^ a b For the LAR and LSL instructions, if the specified segment descriptor could not be loaded, then the instruction's destination register is left unmodified.
  11. ^ On some Intel CPU/microcode combinations from 2019 onwards, the VERW instruction also flushes microarchitectural data buffers. This enables it to be used as part of workarounds for Microarchitectural Data Sampling security vulnerabilities.[12][13] Some of the microarchitectural buffer-flushing functions that have been added to VERW may require the instruction to be executed with a memory operand.[14]
  12. ^ a b Undocumented, 80286 only.[5][15][16] (A different variant of LOADALL with a different opcode and memory layout exists on 80386.)

Added with 80386

The 80386 added support for 32-bit operation to the x86 instruction set. This was done by widening the general-purpose registers to 32 bits and introducing the concepts of OperandSize and AddressSize – most instruction forms that would previously take 16-bit data arguments were given the ability to take 32-bit arguments by setting their OperandSize to 32 bits, and instructions that could take 16-bit address arguments were given the ability to take 32-bit address arguments by setting their AddressSize to 32 bits. (Instruction forms that work on 8-bit data continue to be 8-bit regardless of OperandSize. Using a data size of 16 bits will cause only the bottom 16 bits of the 32-bit general-purpose registers to be modified – the top 16 bits are left unchanged.)

The default OperandSize and AddressSize to use for each instruction is given by the D bit of the segment descriptor of the current code segment - D=0 makes both 16-bit, D=1 makes both 32-bit. Additionally, they can be overridden on a per-instruction basis with two new instruction prefixes that were introduced in the 80386:

  • 66h: OperandSize override. Will change OperandSize from 16-bit to 32-bit if CS.D=0, or from 32-bit to 16-bit if CS.D=1.
  • 67h: AddressSize override. Will change AddressSize from 16-bit to 32-bit if CS.D=0, or from 32-bit to 16-bit if CS.D=1.

The 80386 also introduced the two new segment registers FS and GS as well as the x86 control, debug and test registers.

The new instructions introduced in the 80386 can broadly be subdivided into two classes:

  • Pre-existing opcodes that needed new mnemonics for their 32-bit OperandSize variants (e.g. CWDE, LODSD)
  • New opcodes that introduced new functionality (e.g. SHLD, SETcc)

For instruction forms where the operand size can be inferred from the instruction's arguments (e.g. ADD EAX,EBX can be inferred to have a 32-bit OperandSize due to its use of EAX as an argument), new instruction mnemonics are not needed and not provided.

  1. ^ For the 32-bit string instructions, the ±± notation is used to indicate that the indicated register is post-decremented by 4 if EFLAGS.DF=1 and post-incremented by 4 otherwise.
    For the operands where the DS segment is indicated, the DS segment can be overridden by a segment-override prefix – where the ES segment is indicated, the segment is always ES and cannot be overridden.
    The choice of whether to use the 16-bit SI/DI registers or the 32-bit ESI/EDI registers as the address registers to use is made by AddressSize, overridable with the 67 prefix.
  2. ^ The 32-bit string instructions accept repeat-prefixes in the same way as older 8/16-bit string instructions.
    For LODSD, STOSD, MOVSD, INSD and OUTSD, the REP prefix (F3) will repeat the instruction the number of times specified in rCX (CX or ECX, decided by AddressSize), decrementing rCX for each iteration (with rCX=0 resulting in no-op and proceeding to the next instruction).
    For CMPSD and SCASD, the REPE (F3) and REPNE (F2) prefixes are available, which will repeat the instruction, decrementing rCX for each iteration, but only as long as the flag condition (ZF=1 for REPE, ZF=0 for REPNE) holds true AND rCX ≠ 0.
  3. ^ For the INSB/W/D instructions, the memory access rights for the ES:[rDI] memory address might not be checked until after the port access has been performed – if this check fails (e.g. page fault or other memory exception), then the data item read from the port is lost. As such, it is not recommended to use this instruction to access an I/O port that performs any kind of side effect upon read.
  4. ^ I/O port access is only allowed when CPL≤IOPL or the I/O port permission bitmap bits for the port to access are all set to 0.
  5. ^ The CWDE instruction differs from the older CWD instruction in that CWD would sign-extend the 16-bit value in AX into a 32-bit value in the DX:AX register pair.
  6. ^ For the E3 opcode (JCXZ/JECXZ), the choice of whether the instruction will use CX or ECX for its comparison (and consequently which mnemonic to use) is based on the AddressSize, not OperandSize. (OperandSize instead controls whether the jump destination should be truncated to 16 bits or not).
    This also applies to the loop instructions LOOP,LOOPE,LOOPNE (opcodes E0,E1,E2), however, unlike JCXZ/JECXZ, these instructions have not been given new mnemonics for their ECX-using variants.
  7. ^ For PUSHA(D), the value of SP/ESP pushed onto the stack is the value it had just before the PUSHA(D) instruction started executing.
  8. ^ For POPA/POPAD, the stack item corresponding to SP/ESP is popped off the stack (performing a memory read), but not placed into SP/ESP.
  9. ^ The PUSHFD and POPFD instructions will cause a #GP exception if executed in virtual 8086 mode if IOPL is not 3.
    The PUSHF, POPF, IRET and IRETD instructions will cause a #GP exception if executed in Virtual-8086 mode if IOPL is not 3 and VME is not enabled.
  10. ^ If IRETD is used to return from kernel mode to user mode (which will entail a CPL change) and the user-mode stack segment indicated by SS is a 16-bit segment, then the IRETD instruction will only restore the low 16 bits of the stack pointer (ESP/RSP), with the remaining bits keeping whatever value they had in kernel code before the IRETD. This has necessitated complex workarounds on both Linux ("ESPFIX")[17] and Windows.[18] This issue also affects the later 64-bit IRETQ instruction.
  1. ^ a b c d For the BT, BTS, BTR and BTC instructions:
    • If the first argument to the instruction is a register operand and/or the second argument is an immediate, then the bit-index in the second argument is taken modulo operand size (16/32/64, in effect using only the bottom 4, 5 or 6 bits of the index.)
    • If the first argument is a memory operand and the second argument is a register operand, then the bit-index in the second argument is used in full – it is interpreted as a signed bit-index that is used to offset the memory address to use for the bit test.
  2. ^ a b c The BTS, BTC and BTR instructions accept the LOCK (F0) prefix when used with a memory argument – this results in the instruction executing atomically.
  3. ^ If the F3 prefix is used with the 0F BC /r opcode, then the instruction will execute as TZCNT on systems that support the BMI1 extension. TZCNT differs from BSF in that TZCNT but not BSR is defined to return operand size if the source operand is zero – for other source operand values, they produce the same result (except for flags).
  4. ^ a b BSF and BSR set the EFLAGS.ZF flag to 1 if the source argument was all-0s and 0 otherwise.
    If the source argument was all-0s, then the destination register is documented as being left unchanged on AMD processors, but set to an undefined value on Intel processors.
  5. ^ If the F3 prefix is used with the 0F BD /r opcode, then the instruction will execute as LZCNT on systems that support the ABM or LZCNT extensions. LZCNT produces a different result from BSR for most input values.
  6. ^ a b For SHLD and SHRD, the shift-amount is masked – the bottom 5 bits are used for 16/32-bit operand size and 6 bits for 64-bit operand size.
    SHLD and SHRD with 16-bit arguments and a shift-amount greater than 16 produce undefined results. (Actual results differ between different Intel CPUs, with at least three different behaviors known.[19])
  7. ^ a b The condition codes supported for the SETcc and Jcc near instructions (opcodes 0F 9x /0 and 0F 8x respectively, with the x nibble specifying the condition) are:
    x cc Condition (EFLAGS)
    0 O OF=1: "Overflow"
    1 NO OF=0: "Not Overflow"
    2 C,B,NAE CF=1: "Carry", "Below", "Not Above or Equal"
    3 NC,NB,AE CF=0: "Not Carry", "Not Below", "Above or Equal"
    4 Z,E ZF=1: "Zero", "Equal"
    5 NZ,NE ZF=0: "Not Zero", "Not Equal"
    6 NA,BE (CF=1 or ZF=1): "Not Above", "Below or Equal"
    7 A,NBE (CF=0 and ZF=0): "Above", "Not Below or Equal"
    8 S SF=1: "Sign"
    9 NS SF=0: "Not Sign"
    A P,PE PF=1: "Parity", "Parity Even"
    B NP,PO PF=0: "Not Parity", "Parity Odd"
    C L,NGE SF≠OF: "Less", "Not Greater Or Equal"
    D NL,GE SF=OF: "Not Less", "Greater Or Equal"
    E LE,NG (ZF=1 or SF≠OF): "Less or Equal", "Not Greater"
    F NLE,G (ZF=0 and SF=OF): "Not Less or Equal", "Greater"
  8. ^ For SETcc, while the opcode is commonly specified as /0 – implying that bits 5:3 of the instruction's ModR/M byte should be 000 – modern x86 processors (Pentium and later) ignore bits 5:3 and will execute the instruction as SETcc regardless of the contents of these bits.
  9. ^ For LFS, LGS and LSS, the size of the offset part of the far pointer is given by operand size – the size of the segment part is always 16 bits. In 64-bit mode, using the REX.W prefix with these instructions will cause them to load a far pointer with a 64-bit offset on Intel but not AMD processors.
  10. ^ a b c d e f For MOV to/from the CRx, DRx and TRx registers, the reg part of the ModR/M byte is used to indicate CRx/DRx/TRx register and r/m part the general-register. Uniquely for the MOV CRx/DRx/TRx opcodes, the top two bits of the ModR/M byte is ignored – these opcodes are decoded and executed as if the top two bits of the ModR/M byte are 11b.
  11. ^ a b c d For moves to/from the CRx and DRx registers, the operand size is always 64 bits in 64-bit mode and 32 bits otherwise.
  12. ^ On processors that support global pages (Pentium and later), global page table entries will not be flushed by a MOV to CR3 − instead, these entries can be flushed by toggling the CR4.PGE bit.
    On processors that support PCIDs, writing to CR3 while PCIDs are enabled will only flush TLB entries belonging to the PCID specified in bits 11:0 of the value written to CR3 (this flush can be suppressed by setting bit 63 of the written value to 1). Flushing pages belonging to other PCIDs can instead be done by toggling the CR4.PGE bit, clearing the CR4.PCIDE bit, or using the INVPCID instruction.
  13. ^ On processors prior to Pentium, moves to CR0 would not serialize the instruction stream – in part for this reason, it is usually required to perform a far jump[20] immediately after a MOV to CR0 if such a MOV is used to enable/disable protected mode and/or memory paging.
    MOV to CR2 is architecturally listed as serializing, but has been reported to be non-serializing on at least some Intel Core-i7 processors.[21]
    MOV to CR8 (introduced with x86-64) is serializing on AMD but not Intel processors.
  14. ^ a b The MOV TRx instructions were discontinued from Pentium onwards.
  15. ^ The INT1/ICEBP (F1) instruction is present on all known Intel x86 processors from the 80386 onwards,[22] but only fully documented for Intel processors from the May 2018 release of the Intel SDM (rev 067) onwards.[23] Before this release, mention of the instruction in Intel material was sporadic, e.g. AP-526 rev 001.[24]
    For AMD processors, the instruction has been documented since 2002.[25]
  16. ^ The operation of the F1(ICEBP) opcode differs from the operation of the regular software interrupt opcode CD 01 in several ways:
      In protected mode, CD 01 will check CPL against the interrupt descriptor's DPL field as an access-rights check, while F1 will not.
    • In virtual-8086 mode, CD 01 will also check CPL against IOPL as an access-rights check, while F1 will not.
    • In virtual-8086 mode with VME enabled, interrupt redirection is supported for CD 01 but not F1.
  17. ^ The UMOV instruction is present on 386 and 486 processors only.[22]
  18. ^ a b The XBTS and IBTS instructions were discontinued with the B1 stepping of 80386.
    They have been used by software mainly for detection of the buggy[26] B0 stepping of the 80386. Microsoft Windows (v2.01 and later) will attempt to run the XBTS instruction as part of its CPU detection if CPUID is not present, and will refuse to boot if XBTS is found to be working.[27]
  19. ^ a b For XBTS and IBTS, the r/m argument represents the data to extract/insert a bitfield from/to, the reg argument the bitfield to be inserted/extracted, AX/EAX a bit-offset and CL a bitfield length.[28]
  20. ^ Undocumented, 80386 only.[29]

Added with 80486

Instruction Opcode Description Ring
BSWAP r32 0F C8+r Byte Order Swap. Usually used to convert between big-endian and little-endian data representations. For 32-bit registers, the operation performed is:
r =   (r << 24)
    | ((r << 8) & 0x00FF0000)
    | ((r >> 8) & 0x0000FF00)
    | (r >> 24);

Using BSWAP with a 16-bit register argument produces an undefined result.[a]

3
CMPXCHG r/m8,r8 0F B0 /r[b] Compare and Exchange. If accumulator (AL/AX/EAX/RAX) compares equal to first operand,[c] then EFLAGS.ZF is set to 1 and the first operand is overwritten with the second operand. Otherwise, EFLAGS.ZF is set to 0, and first operand is copied into the accumulator.

Instruction atomic only if used with LOCK prefix.

CMPXCHG r/m,r16
CMPXCHG r/m,r32
0F B1 /r[b]
XADD r/m,r8 0F C0 /r eXchange and ADD. Exchanges the first operand with the second operand, then stores the sum of the two values into the destination operand.

Instruction atomic only if used with LOCK prefix.

XADD r/m,r16
XADD r/m,r32
0F C1 /r
INVLPG m8 0F 01 /7 Invalidate the TLB entries that would be used for the 1-byte memory operand.[d]

Instruction is serializing.

0
INVD 0F 08 Invalidate Internal Caches.[e] Modified data in the cache are not written back to memory, potentially causing data loss.[f]
WBINVD NFx 0F 09[g] Write Back and Invalidate Cache.[e] Writes back all modified cache lines in the processor's internal cache to main memory and invalidates the internal caches.
  1. ^ Using BSWAP with 16-bit registers is not disallowed per se (it will execute without producing an #UD or other exceptions) but is documented to produce undefined results – it is reported to produce various different results on 486,[30] 586, and Bochs/QEMU.[31]
  2. ^ a b On Intel 80486 stepping A,[32] the CMPXCHG instruction uses a different encoding - 0F A6 /r for 8-bit variant, 0F A7 /r for 16/32-bit variant. The 0F B0/B1 encodings are used on 80486 stepping B and later.[33][34]
  3. ^ The CMPXCHG instruction sets EFLAGS in the same way as a CMP instruction that uses the accumulator (AL/AX/EAX/RAX) as its first argument would do.
  4. ^ INVLPG executes as no-operation if the m8 argument is invalid (e.g. unmapped page or non-canonical address).
    INVLPG can be used to invalidate TLB entries for individual global pages.
  5. ^ a b The INVD and WBINVD instructions will invalidate all cache lines in the CPU's L1 caches. It is implementation-defined whether they will invalidate L2/L3 caches as well.
    These instructions are serializing – on some processors, they may block interrupts until completion as well.
  6. ^ Under Intel VT-x virtualization, the INVD instruction will cause a mandatory #VMEXIT. Also, on processors that support Intel SGX, if the PRM (Processor Reserved Memory) has been set up by using the PRMRRs (PRM range registers), then the INVD instruction is not permitted and will cause a #GP(0) exception.[35]
  7. ^ If the F3 prefix is used with the 0F 09 opcode, then the instruction will execute as WBNOINVD on processors that support the WBNOINVD extension – this will not invalidate the cache.

Added in P5/P6-class processors

Integer/system instructions that were not present in the basic 80486 instruction set, but were added in various x86 processors prior to the introduction of SSE. (Discontinued instructions are not included.)

  1. ^ a b c In 64-bit mode, the RDMSR, RDTSC and RDPMC instructions will set the top 32 bits of RDX and RAX to zero.
  2. ^ On Intel and AMD CPUs, the WRMSR instruction is also used to update the CPU microcode. This is done by writing the virtual address of the new microcode to upload to MSR 79h on Intel CPUs and MSR C001_0020h[37] on AMD CPUs.
  3. ^ Writes to the following MSRs are not serializing:[38][39]
    Number Name
    48h SPEC_CTRL
    49h PRED_CMD
    10Bh FLUSH_CMD
    122h TSX_CTRL
    6E0h TSC_DEADLINE
    6E1h PKRS
    774h HWP_REQUEST
    (non-serializing only if the FAST_IA32_­HWP_REQUEST bit it set)
    802h to 83Fh (x2APIC MSRs)
    1B01h UARCH_MISC_CTL
    C001_0100h FS_BASE (non-serializing on AMD Zen 4 and later)[40]
    C001_0101h GS_BASE (Zen 4 and later)
    C001_0102h KernelGSbase (Zen 4 and later)
    C001_011Bh Doorbell Register (AMD-specific)

    WRMSR to the x2APIC ICR (Interrupt Command Register; MSR 830h) is commonly used to produce an IPI (Inter-processor interrupt) - on Intel[41] but not AMD[42] CPUs, such an IPI can be reordered before an older memory store.

  4. ^ System Management Mode and the RSM instruction were made available on non-SL variants of the Intel 486 only after the initial release of the Intel Pentium in 1993.
  5. ^ On some older 32-bit processors, executing CPUID with a leaf index (EAX) greater than 0 may leave EBX and ECX unmodified, keeping their old values. For this reason, it is recommended to zero out EBX and ECX before executing CPUID.
    Processors noted to exhibit this behavior include Cyrix MII[47] and IDT WinChip 2.[48]

    In 64-bit mode, CPUID will set the top 32 bits of RAX, RBX, RCX and RDX to zero.
  6. ^ On some Intel processors starting from Ivy Bridge, there exists MSRs that can be used to restrict CPUID to ring 0. Such MSRs are documented for at least Ivy Bridge[49] and Denverton.[50]
    The ability to restrict CPUID to ring 0 also exists on AMD processors supporting the "CpuidUserDis" feature (Zen 4 "Raphael" and later).[51]
  7. ^ a b CPUID is also available on some Intel and AMD 486 processor variants that were released after the initial release of the Intel Pentium.
  8. ^ On the Cyrix 5x86 and 6x86 CPUs, CPUID is not enabled by default and must be enabled through a Cyrix configuration register.
  9. ^ On NexGen CPUs, CPUID is only supported with some system BIOSes. On some NexGen CPUs that do support CPUID, EFLAGS.ID is not supported but EFLAGS.AC is, complicating CPU detection.[52]
  10. ^ Unlike the older CMPXCHG instruction, the CMPXCHG8B instruction does not modify any EFLAGS bits other than ZF.
  11. ^ LOCK CMPXCHG8B with a register operand (which is an invalid encoding) will, on some Intel Pentium CPUs, cause a hang rather than the expected #UD exception - this is known as the Pentium F00F bug.
  12. ^ a b c On IDT WinChip, Transmeta Crusoe and Rise mP6 processors, the CMPXCHG8B instruction is always supported, however its CPUID bit may be missing. This is a workaround for a bug in Windows NT.[53]
  13. ^ a b The RDTSC and RDPMC instructions are not ordered with respect to other instructions, and may sample their respective counters before earlier instructions are executed or after later instructions have executed. Invocations of RDPMC (but not RDTSC) may be reordered relative to each other even for reads of the same counter.
    In order to impose ordering with respect to other instructions, LFENCE or serializing instructions (e.g. CPUID) are needed.[54]
  14. ^ Fixed-rate TSC was introduced in two stages:
    Constant TSC
    TSC running at a fixed rate as long as the processor core is not in a deep-sleep (C2 or deeper) mode, but not synchronized between CPU cores. Introduced in Intel Prescott, Yonah and Bonnell. Also present in all Transmeta and VIA Nano[55] CPUs, as well as AMD Geode LX.[56] Does not have a CPUID bit.
    Invariant TSC
    TSC running at a fixed rate, and remaining synchronized between CPU cores in all P-,C- and T-states (but not necessarily S-states).
    Present in AMD K10 and later; Intel Nehalem/Saltwell[57] and later; Zhaoxin WuDaoKou[58] and later. Indicated with a CPUID bit (leaf 8000_0007:EDX[8]).
  15. ^ RDTSC can be run outside Ring 0 only if CR4.TSD=0.
    On Intel Pentium and AMD K5/K6, RDTSC cannot be run in Virtual-8086 mode.[59][60] Later processors (Pentium Pro, Athlon 64) removed this restriction.
  16. ^ RDPMC can be run outside Ring 0 only if CR4.PCE=1.
  17. ^ The RDPMC instruction is not present in VIA processors prior to the Nano.
  18. ^ The condition codes supported for CMOVcc instruction (opcode 0F 4x /r, with the x nibble specifying the condition) are:
    x cc Condition (EFLAGS)
    0 O OF=1: "Overflow"
    1 NO OF=0: "Not Overflow"
    2 C,B,NAE CF=1: "Carry", "Below", "Not Above or Equal"
    3 NC,NB,AE CF=0: "Not Carry", "Not Below", "Above or Equal"
    4 Z,E ZF=1: "Zero", "Equal"
    5 NZ,NE ZF=0: "Not Zero", "Not Equal"
    6 NA,BE (CF=1 or ZF=1): "Not Above", "Below or Equal"
    7 A,NBE (CF=0 and ZF=0): "Above", "Not Below or Equal"
    8 S SF=1: "Sign"
    9 NS SF=0: "Not Sign"
    A P,PE PF=1: "Parity", "Parity Even"
    B NP,PO PF=0: "Not Parity", "Parity Odd"
    C L,NGE SF≠OF: "Less", "Not Greater Or Equal"
    D NL,GE SF=OF: "Not Less", "Greater Or Equal"
    E LE,NG (ZF=1 or SF≠OF): "Less or Equal", "Not Greater"
    F NLE,G (ZF=0 and SF=OF): "Not Less or Equal", "Greater"
  19. ^ In 64-bit mode, CMOVcc with a 32-bit operand size will clear the upper 32 bits of the destination register even if the condition is false.
    For CMOVcc with a memory source operand, the CPU will always read the operand from memory – potentially causing memory exceptions and cache line-fills – even if the condition for the move is not satisfied. (The Intel APX extension defines a set of new EVEX-encoded variants of CMOVcc that will suppress memory exceptions if the condition is false.)
  20. ^ On pre-Nehemiah VIA C3 variants ("Samuel"/"Ezra"), the reg,reg but not reg,[mem] forms of the CMOVcc instructions have been reported to be present as undocumented instructions.[61]
  21. ^ Intel's recommended byte encodings for multi-byte NOPs of lengths 2 to 9 bytes in 32/64-bit mode are (in hex):[62]
    Length Byte Sequence
    2 66 90
    3 0F 1F 00
    4 0F 1F 40 00
    5 0F 1F 44 00 00
    6 66 0F 1F 44 00 00
    7 0F 1F 80 00 00 00 00
    8 0F 1F 84 00 00 00 00 00
    9 66 0F 1F 84 00 00 00 00 00

    For cases where there is a need to use more than 9 bytes of NOP padding, it is recommended to use multiple NOPs.

  22. ^ Unlike other instructions added in Pentium Pro, long NOP does not have a CPUID feature bit.
  23. ^ 0F 1F /0 as long-NOP was introduced in the Pentium Pro, but remained undocumented until 2006.[64] The whole 0F 18..1F opcode range was NOP in Pentium Pro. However, except for 0F 1F /0, Intel does not guarantee that these opcodes will remain NOP in future processors, and have indeed assigned some of these opcodes to other instructions in at least some processors.[65]
  24. ^ Documented for AMD x86-64 since 2002.[66]
  25. ^ While the 0F 0B opcode was officially reserved as an invalid opcode from Pentium onwards, it only got assigned the mnemonic UD2 from Pentium Pro onwards.[68]
  26. ^ a b GNU Binutils have used the UD2A and UD2B mnemonics for the 0F 0B and 0F B9 opcodes since version 2.7.[69]
    Neither UD2A nor UD2B originally took any arguments - UD2B was later modified to accept a ModR/M byte, in Binutils version 2.30.[70]
  27. ^ The UD2 (0F 0B) instruction will additionally stop subsequent bytes from being decoded as instructions, even speculatively. For this reason, if an indirect branch instruction is followed by something that is not code, it is recommended to place an UD2 instruction after the indirect branch.[71]