Top Qs
Timeline
Chat
Perspective
X86 instruction listings
List of x86 microprocessor instructions From Wikipedia, the free encyclopedia
Remove ads
The x86 instruction set refers to the set of instructions that x86-compatible microprocessors support. The instructions are usually part of an executable program, often stored as a computer file and executed on the processor.
The x86 instruction set has been extended several times, introducing wider registers and datatypes as well as new functionality.[1]
x86 integer instructions
Summarize
Perspective
Below is the full 8086/8088 instruction set of Intel (81 instructions total).[2] These instructions are also available in 32-bit mode, in which they operate on 32-bit registers (eax, ebx, etc.) and values instead of their 16-bit (ax, bx, etc.) counterparts. The updated instruction set is grouped according to architecture (i186, i286, i386, i486, i586/i686) and is referred to as (32-bit) x86 and (64-bit) x86-64 (also known as AMD64).
Original 8086/8088 instructions
This is the original instruction set. In the 'Notes' column, r means register, m means memory address and imm means immediate (i.e. a value).
Added in specific processors
Added with 80186/80188
Added with 80286
The new instructions added in 80286 add support for x86 protected mode. Some but not all of the instructions are available in real mode as well.
- The descriptors used by the
LGDT
,LIDT
,SGDT
andSIDT
instructions consist of a 2-part data structure. The first part is a 16-bit value, specifying table size in bytes minus 1. The second part is a 32-bit value (64-bit value in 64-bit mode), specifying the linear start address of the table.
ForLGDT
andLIDT
with a 16-bit operand size, the address is ANDed with 00FFFFFFh. On Intel (but not AMD) CPUs, theSGDT
andSIDT
instructions with a 16-bit operand size is – as of Intel SDM revision 079 – documented to write a descriptor to memory with the last byte being set to 0. However, observed behavior is that bits 31:24 of the descriptor table address are written instead.[4] - The
LGDT
,LIDT
,LLDT
andLTR
instructions are serializing on Pentium and later processors. - The
LMSW
instruction is serializing on Intel processors from Pentium onwards, but not on AMD processors. - On 80386 and later, the "Machine Status Word" is the same as the CR0 control register – however, the
LMSW
instruction can only modify the bottom 4 bits of this register and cannot clear bit 0. The inability to clear bit 0 means thatLMSW
can be used to enter but not leave x86 Protected Mode.
On 80286, it is not possible to leave Protected Mode at all (neither withLMSW
nor withLOADALL
[5]) without a CPU reset – on 80386 and later, it is possible to leave Protected Mode, but this requires the use of the 80386-and-laterMOV
toCR0
instruction. - If
CR4.UMIP=1
is set, then theSGDT
,SIDT
,SLDT
,SMSW
andSTR
instructions can only run in Ring 0.
These instructions were unprivileged on all x86 CPUs from 80286 onwards until the introduction of UMIP in 2017.[6] This has been a significant security problem for software-based virtualization, since it enables these instructions to be used by a VM guest to detect that it is running inside a VM.[7] - The
SMSW
,SLDT
andSTR
instructions always use an operand size of 16 bits when used with a memory argument. With a register argument on 80386 or later processors, wider destination operand sizes are available and behave as follows:SMSW
: Stores full CR0 in x86-64 long mode, undefined otherwise.SLDT
: Zero-extends 16-bit argument on Pentium Pro and later processors, undefined on earlier processors.STR
: Zero-extends 16-bit argument.
- In 64-bit long mode, the
ARPL
instruction is not available – the63 /r
opcode has been reassigned to the 64-bit-mode-onlyMOVSXD
instruction. - The
ARPL
instruction causes #UD in Real mode and Virtual 8086 Mode – Windows 95 and OS/2 2.x are known to make extensive use of this #UD to use the63
opcode as a one-byte breakpoint to transition from Virtual 8086 Mode to kernel mode.[8][9] - Bits 19:16 of this mask are documented as "undefined" on Intel CPUs.[10] On AMD CPUs, the mask is documented as
0x00FFFF00
. - On some Intel CPU/microcode combinations from 2019 onwards, the
VERW
instruction also flushes microarchitectural data buffers. This enables it to be used as part of workarounds for Microarchitectural Data Sampling security vulnerabilities.[11][12] Some of the microarchitectural buffer-flushing functions that have been added toVERW
may require the instruction to be executed with a memory operand.[13]
Added with 80386
The 80386 added support for 32-bit operation to the x86 instruction set. This was done by widening the general-purpose registers to 32 bits and introducing the concepts of OperandSize and AddressSize – most instruction forms that would previously take 16-bit data arguments were given the ability to take 32-bit arguments by setting their OperandSize to 32 bits, and instructions that could take 16-bit address arguments were given the ability to take 32-bit address arguments by setting their AddressSize to 32 bits. (Instruction forms that work on 8-bit data continue to be 8-bit regardless of OperandSize. Using a data size of 16 bits will cause only the bottom 16 bits of the 32-bit general-purpose registers to be modified – the top 16 bits are left unchanged.)
The default OperandSize and AddressSize to use for each instruction is given by the D bit of the segment descriptor of the current code segment - D=0
makes both 16-bit, D=1
makes both 32-bit. Additionally, they can be overridden on a per-instruction basis with two new instruction prefixes that were introduced in the 80386:
66h
: OperandSize override. Will change OperandSize from 16-bit to 32-bit ifCS.D=0
, or from 32-bit to 16-bit ifCS.D=1
.67h
: AddressSize override. Will change AddressSize from 16-bit to 32-bit ifCS.D=0
, or from 32-bit to 16-bit ifCS.D=1
.
The 80386 also introduced the two new segment registers FS
and GS
as well as the x86 control, debug and test registers.
The new instructions introduced in the 80386 can broadly be subdivided into two classes:
- Pre-existing opcodes that needed new mnemonics for their 32-bit OperandSize variants (e.g.
CWDE
,LODSD
) - New opcodes that introduced new functionality (e.g.
SHLD
,SETcc
)
For instruction forms where the operand size can be inferred from the instruction's arguments (e.g. ADD EAX,EBX
can be inferred to have a 32-bit OperandSize due to its use of EAX as an argument), new instruction mnemonics are not needed and not provided.
- For the 32-bit string instructions, the ±± notation is used to indicate that the indicated register is post-decremented by 4 if
EFLAGS.DF=1
and post-incremented by 4 otherwise.
For the operands where the DS segment is indicated, the DS segment can be overridden by a segment-override prefix – where the ES segment is indicated, the segment is always ES and cannot be overridden.
The choice of whether to use the 16-bit SI/DI registers or the 32-bit ESI/EDI registers as the address registers to use is made by AddressSize, overridable with the67
prefix. - The 32-bit string instructions accept repeat-prefixes in the same way as older 8/16-bit string instructions.
ForLODSD
,STOSD
,MOVSD
,INSD
andOUTSD
, theREP
prefix (F3
) will repeat the instruction the number of times specified in rCX (CX or ECX, decided by AddressSize), decrementing rCX for each iteration (with rCX=0 resulting in no-op and proceeding to the next instruction).
ForCMPSD
andSCASD
, theREPE
(F3
) andREPNE
(F2
) prefixes are available, which will repeat the instruction, decrementing rCX for each iteration, but only as long as the flag condition (ZF=1 forREPE
, ZF=0 forREPNE
) holds true AND rCX ≠ 0. - For the
INSB/W/D
instructions, the memory access rights for theES:[rDI]
memory address might not be checked until after the port access has been performed – if this check fails (e.g. page fault or other memory exception), then the data item read from the port is lost. As such, it is not recommended to use this instruction to access an I/O port that performs any kind of side effect upon read. - I/O port access is only allowed when CPL≤IOPL or the I/O port permission bitmap bits for the port to access are all set to 0.
- For the
E3
opcode (JCXZ
/JECXZ
), the choice of whether the instruction will useCX
orECX
for its comparison (and consequently which mnemonic to use) is based on the AddressSize, not OperandSize. (OperandSize instead controls whether the jump destination should be truncated to 16 bits or not).
This also applies to the loop instructionsLOOP
,LOOPE
,LOOPNE
(opcodesE0
,E1
,E2
), however, unlikeJCXZ
/JECXZ
, these instructions have not been given new mnemonics for their ECX-using variants. - The
PUSHFD
andPOPFD
instructions will cause a #GP exception if executed in virtual 8086 mode if IOPL is not 3.
ThePUSHF
,POPF
,IRET
andIRETD
instructions will cause a #GP exception if executed in Virtual-8086 mode if IOPL is not 3 and VME is not enabled. - If
IRETD
is used to return from kernel mode to user mode (which will entail a CPL change) and the user-mode stack segment indicated by SS is a 16-bit segment, then theIRETD
instruction will only restore the low 16 bits of the stack pointer (ESP/RSP), with the remaining bits keeping whatever value they had in kernel code before theIRETD
. This has necessitated complex workarounds on both Linux ("ESPFIX")[16] and Windows.[17] This issue also affects the later 64-bitIRETQ
instruction.
- For the
BT
,BTS
,BTR
andBTC
instructions:- If the first argument to the instruction is a register operand and/or the second argument is an immediate, then the bit-index in the second argument is taken modulo operand size (16/32/64, in effect using only the bottom 4, 5 or 6 bits of the index.)
- If the first argument is a memory operand and the second argument is a register operand, then the bit-index in the second argument is used in full – it is interpreted as a signed bit-index that is used to offset the memory address to use for the bit test.
- If the
F3
prefix is used with the0F BC /r
opcode, then the instruction will execute asTZCNT
on systems that support the BMI1 extension.TZCNT
differs fromBSF
in thatTZCNT
but notBSR
is defined to return operand size if the source operand is zero – for other source operand values, they produce the same result (except for flags). - For
SHLD
andSHRD
, the shift-amount is masked – the bottom 5 bits are used for 16/32-bit operand size and 6 bits for 64-bit operand size.SHLD
andSHRD
with 16-bit arguments and a shift-amount greater than 16 produce undefined results. (Actual results differ between different Intel CPUs, with at least three different behaviors known.[18]) - For
SETcc
, while the opcode is commonly specified as /0 – implying that bits 5:3 of the instruction's ModR/M byte should be 000 – modern x86 processors (Pentium and later) ignore bits 5:3 and will execute the instruction asSETcc
regardless of the contents of these bits. - For
LFS
,LGS
andLSS
, the size of the offset part of the far pointer is given by operand size – the size of the segment part is always 16 bits. In 64-bit mode, using theREX.W
prefix with these instructions will cause them to load a far pointer with a 64-bit offset on Intel but not AMD processors. - For
MOV
to/from theCRx
,DRx
andTRx
registers, the reg part of the ModR/M byte is used to indicateCRx/DRx/TRx
register and r/m part the general-register. Uniquely for theMOV CRx/DRx/TRx
opcodes, the top two bits of the ModR/M byte is ignored – these opcodes are decoded and executed as if the top two bits of the ModR/M byte are11b
. - On processors that support global pages (Pentium and later), global page table entries will not be flushed by a
MOV
toCR3
− instead, these entries can be flushed by toggling the CR4.PGE bit.
On processors that support PCIDs, writing to CR3 while PCIDs are enabled will only flush TLB entries belonging to the PCID specified in bits 11:0 of the value written to CR3 (this flush can be suppressed by setting bit 63 of the written value to 1). Flushing pages belonging to other PCIDs can instead be done by toggling the CR4.PGE bit, clearing the CR4.PCIDE bit, or using theINVPCID
instruction. - On processors prior to Pentium, moves to
CR0
would not serialize the instruction stream – in part for this reason, it is usually required to perform a far jump[19] immediately after aMOV
toCR0
if such aMOV
is used to enable/disable protected mode and/or memory paging.MOV
toCR2
is architecturally listed as serializing, but has been reported to be non-serializing on at least some Intel Core-i7 processors.[20]MOV
toCR8
(introduced with x86-64) is serializing on AMD but not Intel processors. - The
INT1
/ICEBP
(F1
) instruction is present on all known Intel x86 processors from the 80386 onwards,[21] but only fully documented for Intel processors from the May 2018 release of the Intel SDM (rev 067) onwards.[22] Before this release, mention of the instruction in Intel material was sporadic, e.g. AP-526 rev 001.[23]
For AMD processors, the instruction has been documented since 2002.[24] - The operation of the
F1
(ICEBP
) opcode differs from the operation of the regular software interrupt opcodeCD 01
in several ways:- In protected mode,
- In virtual-8086 mode,
CD 01
will also check CPL against IOPL as an access-rights check, whileF1
will not. - In virtual-8086 mode with VME enabled, interrupt redirection is supported for
CD 01
but notF1
.
CD 01
will check CPL against the interrupt descriptor's DPL field as an access-rights check, whileF1
will not. - In virtual-8086 mode,
- The UMOV instruction is present on 386 and 486 processors only.[21]
- The
XBTS
andIBTS
instructions were discontinued with the B1 stepping of 80386.
They have been used by software mainly for detection of the buggy[25] B0 stepping of the 80386. Microsoft Windows (v2.01 and later) will attempt to run theXBTS
instruction as part of its CPU detection ifCPUID
is not present, and will refuse to boot ifXBTS
is found to be working.[26] - For
XBTS
andIBTS
, the r/m argument represents the data to extract/insert a bitfield from/to, the reg argument the bitfield to be inserted/extracted, AX/EAX a bit-offset and CL a bitfield length.[27] - Undocumented, 80386 only.[28]
Added with 80486
- Under Intel VT-x virtualization, the
INVD
instruction will cause a mandatory #VMEXIT. Also, on processors that support Intel SGX, if the PRM (Processor Reserved Memory) has been set up by using the PRMRRs (PRM range registers), then theINVD
instruction is not permitted and will cause a #GP(0) exception.[34]
Added in P5/P6-class processors
Integer/system instructions that were not present in the basic 80486 instruction set, but were added in various x86 processors prior to the introduction of SSE. (Discontinued instructions are not included.)
- On Intel and AMD CPUs, the
WRMSR
instruction is also used to update the CPU microcode. This is done by writing the virtual address of the new microcode to upload to MSR79h
on Intel CPUs and MSRC001_0020h
[36] on AMD CPUs. - Writes to the following MSRs are not serializing:[37][38]
WRMSR
to the x2APIC ICR (Interrupt Command Register; MSR830h
) is commonly used to produce an IPI (Inter-processor interrupt) - on Intel[40] but not AMD[41] CPUs, such an IPI can be reordered before an older memory store. - System Management Mode and the
RSM
instruction were made available on non-SL variants of the Intel 486 only after the initial release of the Intel Pentium in 1993. - On some older 32-bit processors, executing
CPUID
with a leaf index (EAX) greater than 0 may leave EBX and ECX unmodified, keeping their old values. For this reason, it is recommended to zero out EBX and ECX before executingCPUID
.
Processors noted to exhibit this behavior include Cyrix MII[46] and IDT WinChip 2.[47]
In 64-bit mode,CPUID
will set the top 32 bits of RAX, RBX, RCX and RDX to zero. - On some Intel processors starting from Ivy Bridge, there exists MSRs that can be used to restrict
CPUID
to ring 0. Such MSRs are documented for at least Ivy Bridge[48] and Denverton.[49]
The ability to restrictCPUID
to ring 0 also exists on AMD processors supporting the "CpuidUserDis" feature (Zen 4 "Raphael" and later).[50] - On NexGen CPUs,
CPUID
is only supported with some system BIOSes. On some NexGen CPUs that do supportCPUID
, EFLAGS.ID is not supported but EFLAGS.AC is, complicating CPU detection.[51] - Unlike the older
CMPXCHG
instruction, theCMPXCHG8B
instruction does not modify any EFLAGS bits other than ZF. LOCK CMPXCHG8B
with a register operand (which is an invalid encoding) will, on some Intel Pentium CPUs, cause a hang rather than the expected #UD exception - this is known as the Pentium F00F bug.- On IDT WinChip, Transmeta Crusoe and Rise mP6 processors, the
CMPXCHG8B
instruction is always supported, however its CPUID bit may be missing. This is a workaround for a bug in Windows NT.[52] - The
RDTSC
andRDPMC
instructions are not ordered with respect to other instructions, and may sample their respective counters before earlier instructions are executed or after later instructions have executed. Invocations ofRDPMC
(but notRDTSC
) may be reordered relative to each other even for reads of the same counter.
In order to impose ordering with respect to other instructions,LFENCE
or serializing instructions (e.g.CPUID
) are needed.[53] - Fixed-rate TSC was introduced in two stages:
- Constant TSC
- TSC running at a fixed rate as long as the processor core is not in a deep-sleep (C2 or deeper) mode, but not synchronized between CPU cores. Introduced in Intel Prescott, Yonah and Bonnell. Also present in all Transmeta and VIA Nano[54] CPUs. Does not have a CPUID bit.
- Invariant TSC
- TSC running at a fixed rate, and remaining synchronized between CPU cores in all P-,C- and T-states (but not necessarily S-states).
Present in AMD K10 and later; Intel Nehalem/Saltwell[55] and later; Zhaoxin WuDaoKou[56] and later. Indicated with a CPUID bit (leaf8000_0007:EDX[8]
).
RDPMC
can be run outside Ring 0 only ifCR4.PCE=1
.- In 64-bit mode,
CMOVcc
with a 32-bit operand size will clear the upper 32 bits of the destination register even if the condition is false.
ForCMOVcc
with a memory source operand, the CPU will always read the operand from memory – potentially causing memory exceptions and cache line-fills – even if the condition for the move is not satisfied. (The Intel APX extension defines a set of new EVEX-encoded variants ofCMOVcc
that will suppress memory exceptions if the condition is false.) - On pre-Nehemiah VIA C3 variants ("Samuel"/"Ezra"), the
reg,reg
but notreg,[mem]
forms of theCMOVcc
instructions have been reported to be present as undocumented instructions.[58] - Intel's recommended byte encodings for multi-byte NOPs of lengths 2 to 9 bytes in 32/64-bit mode are (in hex):[59] For cases where there is a need to use more than 9 bytes of NOP padding, it is recommended to use multiple NOPs.
- Unlike other instructions added in Pentium Pro, long NOP does not have a CPUID feature bit.
0F 1F /0
as long-NOP was introduced in the Pentium Pro, but remained undocumented until 2006.[61] The whole0F 18..1F
opcode range wasNOP
in Pentium Pro. However, except for0F 1F /0
, Intel does not guarantee that these opcodes will remainNOP
in future processors, and have indeed assigned some of these opcodes to other instructions in at least some processors.[62]- Documented for AMD x86-64 since 2002.[63]
- While the
0F 0B
opcode was officially reserved as an invalid opcode from Pentium onwards, it only got assigned the mnemonicUD2
from Pentium Pro onwards.[65] - GNU Binutils have used the
UD2A
andUD2B
mnemonics for the0F 0B
and0F B9
opcodes since version 2.7.[66]
NeitherUD2A
norUD2B
originally took any arguments -UD2B
was later modified to accept a ModR/M byte, in Binutils version 2.30.[67] - The
UD2
(0F 0B
) instruction will additionally stop subsequent bytes from being decoded as instructions, even speculatively. For this reason, if an indirect branch instruction is followed by something that is not code, it is recommended to place anUD2
instruction after the indirect branch.[68] - The UD0/1/2 opcodes -
0F 0B
,0F B9
and0F FF
- will cause an #UD exception on all x86 processors from the 80186 onwards (except NEC V-series processors), but did not get explicitly reserved for this purpose until P5-class processors. - For the
0F FF
opcode, theOIO
mnemonic was introduced by Cyrix,[75] while theUD0
menmonic (without arguments) was introduced by AMD and Intel at the same time as theUD1
mnemonic for0F B9
.[70][71] Later Intel (but not AMD) documentation modified its description ofUD0
to add a ModR/M byte and take two arguments.[76] - On K6, the
SYSCALL
/SYSRET
instructions were available on Model 7 (250nm "Little Foot") and later, not on the earlier Model 6.[78] - The exact semantics of
SYSRET
differs slightly between AMD and Intel processors: non-canonical return addresses cause a #GP exception to be thrown in Ring 3 on AMD CPUs but Ring 0 on Intel CPUs. This has been known to cause security issues.[79] - For the
SYSRET
andSYSEXIT
instructions under x86-64, it is necessary to add theREX.W
prefix for variants that will return to 64-bit user-mode code.
Encodings of these instructions without theREX.W
prefix are used to return to 32-bit user-mode code. (Neither of these instructions can be used to return to 16-bit user-mode code — for return to 16-bit code,IRET
/IRETD
/IRETQ
should be used.) - The
SYSRET
,SYSENTER
andSYSEXIT
instructions are unavailable in Real mode. (SYSENTER
is, however, available in Virtual 8086 mode.) - On AMD CPUs, the
SYSENTER
andSYSEXIT
instructions are not available in x86-64 long mode (#UD). - On Transmeta CPUs, the
SYSENTER
andSYSEXIT
instructions are only available with version 4.2 or higher of the Transmeta Code Morphing software.[83] - On Nehemiah,
SYSENTER
andSYSEXIT
are available only on stepping 8 and later.[84]
Added as instruction set extensions
Added with x86-64
These instructions can only be encoded in 64 bit mode. They fall in four groups:
- original instructions that reuse existing opcodes for a different purpose (
MOVSXD
replacingARPL
) - original instructions with new opcodes (
SWAPGS
) - existing instructions extended to a 64 bit address size (
JRCXZ
) - existing instructions extended to a 64 bit operand size (remaining instructions)
Most instructions with a 64 bit operand size encode this using a REX.W
prefix; in the absence of the REX.W
prefix,
the corresponding instruction with 32 bit operand size is encoded. This mechanism also applies to most other instructions with 32 bit operand
size. These are not listed here as they do not gain a new mnemonic in Intel syntax when used with a 64 bit operand size.
- The
CMPXCHG16B
instruction was absent from a few of the earliest Intel/AMD x86-64 processors. On Intel processors, the instruction was missing from Xeon "Nocona" stepping D,[85] but added in stepping E.[86] On AMD K8 family processors, it was added in stepping F, at the same time as DDR2 support was introduced.[87]
For this reason,CMPXCHG16B
has its own CPUID flag, separate from the rest of x86-64. - Encodings of
MOVSXD
without REX.W prefix are permitted but discouraged[88] – such encodings behave identically to 16/32-bitMOV
(8B /r
).
Bit manipulation extensions
Bit manipulation instructions. For all of the VEX-encoded instructions defined by BMI1 and BMI2, the operand size may be 32 or 64 bits, controlled by the VEX.W bit – none of these instructions are available in 16-bit variants. The VEX-encoded instructions are not available in Real Mode and Virtual-8086 mode - other than that, the bit manipulation instructions are available in all operating modes on supported CPUs.
- On AMD CPUs, the "ABM" extension provides both
POPCNT
andLZCNT
. On Intel CPUs, however, the CPUID bit for "ABM" is only documented to indicate the presence of theLZCNT
instruction and is listed as "LZCNT", whilePOPCNT
has its own separate CPUID feature bit.
However, all known processors that implement the "ABM"/"LZCNT" extensions also implementPOPCNT
and set the CPUID feature bit for POPCNT, so the distinction is theoretical only.
(The converse is not true – there exist processors that supportPOPCNT
but not ABM, such as Intel Nehalem and VIA Nano 3000.) - The
TZCNT
instruction will execute asBSF
on systems that do not support the BMI1 extension.BSF
produces the same result asTZCNT
for all input operand values except zero – for whichTZCNT
returns input operand size, butBSF
produces undefined behavior (leaves destination unmodified on most modern CPUs). - On AMD processors before Zen 3, the
PEXT
andPDEP
instructions are quite slow[89] and exhibit data-dependent timing due to the use of a microcoded implementation (about 18 to 300 cycles, depending on the number of bits set in the mask argument). As a result, it is often faster to use other instruction sequences on these processors.[90][91]
Added with Intel TSX
Added with Intel CET
Intel CET (Control-Flow Enforcement Technology) adds two distinct features to help protect against security exploits such as return-oriented programming: a shadow stack (CET_SS), and indirect branch tracking (CET_IBT).
- This prefix has the same encoding as the DS: segment override prefix – as of April 2022, Intel documentation does not appear to specify whether this prefix also retains its old segment-override function when used as a no-track prefix, nor does it provide an official mnemonic for this prefix.[92][93] (GNU binutils use "notrack"[94])
Added with XSAVE
The XSAVE instruction set extensions are designed to save/restore CPU extended state (typically for the purpose of context switching) in a manner that can be extended to cover new instruction set extensions without the OS context-switching code needing to understand the specifics of the new extensions. This is done by defining a series of state-components, each with a size and offset within a given save area, and each corresponding to a subset of the state needed for one CPU extension or another. The EAX=0Dh
CPUID leaf is used to provide information about which state-components the CPU supports and what their sizes/offsets are, so that the OS can reserve the proper amount of space and set the associated enable-bits.
- On some processors (starting with Skylake, Goldmont and Zen 1), executing
XGETBV
with ECX=1 is permitted – this will not returnXCR1
(no such register exists) but instead returnXCR0
bitwise-ANDed with the current value of the "XINUSE" state-component bitmap (a bitmap of XSAVE state-components that are not known to be in their initial state).
The presence of this functionality ofXGETBV
is indicated by CPUID.(EAX=0Dh,ECX=1):EAX[bit 2]. - The
XSETBV
instruction will cause a mandatory #VMEXIT if executed under Intel VT-x virtualization.
Added with other cross-vendor extensions
- AMD Athlon processors prior to the Athlon XP did not support full SSE, but did introduce the non-SIMD instructions of SSE as part of "MMX Extensions".[95] These extensions (without full SSE) are also present on Geode GX2 and later Geode processors.
- The
SFENCE
instruction ensures that all memory stores after theSFENCE
instruction are made globally observable after all memory stores before theSFENCE
. This imposes ordering on stores that can otherwise be reordered, such as non-temporal stores and stores to WC (Write-Combining) memory regions.[96]
On Intel CPUs, as well as AMD CPUs from Zen1 onwards (but not older AMD CPUs),SFENCE
also acts as a reordering barrier on cache flushes/writebacks performed with theCLFLUSH
,CLFLUSHOPT
andCLWB
instructions. (Older AMD CPUs requireMFENCE
to orderCLFLUSH
.)SFENCE
is not ordered with respect toLFENCE
, and anSFENCE+LFENCE
sequence is not sufficient to prevent a load from being reordered past a previous store.[97] To prevent such reordering, it is necessary to execute anMFENCE
,LOCK
or a serializing instruction. - The
LFENCE
instruction ensures that all memory loads after theLFENCE
instruction are made globally observable after all memory loads before theLFENCE
.
On all Intel CPUs that support SSE2, theLFENCE
instruction provides a stronger ordering guarantee:[98] it is dispatch-serializing, meaning that instructions after theLFENCE
instruction are allowed to start executing only after all instructions before it have retired (which will ensure that all preceding loads but not necessarily stores have completed). The effect of dispatch-serialization is thatLFENCE
also acts as a speculation barrier and a reordering barrier for accesses to non-memory resources such as performance counters (accessed through e.g.RDTSC
orRDPMC
) and x2apic MSRs.
On AMD CPUs,LFENCE
is not necessarily dispatch-serializing by default – however, on all AMD CPUs that support any form of non-dispatch-serializingLFENCE
, it can be made dispatch-serializing by setting bit 1 of MSRC001_1029
.[99] - The
MFENCE
instruction ensures that all memory loads, stores and cacheline-flushes after theMFENCE
instruction are made globally observable after all memory loads, stores and cacheline-flushes before theMFENCE
.
On Intel CPUs,MFENCE
is not dispatch-serializing, and therefore cannot be used on its own to enforce ordering on accesses to non-memory resources such as performance counters and x2apic MSRs.MFENCE
is still ordered with respect toLFENCE
, so if there is a need to enforce ordering between memory stores and subsequent non-memory accesses, then such an ordering can be obtained by issuing anMFENCE
followed by anLFENCE
.[53][100]
On AMD CPUs,MFENCE
is serializing. - The operation of the
PAUSE
instruction in 64-bit mode is, unlikeNOP
, unaffected by the presence of theREX.R
prefix. NeitherNOP
norPAUSE
are affected by the other bits of theREX
prefix. A few examples of how opcode90
interacts with various prefixes in 64-bit mode are:90
isNOP
41 90
isXCHG R8D,EAX
4E 90
isNOP
49 90
isXCHG R8,RAX
F3 90
isPAUSE
F3 41 90
isPAUSE
F3 4F 90
isPAUSE
- While the
CLFLUSH
instruction was introduced together with SSE2, it has its own CPUID flag and may be present on processors not otherwise implementing SSE2 and/or absent from processors that otherwise implement SSE2. (E.g. AMD Geode LX supportsCLFLUSH
but not SSE2.) - While the
MONITOR
andMWAIT
instructions were introduced at the same time as SSE3, they have their own CPUID flag that needs to be checked separately from the SSE3 CPUID flag (e.g. Athlon 64 X2 and VIA C7 supported SSE3 but not MONITOR.) - For
MONITOR
, the DS: segment can be overridden with a segment prefix.
The memory area that will be monitored will be not just the single byte specified by DS:rAX, but a linear memory region containing the byte – the size and alignment of this memory region is implementation-dependent and can be queried through CPUID.
The memory location to monitor should have memory type WB (write-back cacheable), or else monitoring may fail. - The wait performed by
MWAIT
may be ended by system events other than a memory write (e.g. cacheline evictions, interrupts) – the exact set of events that can cause the wait to end is implementation-specific.
Regardless of whether the wait was ended by a memory write or some other event, monitoring will have ended and it will be necessary to set up monitoring again withMONITOR
before usingMWAIT
to wait for memory writes again. - The hint flags available for
MWAIT
in the EAX register are: The C-states are processor-specific power states, which do not necessarily correspond 1:1 to ACPI C-states. RDTSCP
can be run outside Ring 0 only ifCR4.TSD=0
.- For the
MOVBE
instruction, encodings that use both the66h
prefix and theREX.W
prefix will cause #UD on some processors (e.g. Haswell[108]) and should therefore be avoided. - Unlike the older
INVLPG
instruction,INVPCID
will cause a #GP exception if the provided memory address is non-canonical. This discrepancy has been known to cause security issues.[109] - The
PREFETCH
andPREFETCHW
instructions are mandatory parts of the 3DNow! instruction set extension, but are also available as a standalone extension on systems that do not support 3DNow! - The opcodes for
PREFETCH
andPREFETCHW
(0F 0D /r
) execute as NOPs on Intel CPUs from Cedar Mill (65nm Pentium 4) onwards, withPREFETCHW
gaining prefetch functionality from Broadwell onwards. - The
PREFETCH
(0F 0D /0
) instruction is a 3DNow! instruction, present on all processors with 3DNow! but not necessarily on processors with the PREFETCHW extension.
On AMD CPUs with PREFETCHW, opcode0F 0D /0
as well as opcodes0F 0D /2../7
are all documented to be performing prefetch.
On Intel processors with PREFETCHW, these opcodes are documented as performing reserved-NOPs[110] (except0F 0D /2
beingPREFETCHWT1 m8
on Xeon Phi only) – third party testing[111] indicates that some or all of these opcodes may be performing prefetch on at least some Intel Core CPUs. - Unlike the older
RDTSCP
instruction which can also be used to read the processor ID, user-modeRDPID
is not disabled byCR4.TSD=1
. - For
MOVDIR64
, the destination address given by ES:reg must be 64-byte aligned.
The operand size for the register argument is given by the address size, which may be overridden by the67h
prefix.
The 64-byte memory source argument does not need to be 64-byte aligned, and is not guaranteed to be read atomically. - In initial implementations, the
PREFETCHIT0
andPREFETCHIT1
instructions will perform code prefetch only when using the RIP-relative addressing mode and act as NOPs otherwise.
The PREFETCHI instructions are hint instructions only - if an attempt is made to prefetch an invalid address, the instructions will act as NOPs with no exceptions generated. On processors that support Long-NOP but do not support the PREFETCHI instructions, these instructions will always act as NOPs.
Added with other Intel-specific extensions
- The branch hint mnemonics
HWNT
andHST
are listed in early Willamette documentation only[113] - later Intel documentation lists the branch hint prefixes without assigning them a mnemonic.[114]Intel XED uses the mnemonics
hint-taken
andhint-not-taken
for these branch hints.[115] - The
2E
and3E
prefixes are interpreted as branch hints only when used with theJcc
conditional branch instructions (opcodes70..7F
and0F 80..8F
) - when used with other opcodes, they may take other meanings (e.g. for instructions with memory operands outside 64-bit mode, they will work as segment-override prefixesCS:
andDS:
, respectively). On processors that don't support branch hints, these prefixes are accepted but ignored when used withJcc
. - Branch hints are supported on all NetBurst (Pentium 4 family) processors - but not supported on any other known processor prior to their re-introduction in "Redwood Cove" CPUs, starting with "Meteor Lake" in 2023.
- SGX is deprecated on desktop/laptop processors from 11th generation (Rocket Lake, Tiger Lake) onwards,[119] but continues to be available on Xeon-branded server parts.
- For
PTWRITE
, the write to the Processor Trace Packet will only happen if a set of enable-bits (the "TriggerEn", "ContextEn", "FilterEn" bits of theRTIT_STATUS
MSR and the "PTWEn" bit of theRTIT_CTL
MSR) are all set to 1.
ThePTWRITE
instruction is indicated in the SDM to cause an #UD exception if the 66h instruction prefix is used, regardless of other prefixes. - For
CLDEMOTE
, the cache level that it will demote a cache line to is implementation-dependent.
Since the instruction is considered a hint, it will execute as a NOP without any exceptions if the provided memory address is invalid or not in the L1 cache. It may also execute as a NOP under other implementation-dependent circumstances as well.
On systems that do not support the CLDEMOTE extension, it executes as a NOP. - Intel documentation lists Tremont and Alder Lake as the processors in which CLDEMOTE was introduced. However, as of May 2022, no Tremont or Alder Lake models have been observed to have the CPUID feature bit for CLDEMOTE set, while several of them have the CPUID bit cleared.[124]
As of April 2023, the CPUID feature bit for CLDEMOTE has been observed to be set for Sapphire Rapids.[125] - For the
UMWAIT
andTPAUSE
instructions, the operating system can use theIA32_UMWAIT_CONTROL
MSR to limit the maximum amount of time that a singleUMWAIT
/TPAUSE
invocation is permitted to wait. TheUMWAIT
andTPAUSE
instructions will setRFLAGS.CF
to 1 if they reached theIA32_UMWAIT_CONTROL
-defined time limit and 0 otherwise. TPAUSE
andUMWAIT
can be run outside Ring 0 only ifCR4.TSD=0
.- While serialization can be performed with older instructions such as e.g.
CPUID
andIRET
, these instructions perform additional functions, causing side-effects and reduced performance when stand-alone instruction serialization is needed. (CPUID
additionally has the issue that it causes a mandatory #VMEXIT when executed under virtualization, which causes a very large overhead.) TheSERIALIZE
instruction performs serialization only, avoiding these added costs. - The register argument to
SENDUIPI
is an index to pick an entry from the UITT (User-Interrupt Target Table, a table specified by the newUINTR_TT
andUINT_MISC
MSRs.) - On Sapphire Rapids processors, the
UIRET
instruction always sets UIF (User Interrupt Flag) to 1. On Sierra Forest and later processors,UIRET
will set UIF to the value of bit 1 of the value popped off the stack for RFLAGS - this functionality is indicated byCPUID.(EAX=7,ECX=1):EDX[17]
.
Added with other AMD-specific extensions
- The standard way to access the CR8 register is to use an encoding that makes use of the
REX.R
prefix, e.g.44 0F 20 07
(MOV RDI,CR8
). However, theREX.R
prefix is only available in 64-bit mode.
The AltMovCr8 extension adds an additional method to access CR8, using theF0
(LOCK
) prefix instead ofREX.R
– this provides access to CR8 outside 64-bit mode. - Like other variants of MOV to/from the CRx registers, the AltMovCr8 encodings ignore the top 2 bits of the instruction's ModR/M byte, and always execute as if these two bits are set to
11b
.
The AltMovCr8 encodings are available in 64-bit mode. However, combining theLOCK
prefix with theREX.R
prefix is not permitted and will cause an #UD exception. - For
CLZERO
, the address size and 67h prefix control whether to use AX, EAX or RAX as address. The default segment DS: can be overridden by a segment-override prefix. The provided address does not need to be aligned – hardware will align it as necessary.
TheCLZERO
instruction is intended for recovery from otherwise-fatal Machine Check errors. It is non-cacheable, cannot be used to allocate a cache line without a memory access, and should not be used for fast memory clears.[127] - If
CR4.TSD=1
, then theRDPRU
instruction can only run in ring 0.
Remove ads
x87 floating-point instructions
Summarize
Perspective
The x87 coprocessor, if present, provides support for floating-point arithmetic. The coprocessor provides eight data registers, each holding one 80-bit floating-point value (1 sign bit, 15 exponent bits, 64 mantissa bits) – these registers are organized as a stack, with the top-of-stack register referred to as "st" or "st(0)", and the other registers referred to as st(1), st(2), ...st(7). It additionally provides a number of control and status registers, including "PC" (precision control, to control whether floating-point operations should be rounded to 24, 53 or 64 mantissa bits) and "RC" (rounding control, to pick rounding-mode: round-to-zero, round-to-positive-infinity, round-to-negative-infinity, round-to-nearest-even) and a 4-bit condition code register "CC", whose four bits are individually referred to as C0, C1, C2 and C3). Not all of the arithmetic instructions provided by x87 obey PC and RC.
Original 8087 instructions
- x87 coprocessors (other than the 8087) handle exceptions in a fairly unusual way. When an x87 instruction generates an unmasked arithmetic exception, it will still complete without causing a CPU fault – instead of causing a fault, it will record within the coprocessor information needed to handle the exception (instruction pointer, opcode, data pointer if the instruction had a memory operand) and set FPU status-word flag to indicate that a pending exception is present. This pending exception will then cause a CPU fault when the next x87, MMX or
WAIT
instruction is executed.
The exception to this is x87's "Non-Waiting" instructions, which will execute without causing such a fault even if a pending exception is present (with some caveats, see application note AP-578[128]). These instructions are mostly control instructions that can inspect and/or modify the pending-exception state of the x87 FPU. - For each non-waiting x87 instruction whose mnemonic begins with
FN
, there exists a pseudo-instruction that has the same mnemonic except without the N. These pseudo-instructions consist of aWAIT
instruction (opcode9B
) followed by the corresponding non-waiting x87 instruction. For example:FNCLEX
is an instruction with the opcodeDB E2
. The corresponding pseudo-instructionFCLEX
is then encoded as9B DB E2
.FNSAVE ES:[BX+6]
is an instruction with the opcode26 DD 77 06
. The corresponding pseudo-instructionFSAVE ES:[BX+6]
is then encoded as9B 26 DD 77 06
- On 80387 and later x87 FPUs,
FLDENV
,F(N)STENV
,FRSTOR
andF(N)SAVE
exist in 16-bit and 32-bit variants. The 16-bit variants will load/store a 14-byte floating-point environment data structure to/from memory – the 32-bit variants will load/store a 28-byte data structure instead. (F(N)SAVE
/FRSTOR
will additionally load/store an additional 80 bytes of FPU data register content after the FPU environment, for a total of 94 or 108 bytes). The choice between the 16-bit and 32-bit variants is based on theCS.D
bit and the presence of the66h
instruction prefix. On 8087 and 80287, only the 16-bit variants are available.
64-bit variants of these instructions do not exist – usingREX.W
under x86-64 will cause the 32-bit variants to be used. Since these can only load/store the bottom 32 bits of FIP and FDP, it is recommended to useFXSAVE64
/FXRSTOR64
instead if 64-bit operation is desired. - In the case of an x87 instruction producing an unmasked FPU exception, the 8087 FPU will signal an IRQ some indeterminate time after the instruction was issued. This may not always be possible to handle,[129] and so the FPU offers the
F(N)DISI
andF(N)ENI
instructions to set/clear the Interrupt Mask bit (bit 7) of the x87 Control Word,[130] to control the interrupt.
Later x87 FPUs, from 80287 onwards, changed the FPU exception mechanism to instead produce a CPU exception on the next x87 instruction. This made the Interrupt Mask bit unnecessary, so it was removed.[131] In later Intel x87 FPUs, theF(N)ENI
andF(N)DISI
instructions were kept for backwards compatibility, executing as NOPs that do not modify any x87 state. - Intel x87 alias opcode. Use of this opcode is not recommended.
On the Intel 8087 coprocessor, several reserved opcodes would perform operations behaving similarly to existing defined x87 instructions. These opcodes were documented for the 8087[132] and 80287,[133] but then omitted from later manuals until the October 2017 update of the Intel SDM.[134]
They are present on all known Intel x87 FPUs but unavailable on some older non-Intel FPUs, such as AMD Geode GX/LX, DM&P Vortex86[135] and NexGen 586PF.[136] - On Intel Pentium and later processors,
FXCH
is implemented as a register renaming rather than a true data move. This has no semantic effect, but enables zero-cycle-latency operation. It also allows the instruction to break data dependencies for the x87 top-of-stack value, improving attainable performance for code optimized for these processors. - On early Intel Pentium processors, floating-point divide was subject to the Pentium FDIV bug. This also affected instructions that perform divide as part of their operations, such as
FPREM
andFPATAN
.[137] - For
FXTRACT
, the behavior that results from st(0) being zero or ±∞, differs between 8087 and 80387:- If st(0) is ±0, then on 8087/80287, E and M are both set equal to st(0) with no exception reported — on 80387 and later, M is set equal to st(0), E is set to -∞, and a zero-divide exception is raised.
- If st(0) is ±∞, then on 8087/80287, an invalid-operation exception is raised and both M and E are set to NaN — on 80387 and later, M is set equal to st(0) and E is set to +∞ with no exception reported.[138]
- For
FPREM
, if the quotient Q is larger than , then the remainder calculation may have been done only partially – in this case, theFPREM
instruction will need to be run again in order to complete the remainder calculation. This is indicated by the instruction settingC2
to 1.
If the instruction did complete the remainder calculation, it will setC2
to 0 and set the three bits{C0,C3,C1}
to the bottom three bits of the quotient Q.
On 80387 and later, if the instruction didn't complete the remainder calculation, then the computed remainder Q used for argument reduction will have been rounded to a multiple of 8 (or larger power-of-2), so that the bottom 3 bits of the quotient can still be correctly retrieved in a later pass that does complete the remainder calculation. - The x87 transcendental instructions do not obey PC or RC, but instead compute full 80-bit results. These results are not necessarily correctly rounded (see Table-maker's dilemma) – they may have an error of up to ±1 ulp on Pentium or later, or up to ±1.5 ulps on earlier x87 coprocessors.
- For the
FYL2X
andFYL2XP1
instructions, the maximum error bound of ±1 ulp only holds for st(1)=1.0 – for other values of st(1), the error bound is increased to ±1.35 ulps.FYL2X
can produce a #Z (divide-by-zero exception) if st(0)=0 and st(1) is a finite nonzero value.FYL2XP1
, however, cannot produce #Z. - For
FPATAN
, the following adjustments are done as compared to just computing a one-argument arctangent of the ratio :- If both st(0) and st(1) are ±∞, then the arctangent is computed as if each of st(0) and st(1) had been replaced with ±1 of the same sign. This produces a result that is an odd multiple of .
- If both st(0) and st(1) are ±0, then the arctangent is computed as if st(0) but not st(1) had been replaced with ±1 of the same sign, producing a result of ±0 or .
- If st(0) is negative (has sign bit set), then an addend of with the same sign as st(1) is added to the result.
x87 instructions added in later processors
- The x87 FPU needs to know whether it is operating in Real Mode or Protected Mode because the floating-point environment accessed by the
F(N)SAVE
,FRSTOR
,FLDENV
andF(N)STENV
instructions has different formats in Real Mode and Protected Mode. On 80287, theF(N)SETPM
instruction is required to communicate the real-to-protected mode transition to the FPU. On 80387 and later x87 FPUs, real↔protected mode transitions are handled automatically between the CPU and the FPU without the need for any dedicated instructions – therefore, on these FPUs,FNSETPM
executes as a NOP that does not modify any FPU state. - The 80387
FPREM1
instruction differs from the olderFPREM
(D9 F8
) instruction in that the quotient Q is rounded to integer with round-to-nearest-even rounding rather than the round-to-zero rounding used byFPREM
. LikeFPREM
,FPREM1
always computes an exact result with no roundoff errors. LikeFPREM
, it may also perform a partial computation if the quotient is too large, in which case it must be run again. - If st(0) is finite and its absolute value is or greater, then the top-of-stack value st(0) is left unmodified and C2 is set, with no exception raised. This applies to the
FSIN
,FCOS
andFSINCOS
instructions, as well asFPTAN
on 80387 and later.
In this case, theFSINCOS
andFPTAN
instructions will also abstain from pushing a value onto the x87 register-stack. - The
FXSAVE
andFXRSTOR
instructions will save/restore SSE state only on processors that support SSE. Otherwise, they will only save/restore x87 and MMX state.
The x87 section of the state saved/restored byFXSAVE(64)
/FXRSTOR(64)
has a completely different layout than the data structure of the olderF(N)SAVE
/FRSTOR
instructions, enabling faster save/restore by avoiding misaligned loads and stores.FXSAVE
andFXRSTOR
require their memory argument to be 16-byte aligned. - When floating-point emulation is enabled with
CR0.EM=1
,FXSAVE(64)
andFXRSTOR(64)
are considered to be x87 instructions and will accordingly produce an #NM (device-not-available) exception. Other thanWAIT
, these are the only opcodes outside theD8..DF
ESC opcode space that exhibit this behavior.
Except on Netburst (Pentium 4 family) CPUs, all opcodes inD8..DF
will produce #NM ifCR0.EM=1
, even for undefined opcodes that would produce #UD otherwise. - The
FXSAVE64
/FXRSTOR64
instruction differ from theFXSAVE
/FXRSTOR
instructions in that:FXSAVE
/FXRSTOR
will save/restore FIP and FDP as 32-bit items, and will also save/restore FCS and FDS as 16-bit items.FXSAVE64
/FXRSTOR64
will save/restore FIP and FDP as 64-bit items while not saving/restoring FCS and FDS.
XSAVE
/XRSTOR
vsXSAVE64
/XRSTOR64
instructions.
As a result, saving both FCS/FDS and the top 32 bits of 64-bit FIP/FDP cannot be accomplished with 1 instruction, but instead requires running both(F)XSAVE
and(F)XSAVE64
. This has been known to cause problems, especially for 64-bit hypervisors running 16/32-bit guests.[141][142]
Remove ads
SIMD instructions
Cryptographic instructions
Virtualization instructions
Other instructions
Summarize
Perspective
x86 also includes discontinued instruction sets which are no longer supported by Intel and AMD, and undocumented instructions which execute but are not officially documented.
Undocumented x86 instructions
The x86 CPUs contain undocumented instructions which are implemented on the chips but not listed in some official documents. They can be found in various sources across the Internet, such as Ralf Brown's Interrupt List and at sandpile.org
Some of these instructions are widely available across many/most x86 CPUs, while others are specific to a narrow range of CPUs.
Undocumented instructions that are widely available across many x86 CPUs include
Undocumented instructions that appear only in a limited subset of x86 CPUs include
Undocumented x87 instructions
Remove ads
See also
References
External links
Wikiwand - on
Seamless Wikipedia browsing. On steroids.
Remove ads