Top Qs
Timeline
Chat
Perspective
Bit manipulation instructions
Hardware-level Bit manipulation instructions From Wikipedia, the free encyclopedia
Remove ads
Bit manipulation instructions are instructions that perform bit manipulation operations in hardware, rather than requiring several instructions for those operations as illustrated with examples in software.[1] Several leading as well as historic architectures have bit manipulation instructions including ARM, WDC 65C02, the TX-2 and the Power ISA.[2]
Bit manipulation is usually divided into subsets as individual instructions can be costly to implement in hardware when the target application has no justification. Conversely, if there is a justification then performance may suffer if the instruction is excluded. Carrying out the cost-benefit analysis is a complex task: one of the most comprehensive efforts in bit manipulation was a collaboration headed by Clare Wolfe, providing justifications, use-cases, c code, proofs and Verilog for each proposed RISC-V instruction.[3][4]
Particular practical examples include Bit banging of GPIO using a low-cost Embedded controller such as the WDC 65C02, 8051 and Atmel PIC. At the slow clock rate of these CPUs, if bit-set/clear/test bit manipulation were not available the use of that low-cost CPU would, self-evidently, not be viable for the target application.
Remove ads
Hardware bit manipulation
Summarize
Perspective
All the architectures below have instruction subsets and groups where the bit manipulation is provided in hardware. From the list it can be seen that DSPs and Embedded Microcontrollers have at least test/set/clear bit, however there are much more comprehensive instructions such as Count leading zeros, Popcount, Galois field arithmetic, Binary-coded decimal, bit-matrix multiply and transpose, byte-permute, bit permute including bit-reversal, specialised cryptographic instructions and many more.
Intel and AMD (x86)
- The x86 instruction core set contains:
BSR
Bit Scan Reverse - a quirky backwards count leading zerosBSF
Bit Scan Forward - a quirky backwards count trailing zeros
- SSE4 and the BMI instruction set extensions contains instructions for:
- Count leading zeros -
lzcnt
- Count trailing zeros -
tzcnt
- Population count -
popcnt
- Bit extract/bit deposit -
pext
/pdep
- Bit test -
ptest
andvptest
, given two inputs, do both anAND
operation and anANDN
operation between them, and set the ZF and CF EFLAGS bits on whether the results of the AND and ANDN, respectively, are 0. This can be used to test if all masked bits are zero, all masked bits are set, or a mix.
- Count leading zeros -
- The AVX-512 ternary extension includes a Bitwise ternary logic instruction,
vpternlog
. Also noteworthy is a conflict detection instruction.VPCONFLICTD
- Also present in the AVX/AVX-512 GFNI subset is bit-matrix affine transformation and its inverse:
GF2P8AFFINEQB
is effectively an 8x8 bit-matrix multiply in the Galois field GF(2^8).[5] - An Intel GNFI technology guide on that AVX/AVX512 GNFI Extension also lists numerous uses including parallel byte-wise set/clear/invert bitmanipulation, 5-bit sign-extension and points out the potential is much greater.[6]
- Intel BCD opcodes
Power ISA
Power ISA has a large range of bit manipulation instructions,[7] largely due to its history and relationship with IBM mainframes and the z/Architecture:
- Count leading zeros and trailing, and masked versions of the same.[8] There is a mixture of Popcount[8] parity[9] and SWAR-style instructions, but not a full set of each:
popcntb
is SWAR byte-level 8x8-bit but there is no 4x16-bitpopcnth
yet there is 2x32-bitpopcntw
and 64-bit scalarpopcntd
. Likewise,prtyw
is SWAR half-word 4x16-bit but there is noprtyb
- masked bit-extract
pextd
and bit-depositpdepd
these drop and distribute bits in place according to a mask instead of the more usual technique of a offset and a length.[10]; An unusual centrifuge instruction which moves masked-bits to the left and unmasked bits to the right, preserving their relative order in both instances. Most ISAs would have an operand expressing the number of sequential bits to extract, plus the length:cfuged
combines these into one general-purpose bitmask.[10] - 8x8-bit transpose
vgbbd
[11] which treats a 64-bit quantity as an 8x8 2D matrix, and performs a matrix transpose operation. Each bit 0 of each byte therefore becomes the first byte, each bit 1 of each byte becomes the second and so on. - a strange but very useful indexing instruction, (
bpermd
)[12] which allows selection of up to eight individual bits from a 64-bit source, by treating each byte of a second 64-bit register as bit-indices into the first. - Ternary 8-bit Bitwise ternary logic instruction
xxeval
[13] similar to AVX-512 - strategic instructions for accelerating Packed BCD.[14]
- Power v3.1 also introduced a number of additional bit manipulation instructions including swapping the order of bytes within half-words, words, and the whole 64-bit register.
Cray Supercomputers
Cray patented BMM (Bit matrix multiply) in 1990 which could cope with up to 64x64-bit operands.[15] The closest equivalent today is the 8x8 GF(2) Affine Transform instruction of AVX512
IBM System/360 through z/Architecture
IBM S/370, S/370-XA, ESA/370, and ESA/390 vector operations
The IBM 3090 introduced an optional vector facility[16] to the System/370-XA and Enterprise Systems Architecture/370 instruction sets. In addition to integer and floating-point vector arithmetic and logical operations on multiple integer and floating-point values, it introduced vector bit manipulation operations count leading zeros vczvm
and population count vcovm
.[17]
z/Architecture scalar
z/Architecture did not support the previous vector facility.[18] However, starting with the 11th edition of the z/Architecture Principles of Operation:[19] it supported the following instructions:
- Vector count leading zeros
vclz
, count trailing zerosvctz
[20][21] and vector population countvpopct
[22] - Vector test under mask
vtm
[23] - sets a Condition Code based on comparing all elements of one register against a second vector as a mask: if all masked-comparisons are all-zero, if all are all-ones or a mix of both. - Vector GF(2) multiply and multiply-accumulate,
vgfm
,[24] known as carryless multiply - And-complement and others,
- bit-extract and deposit,[25]
- a range of bit byte and masked insert instructions,[26]
- comprehensive rotate and insert instructions including masked rotate-and-OR,[27] and shift,[28]
- comprehensive packed BCD.[29]
- memory-based test-and-set and various masked-test set/clear bit operations, which move or copy a single bit into Condition Codes.[30]
DEC PDP-10
The DEC PDP-6 and PDP-10 had logical operations covering the full suite of 2-operand hardware lookup table (LUT2) Boolean functions[31] (rather than the 3-operand functions that AVX512 and Power ISA have).
Later models of the PDP-10 had instructions to convert between packed BCD and binary.[32]
Also present is unusual (variable-bit-length) byte load and store instructions that use byte pointers for memory operands: in modern terminology these are bit-field insert and extract. In addition to a word address, the bit length (S) and the bit offset (P) of the byte from which to load or into which to store are specified. These instructions can specify a byte size of 0-36, but a byte may not straddle a word boundary.[33] The string manipulation,[34] BCD/binary conversion,[35] and string editing[36] instructions in later models use byte pointers and have the same restrictions.
GE-600 series
The GE-600 series and its successors had Gray-to-binary conversion; without such an instruction, converting from Gray code requires multiple steps. Binary-to-Gray is simply x^(x>>1)
and does not justify a dedicated instruction. Gray coding has significant practical applications.
ARM
- ARM11 has bitwise test-ANDed (a bitmasked test) and test-XOR, standard logical bitwise operations including OR-complement; byte halfword and bit-reversing, and conditional byte-selection/merging. Shift and rotate are available on Operand2.[37]
- ARM Cortex-A has bit-field set, clear, extract and reverse.[38]
- ARM A64 has SWAR-style half-word byte-swapping, bit-field insert and extract, and bit-reversing.[39]
RISC-V
In the standard extensions RISC-V has scalar bitwise operations including shift and arithmetic shift, but no rotate. The omissions are compensated for with additional extensions.
- RISC-V Zb* extensions contain a significant number of bit manipulation instructions.[40] The four groups are broken down into useful categories (the integer subset has min/max, rotate and Popcount for example), and have very good researched justifications for their inclusion and the improvements they bring.[41]
- The RISC-V Vector Extension (RVV) has instructions that qualify as hardware-level bit manipulation, but on Vector masks rather than Scalar registers as is normally the case. For example, a Vector-mask Popcount is available.[42] RVV also has per-element bitwise operations.[43]
Embedded Microcontrollers
Intel
Zilog Z80
- The Zilog Z80 includes
BIT
,RES
, andSET
instructions. These test, reset, and set individual bits in registers or memory pointed to by HL.
MOS 6502
- The WDC 65C02 added bit-manipulation: set, reset and test on individual bits.
- Rockwell added similar extensions (RMB, SMB, BBR and BBS) to the R65C00 series[47]
Microchip PICs
- The Microchip Technology PIC range also has bitwise operations and set, clear and test bit, listed in the instructions.
others
- Texas Instruments DSPs such as the TMS320C6000 series have set, clear, invert, test, extract and insert bit (or bit-field) instructions.[48]
- The TX-2 from 1958 had "skip on bit" predication, as well as set, clear, invert and permute bits, and shift and other bitwise operations.[49][50]
- SuperH has comprehensive memory-based bit manipulation including And-complement and Or-complement, but also has standard register-based test/set/clear and an unusual instruction that replaces bit N (in the range 0 to 7) and copies the replaced bit into the Test register.[51]
- The Signetics 8X300 is a microprocessor produced starting 1976. The processor normally manipulates 8-bit data bytes, but the mask and rotate units makes it possible to manipulate single or multiple bits, making this a variable data-length processor.
Remove ads
Notes
See also
- Find first set – Family of related bitwise operations on machine words
- Bitwise operation – Computer science topic
- Popcount – Number of nonzero symbols in a string
- Count leading zeros – Family of related bitwise operations on machine words
- Mask (computing) – Data used for bitwise operations
- Binary-coded decimal – System of digitally encoding numbers
- CLMUL instruction set – Extension to the x86 instruction set
- Bitwise ternary logic instruction – Bitwise ternary logic (3-way boolean function)
- GFNI instruction set – Intel AVX Galois-Field instructions
References
Wikiwand - on
Seamless Wikipedia browsing. On steroids.
Remove ads