Top Qs
Timeline
Chat
Perspective
Bit manipulation instructions
Hardware-level Bit manipulation instructions From Wikipedia, the free encyclopedia
Remove ads
Bit manipulation instruction sets perform bit manipulation in hardware, as single instructions, rather than several as illustrated with examples in software.[1] Several leading as well as historic architectures have bit manipulation instructions including ARM, WDC 65C02, the TX-2 and the Power ISA.[2]
Bit manipulation is usually divided into subsets as individual instructions can be costly to implement in hardware when the target application has no justification. Conversely, if there is a justification then performance may suffer if the instruction is excluded. Carrying out the cost-benefit analysis is a complex task: one of the most comprehensive efforts in bit manipulation was a collaboration headed by Clare Wolfe, providing justifications, use-cases, c code, proofs and Verilog for each proposed instruction.[3][4]
Particular practical examples include Bit banging of GPIO using a low-cost Embedded controller such as the WDC 65C02, 8051 and Atmel PIC. At the slow clock rate of these CPUs, if bit-set/clear/test bit manipulation were not available the use of that low-cost CPU would, logically, not be viable for the target application.
Note:
In something of a Wikipedia Fourth wall breakage: GPUs and other highly-specialist tasks such as cryptography tend to result in extreme-specialist instructions, wthout which performance would suck. Examples include AES instruction set extensions that cannot in any way be used for any other purpose. GPUs such as Larrabee[5] and Nyuzi attempted to "dial back" this practice to some extent, only to discover why it is done (performance sucks otherwise... seeing a trend, here?).
This page is not about such specialised instructions, nor even of their functionality. It covers useful Categorisation of the existence in CPUs and CPU families, of general-purpose bit-manipulation instructions that happen to greatly improve performance or power conumption of specific algorithms. An example is cryptography making heavy use of rotate, but rotate having many other practical uses elsewhere.
If you encounter any type of bit manipulation instructions, or any CPU that has them, feel free to add them below, bearing in mind that the page's primary purpose is Categorisation, not explicit functional description per se. A helpful task for future readers would be to add such pages describing the functionality to the "See also" section. Enjoy the end of the Fourth Wall...
Remove ads
Hardware bit manipulation
Summarize
Perspective
All the architectures below have instruction subsets and groups where the bit manipulation is provided in hardware.
Intel and AMD (x86)
- For Intel and AMD bit manipulation instructions see X86 Bit manipulation instruction set
- The AVX-512 extension includes a Bitwise ternary logic instruction,
vpternlog
. Also noteworthy is a conflict detection instruction.VPCONFLICTD
Power ISA
Power ISA has a large range of bit manipulation instructions,[6] largely due to its history and relationship with IBM mainframes and the z/Architecture:
- Popcount
- Count leading zeros
- bit-extract and bit-deposit
- 8x8 bit permute (
bpermd
) - Ternary 8-bit Bitwise ternary logic instruction
xxeval
[7] similar to AVX-512 - SWAR-style 8, 16 and 32-bit parity instructions
- bit-matrix multiply and transpose, which are computationally otherwise very expensive.
- strategic instructions for accelerating Packed BCD.
- Power v3.1 also introduced a number of additional bit manipulation instructions including swapping the order of bytes within half-words, words, and the whole 64-bit register.
IBM System/360 and successors
Some clarification is needed for this section: IBM z/Architecture has both scalar and up to the 11th edition has vector processor bit manipulation instructions. Modern IBM z/Architecture scalar instructions starting at the z13 have the same scalar bit manipulation instructions as the IBM 3090 and the ESA/390 (only the IBM/370 Vector facility was dropped by the 11th revision of z/Architecture).
Previous revisions of z/Architecture has an optional Vector processor facility that came from the IBM 3090. However it was dropped as of the z13, by the 15th edition, which has Packed SIMD instructions instead.[8]. IBM 370 also had the same vector instructions.
z/Architecture Scalar
These instructions are part of the 11th edition[9] z/Architecture, and are also present in IBM 370 and ESA/390:
- And-complement and others,
- bit-extract and deposit,[10]
- popcount,[11]
- count leading and trailing zeros,[12]
- a range of bit byte and masked insert instructions,[13],
- comprehensive rotate and insert instructions including masked rotate-and-OR,[14] and shift,[15]
- comprehensive Packed BCD.[16]
- memory-based test-and-set and various masked-test set/clear bit operations, which move or copy a single bit into Condition Codes.[17]
IBM 3090 Vector
- Vector count-leading zeros
vclz
, trailingvctz
[18] andvpopct
[19] - Vector test under mask
vtm
[20] - sets a Condition Code based on comparing all elements of one register against a second vector as a mask: if all masked-comparisons are all-zero, if all are all-ones or a mix of both. - Vector GF(2) multiply and multiply-accumulate,
vgfm
[21], known as Carryless multiply
ARM
- ARM11 has bitwise test-ANDed (a bitmasked test) and test-XOR, standard logical bitwise operations including OR-complement; byte halfword and bit-reversing, and conditional byte-selection/merging. Shift and rotate are available on Operand2.[22]
- ARM Cortex-A has bit-field set, clear, extract and reverse.[23]
- ARM A64 has SWAR-style half-word byte-swapping, bit-field insert and extract, and bit-reversing.[24]
RISC-V
In the standard extensions RISC-V has scalar bitwise operations including shift and arithmetic shift, but no rotate. The omissions are compensated for with additional extensions.
- RISC-V Zb* extensions contain a significant number of bit manipulation instructions.[25] The four groups are broken down into useful categories (the integer subset has min/max, rotate and Popcount for example), and have very good researched justifications for their inclusion and the improvements they bring.[26]
- The RISC-V Vector Extension (RVV) has instructions that qualify as hardware-level bit manipulation, but on Vector masks rather than Scalar registers as is normally the case. For example, a Vector-mask Popcount is available.[27] RVV also has per-element bitwise operations.[28]
Embedded Microcontrollers
Intel
MOS 6502
- The WDC 65C02 added bit-manipulation: set, reset and test on individual bits.
- Rockwell added similar extensions (RMB, SMB, BBR and BBS) to the R65C00 series[32]
Atmel PICs
- The Atmel PIC range also has bitwise operations and set, clear and test bit, listed in the instructions.
others
- Texas Instruments DSPs such as the TMS320C6000 series have set, clear, invert, test, extract and insert bit (or bit-field) instructions.[33]
- The TX-2 from 1958 had "skip on bit" predication, as well as set, clear, invert and permute bits, and shift and other bitwise operations.[34][35]
Remove ads
See also
- Find first set – Family of related bitwise operations on machine words
- Bitwise operation – Computer science topic
- Popcount – Number of nonzero symbols in a string
- Count leading zeros – Family of related bitwise operations on machine words
- Mask (computing) – Data used for bitwise operations
- Binary-coded decimal – System of digitally encoding numbers
- CLMUL instruction set – Extension to the x86 instruction set
- Bitwise ternary logic instruction – Bitwise ternary logic (3-way boolean function)
Remove ads
References
Wikiwand - on
Seamless Wikipedia browsing. On steroids.
Remove ads