热门问题
时间线
聊天
视角
迷你浮点数
来自维基百科,自由的百科全书
Remove ads
迷你浮点(minifloats)是用少位元浮点数值。不太适合通用数值计算。通常用于特殊目的,像电脑图形学,迭代很小并且精度具有美学效果。[1]机器学习也使用类似格式,如bfloat16。
Minifloats按照IEEE 754标准设计。必须遵守次正规数和正规数之间边界规则(未明确写入的),且具无限大和 NaN 特殊模式。标准化数字以有偏差的指数储存。该标准的新修订版 IEEE 754-2008 具有 16 位元二进制小型浮点数。
符号
迷你浮点通常使用四个数字的元组(S、E、M、B)来描述:
- S是符号栏位的长度。通常为 0 或 1。
- E是指数栏位的长度。
- M是尾数(有效数字)栏位的长度。
- B是指数偏差。
因此,以 (S, E, M, B) 表示的小型浮点格式为 S + E + M 位元。 (S, E, M, B) 表示法可以转换为 (B, P, L, U) 格式,如 (2, M + 1, B + 1, 2 S − B) (IEEE指数)。
例子
具有1个符号位元、4个指数位元和3个有效位元[2][3]对于大多数值指数x是2x−7。所有IEEE 754原则都应该有效。 [4]
0 0000 000 = 0 1 0000 000 = −0
有效数用0.扩展:
0 0000 001 = 0.0012 × 21 - 7 = 0.125 × 2-6 = 0.001953125 (最小次正規數) ... 0 0000 111 = 0.1112 × 21 - 7 = 0.875 × 2-6 = 0.013671875 (最大次正規數)
有效数用1.扩展:
0 0001 000 = 1.0002 × 21 - 7 = 1 × 2-6 = 0.015625 (least normalized number) 0 0001 001 = 1.0012 × 21 - 7 = 1.125 × 2-6 = 0.017578125 ... 0 0111 000 = 1.0002 × 27 - 7 = 1 × 20 = 1 0 0111 001 = 1.0012 × 27 - 7 = 1.125 × 20 = 1.125 (最小值大於 1) ... 0 1110 000 = 1.0002 × 214 - 7 = 1.000 × 27 = 128 0 1110 001 = 1.0012 × 214 - 7 = 1.125 × 27 = 144 ... 0 1110 110 = 1.1102 × 214 - 7 = 1.750 × 27 = 224 0 1110 111 = 1.1112 × 214 - 7 = 1.875 × 27 = 240 (最大標準數)
Remove ads
0 1111 000 = +∞ 1 1111 000 = −∞
s 1111 mmm = NaN (if mmm ≠ 000)
这是此范例 8 位元浮点的所有可能值的图表。
只有 242 个不同的非 NaN 值(如果 +0 和 -0 视为不同),因为 14 个位元模式代表 NaN。
Remove ads
在这些小尺寸下,其它偏差值可能会很有趣,例如 -2 的偏差将使数字 0-16 具有与整数 0-16 相同的位表示形式,但会导致无法表示非整数值。
0 0000 000 = 0.0002 × 21 - (-2) = 0.0 × 23 = 0 (subnormal number) 0 0000 001 = 0.0012 × 21 - (-2) = 0.125 × 23 = 1 (subnormal number) 0 0000 111 = 0.1112 × 21 - (-2) = 0.875 × 23 = 7 (subnormal number) 0 0001 000 = 1.0002 × 21 - (-2) = 1.000 × 23 = 8 (normalized number) 0 0001 111 = 1.1112 × 21 - (-2) = 1.875 × 23 = 15 (normalized number) 0 0010 000 = 1.0002 × 22 - (-2) = 1.000 × 24 = 16 (normalized number)
Arithmetic

此图示范了增加较小的 (1.3.2.3)-6 位元迷你浮点。
此浮点系统完全遵循IEEE 754规则。
NaN作为算子始终产生NaN结果。
∞−∞和 (−∞) +∞会产生 NaN(绿)。∞可以按有限值增减而不会发生变化。
有限操作数的和可以给出无限结果(即 14.0 + 3.0 = +∞,因为结果是青,-∞红)。
算术运算可以类似地说明:
-
减法
-
乘法
-
除法
已隐藏部分未翻译内容,欢迎参与翻译。
Other sizes
The Radeon R300 and R420 GPUs used an "fp24" floating-point format with 7 bits of exponent and 16 bits (+1 implicit) of mantissa.[5] "Full Precision" in Direct3D 9.0 is a proprietary 24-bit floating-point format. Microsoft's D3D9 (Shader Model 2.0) graphics API initially supported both FP24 (as in ATI's R300 chip) and FP32 (as in Nvidia's NV30 chip) as "Full Precision", as well as FP16 as "Partial Precision" for vertex and pixel shader calculations performed by the graphics hardware.
Khronos defines 10-bit and 11-bit float formats for use with Vulkan. Both formats have no sign bit and a 5-bit exponent. The 10-bit format has a 5-bit mantissa, and the 11-bit format has a 6-bit mantissa.[6][7]
4 bits and fewer
The smallest possible float size that follows all IEEE principles, including normalized numbers, subnormal numbers, signed zero, signed infinity, and multiple NaN values, is a 4-bit float with 1-bit sign, 2-bit exponent, and 1-bit mantissa.[8] In the table below, the columns have different values for the sign and mantissa bits, and the rows are different values for the exponent bits.
If normalized numbers are not required, the size can be reduced to 3-bit by reducing the exponent down to 1.
In situations where the sign bit can be excluded, each of the above examples can be reduced by 1 bit further, keeping only the left half of the above tables. A 2-bit float with 1-bit exponent and 1-bit mantissa would only have 0, 1, Inf, NaN values.
If the mantissa is allowed to be 0-bit, a 1-bit float format would have a 1-bit exponent, and the only two values would be 0 and Inf. The exponent must be at least 1 bit or else it no longer makes sense as a float (it would just be a signed number).
In embedded devices
Minifloats are also commonly used in embedded devices,[来源请求] especially on microcontrollers where floating-point will need to be emulated in software. To speed up the computation, the mantissa typically occupies exactly half of the bits, so the register boundary automatically addresses the parts without shifting.
Remove ads
参见
参考
外部链接
Wikiwand - on
Seamless Wikipedia browsing. On steroids.
Remove ads