1.58-bit large language model

A 1.58-bit large language model (also known as a ternary LLM) is a type of large language model (LLM) designed to be computationally efficient. It achieves this by using weights that are restricted to only three values: -1, 0, and +1. This restriction significantly reduces the model's memory footprint and allows for faster processing, as complex multiplication operations can be replaced with simpler additions. This contrasts with traditional models that use 16-bit floating-point numbers (FP16 or BF16) for their weights.

Studies have shown that for models up to several billion parameters, the performance of 1.58-bit LLMs on various tasks is comparable to their full-precision counterparts.^[1]^[2] This approach could enable powerful AI to run on less specialized and lower-power hardware.^[3]

The name "1.58-bit" comes from the fact that a system with three states contains $\log _{2}3\approx 1.58$ bits of information. These models are sometimes also referred to as 1-bit LLMs in research papers, although this term can also refer to true binary models (with weights of -1 and +1).^[1]^[4]

[1]

[2]

[3]

[4]

1.58-bit large language model

BitNet

Post-training quantization

Critique

References

Sources

Wikiwand - on