1.58-bit large language model
Version of a transformer large language model From Wikipedia, the free encyclopedia
A 1.58-bit Large Language Model (1.58-bit LLM, also ternary LLM) is a version of a transformer large language model with weights using only three values: -1, 0, and +1. This restriction theoretically allows the model to replace costly multiplications with additions and reduce the storage memory. Since the end-task performance and perplexity of the 1.58-bit LLMs, at least for smaller model sizes (up to 3-4B parameters), are close to their "full precision" (16-bit FP16 or BF16) counterparts, this design allows reaching the same artificial intelligence goals with much lower hardware requirements, latency, and training effort.[1][2][3]
The name comes from a fact that a single trit, a ternary arithmetic equivalent of a bit that can take the {-1, 0, 1} values, carries bits of information. The 1.58-bit LLM models are also called 1-bit LLMs[1][4] (the true 1-bit models also exist).
BitNet
In 2024, Ma et al., researchers at Microsoft, declared that their 1.58-bit model, BitNet b1.58 is comparable in performance to the 16-bit Llama 2 and opens the era of 1-bit LLM.[5] BitNet creators did not use the post-training quantization of weights but instead relied on the new BitLinear transform that replaced the nn.Linear layer of the traditional transformer design.[6]
In 2025, Microsoft researchers had released an open-weights and open inference code model BitNet b1.58 2B4T demonstrating performance competitive to the full precision models at 2B parameters and 4T training tokens.[7]
Critique
Some researchers[8] point out that the scaling laws[9] of large language models favor the low-bit weights only in case of undertrained models. As the number of training tokens increases, the deficiencies of low-bit quantization surface.
References
Sources
Wikiwand - on
Seamless Wikipedia browsing. On steroids.