Top Qs
Timeline
Chat
Perspective
BLOOM (language model)
Multilingual open-access large language model From Wikipedia, the free encyclopedia
Remove ads
The BigScience Large Open-science Open-access Multilingual Language Model (BLOOM) is an open-access large language model (LLM).[1] It was created by a volunteer-driven research effort to provide a transparently-created alternative to proprietary AI models.[2]
![]() | This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages)
|
With 176 billion parameters, BLOOM is a transformer-based autoregressive model designed to generate text in 46 natural languages and 13 programming languages. The model, source code, and the data used to train it are all distributed under free licences, allowing for public research and use.[3][4]
Remove ads
Development
BLOOM is the main outcome of the BigScience initiative, a one-year-long research workshop that took place from May 2021 to May 2022.[5] The project was led by HuggingFace and involved several hundred volunteer researchers and engineers from academia and the private sector. The model was trained between March and July 2022 on the Jean Zay public supercomputer in France, managed by GENCI and IDRIS (CNRS).[6]
BLOOM's training corpus, named ROOTS, combines data extracted from the then-latest version of the web-based OSCAR corpus (38% of ROOTS) and newly collected data extracted from a manually selected and documented list of language data sources. In total, the model was trained on approximately 366 billion (1.6TB) tokens.[7][8]
Remove ads
External links
References
Wikiwand - on
Seamless Wikipedia browsing. On steroids.
Remove ads