Top Qs
Timeline
Chat
Perspective
PaLM
Large language model developed by Google From Wikipedia, the free encyclopedia
Remove ads
PaLM (Pathways Language Model) is a 540 billion-parameter dense decoder-only transformer-based large language model (LLM) developed by Google AI.[1] Researchers also trained smaller versions of PaLM (with 8 and 62 billion parameters) to test the effects of model scale.[2]
Remove ads
Model
Summarize
Perspective
PaLM is capable of a wide range of tasks, including commonsense reasoning, arithmetic reasoning, joke explanation, code generation, and translation.[2][3][4][5] When combined with chain-of-thought prompting, PaLM achieved significantly better performance on datasets requiring reasoning of multiple steps, such as word problems and logic-based questions.[1][2]
The model was first announced in April 2022 and remained private until March 2023, when Google launched an API for PaLM and several other technologies.[6] The API was initially available to a limited number of developers who joined a waitlist before it was released to the public.[7]
Google and DeepMind developed a version of PaLM 540B (the parameter count, 540 billion), called Med-PaLM, that is fine-tuned on medical data and outperforms previous models on medical question answering benchmarks.[8][9] Med-PaLM was the first to obtain a passing score on U.S. medical licensing questions, and in addition to answering both multiple choice and open-ended questions accurately, it also provides reasoning and is able to evaluate its own responses.[10]
Google also extended PaLM using a vision transformer to create PaLM-E, a state-of-the-art vision-language model that can be used for robotic manipulation.[11][12] The model can perform tasks in robotics competitively without the need for retraining or fine-tuning.[13]
In May 2023, Google announced PaLM 2 at the annual Google I/O keynote.[14] PaLM 2 is reported to be a 340 billion-parameter model trained on 3.6 trillion tokens.[15]
In June 2023, Google announced AudioPaLM for speech-to-speech translation, which uses the PaLM-2 architecture and initialization.[16]
Remove ads
Training
PaLM is pre-trained on a high-quality corpus of 780 billion tokens that comprise various natural language tasks and use cases. This dataset includes filtered webpages, books, Wikipedia articles, news articles, source code obtained from open source repositories on GitHub, and social media conversations.[1][2] It is based on the dataset used to train Google's LaMDA model.[2] The social media conversation portion of the dataset makes up 50% of the corpus, which aids the model in its conversational capabilities.[2]
PaLM 540B was trained over two TPU v4 Pods with 3,072 TPU v4 chips in each Pod attached to 768 hosts, connected using a combination of model and data parallelism, which was the largest TPU configuration.[2][17] This allowed for efficient training at scale, using 6,144 chips, and marked a record for the highest training efficiency achieved for LLMs at this scale: a hardware FLOPs utilization of 57.8%.[3]
Remove ads
See also
- LaMDA, PaLM's predecessor
- Gemini, PaLM's successor
- Chinchilla
References
Wikiwand - on
Seamless Wikipedia browsing. On steroids.
Remove ads

