Gemma (language model)

Gemma
Developer	Google DeepMind
Initial release	February 21, 2024; 21 months ago (2024-02-21)^[1]

Stable release	Gemma 3 / March 12, 2025; 8 months ago (2025-03-12)^[2]

Type	Large language model
License	Gemma License
Website	deepmind.google/models/gemma/

Based on similar technologies as the Gemini series of models, Gemma is described by Google as helping support its mission of "making AI helpful for everyone."^[8] Google offers official Gemma variants optimized for specific use cases, such as MedGemma for medical analysis.^[9]

Since its release, Gemma models have had over 150 million downloads, with 70,000 variants available on Hugging Face.^[10]

The latest generation of models is Gemma 3, offered in 1, 4, 12, and 27 billion parameter sizes with support for over 140 languages. As multimodal models, they support both text and image input.^[11] Google also offers Gemma 3n, smaller models optimized for execution on consumer devices like phones, laptops, and tablets.^[12]

Architecture

The latest version of Gemma, Gemma 3, is based on a decoder-only transformer architecture with grouped-query attention (GQA) and the SigLIP vision encoder. Every model has a context length of 128K, with the exception of Gemma 3 1B, which has a context length of 32K.^[13]

Quantized versions fine-tuned using quantization-aware training (QAT) are also available,^[13] offering sizable memory usage improvements with some negative impact on accuracy and precision.^[14]

Variants

Google develops official variants of Gemma models designed for specific purposes, like medical analysis or programming. These include:

ShieldGemma 2 (4B): Based on the Gemma 3 family, ShieldGemma is designed to identify and filter violent, dangerous, and sexually explicit images.^[15]
MedGemma (4B and 27B): Also based on Gemma 3, MedGemma is designed for medical applications like image analysis. However, Google also notes that MedGemma "isn't yet clinical grade."^[16] Developers at Tap Health in Gurgaon, India, have used MedGemma to enhance AI-assisted diabetes management applications.^[17]
DolphinGemma (roughly 400M): Developed in collaboration with researchers at Georgia Tech and the Wild Dolphin Project, DolphinGemma aims to better understand dolphin communication through audio analysis. However, no model or data have been publicly released.^[18]^[19]
CodeGemma (2B and 7B): CodeGemma is a group of models designed for code completion as well as general coding use.^[20] It supports multiple programming languages, including Python, Java, C++, and more.^[21]

More information Generation, Release date ...

Technical specifications of Gemma models
Generation	Release date	Parameters	Context length	Multimodal	Notes
Gemma 1	21 February 2024	2B, 7B	8,192	No	2B distilled from 7B. 2B uses multi-query attention while 7B uses multi-head attention.
CodeGemma		2B, 7B	8,192	No	Gemma 1 finetuned for code generation.
RecurrentGemma	11 April 2024	2B, 9B	Unlimited (trained on 8,192)	No	Griffin-based, instead of Transformer-based.^[22]
Gemma 2	27 June 2024	2B, 9B, 27B	8,192	No	27B trained from web documents, code, science articles. Gemma 2 9B was distilled from 27B. Gemma 2 2B was distilled from a 7B model that remained unreleased. Uses Grouped-Query Attention.^[23]
PaliGemma	10 July 2024	3B	8,192	Image	A vision-language model that takes text and image inputs, and outputs text. It is made by connecting a SigLIP-So400m image encoder with Gemma v1.0 2B.^[24]^[25]
PaliGemma 2	4 December 2024	3B, 10B, 28B	8,192	Image	Made by mating SigLIP-4o400m with Gemma v2.0 2B, 9B, and 27B. Capable of more vision-language tasks.^[26]^[27]
Gemma 3	12 March 2025	1B, 4B, 12B, 27B	131,072	Image	All models trained with distillation. Post-training focuses on math, coding, chat, instruction following, and multilingual (supports 140 languages). Capable of function calling. 1B is not capable of vision.^[28]

Note: open-weight models can have their context length rescaled at inference time. With Gemma 1, Gemma 2, PaliGemma, and PaliGemma 2, the cost is a linear increase of kv-cache size relative to context window size. With Gemma 3 there is an improved growth curve due to the separation of local and global attention. With RecurrentGemma the memory use is unchanged after 2,048 tokens.

Gemma (language model)

History

Overview

Architecture

Variants

References

External links

Wikiwand - on