Summary
The Gemma 3 models range from 1B to 27B, have a context window up to 128k tokens, can accept images and text, and support 140+ languages.
Compared to Gemma 2, Gemma 3:
- has a longer context length
- is multimodal
- is multilingual
The 1B version is limited to:
- 32k tokens
- text only
- English only
Longer Context Length
The models start with a 32k sequence length in pre-training and then the larger variants are scaled to 128k tokens by adjusting the RoPE scale factor
References