Summary

The Gemma 3 models range from 1B to 27B, have a context window up to 128k tokens, can accept images and text, and support 140+ languages.

Compared to Gemma 2, Gemma 3:

  • has a longer context length
  • is multimodal
  • is multilingual

The 1B version is limited to:

  • 32k tokens
  • text only
  • English only

Longer Context Length

The models start with a 32k sequence length in pre-training and then the larger variants are scaled to 128k tokens by adjusting the RoPE scale factor

References