ML Notes

home

❯

modules

❯

attention

❯

Grouped Query Attention (GQA)

Grouped-Query Attention (GQA)

14 Mar 20251 min read

Grouped-query attention is an interpolation of MQA and MHA that achieves quality close to MHA at comparable speed to MQA.

References

  • Papers With Code
  • GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Graph View

Backlinks

  • DeepSeek-V2
  • Gemma 2
  • Llama 2
  • Llama 3

Created with Quartz v4.5.0 © 2025