ML Notes

home

❯

modules

❯

attention

❯

Multi Query Attention (MQA)

Multi-Query Attention (MQA)

14 Mar 20251 min read

MHA consists of multiple attention layers (heads) in parallel with different linear transformations on the queries, keys, values and outputs. Multi-query attention is identical except that the different heads share a single set of keys and values.

References

  • Papers With Code
  • Fast Transformer Decoding: One Write-Head is All You Need

Graph View

Backlinks

  • DeepSeek-V2
  • Llama 2
  • Grouped-Query Attention (GQA)

Created with Quartz v4.5.0 © 2025