Grouped-query attention is an interpolation of MQA and MHA that achieves quality close to MHA at comparable speed to MQA. References Papers With Code GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints