FlashAttention optimised the execution of SDPA by fusing its component operations into a single kernel.

FlashAttention can be used to increase the context length during Training and for increasing the sequence length during generative inference.

FlashAttention-2 is an iteration on FlashAttention that improves it’s performance .

References