FlashAttention optimised the execution of SDPA by fusing its component operations into a single kernel.
FlashAttention can be used to increase the context length during Training and for increasing the sequence length during generative inference.
FlashAttention-2 is an iteration on FlashAttention that improves it’s performance .
References