This paper builds upon the tensor parallel scheme introduced by Megatron-LM by adding two additional techniques:
- Sequence parallelism
- Selective activation recomputation
References
This paper builds upon the tensor parallel scheme introduced by Megatron-LM by adding two additional techniques:
References