Post-training either takes the form of supervised fine-tuning (SFT) or reinforcement learning (RL).
Misc
- Tulu 3 - Frontier level post training recipe
In conversational fine-tuning:
- New tokens for [user] and [assistant] https://arxiv.org/abs/2305.11206 improves task performance