Post-training either takes the form of supervised fine-tuning (SFT) or reinforcement learning (RL).

Misc

  • Tulu 3 - Frontier level post training recipe

In conversational fine-tuning: