ML Notes

❯

❯

❯

Misc

13 Mar 20251 min read

Post-training either takes the form of supervised fine-tuning (SFT) or reinforcement learning (RL).

Misc

Tulu 3 - Frontier level post training recipe

In conversational fine-tuning:

New tokens for [user] and [assistant] https://arxiv.org/abs/2305.11206 improves task performance

Graph View

Created with Quartz v4.5.0 © 2025