ML Notes
Search
Search
Dark mode
Light mode
Explorer
home
❯
training
❯
post training
❯
reinforcement learning
Folder: training/post-training/reinforcement-learning
4 items under this folder.
07 Apr 2025
Direct Preference Optimisation (DPO)
07 Apr 2025
Group Relative Policy Optimisation (GRPO)
07 Apr 2025
Proximal Policy Optimisation (PPO)
07 Apr 2025
Trust Region Policy Optimisation (TRPO)