ML Notes

home

❯

training

❯

post training

❯

reinforcement learning

Folder: training/post-training/reinforcement-learning

4 items under this folder.

  • 07 Apr 2025

    Direct Preference Optimisation (DPO)

    • 07 Apr 2025

      Group Relative Policy Optimisation (GRPO)

      • 07 Apr 2025

        Proximal Policy Optimisation (PPO)

        • 07 Apr 2025

          Trust Region Policy Optimisation (TRPO)


          Created with Quartz v4.5.0 © 2025