ML Notes

home

❯

training

❯

post training

❯

reinforcement learning

❯

Group Relative Policy Optimisation (GRPO)

Group Relative Policy Optimisation (GRPO)

07 Apr 20251 min read

A variant of PPO developed by DeepSeek.

References

  • DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Graph View

Backlinks

  • DeepSeek-R1
  • DeepSeekMath

Created with Quartz v4.5.0 © 2025