ML Notes

❯

❯

❯

reinforcement learning

❯

Group Relative Policy Optimisation (GRPO)

Group Relative Policy Optimisation (GRPO)

07 Apr 20251 min read

A variant of PPO developed by DeepSeek.

References

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Graph View

Backlinks

DeepSeek-R1
DeepSeekMath

Created with Quartz v4.5.0 © 2025