A variant of PPO developed by DeepSeek. References DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models