DeepSeekMath

DeepSeekMath is a mathematical reasoning model what continues pre-training DeepSeek-Coder-Base-v1.5 7B with 120B math-related tokens sourced from common-crawl, together with natural language and code data.

The mathematical reasoning capability of DeepSeekMath is attributed to two key factors:

The significant potential of publicly available web data through a meticulously engineered data selection pipeline.
The introduction of GRPO, a variant of PPO, that enhances mathematical reasoning abilities while concurrently optimising the memory usage of PPO.

References

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

ML Notes

Explorer

DeepSeekMath

Graph View

Backlinks