DeepSeekMath is a mathematical reasoning model what continues pre-training DeepSeek-Coder-Base-v1.5 7B with 120B math-related tokens sourced from common-crawl, together with natural language and code data.
The mathematical reasoning capability of DeepSeekMath is attributed to two key factors:
- The significant potential of publicly available web data through a meticulously engineered data selection pipeline.
- The introduction of GRPO, a variant of PPO, that enhances mathematical reasoning abilities while concurrently optimising the memory usage of PPO.
References