DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Вставка
- Опубліковано 4 лют 2025
- Mathematical reasoning poses a significant challenge for language models due to its complex and structured nature. In this paper, we introduce DeepSeekMath 7B, which continues pre-training DeepSeek-Coder-Base-v1.5 7B with 120B math-related tokens sourced from Common Crawl, together with natural language and code data. DeepSeekMath 7B has achieved an impressive score of 51.7% on the competition-level MATH benchmark without relying on external toolkits and voting techniques, approaching the performance level of Gemini-Ultra and GPT-4. Self-consistency over 64 samples from DeepSeekMath 7B achieves 60.9% on MATH. The mathematical reasoning capability of DeepSeekMath is attributed to two key factors: First, we harness the significant potential of publicly available web data through a meticulously engineered data selection pipeline. Second, we introduce Group Relative Policy Optimization (GRPO), a variant of Proximal Policy Optimization (PPO), that enhances mathematical reasoning abilities while concurrently optimizing the memory usage of PPO.
arxiv.org/abs/...
NotebookLM podcast conversation. Great analysis. It's always the deep dive. And the pattern is always the same. With some nice soundbites. Nice voices.
DeepSeek is performing better on math questions than ChatGPT about a year ago. I did some comparative testing. But it went wrong on the last question. The last answer was "The server is busy. Please try again later." That was a wrong answer. On that question ChatGPT performed better.
Makes you wonder what the hell we’re doing.. we’re building something that can replace us in every way. Why are we doing this?? We’re leaving these decisions to CEO’s???
Wait is this a conversation between actual people or AI bots?
AI
The windows sounds in the background hehe
Can use chat.deepseek.com/ explain the final unified formulation presented in the paper's snapshot.