Agent Laboratory : An Autonomous LLM Based Research Framework #amd #johnhopkins

Best of CES 2025

How to Stop Intrusive Voices | Dr. Ethan Kross & Dr. Andrew Huberman

Удержаться на воде?? 🌊 #симбочкапимпочка #симбочка #симба

ДИЗЕЛЬ ШОУ 2024 🇺🇦 ❄️ ЗИМОВА ПРЕМ'ЄРА ❄️ 🇺🇦 ВИПУСК 154 на підтримку ЗСУ ⭐ Гумор ICTV від 13.12.2024

НА ЦЕ можна дивитись ВІЧНО! Такої ПАЛКОЇ зустрічі НІХТО НЕ ЧЕКАВ

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Srikanth Bhakthan

Переглядів 96

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 17 січ 2025
arxiv: arxiv.org/pdf/...
GitHub: github.com/mic... (not yet available)
Overview:
rStar-Math is a novel approach demonstrating that small language models (SLMs), through self-evolution and deep thinking, can achieve or even surpass the mathematical reasoning capabilities of larger models like OpenAI's o1. This is achieved without relying on distillation from superior models.
Key Innovations:
1. Code-Augmented CoT Data Synthesis: This method uses Monte Carlo Tree Search (MCTS) to generate step-by-step verified reasoning trajectories, ensuring high-quality data for training the policy SLM.
2. Process Reward Model (PPM): A novel training method that avoids naïve step-level score annotation, providing a more effective process preference model for evaluating reasoning steps.
3. Self-Evolution Process: The policy SLM and PPM are iteratively evolved from scratch, leading to improved reasoning capabilities over successive rounds.
Capabilities and Performance:
rStar-Math significantly boosts math reasoning in SLMs to state-of-the-art levels. For instance, it improves Qwen2.5-Math-7B from 58.8% to 90.0% on the MATH benchmark, surpassing OpenAI o1-preview by 4.5%.
On the USA Math Olympiad (AIME), rStar-Math solves an average of 53.3% of problems, ranking among the top 20% of high school math students.
Applications:
Education: Enhancing educational tools for math learning by providing more accurate and reliable problem-solving capabilities.
Research: Facilitating advanced mathematical research where complex problem-solving is required.
Business: Improving decision-making algorithms in finance and engineering that rely on complex mathematical computations.
Findings and Discussions:
rStar-Math exhibits intrinsic self-reflection capabilities, allowing it to identify and correct errors during problem-solving, a feature that has been challenging to achieve in open-source LLMs.
The PPM effectively identifies critical theorem-application steps, guiding the policy model towards correct solutions.
The approach shows potential for generalization to other domains like code and commonsense reasoning, given the appropriate feedback mechanisms.
Conclusion:
rStar-Math represents a significant advancement in the capabilities of small language models for mathematical reasoning. By leveraging self-evolution and deep thinking, it sets a new standard for what can be achieved without relying on larger, more resource-intensive models. The findings suggest promising directions for further research and application in various fields requiring sophisticated reasoning capabilities
Created with o1, gpt-4o and tts-hd #azureopenai

КОМЕНТАРІ •

Наступне

Автоматичне відтворення

Agent Laboratory : An Autonomous LLM Based Research Framework #amd #johnhopkins

Agent Laboratory : An Autonomous LLM Based Research Framework #amd #johnhopkins

Best of CES 2025

Best of CES 2025

How to Stop Intrusive Voices | Dr. Ethan Kross & Dr. Andrew Huberman

How to Stop Intrusive Voices | Dr. Ethan Kross & Dr. Andrew Huberman

Удержаться на воде?? 🌊 #симбочкапимпочка #симбочка #симба

Удержаться на воде?? 🌊 #симбочкапимпочка #симбочка #симба

ДИЗЕЛЬ ШОУ 2024 🇺🇦 ❄️ ЗИМОВА ПРЕМ'ЄРА ❄️ 🇺🇦 ВИПУСК 154 на підтримку ЗСУ ⭐ Гумор ICTV від 13.12.2024

ДИЗЕЛЬ ШОУ 2024 🇺🇦 ❄️ ЗИМОВА ПРЕМ'ЄРА ❄️ 🇺🇦 ВИПУСК 154 на підтримку ЗСУ ⭐ Гумор ICTV від 13.12.2024

НА ЦЕ можна дивитись ВІЧНО! Такої ПАЛКОЇ зустрічі НІХТО НЕ ЧЕКАВ

НА ЦЕ можна дивитись ВІЧНО! Такої ПАЛКОЇ зустрічі НІХТО НЕ ЧЕКАВ

Разобрался голыми руками 😎 #start #кино #фильм #сериал #молотведьм #полиция #пацаны

Разобрался голыми руками 😎 #start #кино #фильм #сериал #молотведьм #полиция #пацаны

Mind-Blowing Humanoid Robot Walked Outside (The Internet Exploded)

Mind-Blowing Humanoid Robot Walked Outside (The Internet Exploded)

Are you using a Hacked AI system?

Are you using a Hacked AI system?

Ultimate NotebookLM Guide (Google's AI Note-Taking App)

Ultimate NotebookLM Guide (Google's AI Note-Taking App)

It Happened! Elon Musk Reveals Incredible Features Of Tesla Bot Gen 3 2025! Destroy ALL Rivals!

It Happened! Elon Musk Reveals Incredible Features Of Tesla Bot Gen 3 2025! Destroy ALL Rivals!

Why I Sold My New Tesla Model Y: The Honest Truth

Why I Sold My New Tesla Model Y: The Honest Truth

Windsurf IDE UPDATE: AI Editor - Cursor Alternative That's FREE & LOCAL!(Autonomous, Web Commands)

Windsurf IDE UPDATE: AI Editor - Cursor Alternative That's FREE & LOCAL!(Autonomous, Web Commands)

Best Ways to Use Gemini 2.0 (over ChatGPT & Perplexity)!

Best Ways to Use Gemini 2.0 (over ChatGPT & Perplexity)!

Inference Time Scaling for Medical Reasoning in LLMs - o1 Replication Journey

Inference Time Scaling for Medical Reasoning in LLMs - o1 Replication Journey

Large Language Models for Reasoning: A Survey #emoryuniversity #tsinghua #hkust

Large Language Models for Reasoning: A Survey #emoryuniversity #tsinghua #hkust

Тайское мороженое в Калининграде

Тайское мороженое в Калининграде

The Witcher IV - Cinematic Reveal Trailer | The Game Awards 2024

The Witcher IV — Cinematic Reveal Trailer | The Game Awards 2024

Прочистка шлюзов

Прочистка шлюзов

Нельзя смеяться | Смех с водой | 97 #shorts

Нельзя смеяться | Смех с водой | 97 #shorts

😳Трамп ПОТІШИВ Скабєєву, але одразу РОЗЧАРУВАВ #shorts

😳Трамп ПОТІШИВ Скабєєву, але одразу РОЗЧАРУВАВ #shorts

Хто такий РОМАН СВІТАН? Звідки бере інформацію про фронт?

Хто такий РОМАН СВІТАН? Звідки бере інформацію про фронт?

"ВСЯ УЛИЦА полетела" - курянка про обстріли рф

"ВСЯ УЛИЦА полетела" — курянка про обстріли рф

REAL or FAKE? #beatbox #tiktok

REAL or FAKE? #beatbox #tiktok