rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Поділитися
Вставка
  • Опубліковано 17 січ 2025
  • arxiv: arxiv.org/pdf/...
    GitHub: github.com/mic... (not yet available)
    Overview:
    rStar-Math is a novel approach demonstrating that small language models (SLMs), through self-evolution and deep thinking, can achieve or even surpass the mathematical reasoning capabilities of larger models like OpenAI's o1. This is achieved without relying on distillation from superior models.
    Key Innovations:
    1. Code-Augmented CoT Data Synthesis: This method uses Monte Carlo Tree Search (MCTS) to generate step-by-step verified reasoning trajectories, ensuring high-quality data for training the policy SLM.
    2. Process Reward Model (PPM): A novel training method that avoids naïve step-level score annotation, providing a more effective process preference model for evaluating reasoning steps.
    3. Self-Evolution Process: The policy SLM and PPM are iteratively evolved from scratch, leading to improved reasoning capabilities over successive rounds.
    Capabilities and Performance:
    rStar-Math significantly boosts math reasoning in SLMs to state-of-the-art levels. For instance, it improves Qwen2.5-Math-7B from 58.8% to 90.0% on the MATH benchmark, surpassing OpenAI o1-preview by 4.5%.
    On the USA Math Olympiad (AIME), rStar-Math solves an average of 53.3% of problems, ranking among the top 20% of high school math students.
    Applications:
    Education: Enhancing educational tools for math learning by providing more accurate and reliable problem-solving capabilities.
    Research: Facilitating advanced mathematical research where complex problem-solving is required.
    Business: Improving decision-making algorithms in finance and engineering that rely on complex mathematical computations.
    Findings and Discussions:
    rStar-Math exhibits intrinsic self-reflection capabilities, allowing it to identify and correct errors during problem-solving, a feature that has been challenging to achieve in open-source LLMs.
    The PPM effectively identifies critical theorem-application steps, guiding the policy model towards correct solutions.
    The approach shows potential for generalization to other domains like code and commonsense reasoning, given the appropriate feedback mechanisms.
    Conclusion:
    rStar-Math represents a significant advancement in the capabilities of small language models for mathematical reasoning. By leveraging self-evolution and deep thinking, it sets a new standard for what can be achieved without relying on larger, more resource-intensive models. The findings suggest promising directions for further research and application in various fields requiring sophisticated reasoning capabilities
    Created with o1, gpt-4o and tts-hd #azureopenai

КОМЕНТАРІ •