Teaching Large Language Models to Reason with Reinforcement Learning with Alex Havrilla - 680

Поділитися
Вставка
  • Опубліковано 14 чер 2024
  • Today we're joined by Alex Havrilla, a PhD student at Georgia Tech, to discuss "Teaching Large Language Models to Reason with Reinforcement Learning." Alex discusses the role of creativity and exploration in problem solving and explores the opportunities presented by applying reinforcement learning algorithms to the challenge of improving reasoning in large language models. Alex also shares his research on the effect of noise on language model training, highlighting the robustness of LLM architecture. Finally, we delve into the future of RL, and the potential of combining language models with traditional methods to achieve more robust AI reasoning.
    🔔 Subscribe to our channel for more great content just like this: ua-cam.com/users/twimlai?sub_confi...
    🗣️ CONNECT WITH US!
    ===============================
    Subscribe to the TWIML AI Podcast: twimlai.com/podcast/twimlai/
    Join our Slack Community: twimlai.com/community/
    Subscribe to our newsletter: twimlai.com/newsletter/
    Want to get in touch? Send us a message: twimlai.com/contact/
    Follow us on Twitter: / twimlai
    Follow us on LinkedIn: / twimlai
    📖 CHAPTERS
    ===============================
    00:00 - Introduction
    02:19 - RL vs RLHF
    06:22 - The state of RL
    07:31 - Path to online learning
    11:04 - Teaching LLMs to reason with RL
    31:10 - ARB
    34:45 - The importance of storing information
    35:15 - Static and dynamic noise
    45:06 - Conclusion
    🔗 LINKS & RESOURCES
    ===============================
    Teaching Large Language Models to Reason with Reinforcement Learning - arxiv.org/abs/2403.04642
    ARB: Advanced Reasoning Benchmark for Large Language Models - arxiv.org/pdf/2307.13692.pdf
    Proximal Policy Optimization Algorithms - arxiv.org/abs/1707.06347
    Prioritized Level Replay - arxiv.org/pdf/2010.03934.pdf
    Direct Preference Optimization: Your Language Model is Secretly a Reward Model - arxiv.org/pdf/2305.18290.pdf
    trlX documentation - trlx.readthedocs.io/en/latest/
    📸 Camera: amzn.to/3TQ3zsg
    🎙️Microphone: amzn.to/3t5zXeV
    🚦Lights: amzn.to/3TQlX49
    🎛️ Audio Interface: amzn.to/3TVFAIq
    🎚️ Stream Deck: amzn.to/3zzm7F5
  • Наука та технологія

КОМЕНТАРІ •