GraphRAG: Knowledge Graphs for AI Applications with Kirk Marple - 681

Chronos: Learning the Language of Time Series with Abdul Fatir Ansari - 685

Localizing and Editing Knowledge in LLMs with Peter Hase - 679

КРУТАЯ НОВИНКА КРИМИНАЛЬНОГО ЭКШЕН БОЕВИКА ПРО ЭЛИТНОГО АГЕНТА! Коммандо. Лучшие Фильмы

Я стал КВАДРОБЕРОМ 🦊 Распаковка посылок от ПОДПИСЧИКОВ

Универ. 10 лет спустя - ВСЕ СЕРИИ ПОДРЯД

Teaching Large Language Models to Reason with Reinforcement Learning with Alex Havrilla - 680

The TWIML AI Podcast with Sam Charrington

Переглядів 568

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 14 чер 2024
Today we're joined by Alex Havrilla, a PhD student at Georgia Tech, to discuss "Teaching Large Language Models to Reason with Reinforcement Learning." Alex discusses the role of creativity and exploration in problem solving and explores the opportunities presented by applying reinforcement learning algorithms to the challenge of improving reasoning in large language models. Alex also shares his research on the effect of noise on language model training, highlighting the robustness of LLM architecture. Finally, we delve into the future of RL, and the potential of combining language models with traditional methods to achieve more robust AI reasoning.
🔔 Subscribe to our channel for more great content just like this: ua-cam.com/users/twimlai?sub_confi...
🗣️ CONNECT WITH US!
===============================
Subscribe to the TWIML AI Podcast: twimlai.com/podcast/twimlai/
Join our Slack Community: twimlai.com/community/
Subscribe to our newsletter: twimlai.com/newsletter/
Want to get in touch? Send us a message: twimlai.com/contact/
Follow us on Twitter: / twimlai
Follow us on LinkedIn: / twimlai
📖 CHAPTERS
===============================
00:00 - Introduction
02:19 - RL vs RLHF
06:22 - The state of RL
07:31 - Path to online learning
11:04 - Teaching LLMs to reason with RL
31:10 - ARB
34:45 - The importance of storing information
35:15 - Static and dynamic noise
45:06 - Conclusion
🔗 LINKS & RESOURCES
===============================
Teaching Large Language Models to Reason with Reinforcement Learning - arxiv.org/abs/2403.04642
ARB: Advanced Reasoning Benchmark for Large Language Models - arxiv.org/pdf/2307.13692.pdf
Proximal Policy Optimization Algorithms - arxiv.org/abs/1707.06347
Prioritized Level Replay - arxiv.org/pdf/2010.03934.pdf
Direct Preference Optimization: Your Language Model is Secretly a Reward Model - arxiv.org/pdf/2305.18290.pdf
trlX documentation - trlx.readthedocs.io/en/latest/
📸 Camera: amzn.to/3TQ3zsg
🎙️Microphone: amzn.to/3t5zXeV
🚦Lights: amzn.to/3TQlX49
🎛️ Audio Interface: amzn.to/3TVFAIq
🎚️ Stream Deck: amzn.to/3zzm7F5
Наука та технологія

КОМЕНТАРІ •

Наступне

Автоматичне відтворення

GraphRAG: Knowledge Graphs for AI Applications with Kirk Marple - 681

GraphRAG: Knowledge Graphs for AI Applications with Kirk Marple - 681

Chronos: Learning the Language of Time Series with Abdul Fatir Ansari - 685

Chronos: Learning the Language of Time Series with Abdul Fatir Ansari - 685

Localizing and Editing Knowledge in LLMs with Peter Hase - 679

Localizing and Editing Knowledge in LLMs with Peter Hase - 679

КРУТАЯ НОВИНКА КРИМИНАЛЬНОГО ЭКШЕН БОЕВИКА ПРО ЭЛИТНОГО АГЕНТА! Коммандо. Лучшие Фильмы

КРУТАЯ НОВИНКА КРИМИНАЛЬНОГО ЭКШЕН БОЕВИКА ПРО ЭЛИТНОГО АГЕНТА! Коммандо. Лучшие Фильмы

Я стал КВАДРОБЕРОМ 🦊 Распаковка посылок от ПОДПИСЧИКОВ

Я стал КВАДРОБЕРОМ 🦊 Распаковка посылок от ПОДПИСЧИКОВ

Универ. 10 лет спустя - ВСЕ СЕРИИ ПОДРЯД

Универ. 10 лет спустя - ВСЕ СЕРИИ ПОДРЯД

Climbing to 18M Subscribers 🎉

Climbing to 18M Subscribers 🎉

An IRS Update for IRAs in 2024

An IRS Update for IRAs in 2024

But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning

But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning

Controlling Fusion Reactor Instability with Deep Reinforcement Learning with Aza Jalalvand - 682

Controlling Fusion Reactor Instability with Deep Reinforcement Learning with Aza Jalalvand - 682

Making AI accessible with Andrej Karpathy and Stephanie Zhan

Making AI accessible with Andrej Karpathy and Stephanie Zhan

Made WITH Georgia: The Rise of Georgia Post Production

Made WITH Georgia: The Rise of Georgia Post Production

Language Understanding and LLMs with Christopher Manning - 686

Language Understanding and LLMs with Christopher Manning - 686

Самый прочный в мире бюджетный смартфон? 😎 Hotwav Cyber 15

Самый прочный в мире бюджетный смартфон? 😎 Hotwav Cyber 15

Bluetooth Desert Eagle

Bluetooth Desert Eagle

ВЫ ЧЕ СДЕЛАЛИ С iOS 18?

ВЫ ЧЕ СДЕЛАЛИ С iOS 18?

БОЛЬШОЙ и полный обзор macOS 15 Sequoia для Mac! Что нового? Стоит ли устанавливать?

БОЛЬШОЙ и полный обзор macOS 15 Sequoia для Mac! Что нового? Стоит ли устанавливать?

Плохо работает переключатель? РЕШЕНИЕ 😱

Плохо работает переключатель? РЕШЕНИЕ 😱

Сонячні панелі на балконі - реально? | EcoFlow Delta 2 + Solar Panel + PowerStream

Сонячні панелі на балконі – реально? | EcoFlow Delta 2 + Solar Panel + PowerStream

Worlds smallest 4K headset 😎 Visor.com #tech #vr #technology #virtualreality #tangled #rapunzel

Worlds smallest 4K headset 😎 Visor.com #tech #vr #technology #virtualreality #tangled #rapunzel

Дени против умной колонки😁

Дени против умной колонки😁