AI Trends 2024: Reinforcement Learning in the Age of LLMs with Kamyar Azizzadenesheli - 670

Поділитися
Вставка
  • Опубліковано 30 тра 2024
  • Today we’re joined by Kamyar Azizzadenesheli, a staff researcher at Nvidia, to continue our AI Trends 2024 series. In our conversation, Kamyar updates us on the latest developments in reinforcement learning (RL), and how the RL community is taking advantage of the abstract reasoning abilities of large language models (LLMs). Kamyar shares his insights on how LLMs are pushing RL performance forward in a variety of applications, such as ALOHA, a robot that can learn to fold clothes, and Voyager, an RL agent that uses GPT-4 to outperform prior systems at playing Minecraft. We also explore the progress being made in assessing and addressing the risks of RL-based decision-making in domains such as finance, healthcare, and agriculture. Finally, we discuss the future of deep reinforcement learning, Kamyar’s top predictions for the field, and how greater compute capabilities will be critical in achieving general intelligence.
    🔔 Subscribe to our channel for more great content just like this: ua-cam.com/users/twimlai?sub_confi...
    🗣️ CONNECT WITH US!
    ===============================
    Subscribe to the TWIML AI Podcast: twimlai.com/podcast/twimlai/
    Join our Slack Community: twimlai.com/community/
    Subscribe to our newsletter: twimlai.com/newsletter/
    Want to get in touch? Send us a message: twimlai.com/contact/
    📖 CHAPTERS
    ===============================
    00:00 - Introduction
    02:24 - How LLMs have changed RL
    18:36 - Voyager paper & Minecraft
    22:08 - World models
    25:27 - LLMs in robotics
    28:16 - RL vs explicit control algorithms
    35:19 - ALOHA and RLHF robots
    41:51 - Assessing the risks in RL agents
    51:22 - The future of RL & AI
    01:04:39 - Solving generality & narrow AI
    01:19:16 - Is hardware ready for AGI?
    01:23:36 - Conclusion
    🔗 LINKS & RESOURCES
    ===============================
    Neural Lander: Stable Drone Landing Control Using Learned Dynamics - arxiv.org/pdf/1811.08027
    Mobile ALOHA: Your Housekeeping Robot - • Mobile ALOHA: Your Hou...
    Voyager: An Open-Ended Embodied Agent with Large Language Models -
    arxiv.org/abs/2305.16291
    Mastering Diverse Domains through World Models - arxiv.org/abs/2301.04104
    AI Rewind 2021: Trends in Reinforcement Learning with Kamyar Azizzadenesheli - 560 - twimlai.com/podcast/twimlai/a...
    📸 Camera: amzn.to/3TQ3zsg
    🎙️Microphone: amzn.to/3t5zXeV
    🚦Lights: amzn.to/3TQlX49
    🎛️ Audio Interface: amzn.to/3TVFAIq
    🎚️ Stream Deck: amzn.to/3zzm7F5
  • Наука та технологія

КОМЕНТАРІ • 5

  • @mansurZ01
    @mansurZ01 18 днів тому

    Thank you for the video, it is a great source for catching up with research progress!

  • @kiryllshynharow9058
    @kiryllshynharow9058 3 місяці тому

    and one more question
    here you are talking about the role in RL tasks of intelligent agents based on LLMs for goal setting and evaluation functions
    but the underlying concept of LLMs themselves is to select the most likely next token. Moreover, effective video generation using a similar approach has recently been demonstrated. The strategy is also convenient because it does not require a huge amount of labeled data - it is enough to move the window along the text (or data of another modality).
    Why not use a transformer to generate the next action natively, instead of a word or a video frame, with logs or video recordings of possible behavior (like simulation training)? This approach looks natural (and is not more expensive than video generation)
    What is currently known about research in this direction?

  • @kiryllshynharow9058
    @kiryllshynharow9058 3 місяці тому

    23:44 Isn't it redudant to generate a goal/state description explicitly, while it would be sufficient to operate with comparisons in the embedding space?
    Or is this just a popular science explanation for a wider audience? Or am I overlooking some technical reasons?

  • @miriamploude3175
    @miriamploude3175 3 місяці тому

    "Promo sm"