Stanford CS25: V3 I Low-level Embodied Intelligence w/ Foundation Models

Поділитися
Вставка
  • Опубліковано 7 гру 2023
  • October 10, 2023
    Low-level Embodied Intelligence with Foundation Models
    Fei Xia, Google DeepMind
    This talk introduces two novel approaches to low-level embodied intelligence through integrating large language models (LLMs) with robotics, focusing on "Language to Reward" and "Robotics Transformer-2". The former employs LLMs to generate reward code, creating a bridge between high-level language instructions and low-level robotic actions. This method allows for real-time user interaction, efficiently controlling robotic arms for various tasks and outperforming baseline methodologies. "Robotics Transformer-2" integrates advanced vision-language models with robotic control by co-fine-tuning on robotic trajectory data and extensive web-based vision-language tasks, resulting in the robust RT-2 model which exhibits strong generalization capabilities. This approach allows robots to execute untrained commands and efficiently perform multi-stage semantic reasoning tasks, exemplifying significant advancements in contextual understanding and response to user commands. These projects demonstrate that language models can extend beyond their conventional domain of high-level reasoning tasks, playing a crucial role not only in interpreting and generating instructions but also in the nuanced generation of low-level robotic actions.
    More about the course can be found here: web.stanford.edu/class/cs25/
    View the entire CS25 Transformers United playlist: • Stanford CS25 - Transf...

КОМЕНТАРІ • 7

  • @SamBhattacharyya
    @SamBhattacharyya 5 місяців тому +7

    Thank you for bringing together all of these amazing guest lectures in one place and making it freely available online. I've been watching dozens of lectures, and this one is particularly interesting for me, as I was previously doing a PhD in Robotics in the early 2010s whereas recently I've spent the last few years in industry on computer vision and Natural Language processing. Seeing robotics with reasoning powered by Large Language Models trained on Internet data is incredibly fascinating and makes me interested in going back to Robotics

    • @stanfordonline
      @stanfordonline  5 місяців тому +1

      We are glad you enjoyed them. You might also be interested in this interview with Professor Chelsea Finn. ua-cam.com/video/IT734HriiHQ/v-deo.html

    • @SamBhattacharyya
      @SamBhattacharyya 5 місяців тому

      @@stanfordonline Thank you so much! That was a great interview as well!

  • @erniea5843
    @erniea5843 2 місяці тому +1

    Amazing to see where this field is heading

  • @chrisweeks8789
    @chrisweeks8789 5 місяців тому +2

    The transformer is nuts. Can’t wait for the breakthrough to generate low level actions for most devices

  • @bobsmithy3103
    @bobsmithy3103 5 місяців тому +1

    Amazing presentation! It was extremely digestible with plenty of takeaways even for someone not knowledgeable in the ml field

  • @ginogarcia8730
    @ginogarcia8730 5 місяців тому

    man the robot knocking the sink open