WAD at CVPR
WAD at CVPR
  • 45
  • 296 838
[CVPR'24 WAD] Keynote - Sanja Fidler, NVIDIA
Talk given at the CVPR Workshop on Autonomous Driving 2024: cvpr2024.wad.vision/.
00:00 Introduction
01:43 AV Stack Evolution
03:43 NVIDIA's Generative AI & Drive Thor
05:11 Today's Talk
05:41 Foundation Models in Data Tools
06:47 Visual Retrieval
07:55 Semantic Scenario Clustering
08:54 Foundation Model powered AV Stack (AVFM)
10:40 PARA-Drive
11:10 Hydra-MDP
11:41 Trajeglish
12:21 Video Generation
13:53 Video Modeling
16:31 Video Tokenization
17:48 Open-Loop Trajectory Prediction
18:54 Simulation
20:15 Physics, Dynamics and Editing
22:00 Closed-Loop Testing
23:34 Object Insertion and Handling Large Scenes
25:05 Scene Generation
25:56 fVDB Release
27:28 Conclusion
Переглядів: 258

Відео

[CVPR'24 WAD] Challenge - Waymo Open Dataset
Переглядів 24114 днів тому
Talk given at the CVPR Workshop on Autonomous Driving 2024: cvpr2024.wad.vision/. 00:00 Introduction 02:30 Perception Dataset 04:30 Motion Dataset 06:22 Benchmarks 07:25 Waymax Simulator Release 09:18 2024 Challenge Overview 10:47 Motion Prediction Challenge 15:07 1st Place: MTR v3 20:12 Occupancy and Flow Challenge 23:20 1st Place: DOPP 25:11 2nd Place: STNet 27:45 3rd Place: HGNET 29:57 Sim A...
[CVPR'24 WAD] Keynote - Congcong Li, Waymo
Переглядів 26721 день тому
Talk given at the CVPR Workshop on Autonomous Driving 2024: cvpr2024.wad.vision/. 00:00 Introduction 01:52 Waymo's Experience and Service Areas 03:26 Driving Examples 05:30 Construction, Emergency Vehicles and Freeways 09:25 Waymo's Safety Record 10:34 Rare Event Examples 11:21 Long-Tail Handling 12:48 LLM/VLM Reasoning 14:13 Open-Vocabulary Perception using VL Distillation 19:20 MotionLM: Mode...
[CVPR'24 WAD] Challenge - Argoverse
Переглядів 9121 день тому
Talk given at the CVPR Workshop on Autonomous Driving 2024: cvpr2024.wad.vision/. 00:00 Introduction 00:49 What is Argoverse (2) 01:15 Sensor Dataset 01:55 LiDAR Dataset 02:24 Motion Forecasting Dataset 02:55 Map Change Dataset 03:57 Argoverse post-Argo 05:07 (Un)supervised Scene Flow Challenge 18:27 End-to-End Forecasting Challenge 32:22 Multi-Agent Motion Forecasting Challenge 41:56 4D Occupa...
[CVPR'24 WAD] Keynote - Nick Roy, Zoox
Переглядів 27621 день тому
Talk given at the CVPR Workshop on Autonomous Driving 2024: cvpr2024.wad.vision/. 00:00 Introduction 00:59 Autonomy Architectures 03:25 Structured Optimization vs E2E Learned Policies 06:38 Hybrid Architecture 08:34 Early Sensor Fusion 11:03 Learned Depth Completion 12:45 Handling Debris 15:33 The IID Assumption is Brittle 16:37 OOD Detection 20:06 OOD Detection using Foundation Model Embedding...
[CVPR'24 WAD] Keynote - Alex Kendall, Wayve
Переглядів 21921 день тому
Talk given at the CVPR Workshop on Autonomous Driving 2024: cvpr2024.wad.vision/. 00:00 Introduction 01:07 The Road to Embodied AI 01:57 From Vision to Action 03:43 AV2.0: An End-to-End System 04:57 Promises of End-to-End Systems 06:18 Focus Areas to Create Embodied AI 06:53 Focus Area: Simulation 08:36 Ghost Gym: Neural Simulator 09:28 Introducing PRISM-1 12:09 Launching WayveScenes101 14:07 G...
[CVPR'24 WAD] Keynote - Bolei Zhou, UCLA
Переглядів 12021 день тому
Talk given at the CVPR Workshop on Autonomous Driving 2024: cvpr2024.wad.vision/. 00:00 Introduction 00:52 The Gap between Industry and Academia 02:02 Driving Datasets 02:51 Driving Simulators 04:41 The MetaDrive Simulator 07:26 ScenarioNet 11:29 TrafficGen 13:30 Closed-Loop Adversarial Training (CAT) 16:18 SimGen Driving Scene Generation 21:04 Mobility Anywhere 22:55 MetaUrban Simulator 25:35 ...
[CVPR'24 WAD] Keynote - Kashyap Chitta, University of Tübingen
Переглядів 18021 день тому
Talk given at the CVPR Workshop on Autonomous Driving 2024: cvpr2024.wad.vision/. 00:00 Introduction 00:30 Benchmarking AVs is hard 02:36 Flaws in Open Loop Benchmarks 05:18 What about Simulation 07:18 Non-reactive Simulation 08:11 No Sensor Simulation 11:54 No Traffic Simulation 13:33 NAVSIM Metrics 16:00 The Predictive Driver Model (PDM) Score 18:10 Comparing PDMS vs Closed-loop Metrics 20:09...
[CVPR'24 WAD] Keynote - Georgios Pavlakos
Переглядів 8521 день тому
Talk given at the CVPR Workshop on Autonomous Driving 2024: cvpr2024.wad.vision/. 00:00 Introduction 00:58 Perceiving Humans in 4D 03:17 Improving Robustness in Human Motion Estimation 04:49 Human Mesh Recovery (HMR) 07:46 HMR Examples 11:28 Application for Tracking 13:47 SLAHMR: Decoupling Human and Camera Motion in the Wild 18:36 SLAHMR Examples 21:30 Reconstructing Humans and Scenes 22:24 Re...
[CVPR'23 WAD] Keynote - Hang Zhao, Tsinghua University
Переглядів 4,3 тис.Рік тому
Talk given at the CVPR Workshop on Autonomous Driving 2023: cvpr2023.wad.vision/. 00:00 Introduction 00:27 A Simplified Self-Driving Stack 01:05 ViP3D: End-to-End Visual Prediction 02:25 Scalability 04:56 3D Occupancy Prediction 07:28 Auto-Labeling Occupancy Datasets 12:25 The Occ3D and SSCBench Benchmarks 13:47 Handling New Geo-Locations 16:31 VectorMapNet 18:59 Neural Map Priors 22:10 Map Pri...
[CVPR'23 WAD] Challenge - Argoverse
Переглядів 2,2 тис.Рік тому
Talk given at the CVPR Workshop on Autonomous Driving 2023: cvpr2023.wad.vision/. 00:00 Introduction 01:50 What is Argoverse (2) 7:17 Multi-Agent Motion Forecasting Challenge 20:57 End-to-End Forecasting Challenge 36:26 4D Occupancy Forecasting Challenge 44:38 Self-Supervised Scene Flow Challenge
[CVPR'23 WAD] Challenge - BDD100K
Переглядів 556Рік тому
Talk given at the CVPR Workshop on Autonomous Driving 2023: cvpr2023.wad.vision/. 00:00 Introduction 00:35 First Winner 12:35 Second Winner
[CVPR'23 WAD] Keynote - Andreas Wendel, Kodiak Robotics
Переглядів 1,7 тис.Рік тому
Talk given at the CVPR Workshop on Autonomous Driving 2023: cvpr2023.wad.vision/.
[CVPR'23 WAD] Challenge - Waymo Open Dataset
Переглядів 4,9 тис.Рік тому
Talk given at the CVPR Workshop on Autonomous Driving 2023: cvpr2023.wad.vision/. 00:00 Introduction 06:15 What's New 08:04 Waymax & Perception Object Assets 11:13 New Dataset Format 14:43 Polling 21:47 Pose Estimation Challenge 31:57 Motion Prediction Challenge 42:30 Sim Agents Challenge 54:42 2D Video Panoptic Segmentation
[CVPR'23 WAD] Keynote - Alexandre Alahi, EPFL
Переглядів 1,4 тис.Рік тому
Talk given at the CVPR Workshop on Autonomous Driving 2023: cvpr2023.wad.vision/. 00:00 Introducing Social Forecasting 03:05 Robot Experiment 04:34 Representation Learning: Perception 08:52 Representation Learning: Social Forecasting 16:57 Representation Learning: Planning 19:13 7 Foundational Principles for Autonomous Mobility 19:22 P1: Predictive Coding 21:18 P2: Opposites 23:30 P3: Multimoda...
[CVPR'23 WAD] Keynote - Ashok Elluswamy, Tesla
Переглядів 36 тис.Рік тому
[CVPR'23 WAD] Keynote - Ashok Elluswamy, Tesla
[CVPR'23 WAD] Keynote - Jiyang Gao, Momenta
Переглядів 2,8 тис.Рік тому
[CVPR'23 WAD] Keynote - Jiyang Gao, Momenta
[CVPR'23 WAD] Keynote - Chelsea Finn, Stanford University/Google
Переглядів 1,6 тис.Рік тому
[CVPR'23 WAD] Keynote - Chelsea Finn, Stanford University/Google
[CVPR'23 WAD] Keynote - Andreas Geiger, Universität Tübingen
Переглядів 4,3 тис.Рік тому
[CVPR'23 WAD] Keynote - Andreas Geiger, Universität Tübingen
[CVPR'23 WAD] Keynote - Chen Wu, Waymo
Переглядів 5 тис.Рік тому
[CVPR'23 WAD] Keynote - Chen Wu, Waymo
[CVPR'22 WAD] Keynote - Alex Kendall, Wayve
Переглядів 5 тис.2 роки тому
[CVPR'22 WAD] Keynote - Alex Kendall, Wayve
[CVPR'22 WAD] Keynote - Sergey Levine, UC Berkeley
Переглядів 3,3 тис.2 роки тому
[CVPR'22 WAD] Keynote - Sergey Levine, UC Berkeley
[CVPR'22 WAD] Keynote - Yuning Chai, Cruise
Переглядів 6 тис.2 роки тому
[CVPR'22 WAD] Keynote - Yuning Chai, Cruise
[CVPR'22 WAD] Challenge - Argoverse
Переглядів 2 тис.2 роки тому
[CVPR'22 WAD] Challenge - Argoverse
[CVPR'22 WAD] Challenge - nuPlan by Motional
Переглядів 1,3 тис.2 роки тому
[CVPR'22 WAD] Challenge - nuPlan by Motional
[CVPR'22 WAD] Challenge - BDD100K
Переглядів 9152 роки тому
[CVPR'22 WAD] Challenge - BDD100K
[CVPR'22 WAD] Challenge - Waymo Open Dataset
Переглядів 4,1 тис.2 роки тому
[CVPR'22 WAD] Challenge - Waymo Open Dataset
[CVPR'22 WAD] Keynote - Matthew Johnson-Roberson, CMU
Переглядів 1,1 тис.2 роки тому
[CVPR'22 WAD] Keynote - Matthew Johnson-Roberson, CMU
[CVPR'22 WAD] Keynote - Raquel Urtasun, Waabi/UofT
Переглядів 2,7 тис.2 роки тому
[CVPR'22 WAD] Keynote - Raquel Urtasun, Waabi/UofT
[CVPR'22 WAD] Keynote - Sanja Fidler, Nvidia/UofT
Переглядів 3,8 тис.2 роки тому
[CVPR'22 WAD] Keynote - Sanja Fidler, Nvidia/UofT

КОМЕНТАРІ

  • @diodin8587
    @diodin8587 20 днів тому

    18:02 Planning and Control

  • @lala-ru1jj
    @lala-ru1jj Місяць тому

    Most questions are from BYD, Xpeng employees, I guess 😬

  • @carlrossum
    @carlrossum 3 місяці тому

    Thank you Dear Waymo for making this available to the world. This is really help full for both fundamental researchers and engineering specialists. Thanks. Keep up the great work.

  • @jaysrinivasan8205
    @jaysrinivasan8205 9 місяців тому

    Ashok is amazing

  • @karunald
    @karunald Рік тому

    I will never understand why Tesla continues to put intersection cameras 9 feet behind the nose of the car and behind our backs. When there's no room to creep to see - it doesn't work! Such an obvious massive flaw. Maybe if I lived in CA with bike lanes & road buffers it would work. FIX IT

  • @TheFutureThoughtExchange
    @TheFutureThoughtExchange Рік тому

    Keynote speech presented by Ashok Elluswamy at CVPR'23 WAD (Workshop on Autonomous Driving), where he discusses Tesla's self-driving technology, including the advancements in machine learning techniques, robotics, real-world implementations, and the underlying concepts of Tesla's technology. Let's break down the themes and deeper insights, including the esoteric aspects of this conversation. ### 1. **The Foundation of Autonomy and Robotics**: - **Machine Learning at the Core**: Elluswamy emphasizes the use of machine learning in building the foundation for autonomy in vehicles. The explanation of neural networks, occupancy, voxel prediction, transformer models, and 3D scene construction reflect the cutting-edge practices of AI and machine learning. - **Generalization and Robustness**: He emphasizes the stack's robustness to errors and its ability to adapt to different situations and environments. This ties to the broader theme of machines being adaptable, like humans, and being able to operate in the real world. - **Integration with Robotics**: The technology is not limited to cars but is extendable to other robotic platforms, highlighting the unifying concepts within robotics and machine autonomy. ### 2. **Full Self-Driving (FSD) Technology**: - **Real-World Implementation**: Elluswamy discusses the actual deployment of Tesla's FSD system in the U.S. and Canada. He doesn't merely focus on theoretical concepts; instead, he gives insights into the real-world challenges and accomplishments. - **Camera and Sensing Technology**: The 360-degree coverage and the way cameras are used to interpret the world resonate with how human senses work. It reflects an attempt to create machines that perceive the world much like humans do. - **Lane and Object Prediction**: The real-time prediction and analysis of lanes, objects, and traffic signals is a sophisticated task that mimics human cognition. The comparison of modeling lanes to modeling language (mentioning GPT) connects two complex realms of machine learning - natural language processing and computer vision. ### 3. **Simulation and Prediction of Different Futures**: - **Dynamic World Modeling**: There's a profound concept of creating a "neural network simulator" that can simulate different futures based on different actions. It implies a move from static to dynamic models, mirroring human ability to predict potential outcomes. - **Generative Models and Action-Conditioning**: The ability to generate and condition models on the past to predict the future aligns with human imagination and intentionality, extending AI into realms previously exclusive to human consciousness. ### 4. **Hybrid Approaches and Configurability**: - **Hybrid Neural Representational Field (Nerf) Approach**: The discussion about 3D reconstruction with a hybrid Nerf approach represents the synthesis of different AI techniques to create something novel and effective. - **Customizable System**: The configurable nature of the models, allowing for different applications and needs, reflects a move towards personalized and adaptable technology. ### 5. **Tesla as a Leader in Compute**: - **Dojo and Scalability**: Mention of Dojo, Tesla's training hardware, and the company's aspiration to become a world leader in compute emphasizes the role of technology not only in driving but in broader societal change. ### 6. **Questions and Further Insights**: - **Engaging with Complexity**: The subsequent questions and responses delve into complexities like auto-regressive models, voxel sizes, occupancy, inference times, and map components. These details allow a nuanced understanding of the technology. ### Conclusion This transcript is more than just a description of Tesla's autonomous driving technology. It's a glimpse into the current and future state of AI and machine learning, offering insights into how these technologies are shaping our interaction with the world. It reflects the ongoing journey to bridge the gap between machines and human-like understanding, adaptability, and intuition. Moreover, the real-world application of these complex technologies represents a crucial step in the transition from theoretical research to tangible, everyday experiences, contributing to the evolution of our transportation systems and potentially our broader interaction with machines.

  • @simsonyee
    @simsonyee Рік тому

    How does the network account for different calibrations on different cars? The relative poses of the 8 cameras will a little different from car to car and also drift in time and influenced by heat/cold between day and night.

  • @simsonyee
    @simsonyee Рік тому

    Autonomous shipping? Autonomous trains?

    • @Nolan.Gurule
      @Nolan.Gurule 11 місяців тому

      Trains would be awesome!

  • @fredharris929
    @fredharris929 Рік тому

    They are hackable.

  • @ThomasButryn
    @ThomasButryn Рік тому

    I think the most interesting thing is that Tesla's approach is really based on experimenting with ideas

  • @nonietoomila8890
    @nonietoomila8890 Рік тому

    1:08 🎉🎉🎉😅😅😅😂😂😂😮😮😮😢😢😢😢😢😢🎉🎉😂😂❤❤❤❤❤❤❤❤❤

  • @nonietoomila8890
    @nonietoomila8890 Рік тому

    30:36 🎉🎉😅😅😅😢😢😂😂😂😮😮🎉🎉😢😢😂😂😮😅😮😢🎉😂❤❤❤❤❤❤❤❤❤❤❤❤❤

  • @GloriaKerluke
    @GloriaKerluke Рік тому

    P r o m o s m 😣

  • @nonietoomila8890
    @nonietoomila8890 Рік тому

    0:33 🎉🎉🎉😅😅😅😢😢😮😮🎉😂😂😂😂😂😂🎉❤❤❤❤❤❤❤❤❤❤❤❤❤

  • @gregchristie2763
    @gregchristie2763 Рік тому

    Imagine this applied to military robots with guns or worse.. so Elon now has both vehicles robots and humanoid robots already where if this AI is applied to them can learn by itself with no restraints whatsoever...it is really is quite scary this needs stopping now.. even Elon himself has said it needs legislating.. and this is just the beginning.. people joke about skynet and the terminator but this is a real threat..Tesla also has the coms via Starlink .. very scary.😢😢 Elon himself in the Tucker Carlson interview available on UA-cam .... The dangers of hyper intelligent AI ... said this sort of AI unchecked could result in the annihilation of the human race ..or "civilisational destruction" as he called it .. just thinks if a mad man like Putin got hold of this tech..very very scary.

    • @meamzcs
      @meamzcs Рік тому

      Lol... Elon has access to literal ICBMs...

  • @nonietoomila8890
    @nonietoomila8890 Рік тому

    0:14 😢🎉🎉🎉🎉😅😅😮😮😢😢😮😂😂😂😂🎉🎉❤❤❤❤❤❤❤❤❤❤❤❤

  • @nonietoomila8890
    @nonietoomila8890 Рік тому

    🎉🎉😅😅😅😂😂😢😢😮😮😢🎉😂😂❤❤❤❤❤❤❤❤❤❤❤

  • @nonietoomila8890
    @nonietoomila8890 Рік тому

    1:08 🎉😢😢😢🎉🎉🎉😅😅😮😢😂😂❤❤❤❤❤❤❤🎉🎉❤❤❤

  • @nonietoomila8890
    @nonietoomila8890 Рік тому

    😮😮😅😅😅🎉🎉🎉😢😢😢😢🎉🎉🎉🎉😂😂😂😂😂❤❤❤❤❤❤❤❤❤❤❤❤

  • @nonietoomila8890
    @nonietoomila8890 Рік тому

    13:34 🎉🎉🎉😅😅😅😢😢😮😮😮😮😂😂😂😂🎉🎉❤❤❤❤❤❤❤❤

  • @nonietoomila8890
    @nonietoomila8890 Рік тому

    0:47 🎉🎉😅😅😅😢😢😮😮😮😮😢😢😂😂😂😂😂❤❤❤❤❤🎉🎉🎉❤❤❤❤❤❤❤

  • @joeysipos
    @joeysipos Рік тому

    12:05 bro that was totally the Tesla’s fault. It’s the one that blew through the stop sign…

    • @galileo3431
      @galileo3431 Рік тому

      That's exactly what Ashok also says. In this case, FSD wasn't enabled, but the human was driving and made the error. The visual AEB system was anyways able to perform emergency breaking.

    • @SkradaczTENZNANY
      @SkradaczTENZNANY Рік тому

      yes, that's what he said. "the ego driver" means the system controlling the Tesla

    • @joeysipos
      @joeysipos Рік тому

      @@galileo3431 Ah ok, yeah now that I rewatched it. I thought he said the red car blew through the stop sign...

    • @meamzcs
      @meamzcs Рік тому

      ​@@SkradaczTENZNANYLOL... The title of the slide literally says MANUAL DRIVING...

  • @doublesman0
    @doublesman0 Рік тому

    For the love of God please add radar as a data point.

    • @tiro0oO5
      @tiro0oO5 Рік тому

      Imaging radar to be precise

  • @berlusmafia
    @berlusmafia Рік тому

    Thanks to people like you and and the engineers who works at Tesla, there hope for humanity for a decent future

  • @marcusaurelius6607
    @marcusaurelius6607 Рік тому

    5 more lidars will be enough

  • @seojimjames
    @seojimjames Рік тому

    Brilliant all the Way, thanks for the great informative video, Tesla Investors appreciate knowing just how great your work is. Also, consider how many drivers are visually challenged and still driving, or impaired drivers, or distracted drivers, the FSB is consistent and a much better option than borderline bad drivers.

  • @pranjal86able
    @pranjal86able Рік тому

    Here are the key points from the video titled "[CVPR'23 WAD] Keynote - Ashok Elluswamy, Tesla": - The speaker, Ashok Elluswamy, is a member of the autopilot team at Tesla. - He presents their work on what they believe will be the foundation model for autonomy and robotics. - Tesla has shipped the full self-driving beta software to all purchasers in the United States and Canada, with roughly 400,000 vehicles having driven up to 250 million miles on the full self-driving beta program. - The self-driving stack is scalable and can navigate to any destination within the US, handling intersections, stopping at traffic lights, and interacting with other objects. - The system is driven primarily by eight cameras on the car that provide a full 360-degree coverage. - The self-driving stack is based on modern machine learning, with many components folded into neural networks. This is different from the traditional approach to self-driving, which uses localization maps and various sensors. - The system works primarily with cameras, and it performs quite well. - The speaker discusses the importance of occupancy networks in their stack, which predict whether a voxel in 3D space is occupied or not. This model task is general and robust to ontology errors. - The occupancy networks also predict the flow of voxels in the future, providing arbitrary motion. Everything runs in real time. - The architecture of the system may look complicated, but it's quite straightforward. Videos from multiple cameras stream in, and a large Transformer block builds up features and does temporal attention with some geometry thrown in. - The same architecture can be used for other tasks needed for driving, such as predicting lanes and roads. - Lanes are crucial for driving tasks but are challenging to predict due to their high-dimensional nature, graph structure, and large uncertainty. They can span the entire road, fork, merge, and sometimes even humans cannot agree on their structure. - The team uses state-of-the-art generative modeling techniques, such as autoregressive transformers, to predict lanes. This approach is similar to GPT and predicts lanes one token at a time, considering the full graph structure. - Moving objects like vehicles, trucks, and pedestrians need to be detected with their full kinematic state. The models used are multi-modal, taking in not just camera video streams but also other inputs like the vehicle's own kinematics and navigation instructions. - The entire motion planning can also be done using a network, making the system a modern machine learning stack where everything is done end-to-end. - The success of this system is attributed to the sophisticated auto-labeling pipeline that provides data from the entire fleet. This allows for multi-trip reconstruction, where multiple Tesla vehicles driving through the same location provide their video clips and kinematic data to construct the entire 3D scene. - The team uses multi-trip reconstruction to gather data from the entire fleet, enabling them to reconstruct lanes, road lines, and other elements from anywhere on Earth. - They use a hybrid approach to Neural Radiance Fields (NeRF) and general 3D reconstruction, which results in accurate and clear reconstructions of the scene, including vehicles, barriers, and trucks. - Additional neural networks are run offline to produce labels for lanes, roads, and traffic lights, creating a vector representation that can be used as labels for the online stack. - The system can auto-label traffic lights, predicting their shape, color, and relevancy, and these predictions are multi-view consistent. - These predictions provide a superhuman understanding of the world from cameras, creating a foundation model that can be used in various places. - The system helps with both autonomous and manual driving, providing emergency braking for crossing vehicles. This is a new feature, as crossing objects are harder to predict than vehicles in your own lane. - The team is working on learning a more general world model that can represent arbitrary things, using recent advances in generative models like Transformers and diffusion. - The neural network can predict future video sequences given past videos. It predicts for all eight cameras around the car jointly, understanding depth and motion on its own without any 3D priors. - The model can be action-conditioned. For example, given the same past context, when asked for different futures (like keep driving straight or change lanes), the model can produce different outcomes. - This creates a neural network simulator that can simulate different futures based on different actions, representing things that are hard to describe in an explicit system. - Future prediction tasks can also be done in semantic segmentation or reprojected to 3D spaces, predicting future 3D scenes based on the past and action prompting. - The team is working on solving various nuances of driving to build a general driving stack that can drive anywhere in the world and be human-like, fast, efficient, and safe. - Training these models requires a lot of compute power. Tesla is aiming to become a world leader in compute with their custom-built training hardware, Dojo, which is starting production soon. - The models are not just being built for the car but also for the robot, with several networks shared between the car and the robot. - The foundational models for vision that the team is building are designed to understand everything and generalize across cars and robots. They can be trained on diverse data from the fleet and require a lot of compute power. - The team is excited about the progress they expect to make in the next 12 to 18 months. - In the Q&A session, the speaker explains that they can track moving objects in the 3D reconstruction with their hybrid NeRF approach, using various cues and signals in the data. - The world model for future prediction tasks is a work in progress, but it's starting to work now, providing a simulator where they can roll out different outcomes and learn representations. - The use of autoregressive models for predicting lanes is due to the graph structure of lanes and the need to model a distribution in high-dimensional space. This approach provides clear, non-blurry predictions that are useful downstream. - The voxel size in the occupancy network output is a trade-off between memory and compute and can be configured based on the needs of the application. - The same principles of the world model should apply to humanoid robots. The model should be able to imagine what actions like picking up a cup or walking to a door would look like. - The occupancy network is used for collision avoidance in the full self-driving (FSD) system. It's particularly useful for dealing with unusual vehicles or objects that are hard to model using other methods. - The general world model is still being optimized and hasn't been shipped to customers yet. It might be ready later in the year. - The system doesn't use high-definition maps, so alignment isn't super critical. The maps used are low-definition, providing enough information to guide the network on which roads and lanes to take. This concludes the summary of the video "[CVPR'23 WAD] Keynote - Ashok Elluswamy, Tesla". The speaker, Ashok Elluswamy, discusses the development of Tesla's self-driving technology, focusing on the use of machine learning and neural networks. He also answers questions about the technical details of the system.

    • @pascalg.8772
      @pascalg.8772 Рік тому

      Thanks for your work Ashok is actually the head of Tesla autopilot team. He took over when Andrej Karpathy left

    • @jacolantern1
      @jacolantern1 Рік тому

      If you’re going to use GPT to summarize the transcript, then at least say that’s what you’re doing. Neglecting to mention that is dishonest and misleading

  • @nioncao
    @nioncao Рік тому

    Too little progress compared to Ai day

    • @Jsmith32t
      @Jsmith32t Рік тому

      It’s pretty great progress actually. With the rate of change in the ML world you have to re-evaluate your approach every 3-6 months now. What they showed at AI day is what they are shipping now, but they have already hit a wall with the rare corner cases. The World model will take advantage of their auto label system and will provide clean data for the new approach. Probably in 2 years from now the World model will be at a stable release and metrics for regulators will start accumulating towards proper level 5

    • @SyntheticSpy
      @SyntheticSpy Рік тому

      @@Jsmith32tif their amount of compute scales like they are planning, it will likely be sooner than 2 years

  • @falconxlc
    @falconxlc Рік тому

    12:10 the tesla missed the stop sign but ashok says the red car blew a stop sign??

    • @f2yd
      @f2yd Рік тому

      "In this case, on the left side the *ego* driver for some reason blew past the stop sign" The ego driver the one driving the Tesla

    • @falconxlc
      @falconxlc Рік тому

      @@f2ydi stand corrected, he did say ego.

    • @FinanceNinja
      @FinanceNinja Рік тому

      @@f2yd It was a human driving the Tesla and blew the stop sign, not FSD. He was explaining how FSD saw the path of the perpindicular car and interjected to stop the Tesla from hitting it.

    • @f2yd
      @f2yd Рік тому

      @@FinanceNinja I agree, that's what I was saying too. Ego driver = the human driving the car from which we see the video

  • @pervezbhan1708
    @pervezbhan1708 Рік тому

    3D representations can provide valuable information for self-driving systems, but they are typically used in conjunction with other sensor modalities and data sources to create a comprehensive understanding of the environment. While 3D representations, such as point clouds or voxel grids, can capture geometric details of the surroundings, they have certain limitations that make additional information necessary. Here are a few reasons why 3D representations alone may not be sufficient for self-driving: Limited Field of View: 3D representations typically rely on sensors like lidar to capture the surrounding environment. However, lidar sensors have a limited field of view and can struggle with detecting objects beyond their range or behind obstacles. Therefore, additional sensor modalities like cameras and radar are used to provide complementary information. Perception Challenges: Understanding the environment goes beyond geometry. Perception tasks like object detection, classification, and semantic segmentation require information beyond 3D representations. Cameras, for example, provide valuable color and texture information that aids in object recognition and scene understanding. Dynamic Objects: Self-driving systems need to account for the motion and behavior of dynamic objects like pedestrians, cyclists, and other vehicles. 3D representations alone may not capture the temporal aspect of these objects. Additional information, such as object tracking and prediction algorithms, are necessary to anticipate and plan for the future movements of these dynamic entities. Environmental Context: Environmental context, such as traffic signs, road markings, and traffic signals, plays a critical role in self-driving. While some aspects of this information can be captured in 3D representations, other details may require explicit detection and recognition algorithms designed for specific objects or structures. To build robust and reliable self-driving systems, it's common to integrate data from various sensors, including cameras, lidars, radars, and other sources, to create a holistic understanding of the environment. Combining multiple sensor modalities and data sources allows for redundancy, robustness in different environmental conditions, and a richer representation of the surroundings. User 3D occupancy prediction ChatGPT 3D occupancy prediction is a technique used in self-driving and robotics to estimate the occupancy or occupancy probability of the environment in three-dimensional space. It involves predicting which regions of the 3D space are occupied by obstacles or objects and which regions are free or unoccupied. This information is crucial for path planning, obstacle avoidance, and collision prevention. There are various approaches and algorithms used for 3D occupancy prediction, and here are a few common techniques: Voxel Grids: Voxel grids are a popular representation for 3D occupancy prediction. The 3D space is discretized into small volumetric cells called voxels. Each voxel can be marked as occupied, free, or unknown based on sensor measurements. Algorithms like occupancy grid mapping or probabilistic occupancy mapping can update and maintain the occupancy probabilities of the voxels over time. Point Cloud Processing: Point clouds obtained from lidar sensors are often used for 3D occupancy prediction. Various techniques, such as point cloud segmentation, clustering, and surface reconstruction, can be applied to identify and estimate the occupancy of objects in the point cloud data. Deep Learning: Deep learning methods, particularly convolutional neural networks (CNNs) and their variants, can be employed for 3D occupancy prediction. These networks can learn to directly predict occupancy probabilities or voxel-wise occupancy labels from 3D input data, such as point clouds or voxel grids. Training these models requires labeled data that indicates the occupancy status of each voxel or point. Sensor Fusion: 3D occupancy prediction can also benefit from sensor fusion, where data from multiple sensors, such as lidar, cameras, and radars, are combined to create a more comprehensive representation. Sensor fusion techniques, like Kalman filters or particle filters, can be used to integrate the data from different sensors and refine the occupancy predictions. The accuracy and reliability of 3D occupancy prediction depend on factors such as sensor quality, sensor placement, calibration, and the algorithms used. It's common to combine 3D occupancy prediction with other perception tasks, such as object detection and semantic segmentation, to create a more detailed understanding of the environment for autonomous systems.

  • @MrMolledm
    @MrMolledm Рік тому

    Really enjoyed Can’t wait for autonomous mass transit buses

  • @jascfdrac
    @jascfdrac Рік тому

    Nice work!

  • @孟志彬
    @孟志彬 Рік тому

    is there paper name of this work? occupancy networks

  • @pavelt9391
    @pavelt9391 Рік тому

    Simple. Genius. Amazing.

  • @johanngerberding5956
    @johanngerberding5956 Рік тому

    sad that this doesn't get the hype like gpts get, this is much cooler to me

  • @tenzinpassang4812
    @tenzinpassang4812 Рік тому

    #RemindMeIn2030: How much is TSLA worth? 🤑🤑🤑🤑🤑🤑

  • @changtimwu
    @changtimwu Рік тому

    the dataset is great but the reader lib strictly depends on tensorflow 2.6.

  • @brewedicedcode1341
    @brewedicedcode1341 Рік тому

    Great presentation Ashok! So good to get in-depth understanding of how FSD works. Give you sense of how complex collision avoidance problem is. Thanks for great work from you and your team.

  • @foodmaker5771
    @foodmaker5771 Рік тому

    Wow we need more such of this lec!

  • @foodmaker5771
    @foodmaker5771 Рік тому

    Thanks for your valuable info

  • @Tommm73
    @Tommm73 Рік тому

    Thank you Ashok -. Great work, great presentation, and as one of those 100,000 FSD Beta drivers it’s exciting to see what’s under the hood.

  • @explor794
    @explor794 Рік тому

    Brilliant, I hope you got a lot of stock options for your work.

  • @frangalarza
    @frangalarza Рік тому

    I like the approach. It reminds me a lot to Tesla and Comma. I wonder how you compete with these 2

  • @gaussian3750
    @gaussian3750 Рік тому

    very good keynote!

  • @RealRusty
    @RealRusty 2 роки тому

    Very insightful, thanks!

  • @zshn25
    @zshn25 2 роки тому

    How is the occupancy network trained? Where is the supervision coming from? As was mentioned, NeRF could be an additional way to supervise it but what is the primary supervision?

  • @yy-xv9vw
    @yy-xv9vw 2 роки тому

    Fascinating! Note that the fake v.s. real pedestrian issue, I actually encountered it today myself! I made a video about it, at ua-cam.com/video/Dvr9qbs1PYE/v-deo.html

  • @suryatamilan576
    @suryatamilan576 2 роки тому

    தமிழன் ❤️🔥

  • @frangalarza
    @frangalarza 2 роки тому

    An interesting exercise is to see how many views the videos on this channel get. They're all around ~1K except the Tesla ones :D

  •  2 роки тому

    It would be interesting to find an intersection like ua-cam.com/video/jPCV4GKX9Dw/v-deo.html where the left lane is the end of a ramp/bridge. So that looking flat and straight left would not actually indicate the correct road surface.