Google Children wooden educational toy's Montessori math , Did you know we all played with those didn't sink in then, Well these structure are practically the basic shape which also to find in calculus. If you can't rotate them in you're head get them and visualize the image space. These shapes also hold structures done with programming. It might seem simplistic after all these years but breaking you're head to grasp something that has these shape why not. Circle -> loop , iteration, segmentation etc. Try fill in the blanks with Square, Triangle and Pentagon shape. and cube some. We all are well endowed with knowledge use it.
there should be a "game" in model 3 where people can tag things like traffic lights and other unsolved obstacles manually so the AI is learning from as many humans as possible. Maybe you reward them with free supercharging or something. 🤘😜
I think we all do that manually every time we take over and correct the autopilot when driving a Tesla. I bet Tesla uploads the data and uses the action taking as the training feedback
Cleber Zarate Disengaging autopilot does not label objects. If a deer runs across the road and the driver swerves to avoid hitting it, autopilot will disengage and the driver’s actions can be used to teach the neural net how to react in that situation. But driver input(steering, braking) does not label the object. You can teach a neural net how to properly recognize objects but it requires labeled data. So the OP is correct that this could be helpful. Whether Tesla needs help labeling objects 🤷♂️ .
@@dougdstecklein you're right, it won't label it but it will vastly reduce the amount of data to go through, as you could, for instance, know all those pictures had a red light since the car had to step on the brake coming to an intersection when no objects were in front of it. See what I'm talking about?
I still think they currently hold back a huge step forward due to inconsistency. Probably navigation of roundabouts and automatic traversing of intersections with yield, stop signs and traffic lights straight.
@@RubenKelevra I totally agree. We've seen Green use stop sign and traffic recognition on his S months ago and it was pretty damn good. Not flawless tho, and you can't mess around with a red light. I wish we could opt into some extreme beta program :)
@@DirtyTesla Mapillary shows a bit more in depth what they are capable of in terms of detection. ua-cam.com/video/3IIlc0HzES0/v-deo.html You can even go on their website and look at pictures other people has provided, so completely different angles, cameras, countries and climates and their detections are pretty much spot on when it comes to "where are cars? where is the road? Where are obstacles?" And the sign detection is able to identify the most important signs as far as a human would be able to, but with an average smartphone as camera.
I slowed the video down to 0.75 and it was finally intelligible. Speaking as fast as he is doing is not a good trait. If you want people to understand and remember what you said; you need to let them time to process it. Quite the irony for someone working in data processing. Does it believe we are all machines?
It's interesting to hear your opinions - I've been living in the US for 14 years and have no problem understanding him, as he's still relatively slower than how people generally speak (perhaps not in presentations). He has an accent which makes slight harder to understand than a native but it's still not as fast as it appears. But yeah I definitely would not have kept up 14 years ago, the 0.75 is a nice trick you guys got there, I wish I had that back then
talking fast makes him able to make complex points while still keeping the listeners attention. I have no trouble following along, and I'm not a native speaker.
@@joelodlund6979 I'm thinking the issue with the native speakers complaining here must be because most of the tech vocabulary he uses are unfamiliar terms to them. Complainers, please enlighten me. I'm used to most if not all of those terms given my professional background. How about you? My brain processing will certainly slowdown when I encounter a completely new subject regardless of the languages I'm fluent at
@@rangv733 So, far I've used RNNs for De Novo Drug generation, ie. create new potential drugs from scratch. You can read more here - www.wildcardconsulting.dk/teaching-computers-molecular-creativity/ Another team, which uses a very similar technique provided a good visualization- github.com/MarcusOlivecrona/REINVENT/blob/master/images/celecoxib_analogues.gif
This guy should really be appretiated...using State-of-the-art algorithms directly into Production, its a big risk but also a big achievement...plus Tesla's approach is safer to Human eyes as certain LiDars can cause blindness.
In my opinion, this roll-out without having injuries or fatalities was one of the greatest engineering accomplishments of the last decade. However, there have been some dangerous near-misses with recent versions, and I left a comment on my other account under this video about one of them. The comment was removed. I think suppressing critical comments is dangerous and is an abuse of UA-cam's moderation system.
The current generation of autopilot relies on the owner paying attention and be ready to intervene so I would like to know more about how these near-misses were so terrible.
@@cleberz8072 ua-cam.com/video/fKyUqZDYwrU/v-deo.html Its not that they are "so terrible". Its that its a bad idea to pretend they don't happen and hide comments about them. What I was asking was for Karpathy to address that particular near miss. One big thing is actually that the owner in the video says that he believes that Tesla will automatically receive a bug report. But I have a feeling there is actually not an automatic way for Tesla to know that this disengagement was a bug rather than a normal disengagement. So at the very least, there needs to be an easy way to report these "life on the line" bugs and all Tesla owners need to be properly informed about it if/when that exists.
@@RichOrElse There is nothing to distinguish driver input in a minor situation or from a driver that likes to give unnecessary input from a life-threatening situation.
@@pauljnellissery7096 that they're terrified of musk showing up at the office because he fires people on a whim, the work hours are way too long and the pay is way lesser than their counterparts in other companies like google, apple, microsoft etc, which have more flexible work hours and better pay ( and pretty much have the same hiring standards). Although working at tesla would look really good on my resume so i'd probably take it up if i had a chance lol
Pytorch configuration in Linux environment can take up to 3hrs esp if ur building it freshly from source. However, it's one of my favorite deep learning framework's besides keras an Tensorflow. Great presentation pls keep it up n coming.
That is kind of obvious. Technology is nothing in front of nature. This is why AI is so hyped up right now even though it is good only in a few very specific tasks.
Just remember that vision had 543 million years to evolve. Computer vision algorithms are here only for 55 years. Also extremely impressive how far we have come in such a short time.
This statement is true yet misleading, we can perform driving, but we don't do it well. We screw up often in many ways every time we drive a car, we all get some form of road rage and some point, we all have attention issues when it comes to driving, we're terrible at staying centered in our lanes and following the rules of the road. Driving is easy, but we're relatively bad at it. Autopilot however has none of these issues, it never gets tired, or needs a break, nor has road rage nor attention issues. It is trained and drives the car while having vastly better vision and situational awareness of the cars environment. All it takes is improving the software and convolutional neural networks as described in this video as well as a few others from Andrej
144 TOPS is 144 trillion operations per second! it is an astronomical figure that even nvidia doesnt have at that watt hours! it deserves title "insane". Imagine when you get a 300TOPS chip on a phone, laptop, watch, ipads! that is godly power..
I would like to see how they achieved that given NVidia's best accelerators right now are 47 TOPS while consuming 2,5 times more power. It's either a breakthrough or a lie. Edit: ah, nevermind, I was looking at 5 year old gpu accelerators. NVidia doesn't say how do modern cards do in int8 TOPS, but they have around 130 TFLOPS Tensor-wise
*My takeaways:* 1. They use shared backbone network because if each task has its own neural network, the computation is not affordable 3:00 2. Their inference hardware 9:00
I wish my Tesla had a "learn/train" button when I'm driving around and I KNOW that the car won't be able to handle the upcoming traffic circle, for example. I would hit "train" in advance and gather data for the next 5 minutes to be sent to Tesla for their database to watch and learn how I drove the car around the circle, dodged the incoming cars from the left and then from the right, and maneuvered the car over to take the correct exit. I was wondering if Tesla remembers or builds up a view from my car and other Teslas driving around that particular traffic circle? If not, why not?
The cars are "learning" whether autopilot is engaged or not, every mile you drive is being recorded by the cameras for the company to fetch thousands of different specific events and study how the car behaves in those
They use GPU's (it was in presentation). And it takes a lot of time. 70.000+ GPU hours for full stack (1 nod with 8 GPU would take more then a year. My guess is they have many nods with lots of GPU's but not sure how many. If they would have 70.000 GPU's that means they can train full stack in 1 hour (70.000 GPU's x 1 hours = 70.000 GPU hours), but that would be huge super computer. You can put around 20-ish nods in one server rack (42U) so that means one rack would have around 160 GPU cards. In order to train this network in relative fast time, lets say you have 20 rack servers that would give you 20 racks x 160 GPU's = 3200 GPU's x 24 hours per day = 76.000 GPU hours. So every 24 hours they can train network again. Network = each time they want to train / upgrade network, they would need to wait 24 hours to see if new network is better then older one. In short, they use a lot of resources to make this work, and he also talked about Dojo project. Tesla Dojo is super powerful training computer that would replace GPU's (this is my best guess). It's dedicated hardware and Dojo can possible improve performance bt factor x10-20-ish so that would means that if they need now 24 hours to train full network, it would only take 2.4 hours. This will speed up things and they can test more variations and what not.
In the past Tesla has used AWS to host a lot of their backend services. I know at one point it was reported that some AWS instances were mentioned being used for Autopilot training, but that was a few years ago and I haven't heard anything new since then. I suspect they are still using cloud services for now - with the plan being to move things to their new Dojo hardware once it's up and ready. They showed the development timeline for the FSD computer in their Investor Autonomy event, and if Dojo is following a similar timeline it will probably be up and ready in the next year which will line up nicely with their plans for ramping up autopilot capabilities (like the Taxi network).
@@rkan2 3200 GPU cards would be around 1 MW (give or take few, since you need to power servers as well, network hardware and everything in between, not just GPU's). 1 MW peak usage is a lot, but nuclear plants can do anywhere from 550-ish megawatts (MW) up to 4 GW-ish (that's 4000 MW). Even so 1 MW is huge amount of power usage and it makes sense for them to try to find better way to do the processing, ergo Dojo project.
@@ipconfigrenew I honestly dont know if they using AWS or dedicated clusters, I was just doing math based on some simple numbers. I dont have any inside Tesla information :) I work in industry (servers, cloud etc) so I just run the basic numbers for fun! But for sure Dojo project could make a lot of difference for them if they can build it cost effective.
Super Interesting. If you follow his speech by mimicking it in your own thought speech mind it’s actually easy to keep up with him while taking all of this information in. Great technique
@@Lord2225 I wonder if there is a point in that if a classifier net relies on finer precision than 8bit that it's too fragile. Maybe sigmoid invites fine balances and some things need threshold.
@@DanFrederiksenIt makes sense. The average activation of (neurons or layers) is on close to zero and there is no large standard deviation. even using better functions than sigmoid (elu, relu). In general, you can do 8 bit multiplication and 16 bit sum if someone is worried about a problem. heartbeat.fritz.ai/8-bit-quantization-and-tensorflow-lite-speeding-up-mobile-inference-with-low-precision-a882dfcafbbd ~ you can get better results by comparing time petewarden.com/2016/05/03/how-to-quantize-neural-networks-with-tensorflow/ ~ tricks removing problems with low precision.
Tesla paygrade based on the number of times you can drop "order of magnitude" into your presentations. Paygrade escalation rate is of course an order of magnitude greater than order-of-magnitude-isms / hour * base-pay-rate
There are 1000 things the full network is trying to do (such as label curbs, lights, other cars, is the car going to cut me off, etc.) Each of these 1000 things will have their own accurcy across their test data (labeling a light a light and not labeling a stop sign a traffic light, etc). When you regress, you are losing accuracy. So they might be at 99% accuracy in labeling stoplights, but as they train to recognize curbs for the Smart Summon feature, the network might forget something about recognizing stoplights, and that 1 prediction now has regressed to 98.5% accuracy. They want to make sure they gather more information into the network without losing anything they previously learned.
@@jacobholloway7653 Thanks Jacob! That makes sense. In this context, regress = loosing accuracy. I looked up the definition just to add: return to a former or less developed state.
I am humbled and beyond thankful to the Andrej Karpathy, the Ai team, and Elon Musk for providing the service of uploading human consciousness into electronics such as Teslas. #ForeverGrateful
The fact that Elon Musk companies manufacture their own components is how they can price their products a little bit lower and also have control and provide quality. Ex : SpaceX
Waymo uses a completely different stack. Not comparable. But I would guess that Tesla is significantly ahead on SLAM with vision-only, while Waymo circumvents that with LIDAR for the SLAM and uses vision to complement the data from their lidar
@@BosonCollider hmm is there a video or page that gives an intro on how Waymo did their autonomous driving systems? I would hold on the thoughts that 'Tesla is much better'. Tesla's autopilot was officially ranked 2 levels below Waymo's. Using LIDAR is better than not using it. The only obvious advantage of Tesla's autopilot is that they have much much more real world data and they dared to put not finished product into test/production. (the smart summon)
@@Dan-xl8jv Waymo's implementation is not general purpose, but rather based on defining geo-fenced areas and then training their system for each individual pocket they want to use. It will do a good job for its intended niche, for example serving a campus perhaps. But Waymo's approach is not scalable on its own, it's limited to those geofenced areas and the training tailored for each of them they add. It's not meant to tackle the larger more "general purpose" challenge of a car being able to drive itself across multiple states, for example. That requires a slightly different kind of skillset on behalf of the AI in order to handle, much more complex problem to tackle, so in that regard Tesla is way ahead in the field, but the question is going to be "when" they'll be able to be able to achieve that holy grail of automation. Nobody else has all those miles of training data needed to accomplish the task, by many orders of magnitude, so if they can't do it, I don't know who else can really.
Any thoughts about using extra data, like from V2X /V2M sources? It would be like cheating, I know, BUT why not use what is available to train the NN even faster? I would imagine even adding V2X /V2M hardware in large cities like New York, LA, San Francisco might be cost effective.
if that was done in a day, the math has it would require 17MWh. According to the EIA website the smallest American nuclear plant (R.E. Ginna) with a 584MW capacity would generate over 13GWh in 24hours so seems like you're off by a few zeros. 17MWh is the equivalent to 170 Model S P100D batteries and the nuclear power plant would be able to supercharge them all in 4h using only 4.5MW, which is less than 1% of such power plant capacity. That said, they probably use way less than that thanks to the fact the multitasking he refers to in the 48GPU system likely parallelizes tasks heavily so the max power they'll need at a given time is 250W*48=12kW which is the equivalent of a basic Tesla Solar Roof.
How many of you checked the playback speed?
looks like Abigail Doolittle from bloomberg 1.5x speed
came here to say that. might be the only video ever worth watching at .75
Dead serious, many videos I watch on youtube is on 1.25 or often 1.5 speed. but this guy, everything is 0.75, just to make sure I dont miss anything
usually 1.5 but his 1.25
yessss
Some people have great empathy, Andrej has great carpathy.
*self-driving carpathy. (Some people are self-driven by empathy. Andrej is a self-driving carpathy.
it's karpathy actually
@@ashwithabanoth3520 Dude ever heard of puns?
Karpathy, very fitting name
Agent Office LOL true. Never realized. 🤣🤣🤣
might explain pls?
Julio Chao he has ‘kar’ in his name and works for a ‘car’ company. :)
@@gridcoregilry666 Kar-Pathy -> Car + Path -> self driving cars.
@@gridcoregilry666 car path
*Me who’s struggling even with my basic calculus class:
Fascinating
All you need is curiosity and resilience! You got this dude!
That's the mark of a true teacher, they can make the most difficult of concepts seem easy and fascinating
Google Children wooden educational toy's Montessori math , Did you know we all played with those didn't sink in then, Well these structure are practically the basic shape which also to find in calculus. If you can't rotate them in you're head get them and visualize the image space.
These shapes also hold structures done with programming. It might seem simplistic after all these years but breaking you're head to grasp something that has these shape why not.
Circle -> loop , iteration, segmentation etc. Try fill in the blanks with Square, Triangle and Pentagon shape. and cube some. We all are well endowed with knowledge use it.
There's very little calculus in neural networks besides differentials (gradients).
You don't need to know calculus for deep learning. Like you don't need to learn assembly language for building web apps.
11:02: "Thank you"
11:03: Exit to work
😂😂😂😂
he had to train neural networks at scale he was busy
Always great to hear Andrej's talks. He's left an indelible impression on my research career in deep learning through CS231N
🙏
CS231n is IMO the BEST online CS course on the internet.
can i take the course if im not a stanford student?
@@aishahsofea3128 yes
@@aishahsofea3128 ua-cam.com/play/PL3FW7Lu3i5JvHM8ljYj-zLfQRF3EO8sYv.html&feature=share
There you go.
That was awesome. Lots of new insights beyond what was presented at Autonomy Day. I wish he had 1 hour to talk.
there should be a "game" in model 3 where people can tag things like traffic lights and other unsolved obstacles manually so the AI is learning from as many humans as possible. Maybe you reward them with free supercharging or something. 🤘😜
I think we all do that manually every time we take over and correct the autopilot when driving a Tesla. I bet Tesla uploads the data and uses the action taking as the training feedback
This is a great idea, you should somehow share this
Cleber Zarate
Disengaging autopilot does not label objects.
If a deer runs across the road and the driver swerves to avoid hitting it, autopilot will disengage and the driver’s actions can be used to teach the neural net how to react in that situation.
But driver input(steering, braking) does not label the object.
You can teach a neural net how to properly recognize objects but it requires labeled data.
So the OP is correct that this could be helpful.
Whether Tesla needs help labeling objects 🤷♂️ .
@@dougdstecklein you're right, it won't label it but it will vastly reduce the amount of data to go through, as you could, for instance, know all those pictures had a red light since the car had to step on the brake coming to an intersection when no objects were in front of it. See what I'm talking about?
Many captchas on websites allow humans to label such data.
Amazing stuff. I love seeing this stuff improve every few weeks first hand.
I still think they currently hold back a huge step forward due to inconsistency. Probably navigation of roundabouts and automatic traversing of intersections with yield, stop signs and traffic lights straight.
@@RubenKelevra I totally agree. We've seen Green use stop sign and traffic recognition on his S months ago and it was pretty damn good. Not flawless tho, and you can't mess around with a red light.
I wish we could opt into some extreme beta program :)
@@DirtyTesla Mapillary shows a bit more in depth what they are capable of in terms of detection.
ua-cam.com/video/3IIlc0HzES0/v-deo.html
You can even go on their website and look at pictures other people has provided, so completely different angles, cameras, countries and climates and their detections are pretty much spot on when it comes to "where are cars? where is the road? Where are obstacles?"
And the sign detection is able to identify the most important signs as far as a human would be able to, but with an average smartphone as camera.
no LIDAR? just wowww
As always, Andrej is the best in explaining Computer Vision :)
High levels of intelligence and passion always make for an astounding presentation. Andrej is the f*ing man.
Wow Karpathy's a fast talker, its like the video was sped up.
I slowed the video down to 0.75 and it was finally intelligible. Speaking as fast as he is doing is not a good trait. If you want people to understand and remember what you said; you need to let them time to process it.
Quite the irony for someone working in data processing. Does it believe we are all machines?
It's interesting to hear your opinions - I've been living in the US for 14 years and have no problem understanding him, as he's still relatively slower than how people generally speak (perhaps not in presentations). He has an accent which makes slight harder to understand than a native but it's still not as fast as it appears. But yeah I definitely would not have kept up 14 years ago, the 0.75 is a nice trick you guys got there, I wish I had that back then
Yeah I was like, "Did I already speed this up? Oh, it's just him."
talking fast makes him able to make complex points while still keeping the listeners attention. I have no trouble following along, and I'm not a native speaker.
@@joelodlund6979 I'm thinking the issue with the native speakers complaining here must be because most of the tech vocabulary he uses are unfamiliar terms to them. Complainers, please enlighten me. I'm used to most if not all of those terms given my professional background. How about you? My brain processing will certainly slowdown when I encounter a completely new subject regardless of the languages I'm fluent at
great talk.
try to give the photo of the speaker on the thumbnail.
thanks
Hi Samin. Thanks for the feedback, we'll be sure to pass it along to our team!
I saw the name and I had to click this, Andrej Karpathy is one of the greats.
I did..lol
I've used Andrej's RNN techniques to do significant work in medicinal chemistry - great stuff!!!
Hey there. For what have you used it ?
Humble brag
@@rangv733 So, far I've used RNNs for De Novo Drug generation, ie. create new potential drugs from scratch. You can read more here - www.wildcardconsulting.dk/teaching-computers-molecular-creativity/
Another team, which uses a very similar technique provided a good visualization-
github.com/MarcusOlivecrona/REINVENT/blob/master/images/celecoxib_analogues.gif
Andrej Karpathy exquisite technical explanation on Tesla Autopilot 👍👍👍
6 people who have disliked the video are from Waymo :)
Tesla is winning autonomy folks. Watch the stock price over next 2 years. Should look like a falcon heavy launch accelerating into atmosphere.
Yup. Bought in june at 199USD. Thought I was late. Now in november its 350USD.
Falcon heavy? I think it'll look more like Starship 😎
@@CreativeBuilds Damnit you're right! It's gonna be epic bro
I dunno, waymo isn't sleeping either
Bump
No need to set the speed to 1.25x when Andrej is doing a presentation
This guy should really be appretiated...using State-of-the-art algorithms directly into Production, its a big risk but also a big achievement...plus Tesla's approach is safer to Human eyes as certain LiDars can cause blindness.
the number of knowledgeable folks in comments section is just overwhelming
In my opinion, this roll-out without having injuries or fatalities was one of the greatest engineering accomplishments of the last decade. However, there have been some dangerous near-misses with recent versions, and I left a comment on my other account under this video about one of them. The comment was removed. I think suppressing critical comments is dangerous and is an abuse of UA-cam's moderation system.
The current generation of autopilot relies on the owner paying attention and be ready to intervene so I would like to know more about how these near-misses were so terrible.
@@cleberz8072 ua-cam.com/video/fKyUqZDYwrU/v-deo.html Its not that they are "so terrible". Its that its a bad idea to pretend they don't happen and hide comments about them. What I was asking was for Karpathy to address that particular near miss. One big thing is actually that the owner in the video says that he believes that Tesla will automatically receive a bug report. But I have a feeling there is actually not an automatic way for Tesla to know that this disengagement was a bug rather than a normal disengagement. So at the very least, there needs to be an easy way to report these "life on the line" bugs and all Tesla owners need to be properly informed about it if/when that exists.
@@runvnc208 according to Elon during autopilot all driver input are considered errors, which is an automatic bug report.
Losin UA-cam comments on random videos is not a conspiracy to silence you...
@@RichOrElse There is nothing to distinguish driver input in a minor situation or from a driver that likes to give unnecessary input from a life-threatening situation.
He's breaking my neural net with his speech speed...
Anytime I think I have got a hang of how to use deep learning, get to see a video like this.
Brings back 231N memories!
this would be a dream job, damn working on AI at Tesla..
The pay and work hours are crap though
@@gregh5061 how?
@@gametony947 ive had some developers talk about it to me. mechanical engineers and a few computer science professionals.
@@gregh5061 what did they say
@@pauljnellissery7096 that they're terrified of musk showing up at the office because he fires people on a whim, the work hours are way too long and the pay is way lesser than their counterparts in other companies like google, apple, microsoft etc, which have more flexible work hours and better pay ( and pretty much have the same hiring standards). Although working at tesla would look really good on my resume so i'd probably take it up if i had a chance lol
Pytorch configuration in Linux environment can take up to 3hrs esp if ur building it freshly from source. However, it's one of my favorite deep learning framework's besides keras an Tensorflow. Great presentation pls keep it up n coming.
Omg, he speak accurately and so fast , he's smart
This was very insightful, would love to get a follow up!
He is speaking like x1.5. So I reduced the playback speed to 0.75 and it makes more sense now.
I love PyTorch and Tesla
i am proud to be a python machine learner prodigy after seeing this video
It's crazy how the human brain can perform the task so easily, yet state of the art computers and algorithms find it very difficult.
That is kind of obvious. Technology is nothing in front of nature. This is why AI is so hyped up right now even though it is good only in a few very specific tasks.
Just remember that vision had 543 million years to evolve. Computer vision algorithms are here only for 55 years. Also extremely impressive how far we have come in such a short time.
This statement is true yet misleading, we can perform driving, but we don't do it well. We screw up often in many ways every time we drive a car, we all get some form of road rage and some point, we all have attention issues when it comes to driving, we're terrible at staying centered in our lanes and following the rules of the road. Driving is easy, but we're relatively bad at it. Autopilot however has none of these issues, it never gets tired, or needs a break, nor has road rage nor attention issues. It is trained and drives the car while having vastly better vision and situational awareness of the cars environment. All it takes is improving the software and convolutional neural networks as described in this video as well as a few others from Andrej
@@TheZeeray and it can use the turning signal! Way ahead of my fellow drivers
They used to say the same about adding big numbers. That day has long passed. Soon it will be the same with driving a car.
I learned about machine learning from Andrej in a video 4 years ago whoa
144 TOPS is 144 trillion operations per second! it is an astronomical figure that even nvidia doesnt have at that watt hours! it deserves title "insane". Imagine when you get a 300TOPS chip on a phone, laptop, watch, ipads! that is godly power..
to check out instagram :D
Jap, that is impressive
I would like to see how they achieved that given NVidia's best accelerators right now are 47 TOPS while consuming 2,5 times more power. It's either a breakthrough or a lie.
Edit: ah, nevermind, I was looking at 5 year old gpu accelerators. NVidia doesn't say how do modern cards do in int8 TOPS, but they have around 130 TFLOPS Tensor-wise
MAGICAL!!!!! Elon, Andrej, Pete Bannon, etc. etc. etc., OUR ONE WORLD LOVES YOU!
See you on Mars___'One World 2.0'
Is he talking really fast or there is a playback speed bug in UA-cam?
*My takeaways:*
1. They use shared backbone network because if each task has its own neural network, the computation is not affordable 3:00
2. Their inference hardware 9:00
1. 브스스 + 브오스 + 뎁스 구현
2. OTA 구현
3. 운전자 정보로 검증 shadow mode 구현
indeed the right kind of business path and beyond kudos to ALL
Brilliant session, thanks for the info.
Great Presentation
wow well done !
PyTorch is really cool. 😎 😎 😎
Another reason why Tesla will completely dominate, glad he is on our side.
to understand this guy, i had to put the vid on 0.5 speed :P
PyTorch is quite good.
I wish my Tesla had a "learn/train" button when I'm driving around and I KNOW that the car won't be able to handle the upcoming traffic circle, for example. I would hit "train" in advance and gather data for the next 5 minutes to be sent to Tesla for their database to watch and learn how I drove the car around the circle, dodged the incoming cars from the left and then from the right, and maneuvered the car over to take the correct exit. I was wondering if Tesla remembers or builds up a view from my car and other Teslas driving around that particular traffic circle? If not, why not?
The cars are "learning" whether autopilot is engaged or not, every mile you drive is being recorded by the cameras for the company to fetch thousands of different specific events and study how the car behaves in those
I thought there was a button exactly for this. You press it when your car didn't drive ideally.
The intro music is lit.. does anyone know the music?
Is it speed up 3 times ? ;)
Does Tesla train using their own hardware or in the cloud? And if so, how long does training take with these methods?
They use GPU's (it was in presentation). And it takes a lot of time. 70.000+ GPU hours for full stack (1 nod with 8 GPU would take more then a year. My guess is they have many nods with lots of GPU's but not sure how many. If they would have 70.000 GPU's that means they can train full stack in 1 hour (70.000 GPU's x 1 hours = 70.000 GPU hours), but that would be huge super computer. You can put around 20-ish nods in one server rack (42U) so that means one rack would have around 160 GPU cards. In order to train this network in relative fast time, lets say you have 20 rack servers that would give you 20 racks x 160 GPU's = 3200 GPU's x 24 hours per day = 76.000 GPU hours. So every 24 hours they can train network again. Network = each time they want to train / upgrade network, they would need to wait 24 hours to see if new network is better then older one. In short, they use a lot of resources to make this work, and he also talked about Dojo project. Tesla Dojo is super powerful training computer that would replace GPU's (this is my best guess). It's dedicated hardware and Dojo can possible improve performance bt factor x10-20-ish so that would means that if they need now 24 hours to train full network, it would only take 2.4 hours. This will speed up things and they can test more variations and what not.
In the past Tesla has used AWS to host a lot of their backend services. I know at one point it was reported that some AWS instances were mentioned being used for Autopilot training, but that was a few years ago and I haven't heard anything new since then. I suspect they are still using cloud services for now - with the plan being to move things to their new Dojo hardware once it's up and ready. They showed the development timeline for the FSD computer in their Investor Autonomy event, and if Dojo is following a similar timeline it will probably be up and ready in the next year which will line up nicely with their plans for ramping up autopilot capabilities (like the Taxi network).
It seems they need about one nuclear plant's hourly production to train the network once... That is quite the electricity cost. (250W*70000 hours)
@@rkan2 3200 GPU cards would be around 1 MW (give or take few, since you need to power servers as well, network hardware and everything in between, not just GPU's). 1 MW peak usage is a lot, but nuclear plants can do anywhere from 550-ish megawatts (MW) up to 4 GW-ish (that's 4000 MW). Even so 1 MW is huge amount of power usage and it makes sense for them to try to find better way to do the processing, ergo Dojo project.
@@ipconfigrenew I honestly dont know if they using AWS or dedicated clusters, I was just doing math based on some simple numbers. I dont have any inside Tesla information :) I work in industry (servers, cloud etc) so I just run the basic numbers for fun! But for sure Dojo project could make a lot of difference for them if they can build it cost effective.
Amazing design and engineering!
Super Interesting. If you follow his speech by mimicking it in your own thought speech mind it’s actually easy to keep up with him while taking all of this information in. Great technique
Tell me you are smart without telling me you are smart: I watch Andrej at 1.25x speed
Are the slides available anywhere?
Where can I find a written version of this?
Nice tutorial, *Car pathy*
I love Andrej so fucking much
Watch it at 0.75x
Is int8 good enough?
Yes. While high accuracy is required during training, on prediction you can round the calculation to 16 bits or even 8.
@@Lord2225 I wonder if there is a point in that if a classifier net relies on finer precision than 8bit that it's too fragile. Maybe sigmoid invites fine balances and some things need threshold.
@@DanFrederiksenIt makes sense. The average activation of (neurons or layers) is on close to zero and there is no large standard deviation. even using better functions than sigmoid (elu, relu). In general, you can do 8 bit multiplication and 16 bit sum if someone is worried about a problem.
heartbeat.fritz.ai/8-bit-quantization-and-tensorflow-lite-speeding-up-mobile-inference-with-low-precision-a882dfcafbbd ~ you can get better results by comparing time
petewarden.com/2016/05/03/how-to-quantize-neural-networks-with-tensorflow/ ~ tricks removing problems with low precision.
Only in bad models weights explode to huge numbers.
@@Lord2225 Also there are other ways to compress learned data: twitter _ com/NENENENENE10/status/1151530562844332033
At 8:26, when he says predictions can't regress, what does he mean ?,
Any explanations/links ?
Basically, when adding new functionality to the autopilot, it should be tested to ensure that the existing functionality doesn't break/get worse.
very smart man
This is probably the first time i've had to slow a video down to 0.75 instead of speed it up
Fascinating
Tesla paygrade based on the number of times you can drop "order of magnitude" into your presentations. Paygrade escalation rate is of course an order of magnitude greater than order-of-magnitude-isms / hour * base-pay-rate
Runs away after finishing his talk
These are probably lightning talks.
It is just to keep up with the talking speed.
Doesn't want to answer questions
Is that Hwy 403 on the way to Hamilton?
*Switch speed from normal to 0.75*
I usually watch at x1.25 or even x1.5 but I had to watch x0.75 for this
I thought it was just me hehehe
There must something wrong with the editing of this video. Is the speed set higher? Had to set the video speed to 0.75 to watch and understand.
I heard him in other videos, this is how he talks
The nerdiest presentation ever! Love it
Who's here after lecun roasted Elon?
Look where our badmephisto is now
Wait... This is badmephisto?
@@petko4733 yaa, he is the one that teach us on tubing, can watch him at his old tube 'badmephisto'
8:27 what does he mean by "make sure that none of this 1000 predictions that we make, can regress"?
There are 1000 things the full network is trying to do (such as label curbs, lights, other cars, is the car going to cut me off, etc.)
Each of these 1000 things will have their own accurcy across their test data (labeling a light a light and not labeling a stop sign a traffic light, etc).
When you regress, you are losing accuracy. So they might be at 99% accuracy in labeling stoplights, but as they train to recognize curbs for the Smart Summon feature, the network might forget something about recognizing stoplights, and that 1 prediction now has regressed to 98.5% accuracy.
They want to make sure they gather more information into the network without losing anything they previously learned.
@@jacobholloway7653 Thanks Jacob! That makes sense. In this context, regress = loosing accuracy.
I looked up the definition just to add: return to a former or less developed state.
So pytorch >> tensorflow?
I guess yes
I am humbled and beyond thankful to the Andrej Karpathy, the Ai team, and Elon Musk for providing the service of uploading human consciousness into electronics such as Teslas. #ForeverGrateful
???
Wow.
Whoa, epic !!!!
And what about if you use tensorflow.
It would probably run too hot on nvidia.
Visualization about Recurrent Network can be referred here: vision.stanford.edu/pdf/KarpathyICLR2016.pdf
Wondering how many cars got smashed in the process of actually getting this to work initially.
He speaks so fast, I thought my video speed was 1.5x
me too..
The end game: Operation Vacation
The fact that Elon Musk companies manufacture their own components is how they can price their products a little bit lower and also have control and provide quality. Ex : SpaceX
FUCKING AWESOME
Amazing technology, apple also uses python a lot for their ml projects.
never knew pytorch is used in tesla
I'm from twitter
When your brain is faster than normal...you speak faster than normal
Why is he speaking so fast?
@schräg schau dir das mal an als Autopilot-Tester ;)
I thought Andrej Karpathy is 70+ years old person !
WOW.
And people still think Waymo is ahead lol
How do you arrive to this conclusion? Does Waymo have tech talk about the stack they use?
Why would this video suggest that waymo is not ahead?
Waymo uses a completely different stack. Not comparable. But I would guess that Tesla is significantly ahead on SLAM with vision-only, while Waymo circumvents that with LIDAR for the SLAM and uses vision to complement the data from their lidar
@@BosonCollider hmm is there a video or page that gives an intro on how Waymo did their autonomous driving systems? I would hold on the thoughts that 'Tesla is much better'. Tesla's autopilot was officially ranked 2 levels below Waymo's. Using LIDAR is better than not using it. The only obvious advantage of Tesla's autopilot is that they have much much more real world data and they dared to put not finished product into test/production. (the smart summon)
@@Dan-xl8jv Waymo's implementation is not general purpose, but rather based on defining geo-fenced areas and then training their system for each individual pocket they want to use. It will do a good job for its intended niche, for example serving a campus perhaps.
But Waymo's approach is not scalable on its own, it's limited to those geofenced areas and the training tailored for each of them they add. It's not meant to tackle the larger more "general purpose" challenge of a car being able to drive itself across multiple states, for example.
That requires a slightly different kind of skillset on behalf of the AI in order to handle, much more complex problem to tackle, so in that regard Tesla is way ahead in the field, but the question is going to be "when" they'll be able to be able to achieve that holy grail of automation.
Nobody else has all those miles of training data needed to accomplish the task, by many orders of magnitude, so if they can't do it, I don't know who else can really.
Now do one called _Rent-Seeking at Tesla._
Any thoughts about using extra data, like from V2X /V2M sources? It would be like cheating, I know, BUT why not use what is available to train the NN even faster? I would imagine even adding V2X /V2M hardware in large cities like New York, LA, San Francisco might be cost effective.
Here is a job I can never get ... head of AI at Tesla
Please make Sentry Mode on CLOUD. Tesla can make money and owners can subscribe
9:01 FSD computer discussion
Does anyone have any papers or examples of “hydra nets” like this? I want to implement a system with a few hydra heads.
I guess they have came up with that term. Look for Feature Pyramid Networks, concept seems to be same.
It seems they need about one nuclear plant's hourly production to train the network once... That is quite the electricity cost. (250W*70000 hours)
if that was done in a day, the math has it would require 17MWh. According to the EIA website the smallest American nuclear plant (R.E. Ginna) with a 584MW capacity would generate over 13GWh in 24hours so seems like you're off by a few zeros. 17MWh is the equivalent to 170 Model S P100D batteries and the nuclear power plant would be able to supercharge them all in 4h using only 4.5MW, which is less than 1% of such power plant capacity.
That said, they probably use way less than that thanks to the fact the multitasking he refers to in the 48GPU system likely parallelizes tasks heavily so the max power they'll need at a given time is 250W*48=12kW which is the equivalent of a basic Tesla Solar Roof.