I haven't commented on a youtube video since 2017. But I have to, in the slim case that you actually read this comment Adrej! Please keep doing what you are doing! You are an absolute gem of an educator, and the millions of minds you are enlightening with each video will do great things that will compound and make the world a better place.
@@AndrejKarpathy you sure made YT comments section a better place lol.. Excellent videos, please keep them coming, or shall I say make more! Thank you!!
Update: I added some suggested exercises to the description of the video. imo learning requires in-person tinkering and work, watching a video is not enough. If you complete the exercises please feel free to link your work here. (+Feel free to suggest other good exercises!)
I don't know if this is a good place for Q&A but there is something I need to ask that I cant't wrap my head around. I was training the trigram language model and loss was less than for the bigram language model but the model was worse. I tried to generate a few names and I reliased I made a huge error in data preparation. The question I have is how big of an indicator is loss? Is loss the only thing that matters for model performance. I understand there are other metrics of model perfromance. I have actually faced something in my work. I am stabilizing a video using IMU sensor. And I am training a NN for camera pose estimation. For different architectures the lower loss models don not necessarily perform better. When our team looks at the stanilized video many times the model with higher loss generates a visually better stabilized video. I don't quiet understand this. That's why I am asking how much is loss the indicative of model performance. I don't expect you to answer this here but if you may talk abou this in your future lectures or somewhere else.
My loss for trigrams, count model, using all data for training was around 2.0931 and I was able to get close with NN approach. I'm not sure if the resulting names were better, but I wasn't able to generate the exactly same names with the count and NN approaches anymore (even using the same generator). Also I'm not sure how to best share/link my solution (I have the notebook on my local drive).
I built the trigram model by concatenating the one hot encoded vector for the first two letters & feed them through a neuron & rest is the same. I think that is fine way to train a trigram model. Any views on that? I did attain a lower loss compared to bigram although the results are not significantly better.
@@ibadrather Not an expert myself, but here’s how I would explain it: Coming up with a loss function is like coming up with a target you optimise for. Apparently your perception of how good a result is (your human brain loss function) differs from what you optimise your network toward. In that case you should come up with a better equation to match your gut feeling. Practical example. Let’s say you want to train a network that produces ice cream. Your loss function is the amount of sugar in the ice cream. The best network you train crushes the loss, but produces simple 100% sugar syrup. It does not have the texture and consistency of real ice cream. A different network may make great ice cream texturewise, but put less sugar in it, thus having worse loss. So, adjust your loss function to score for texture as well.
I cannot imagine a better -- or kinder -- teacher. He is feeding his audience knowledge and understanding, in small delicious bites, without worrying about their level of prior knowledge. And he is smiling irrepresively all the time! Such a good person.
The reason why this is such excellent teaching is because it's constructed bottom-up. It builds more abstract concepts using more concrete ones, generalizations follow concrete examples. At no point there is a situation in which the learner has to "just assume" something, which will "become clear later on" (in most instances when a teacher says it, it doesn't; it just causes people to desperately try to guess the required knowledge on their own to fill in the gaps, distracting from anything that follows, and producing mental discomfort). The bottom up approach produces pleasure from a series of little "a-ha" and "I agree" moments and a general trust in the teacher. I contrast this to the much worse fastai courses - in which there are lots of gaps and hand waving because of their top-down approach.
so in your opinion would taking the fastai course after this series be a better way to approach it? or are they overlapping/similar in terms of the topics?
@@MihikChaudhari I don't think you need a fastai course, you can just follow Andrej's series, maybe also watch some lectures on neural nets (he also has some) for the theoretical underpinning.
I am pretty confident that the impact of his CS231n course is bigger than even his work at Tesla. I know too many people working in machine learning that where introduced to the field by CS231n. It changed my life. Makes you wonder if he should just spend all his efforts to teaching. The impact is truly exponential.
Too many people are working on AI performance and not enough people are working on AI alignment. If this trend continues, the impact might be enormously negative.
@@samot1808 AI alignment and the AI control problem are aspects of how to build AI systems such that they will aid rather than harm their creators. it basically means that we are like kids playing with plutonium and it won't take much for someone to turn it into a bomb (on purpose or by mistake) and make everyone's life a living hell. All that leads to a need for more regulation and oversight of the really advanced AI models because otherwise we may end up with AI generators that can take a photo of you and create a video showing you killing babies, or worse, an AI that self-replicates and takes over entire systems leading to collapsed economies and countries (or, maybe, even something like the Terminator).
Absolutely insane levels of detail you are going into. This Series is invaluable for beginners in the field as well as for people like me, who are building own models all the time, but want to go back to basics from time to time to not get stuck in wrong assumptions learned from fast success with Keras :D I really hope you will continue this Series for quite a while! Thanks a lot, AI Grand Master Andrej!
OMG, I feel soooo grateful for the internet! I would have never met a teacher this clear and to my needs in real life. I have watched the famous Standford courses before; they have set a standard in ML courses. It is always the Standford courses and the rest. Likewise, this course is setting a new standard on hands-on courses. I'm only half an hour into the video. I'm already amazed by the sensitivity, clarity and organization of the course. Many many thanks for your generosity to step out and share your knowledge with numerous strangers in the world. Much much indebted! Thank you!
Andrej, your videos are the clearest explanations of these topics I have ever seen. My hat off to you. I wish you have taught my ML and NLP classes in college. There's a huge difference between your ground-up, code-first approach and the usual dry, academic presentation of these topics. It also demonstrates the power of UA-cam as an educational tool. Thank you for your efforts!
I've been learning ML for a year now. I've read dozens of books, watched courses, and practiced myself. It always feels that I have 10x more theory knowledge than I use in practice. When I have to build something like this, I stare at the screen and can't decide what/where/how to use it. Andrej's courses are the ones that glue everything. It shows that it is not some abstract math you must memorize but the math you built logically on top of each other. Watch how well he explains the maximum likelihood and why it is used as a loss function. Brilliant. Thanks a lot, Andrej. I truly feel, you enjoy and love to teach!
I definitely agree with you. As a software developer, I noticed this too. Great professors from the best universities in the world spend hours and hours explaining theory and mathematics, drawing and writing. But they don't even write a single line of code. If I had the chance, I would shout at them: "Sir, should I make for example a speech recognition from the strange graphs you have drawn during these 30 hours?" Examine quickly the worst software development course and you'll see that the instructor writes code on the screen for at least 70% of the course. No one should tell me that I studied ML and DL at this great university. We see how they educate😁😁😁
I love how you make the connections between the counting and gradient based approaches. Seeing the predictions from the gradient descent method were identical to the predictions from statistical probabilities from the counts was, for me, a big aha moment. Thank you so much for these videos Andrej. Seeing how you build things from the ground up to transformers will be fascinating!
Thank you Andrej. I can't stress more how much I have benefited and felt inspired by this series. I'm a 40 yo father with a young kid. Work and being a parent have consumed lots of my time - I have always wanted to learn ML/neural network from the ground up but a lot of materials out there are just thick and dense and full of jargons. Coming from a math and actuarial background I had kind of expected myself to be able to pick up this knowledge without too much stumbling but seriously not until your videos did I finally feel so strongly interested and motivated in this subject. It's really fun learning from you and coding along with you - I'm leaving your lectures each time more energized than when it first started. You're such a great educator as many have said.
Lex guided me here. I loved your micrograd tutorial. It brought back my A level calculus and reminded me of my Python skills from years back - all whilst teaching me the basics of neural networks. This tutorial is now putting things into practise with a real-world example. Please do more of these, as you're sure to get more people into the world of AI and ML. Python is such a powerful language for manipulating data and you explain it really well by building things up from a basic foundation into something that ends up being fairly complex.
The video series featured on your channel undoubtedly stands as the most consequential and intuitive material I have encountered. The depth of knowledge gained from these instructional materials is significant, and the way you've presented complex topics with such clarity is commendable. I find myself consistently recommending them to both friends and colleagues, as I truly believe the value they offer to any learner is unparalleled. The gratitude I feel for the work you've put into these videos is immense, as the concepts I've absorbed from your content have undoubtedly expanded my understanding and competence. This invaluable contribution to the field has benefited me tremendously, and I am certain it has equally enriched others' learning experiences.
You are the best teacher I've seen in my life. Thank you so much for doing these videos and materials for free. It is so clear to demonstrate how all the stuff is actually working.
10 mins into the video, I'm amazed, smiling and feeling as if I've cracked the code to life itself (with the dictionary of bigram counts). of course, I can't build anything with the level of knowledge I have currently but I sure can appreciate how it works in a much better manner. I always knew that things are predicted based on their occurrence in data but somehow seeing those counts(for eg. for first 10 words, of `('a', ''): 7`) makes it so glaringly obvious which no amount of imagination could've done for me. You are a scientist, researcher, high paid exec, knowledgeable, innovator but more than anything, you are the best teacher who can elucidate complex things in simple terms which then all make sense and seem obvious. And that requires not just mastery but a passion for the subject.
This video is the goldmine. It's so intuitive and easy to understand. Even my grad classes could not squeeze this information over a semester-long course. Hats off and it's a privilege to be learning from the accomplished AI master and the best. Thank you for the efforts Andrej :).
Andrej thanks for doing this. You can have a larger impact bringing ML to the masses and directly inspiring a new generation of engineers and developers than you could have managing work at Tesla.
No one on youtube is producing such granular explanations of neural network operations. You have an incredible talent for teaching! Please keep producing this series, it is so refreshing to get such clear, first-principles content on something I'm passionate about from someone with a towering knowledge of the subject.
Pure fundamentals...how frequentist and probabilistic views are connected and presented in such an elegant manner for building a mini-llm with actual code in practice is simply awesome. Thank you so much @AndrejKarpathy!
You are an embodiment of humbleness, selflessness and kindness. A person of your caliber having led the Tesla autopilot program as a director and being one of the founders of open ai, to come and teach the world for free is unheard off. You are a true inspiration for the whole world. I am truly blessed to learn from you. May GOD give you and your family a long healthy, happy prosperous and a peaceful life
Hi Andrej, I heard two days ago (from a Lex Fridman podcast) that you were thinking in pursuing something related to education. I was surprised and very excited, wishing that it was true and accessible. Today, I ran into your UA-cam channel and I can't be happier, thanks a lot for doing this and for sharing your valuable knowledge! The lectures are incredible detailed and interesting. It's also very nice to see how you enjoy talking about these topics, that's very inspiring. Again, thank you!
Your intro to neural networks video is amazing, especially how you focused on the raw mathematical fundamentals rather than just implementation. I can tell this is going to be another banger.
What a pleasure to watch! love the fact there is no shortcut, even for what may seem easy. Everything is well explained and easy to follow. It is very nice to show us the little things to watch for.
Andrej, i'm from Brazil and love ML and to code. I have tried several different classes with various teachers, but yours was by far the best. Simplicity with quality. Congratulations! I loved the class! Looking forward to taking the next ones. The best!.
Mastery is ability to stay with fundamentals. Andrej derives the neural architecture FROM the counts based model! So the log counts, counts and probs are wrapped around the idea of how to get to probs similar to the counts model. Thus he explains why you need to log, why you need to normalizes, then introduces the name for it called softmax! What a way to teach. Brilliant master stroke is when he shows that the samples from the neural model exactly match the samples from the counts model. Wow, I would not have guessed it and many teachers might not have checked it. The connection between 'smoothing' and 'regularization' was also a nice touch. Teaching the new concepts in terms of the known so that there is always a way to think about new ideas rather than taking them as given. For instance the expected optimal loss of the neural model is what one would see in the counts model. Thanks Andrej! By the way one way to interpret the loss, is perplexity. What the number 2.47 says is that every character on average has typically about 2 or 3 characters that are more likely to follow it.
You're such a good teacher. Nice and steady pace of gradual buildup to get to the end result. Very aware of points where student might get lost. Also respectful of viewers time, always on topic. Even if I paid for this, I wouldn't expect this quality, can't believe I get to watch it for free. Thank you.
I just want to say, thank you Dr Karpathy, the way you explain concepts is just brilliant, you are making me fall in love with neural nets all over again
I think minimal, simple-as-possible code implementations, talked through, are just about the best possible way to learn new concepts. All power to you Andrej, and long live these videos.
This is just a wonderful series. Thank you so much! For everyone who wants to visualize this "network", here is the code. Run !pip install torchviz PIL in your notebook for this to work: from torchviz import make_dot from PIL import Image def visualize(tensor): params = {name: p for name, p in tensor.named_parameters()} if hasattr(tensor, 'named_parameters') else {} make_dot(tensor, params=params).render("computation_graph", format="png") img = Image.open("computation_graph.png") img.show()
pure gold content!!! this is how it should be taught! too many tutorials just jump into the "practical" examples without explaining the underlying fundamentals. great lesson, thank you for posting this! 🙏
What an incredible resource - thank you Andrej. I especially enjoyed the intuitive explanation of regularization, what a smooth way of relating it to the simple count-matrix
Another absolutely fantastic tutorial. The detail is incredible and the explanations are so clear. For anyone watching this after me, I feel that the micrograd tutorial is absolutely essential to watch first if you want to really understand things from the ground up. Here, for example, when Andrej runs the loss.backward() function, you'll know exactly what's happening, because you do it manually in the first lesson. I feel that the transition from micrograd (where everything is built from first principles) to makemore (relying on the power of pytorch) leaves you with a suprisingly deep understand of the fundamentals of language modeling. Really superb.
Besides having the best explanation of LLMs from this great teacher, you get a free hands on python course, which has also better explanation than lots of others. Thx a lot Andrejq!
There's so much packed in there. I spent the whole day on this and got to the 20 minute mark, haha. Great teacher, thank you for this logical and practical approach.
This is amazing, thank you so much for all the hard work you've been doing Andrej! For anyone as confused as I first where as to why both P[0].sum() and P[:, 1].sum() would add to 1, note that this happens only because of the way N was constructed in the first place. Each row of N sums to the same amount that the corresponding column of N does, so N[0] == N[:, 0], N[1] == N[:, 1], etc. This is because each row has all the bigrams starting with a letter and each corresponding column has all the bigrams ending with that same letter. Another way to think about it is that each letter will be counted twice: once as the starting letter (adding 1 to the sum of its row) and once as the ending letter (adding 1 to the sum of its column). So the sums end up being equal which means a bug like dividing each column by the sum of each row would result to a matrix with columns summing to 1!
The way he explained zip method even a beginner can understand. From very basic python to an entire language model. I can't thank this man enough for teaching us.
Andrej enseña de manera pedagógica y sencilla un tema muy complejo y además regala muchos tips invaluables de programación, python, Torch y cómo aproximarse a la solución de un problema. Tengo varios años aprendiendo ML, casi literalmente desde cero (no soy ingeniero, ni estadístico, ni programador), y estas lecciones me ordenaron muchas cosas en mi cabeza, (ahá moments, como comentó alguien en el hilo), entendí mucho mejor conceptos y procesos que antes apenas alcanzaba a intuir. De verdad es como abrir la AI y ver cómo es por dentro. Recomiendo ver los videos en orden, antes de este vi el de Micrograd y me pareció increíble entender todo. De verdad, mil gracias por este aporte Andrej.
36:20 The way he showed us that how an untrained model and an trained model differs by their output, oh man.... Really I don't comment at all but this has to be said, you are the best in this field. Really looking forward to learn more from you sir.
This is a great overview of your work! It's impressive how you break down complex topics into manageable parts. The resources you've shared will surely help many people learn and practice. Looking forward to seeing more videos!
Recently I've been facing tough times, lost focus on my goal, regret my decisions, and im now working on self-improvement. I watched this video by Andre(Thank you so much for sharing your valuable ideas and time ), which inspired me. Leaving this comment as a reminder; I'll return when I reach my goal 😊
new sub here. i started w/ "let's build gpt: from scratch, in code, spelled out". i learned lot, enjoyed coding along, appreciated the thoughtful explanations. i'm hooked & will be watching the makemore series. thank you very much sir for sharing your knowledge.
I’m an old-school software developer, revising machine learning for the first time since my undergrad studies. Back in the day we called them Markov Chains instead of Bigram Models. Thanks for the fantastic refresher!
You're lifting the lid on the black box and it feels like Im sitting on a perceptron and watching the algos make their changes, forward, and back. It has provided such a deeper understanding of the topics in the video. I have recommended it to my cohort of AI students, of which I am one, as supplementary learning. But to be honest, this is the way it should be taught. Excelllent job, Andrej.
I love how you take nn, and explain to us, not by already built in function in pytorch, but by how things works, then giving us what the equivelent lf it in pytorch
I’ve just finished the whole playlist and, for some reason, I started from the last one (GPT Tokenizer), went through the ‘makemore’ ones, and finally watched this one. Each one is better than the other. I couldn’t appreciate more what you’re doing for the community of ‘homeless scientists’ (those who want to become better at their crafts but are not associated with an academic institution) out there, Andrej. The way you teach says a lot about how you learn and how you think others should learn. I hope to find more videos like yours and more people like you. Cheers!! 👏👏👏
Wow Andrej this is really good stuff! I have 7 years SWE exp at a FANG company and the way you explain your process is so nice to learn from! thank you for the great content!
I'm blown away! I've never thought I'd say this but I actually like ML for the first time, and I'm a simple SWE with little knowledge of ML and this is your first video I've seen. Really interesting and easy to follow! thank you for being such a good teacher!
I went through building micro grad 3~4 times, It took me a week to understand a good portion of that and now started with this. I am really looking forward to going through this series. Thanks for doing this Andrej, you are amazing.
Thank you for posting these! It's extremely valuable. The end result of the neural net wasn't all that anticlimatic, at least the longer "names" did differ slightly so it wasn't 100% the same weights as in the first case :)
The level of pedagogy is so so good here; I love that you start small and build up and I particularly love that you pointed out common pitfalls as you went. I am actually teaching a course where I was going to have to explain broadcasting this term, but I think I am just going to link my students to this video instead. Really excellent stuff! One small suggestion is to consider using Desmos instead of wolframalpha is you just want to show a simple function
Amazing delivery as always. Fact that he spent time explaining broadcast rules and some of the quirks of Keepdim shows how much knowledgeable he is and fact that he knows that most struggle with little things like that to get past what they need to do.
It is an excellent practical class. Thanks a million for your work. Now, I understand the concept of how prediction works for bigram in gradient descent. I have an aha moment of the current Deep Learning trend. It will definitely help people like me to jump start Generative AI.
Doing this again to really solidify the fundamentals and github copilot is hilarious. It's seen this code so many times that if it is enabled you can't actually type everything out for yourself! It all comes rolling out (pretty much perfect, tweaked to my style and ready to run) after the first character or two. Amazing times. Got to say, whoa. This is so good. So sick of fumbling about with tensors and this is a masterclass for sure. Thank you thank you thank you.
Andrej is the MAN! Such a level of detail and explanation that I've yet to find anywhere. Thanks for these incredible videos! I am SO close to being done with this. Took a long time to get to the end (I used to do programming as a hobby in C++, Visual Basic, mIRC (and other IRC platforms) but that was years ago) so getting back in the mix has been a process. I am getting a "ValueError: only one element tensors can be converted to Python scalars" on the ix variable at the very end. I've commented out the old methods and typed the code as shown at 1:56:08 and can't seem to figure out the issue. The first rendition of ix works just fine, and commenting that out made no difference to this last section. Aye aye aye, so close!!
@@adamderose9468 thanks for the tip! I found a single typo that was breaking things and managed to fix it a little while back. at this point I don't remember what it was, but it wasn't exactly where the error was showing, and again was just a measly typo LOL. gotta love it
Thank you so much. Your videos are artistic, using math as your palette to create these beautiful outcomes. They are mesmerizing.. Really appreciate it for making these videos, you are truly making the world a better place.
So cool to see the equivalance between the manually calculated model and neural network model optimised with gradient descent. It's not quite the same output either. The regularization loss is required to get the two super close too. Pretty neat.
Andrej, you are simply amazing for doing this makemore series. I do not usually comment on videos, have not commented in a very long time, I just want to say thanks for your work and that the AI world is probably crazy now, it is videos like these that help even trained engineers get a proper understanding of how the models are made and the thoughts behind it, and not just implement and run or spend hours debugging because of a bug like broadcasting...
You are incredible! This makes learning about ML so fun, your passion and knowledge really shine here. I’m a college student studying CS, and you lecture better than many professors. Not a knock on them though, props to you.
Thank you for the unique opportunity to learn how to write code from the most advanced developer, Andrej! An almost priceless and irreplaceable opportunity! Extremely useful and efficient!
Note if you are following this in torch 2.0, the multinomial function might behave differently in getting the idx (3 instead of 13). Just downgrade to torch==1.13.1 if this bothers you.
Andrej thank you. As a newcomer this is perfect introduction, i want to practice this approach with alternative languages and iterate what you have shown. Thanks again.
I love you Andrej! Learning from this series has been therapeutic for me. I’m glad you have decided to be full time educator. All the best with Eureka AI! It’s gonna make a positive impact on a lot of our lives. You da best!!❤
Thank you Andrej, this is absolutely the best hands-on coding neural nets & PyTorch tutorial. Special thanks for decoding cryptic PyTorch docs. Very, very useful!
Thanks for explaining things at such an incredible level of detail. (1:08:49) A couple of days ago I was too searching for what the difference between torch.Tensor and torch.tensor. Passing dtype explicitly seems like a good practice to me. Also the caution on the broadcasting rules is very valid. Often the bugs in my Pytorch/Tensorflow code are due to unexpected broadcasting going undetected because of say an averaging or reduction operation. Putting a lot of asserts on the shapes helps me.
this video (and the whole tutorial series) is extremely good. Breaking down a complex topic into its components and explaining them in a structured way is definitely not easy and requires a lot of work. Thank you for doing this, I'm learning a lot. One thing I noticed (just to be clear, this is not really critique but just something interesting I noticed): at 1:56:05 the results are not exactly equal - the third and fifth example generated are different. The last few characters are equal again, since the models just use the previous character to determine the next, so when both models reach the same character again they are likely to continue the same way.
I haven't commented on a youtube video since 2017. But I have to, in the slim case that you actually read this comment Adrej! Please keep doing what you are doing! You are an absolute gem of an educator, and the millions of minds you are enlightening with each video will do great things that will compound and make the world a better place.
reminded of ua-cam.com/video/B8C5sjjhsso/v-deo.html :D
@@AndrejKarpathy you sure made YT comments section a better place lol.. Excellent videos, please keep them coming, or shall I say make more! Thank you!!
@@AndrejKarpathy 🤣🤣🤣
thanks for writing this comment for all of us ! please keep us these videos , as Minjune said , you're a gem of an educator !
@@AndrejKarpathy So lo no mo. Classic!
What a privilege to be learning from someone as accomplished as Andrej, all for free. The internet at its best🙏
Just what this is -- a privilege indeed!
We don't even have to pay tuition, or travel to Stanford.
I am not lucky; I am blessed!
absolutely!!!
So true!
Hopefully UA-cam will be free FOREVER AND EVER, not like Medium or Towardsdatascience...
Update: I added some suggested exercises to the description of the video. imo learning requires in-person tinkering and work, watching a video is not enough. If you complete the exercises please feel free to link your work here. (+Feel free to suggest other good exercises!)
@@ibadrather oh. Please go to Discord then, linked in the description. Sorry :\
I don't know if this is a good place for Q&A but there is something I need to ask that I cant't wrap my head around.
I was training the trigram language model and loss was less than for the bigram language model but the model was worse. I tried to generate a few names and I reliased I made a huge error in data preparation.
The question I have is how big of an indicator is loss? Is loss the only thing that matters for model performance. I understand there are other metrics of model perfromance.
I have actually faced something in my work. I am stabilizing a video using IMU sensor. And I am training a NN for camera pose estimation. For different architectures the lower loss models don not necessarily perform better. When our team looks at the stanilized video many times the model with higher loss generates a visually better stabilized video. I don't quiet understand this.
That's why I am asking how much is loss the indicative of model performance.
I don't expect you to answer this here but if you may talk abou this in your future lectures or somewhere else.
My loss for trigrams, count model, using all data for training was around 2.0931 and I was able to get close with NN approach. I'm not sure if the resulting names were better, but I wasn't able to generate the exactly same names with the count and NN approaches anymore (even using the same generator). Also I'm not sure how to best share/link my solution (I have the notebook on my local drive).
I built the trigram model by concatenating the one hot encoded vector for the first two letters & feed them through a neuron & rest is the same. I think that is fine way to train a trigram model. Any views on that? I did attain a lower loss compared to bigram although the results are not significantly better.
@@ibadrather Not an expert myself, but here’s how I would explain it:
Coming up with a loss function is like coming up with a target you optimise for. Apparently your perception of how good a result is (your human brain loss function) differs from what you optimise your network toward. In that case you should come up with a better equation to match your gut feeling.
Practical example. Let’s say you want to train a network that produces ice cream. Your loss function is the amount of sugar in the ice cream. The best network you train crushes the loss, but produces simple 100% sugar syrup. It does not have the texture and consistency of real ice cream. A different network may make great ice cream texturewise, but put less sugar in it, thus having worse loss.
So, adjust your loss function to score for texture as well.
I cannot imagine a better -- or kinder -- teacher.
He is feeding his audience knowledge and understanding, in small delicious bites, without worrying about their level of prior knowledge.
And he is smiling irrepresively all the time!
Such a good person.
❤
Never in my life have I found an educator like this. This is free gold.
The reason why this is such excellent teaching is because it's constructed bottom-up. It builds more abstract concepts using more concrete ones, generalizations follow concrete examples. At no point there is a situation in which the learner has to "just assume" something, which will "become clear later on" (in most instances when a teacher says it, it doesn't; it just causes people to desperately try to guess the required knowledge on their own to fill in the gaps, distracting from anything that follows, and producing mental discomfort). The bottom up approach produces pleasure from a series of little "a-ha" and "I agree" moments and a general trust in the teacher.
I contrast this to the much worse fastai courses - in which there are lots of gaps and hand waving because of their top-down approach.
This is exactly my experience as well. Well said.
so in your opinion would taking the fastai course after this series be a better way to approach it? or are they overlapping/similar in terms of the topics?
@@MihikChaudhari I don't think you need a fastai course, you can just follow Andrej's series, maybe also watch some lectures on neural nets (he also has some) for the theoretical underpinning.
You're literally the best, man. These lessons are brilliant, hope you keep doing them. Thank u so much
Seeing him back to education is great. Hope to see some computer vision lessons 👍👌
@@talhakaraca If you search on this very site you will find truly clear lessons on computer vision from Andrej (from like 2016 or so)!
@@HarishNarayanan thanks a lot. i found it 🙏
@@talhakaraca You are very welcome. It was that lecture series that got me first informed and interested in deep learning.
The scale of impact these lectures will have is going to be enormous.
Please keep doing them and thanks a lot Andrej.
I am pretty confident that the impact of his CS231n course is bigger than even his work at Tesla. I know too many people working in machine learning that where introduced to the field by CS231n. It changed my life. Makes you wonder if he should just spend all his efforts to teaching. The impact is truly exponential.
Too many people are working on AI performance and not enough people are working on AI alignment. If this trend continues, the impact might be enormously negative.
@@XorAlex please explain
@@samot1808 AI alignment and the AI control problem are aspects of how to build AI systems such that they will aid rather than harm their creators. it basically means that we are like kids playing with plutonium and it won't take much for someone to turn it into a bomb (on purpose or by mistake) and make everyone's life a living hell. All that leads to a need for more regulation and oversight of the really advanced AI models because otherwise we may end up with AI generators that can take a photo of you and create a video showing you killing babies, or worse, an AI that self-replicates and takes over entire systems leading to collapsed economies and countries (or, maybe, even something like the Terminator).
@@samot1808 Hi, is there a way I can find his CS231n course online, Im all the way in Africa
Absolutely insane levels of detail you are going into. This Series is invaluable for beginners in the field as well as for people like me, who are building own models all the time, but want to go back to basics from time to time to not get stuck in wrong assumptions learned from fast success with Keras :D I really hope you will continue this Series for quite a while! Thanks a lot, AI Grand Master Andrej!
OMG, I feel soooo grateful for the internet! I would have never met a teacher this clear and to my needs in real life.
I have watched the famous Standford courses before; they have set a standard in ML courses. It is always the Standford courses and the rest. Likewise, this course is setting a new standard on hands-on courses. I'm only half an hour into the video. I'm already amazed by the sensitivity, clarity and organization of the course. Many many thanks for your generosity to step out and share your knowledge with numerous strangers in the world. Much much indebted! Thank you!
Andrej, your videos are the clearest explanations of these topics I have ever seen. My hat off to you. I wish you have taught my ML and NLP classes in college. There's a huge difference between your ground-up, code-first approach and the usual dry, academic presentation of these topics. It also demonstrates the power of UA-cam as an educational tool. Thank you for your efforts!
I've been learning ML for a year now. I've read dozens of books, watched courses, and practiced myself. It always feels that I have 10x more theory knowledge than I use in practice. When I have to build something like this, I stare at the screen and can't decide what/where/how to use it. Andrej's courses are the ones that glue everything. It shows that it is not some abstract math you must memorize but the math you built logically on top of each other. Watch how well he explains the maximum likelihood and why it is used as a loss function. Brilliant. Thanks a lot, Andrej. I truly feel, you enjoy and love to teach!
I definitely agree with you. As a software developer, I noticed this too. Great professors from the best universities in the world spend hours and hours explaining theory and mathematics, drawing and writing. But they don't even write a single line of code. If I had the chance, I would shout at them: "Sir, should I make for example a speech recognition from the strange graphs you have drawn during these 30 hours?"
Examine quickly the worst software development course and you'll see that the instructor writes code on the screen for at least 70% of the course. No one should tell me that I studied ML and DL at this great university. We see how they educate😁😁😁
I love how you make the connections between the counting and gradient based approaches. Seeing the predictions from the gradient descent method were identical to the predictions from statistical probabilities from the counts was, for me, a big aha moment. Thank you so much for these videos Andrej. Seeing how you build things from the ground up to transformers will be fascinating!
I’ve literally never had heard the logits are counts, softmax turns into probs, way of thinking before. Worth the ticket price alone!
Thank you Andrej. I can't stress more how much I have benefited and felt inspired by this series. I'm a 40 yo father with a young kid. Work and being a parent have consumed lots of my time - I have always wanted to learn ML/neural network from the ground up but a lot of materials out there are just thick and dense and full of jargons. Coming from a math and actuarial background I had kind of expected myself to be able to pick up this knowledge without too much stumbling but seriously not until your videos did I finally feel so strongly interested and motivated in this subject. It's really fun learning from you and coding along with you - I'm leaving your lectures each time more energized than when it first started. You're such a great educator as many have said.
Lex guided me here. I loved your micrograd tutorial. It brought back my A level calculus and reminded me of my Python skills from years back - all whilst teaching me the basics of neural networks. This tutorial is now putting things into practise with a real-world example. Please do more of these, as you're sure to get more people into the world of AI and ML. Python is such a powerful language for manipulating data and you explain it really well by building things up from a basic foundation into something that ends up being fairly complex.
Another weekend watch! Epic to see these videos coming out! Thank you for all your efforts Andrei!
The video series featured on your channel undoubtedly stands as the most consequential and intuitive material I have encountered. The depth of knowledge gained from these instructional materials is significant, and the way you've presented complex topics with such clarity is commendable. I find myself consistently recommending them to both friends and colleagues, as I truly believe the value they offer to any learner is unparalleled.
The gratitude I feel for the work you've put into these videos is immense, as the concepts I've absorbed from your content have undoubtedly expanded my understanding and competence. This invaluable contribution to the field has benefited me tremendously, and I am certain it has equally enriched others' learning experiences.
You are the best teacher I've seen in my life. Thank you so much for doing these videos and materials for free. It is so clear to demonstrate how all the stuff is actually working.
A teacher that explains complex concepts both clearly and accurately. I must be dreaming.
Thank you Mr. Karpathy.
10 mins into the video, I'm amazed, smiling and feeling as if I've cracked the code to life itself (with the dictionary of bigram counts). of course, I can't build anything with the level of knowledge I have currently but I sure can appreciate how it works in a much better manner. I always knew that things are predicted based on their occurrence in data but somehow seeing those counts(for eg. for first 10 words, of `('a', ''): 7`) makes it so glaringly obvious which no amount of imagination could've done for me.
You are a scientist, researcher, high paid exec, knowledgeable, innovator but more than anything, you are the best teacher who can elucidate complex things in simple terms which then all make sense and seem obvious. And that requires not just mastery but a passion for the subject.
This video is the goldmine. It's so intuitive and easy to understand. Even my grad classes could not squeeze this information over a semester-long course. Hats off and it's a privilege to be learning from the accomplished AI master and the best. Thank you for the efforts Andrej :).
I love how you explain every step and function so your tutorial is accessible for non-python programmers as well. Thank you.
What a sense of "knowledge satisfaction" I have after watching your video and working out the details as taught. THANK YOU Andrej.
Andrej thanks for doing this. You can have a larger impact bringing ML to the masses and directly inspiring a new generation of engineers and developers than you could have managing work at Tesla.
No one on youtube is producing such granular explanations of neural network operations. You have an incredible talent for teaching! Please keep producing this series, it is so refreshing to get such clear, first-principles content on something I'm passionate about from someone with a towering knowledge of the subject.
The clarity from this video of all the fundamental concepts and how they connect blew my mind. Thank you!
Pure fundamentals...how frequentist and probabilistic views are connected and presented in such an elegant manner for building a mini-llm with actual code in practice is simply awesome. Thank you so much @AndrejKarpathy!
I love your clear and practical way of explaining stuff, the code is so helpful in understanding the concepts. Thanks a lot Andrej!
I love that he understood what a normal person wouldn’t understand and explained those parts.
You are an embodiment of humbleness, selflessness and kindness. A person of your caliber having led the Tesla autopilot program as a director and being one of the founders of open ai, to come and teach the world for free is unheard off. You are a true inspiration for the whole world. I am truly blessed to learn from you. May GOD give you and your family a long healthy, happy prosperous and a peaceful life
Hi Andrej, I heard two days ago (from a Lex Fridman podcast) that you were thinking in pursuing something related to education. I was surprised and very excited, wishing that it was true and accessible. Today, I ran into your UA-cam channel and I can't be happier, thanks a lot for doing this and for sharing your valuable knowledge! The lectures are incredible detailed and interesting. It's also very nice to see how you enjoy talking about these topics, that's very inspiring. Again, thank you!
Your intro to neural networks video is amazing, especially how you focused on the raw mathematical fundamentals rather than just implementation. I can tell this is going to be another banger.
What a pleasure to watch! love the fact there is no shortcut, even for what may seem easy. Everything is well explained and easy to follow. It is very nice to show us the little things to watch for.
Andrej, i'm from Brazil and love ML and to code. I have tried several different classes with various teachers, but yours was by far the best. Simplicity with quality. Congratulations! I loved the class! Looking forward to taking the next ones. The best!.
Mastery is ability to stay with fundamentals. Andrej derives the neural architecture FROM the counts based model! So the log counts, counts and probs are wrapped around the idea of how to get to probs similar to the counts model. Thus he explains why you need to log, why you need to normalizes, then introduces the name for it called softmax! What a way to teach.
Brilliant master stroke is when he shows that the samples from the neural model exactly match the samples from the counts model. Wow, I would not have guessed it and many teachers might not have checked it. The connection between 'smoothing' and 'regularization' was also a nice touch. Teaching the new concepts in terms of the known so that there is always a way to think about new ideas rather than taking them as given. For instance the expected optimal loss of the neural model is what one would see in the counts model. Thanks Andrej!
By the way one way to interpret the loss, is perplexity. What the number 2.47 says is that every character on average has typically about 2 or 3 characters that are more likely to follow it.
You're such a good teacher. Nice and steady pace of gradual buildup to get to the end result. Very aware of points where student might get lost. Also respectful of viewers time, always on topic. Even if I paid for this, I wouldn't expect this quality, can't believe I get to watch it for free. Thank you.
I just want to say, thank you Dr Karpathy, the way you explain concepts is just brilliant, you are making me fall in love with neural nets all over again
I think minimal, simple-as-possible code implementations, talked through, are just about the best possible way to learn new concepts. All power to you Andrej, and long live these videos.
This is just a wonderful series. Thank you so much!
For everyone who wants to visualize this "network", here is the code. Run !pip install torchviz PIL in your notebook for this to work:
from torchviz import make_dot
from PIL import Image
def visualize(tensor):
params = {name: p for name, p in tensor.named_parameters()} if hasattr(tensor, 'named_parameters') else {}
make_dot(tensor, params=params).render("computation_graph", format="png")
img = Image.open("computation_graph.png")
img.show()
pure gold content!!! this is how it should be taught! too many tutorials just jump into the "practical" examples without explaining the underlying fundamentals. great lesson, thank you for posting this! 🙏
This is one of the best educational series I have stumbled upon on YT in years!
Thank you so much Andrej
What an incredible resource - thank you Andrej. I especially enjoyed the intuitive explanation of regularization, what a smooth way of relating it to the simple count-matrix
Another absolutely fantastic tutorial. The detail is incredible and the explanations are so clear. For anyone watching this after me, I feel that the micrograd tutorial is absolutely essential to watch first if you want to really understand things from the ground up. Here, for example, when Andrej runs the loss.backward() function, you'll know exactly what's happening, because you do it manually in the first lesson. I feel that the transition from micrograd (where everything is built from first principles) to makemore (relying on the power of pytorch) leaves you with a suprisingly deep understand of the fundamentals of language modeling. Really superb.
Taught me how to do the Rubik’s cube, now teaching me neural networks, truly an amazing teacher!
Besides having the best explanation of LLMs from this great teacher, you get a free hands on python course, which has also better explanation than lots of others. Thx a lot Andrejq!
There's so much packed in there. I spent the whole day on this and got to the 20 minute mark, haha. Great teacher, thank you for this logical and practical approach.
This is amazing, thank you so much for all the hard work you've been doing Andrej!
For anyone as confused as I first where as to why both P[0].sum() and P[:, 1].sum() would add to 1, note that this happens only because of the way N was constructed in the first place. Each row of N sums to the same amount that the corresponding column of N does, so N[0] == N[:, 0], N[1] == N[:, 1], etc.
This is because each row has all the bigrams starting with a letter and each corresponding column has all the bigrams ending with that same letter. Another way to think about it is that each letter will be counted twice: once as the starting letter (adding 1 to the sum of its row) and once as the ending letter (adding 1 to the sum of its column). So the sums end up being equal which means a bug like dividing each column by the sum of each row would result to a matrix with columns summing to 1!
The way he explained zip method even a beginner can understand. From very basic python to an entire language model. I can't thank this man enough for teaching us.
It is truly amazing how clearly you can explain things and how enjoyable your lesson is
Andrej, the elegance and simplicity of your code is beautiful and an example of the right way to write python
There are so many fantastic nuggets in these videos even for those already with substantial pytorch experience!
Andrej enseña de manera pedagógica y sencilla un tema muy complejo y además regala muchos tips invaluables de programación, python, Torch y cómo aproximarse a la solución de un problema. Tengo varios años aprendiendo ML, casi literalmente desde cero (no soy ingeniero, ni estadístico, ni programador), y estas lecciones me ordenaron muchas cosas en mi cabeza, (ahá moments, como comentó alguien en el hilo), entendí mucho mejor conceptos y procesos que antes apenas alcanzaba a intuir. De verdad es como abrir la AI y ver cómo es por dentro. Recomiendo ver los videos en orden, antes de este vi el de Micrograd y me pareció increíble entender todo. De verdad, mil gracias por este aporte Andrej.
Wow. Just WOW. Andrej you are simply too good! Thank you for sharing such valuable content on UA-cam, hands down the best one around.
36:20 The way he showed us that how an untrained model and an trained model differs by their output, oh man.... Really I don't comment at all but this has to be said, you are the best in this field. Really looking forward to learn more from you sir.
This is a great overview of your work! It's impressive how you break down complex topics into manageable parts. The resources you've shared will surely help many people learn and practice. Looking forward to seeing more videos!
Recently I've been facing tough times, lost focus on my goal, regret my decisions, and im now working on self-improvement. I watched this video by Andre(Thank you so much for sharing your valuable ideas and time ), which inspired me. Leaving this comment as a reminder; I'll return when I reach my goal 😊
new sub here.
i started w/ "let's build gpt: from scratch, in code, spelled out". i learned lot, enjoyed coding along, appreciated the thoughtful explanations.
i'm hooked & will be watching the makemore series.
thank you very much sir for sharing your knowledge.
Really incredible how you can explain clearly a complex subject only with raw material. Thanks a lot for the valuable knowledge
Thank you for the clarity, simplicity and quality. This is the best NN course I have ever seen
I’m an old-school software developer, revising machine learning for the first time since my undergrad studies. Back in the day we called them Markov Chains instead of Bigram Models. Thanks for the fantastic refresher!
Brillant, simple, complete, accurate, I have only compliments. Thank you very much for one of the best class I had in my life !
You're lifting the lid on the black box and it feels like Im sitting on a perceptron and watching the algos make their changes, forward, and back. It has provided such a deeper understanding of the topics in the video. I have recommended it to my cohort of AI students, of which I am one, as supplementary learning. But to be honest, this is the way it should be taught. Excelllent job, Andrej.
I love how you take nn, and explain to us, not by already built in function in pytorch, but by how things works, then giving us what the equivelent lf it in pytorch
This is the first of your lessons I have watched and you really are one of the best teachers I've ever seen. Thank you for your efforts.
The density of information in these tutorials is hugeeeee.
I’ve just finished the whole playlist and, for some reason, I started from the last one (GPT Tokenizer), went through the ‘makemore’ ones, and finally watched this one. Each one is better than the other. I couldn’t appreciate more what you’re doing for the community of ‘homeless scientists’ (those who want to become better at their crafts but are not associated with an academic institution) out there, Andrej. The way you teach says a lot about how you learn and how you think others should learn. I hope to find more videos like yours and more people like you. Cheers!! 👏👏👏
Wow Andrej this is really good stuff! I have 7 years SWE exp at a FANG company and the way you explain your process is so nice to learn from! thank you for the great content!
I'm blown away! I've never thought I'd say this but I actually like ML for the first time, and I'm a simple SWE with little knowledge of ML and this is your first video I've seen.
Really interesting and easy to follow! thank you for being such a good teacher!
I went through building micro grad 3~4 times, It took me a week to understand a good portion of that and now started with this. I am really looking forward to going through this series. Thanks for doing this Andrej, you are amazing.
Thank you for posting these! It's extremely valuable. The end result of the neural net wasn't all that anticlimatic, at least the longer "names" did differ slightly so it wasn't 100% the same weights as in the first case :)
Andrej’s way of explaining is exactly how I want things to be explained to me. It’s actually insane these high quality videos are free.
The level of pedagogy is so so good here; I love that you start small and build up and I particularly love that you pointed out common pitfalls as you went. I am actually teaching a course where I was going to have to explain broadcasting this term, but I think I am just going to link my students to this video instead. Really excellent stuff!
One small suggestion is to consider using Desmos instead of wolframalpha is you just want to show a simple function
Amazing delivery as always. Fact that he spent time explaining broadcast rules and some of the quirks of Keepdim shows how much knowledgeable he is and fact that he knows that most struggle with little things like that to get past what they need to do.
It is an excellent practical class. Thanks a million for your work. Now, I understand the concept of how prediction works for bigram in gradient descent. I have an aha moment of the current Deep Learning trend. It will definitely help people like me to jump start Generative AI.
Doing this again to really solidify the fundamentals and github copilot is hilarious. It's seen this code so many times that if it is enabled you can't actually type everything out for yourself! It all comes rolling out (pretty much perfect, tweaked to my style and ready to run) after the first character or two.
Amazing times.
Got to say, whoa. This is so good. So sick of fumbling about with tensors and this is a masterclass for sure. Thank you thank you thank you.
Its most inspiring that there are always such genuine people in the world.
Andrej is the MAN! Such a level of detail and explanation that I've yet to find anywhere. Thanks for these incredible videos! I am SO close to being done with this. Took a long time to get to the end (I used to do programming as a hobby in C++, Visual Basic, mIRC (and other IRC platforms) but that was years ago) so getting back in the mix has been a process. I am getting a "ValueError: only one element tensors can be converted to Python scalars" on the ix variable at the very end. I've commented out the old methods and typed the code as shown at 1:56:08 and can't seem to figure out the issue. The first rendition of ix works just fine, and commenting that out made no difference to this last section. Aye aye aye, so close!!
num_samples has to be 1 for the multinomial fn
@@adamderose9468 thanks for the tip! I found a single typo that was breaking things and managed to fix it a little while back. at this point I don't remember what it was, but it wasn't exactly where the error was showing, and again was just a measly typo LOL. gotta love it
Not exactly revolutionary but so damn well explained and resolved in PyTorch. This is ML pedagogy for the masses. I praise your efforts.
However, getting people to understand the nitty-gritty of a Transformer Language Model (like GPT), that will prove truly revolutionary!
Thank you so much. Your videos are artistic, using math as your palette to create these beautiful outcomes. They are mesmerizing.. Really appreciate it for making these videos, you are truly making the world a better place.
This lecture is well paced and introduces concepts one by one where later complex ones built on top of previous ones.
So cool to see the equivalance between the manually calculated model and neural network model optimised with gradient descent. It's not quite the same output either. The regularization loss is required to get the two super close too. Pretty neat.
Andrej, you are simply amazing for doing this makemore series.
I do not usually comment on videos, have not commented in a very long time, I just want to say thanks for your work and that the AI world is probably crazy now, it is videos like these that help even trained engineers get a proper understanding of how the models are made and the thoughts behind it, and not just implement and run or spend hours debugging because of a bug like broadcasting...
The explanation of the bug that can happen is just phenomenal! Demn this is good stuff
Thank you for taking the time out of your busy days to teach others. This is a very kind act.
The music in this video is perfect. It really sets the mood and creates a powerful emotional connection.
I absolutely love these entire episode, high quality content and very educational. Thanks a lot for doing this for the good of general public.
You are incredible! This makes learning about ML so fun, your passion and knowledge really shine here. I’m a college student studying CS, and you lecture better than many professors. Not a knock on them though, props to you.
This tutorial is really great, very professional, and very vivid. I really learned a lot. Thank you teacher and educator!!!
This is like listening to a lecture by Richard Feynman. Super clear!
Thank you for the unique opportunity to learn how to write code from the most advanced developer, Andrej! An almost priceless and irreplaceable opportunity! Extremely useful and efficient!
Please DONT STOP soing this! The world is so lucky to have you sharing this knowledge!
Note if you are following this in torch 2.0, the multinomial function might behave differently in getting the idx (3 instead of 13). Just downgrade to torch==1.13.1 if this bothers you.
Thank you! Was scratching my head a bit...
Edit: Actually I still can't get it to reproduce 13 (getting 3)....
Wow thank you. I've been getting a 10 on torch 2.1.0+cu118
same@@RounakJain91 which is funny because when i left num samples = 20, the first sample was 13 but 1 sample gives me 10...
Let me say it,
THE best educational series.
Sir, I don't have enough words to thank you.
Thanks for taking the time to explain broadcasting, the rules, and such a gentle introduction to Torch
This is how I'm spending my time off from Thanksgiving break. Watching this whole series 🍿
Andrej thank you. As a newcomer this is perfect introduction, i want to practice this approach with alternative languages and iterate what you have shown. Thanks again.
I love you Andrej! Learning from this series has been therapeutic for me. I’m glad you have decided to be full time educator. All the best with Eureka AI! It’s gonna make a positive impact on a lot of our lives. You da best!!❤
Thank you Andrej, this is absolutely the best hands-on coding neural nets & PyTorch tutorial. Special thanks for decoding cryptic PyTorch docs. Very, very useful!
Thanks for explaining things at such an incredible level of detail. (1:08:49) A couple of days ago I was too searching for what the difference between torch.Tensor and torch.tensor. Passing dtype explicitly seems like a good practice to me. Also the caution on the broadcasting rules is very valid. Often the bugs in my Pytorch/Tensorflow code are due to unexpected broadcasting going undetected because of say an averaging or reduction operation. Putting a lot of asserts on the shapes helps me.
this video (and the whole tutorial series) is extremely good. Breaking down a complex topic into its components and explaining them in a structured way is definitely not easy and requires a lot of work. Thank you for doing this, I'm learning a lot.
One thing I noticed (just to be clear, this is not really critique but just something interesting I noticed): at 1:56:05 the results are not exactly equal - the third and fifth example generated are different. The last few characters are equal again, since the models just use the previous character to determine the next, so when both models reach the same character again they are likely to continue the same way.
Absolutely love these videos, Andrej! I wish i stumbled upon these sooner but glad i'm here now.