- 14
- 72 408
Sourish Kundu
United States
Приєднався 14 сер 2023
Hi all! I produce content about computer science and technology in general, along with some of my thoughts on life. I believe that in order to become the best version of yourself, one must find true passion and joy in what they work on and this channel is me sharing that with others.
I recently graduated college from the University of Wisconsin-Madison majoring in Computer Science, Data Science, & Economics. One thing that was really hard for me during this period was preparing for internships and getting offers. I also want to share some of my best tips and advice for current college students so hopefully their experiences are a little bit smoother than mine!
All content on this channel is produced by and is the intellectual property of Sourish Kundu LLC.
I recently graduated college from the University of Wisconsin-Madison majoring in Computer Science, Data Science, & Economics. One thing that was really hard for me during this period was preparing for internships and getting offers. I also want to share some of my best tips and advice for current college students so hopefully their experiences are a little bit smoother than mine!
All content on this channel is produced by and is the intellectual property of Sourish Kundu LLC.
Day in the Life of a Machine Learning Systems Engineer @ TikTok Bay Area!
Join me on a typical day as a college new grad balancing my work as a machine learning systems engineer at TikTok with my UA-cam channel. Life after college has converged onto a much more predictable routine for me and I'm really excited to share that with you guys!
*Disclaimer: All opinions are my own, and do not represent the position or opinions of the Company.*
Resources:
Galvatron Paper: arxiv.org/abs/2211.13878
MegaBlocks Paper: arxiv.org/abs/2211.15841
Timestamps:
0:00 - Good Morning!
0:46 - Morning Routine & UA-cam
1:26 - Get Ready for Work
2:02 - Driving to Work
2:37 - Workout
3:10 - Starting the Work Day
4:10 - What I do at TikTok
6:09 - Lunch
6:33 - After Lunch
6:56 - Dinner
7:27 - Driving Home
7:44 - UA-cam
8:45 - Nighttime Routine
9:46 - The Almanack of Naval Ravikant
11:52 - Good Night!
*Disclaimer: All opinions are my own, and do not represent the position or opinions of the Company.*
Resources:
Galvatron Paper: arxiv.org/abs/2211.13878
MegaBlocks Paper: arxiv.org/abs/2211.15841
Timestamps:
0:00 - Good Morning!
0:46 - Morning Routine & UA-cam
1:26 - Get Ready for Work
2:02 - Driving to Work
2:37 - Workout
3:10 - Starting the Work Day
4:10 - What I do at TikTok
6:09 - Lunch
6:33 - After Lunch
6:56 - Dinner
7:27 - Driving Home
7:44 - UA-cam
8:45 - Nighttime Routine
9:46 - The Almanack of Naval Ravikant
11:52 - Good Night!
Переглядів: 744
Відео
Who's Adam and What's He Optimizing? | Deep Dive into Optimizers for Machine Learning!
Переглядів 41 тис.Місяць тому
Welcome to our deep dive into the world of optimizers! In this video, we'll explore the crucial role that optimizers play in machine learning and deep learning. From Stochastic Gradient Descent to Adam, we cover the most popular algorithms, how they work, and when to use them. 🔍 What You'll Learn: Basics of Optimization - Understand the fundamentals of how optimizers work to minimize loss funct...
Building My Ultimate Machine Learning Rig from Scratch! | 2024 ML Software Setup Guide
Переглядів 9 тис.Місяць тому
Join me on an exhilarating journey as I dive into the world of Machine Learning by building my very own ML rig from scratch! This video is your all-in-one guide to assembling a powerful machine learning computer, explaining every component's role in ML tasks, and setting up the essential software to get you up and running in the world of artificial intelligence. 🔧 What's Inside: Component Selec...
Welcome to NVIDIA GTC 2024! | Let's Go to the World's Premiere AI Conference
Переглядів 2172 місяці тому
Join me on an incredible journey to NVIDIA GTC 2024, the groundbreaking tech conference where the future of AI, gaming, and graphics comes alive! In this vlog, I dive into the heart of innovation, bringing you exclusive insights, interviews, and a firsthand look at the latest breakthroughs in technology. 🚀 Highlights Include: Keynote Speeches: Get the lowdown on the cutting-edge announcements f...
Transform Any Room into an Art Gallery with AR! || Intro to Augmented Reality with SwiftUI - Part 2
Переглядів 1543 місяці тому
Welcome back to Part 2 of our exciting journey into Augmented Reality app development! Building on the basics covered in Part 1, this tutorial takes your AR skills further by introducing dynamic picture frames and interactive features that bring your wall art to life. Using Apple's advanced RealityKit and ARKit libraries, we'll enhance our app to make your memories not just visible but interact...
Transform Any Room into an Art Gallery with AR! || Intro to Augmented Reality with SwiftUI - Part 1
Переглядів 2323 місяці тому
In this tutorial, we dive into the fascinating world of Augmented Reality (AR) by creating an app that brings your memories to life right on your walls! Using Apple's powerful RealityKit and ARKit libraries, I'll guide you step-by-step on how to develop an AR application that allows you to place pictures from your camera roll onto any wall in your home, office, or anywhere you like. Each image ...
Cascadia Code Calls to Codersl! || Why and How to Customize Your IDE's Font
Переглядів 6444 місяці тому
👨💻🔍 In this quick chat, we explore Cascadia Code, the innovative font designed specifically for developers and its impact on the coding experience. 🌟 What's Inside: 0:00 - Intro 0:45 - Why Cascadia Code 1:48 - Ligatures 2:27 - Setup & Installation 3:51 - Conclusion 🖥️ Discover the ergonomic benefits of Cascadia Code's design, see how its unique features like ligatures and spacing enhance reada...
Let's Create a NAS! || An Automated Backup Solution with the Zimaboard, TrueNAS, & Proxmox
Переглядів 7014 місяці тому
Today, we'll be creating a NAS, or a Network Attached Storage to store all of my files. We'll talk about how to set one up using TrueNAS inside of a Proxmox virtual machine. After discussing redundancy and backups, we'll also proceed to create an automated backup solution that backs up our NAS to BackBlaze's servers. 📘 Chapters: 0:00 - Intro 0:49 - Parts Used for the Build 3:06 - Zimaboard BIOS...
Uncovering Meaning Amidst Randomness! || A Beginner's Guide to Monte Carlo Integration
Переглядів 7475 місяців тому
🎲 Welcome to our deep dive into Monte Carlo Integration! 🎲 In this video, we're unraveling the mysteries of one of the most intriguing and powerful mathematical techniques used in various fields from finance to physics: Monte Carlo Integration. Perfect for students, professionals, and anyone with a curiosity for mathematics and computational methods! 🔍 What You'll Learn: 1. Monte Carlo Integrat...
Understanding Bloom Filters || How to Save Space at the Cost of Certainty!
Переглядів 4585 місяців тому
Welcome to our in-depth exploration of Bloom Filters! In this video, we demystify this advanced data structure, making it accessible and understandable for beginners. We'll be inserting our favorite fruits into the bloom filter and learning what the catch is when we go to retrieve them! 🔍 What You'll Learn: - Conceptual Overview: Get a clear understanding of what Bloom Filters are and their uni...
How to Find the Best Apartment with Optimal Stopping Theory || The Secretary Problem Explained
Переглядів 5936 місяців тому
🔍 Unraveling the Mysteries of the Secretary Problem! 🧠 Welcome to our deep dive into the fascinating world of the Secretary Problem, also known as the Marriage Problem or the Best Choice Algorithm! This mathematical puzzle has perplexed and intrigued researchers and enthusiasts alike. In this video, we'll explore the intricacies of this classic problem, delving into its history, mathematical fo...
Landing that First Tech Internship and Beyond! || Advice I Wish I Had as a CS Freshman
Переглядів 4007 місяців тому
👨💻 Are you a computer science student struggling to land that all-important first internship? Look no further! In this video, I share my personal journey and the strategies that helped me secure my first internship in the competitive field of computer science. 🔍 What You'll Learn: Resume Must-Haves: Discover the key elements every computer science resume should include to catch a recruiter's e...
The Ultimate Guide to UW-Madison || Why You Should Apply!
Переглядів 4,6 тис.7 місяців тому
Welcome to the world of the Wisconsin Badgers! 🦡 If you're a high school senior contemplating your college choices, this guide is specially crafted for you. Dive into the heart of UW-Madison with me as I share my personal journey, lessons learned, and the myriad reasons why this iconic institution might just be your dream college destination. In this video, we'll explore: - The unparalleled aca...
Train AI to Beat Super Mario Bros! || Reinforcement Learning Completely from Scratch
Переглядів 13 тис.8 місяців тому
Today we'll be implementing a Reinforcement Learning algorithm named the Double Deep Q Network algorithm. A lot of other videos will use a library like Stable Baselines, however, today we'll be building this completely from scratch. It'll be used to train the computer to play Super Mario Bros on the NES! This is a tutorial aimed at people that have a base level understanding of ML, but not nece...
10:19 What a weird formula for NAG! It's much easier to remember a formulation where you always take antigradient. You want to *add* velocity and take gradient with *minus* . The formula just changes to V_t+1 = b V_t - a grad(Wt + b V_t) W_t+1 = W_t + V_t+1 It's more intuitive and more similar to standard GD. Why would anyone want to change these signs? How often do you subtract velocity to update the position? Do you want to *add* gradient to update V right after you explained we want to subtract gradient in general to minimize the loss function? It makes everything twice as hard and just... wtf...
Hi! Thanks for bringing this up! I've seen the equation written in both forms, but probably should've elected for the one suggested by you! This is what I was referring to for the equation: www.arxiv.org/abs/1609.04747
Great clear nd thorough content. I look forwrard to seeing more! 🤓
Awesome, thank you!
I echo other comments; this is such a great video and you can see the effort put in, and you present your knowledge really well. Keep it up :)
Wow, thank you!
Thanks for the great explanations! The graphics and benchmark were particularly useful.
I'm really glad to hear that!
I tried using momentum for a 3SAT optimizer token i worked on in 2010. Doesn't help with 3SAT since all variables are binary. It's cool that it works with NNs though!
Oh wow that's an interesting experiment to run! Glad you decided to try it out
Im new to coding im generell. how do i make a python file like you did?
Hi! I would recommend with setting up an IDE such as VSCode, along with a local Python environment with Anaconda. Then, you'll need to install and set up PyTorch, the instructions for which are in the repo. There are plenty of resources online about how to get started with programming so I would treat Google as your best friend!
check out fracm optimizer (Chen et al. - An Adaptive Learning Rate Deep Learning Optimizer Using Long and Short-Term Gradients Based on G-L Fractional-Order Derivative)
i wonder if you can combine nag and fractional gradients to make a generally even better optimizer
very clearly explained - thanks
Glad you liked it
is the code available?
Unfortunately, the code for the animations are not ready for the public haha. It's wayyy too messy. However, I didn't include the code for the optimizers because the equations are straight forward to implement, but how you use the gradients to update weights depends greatly on how the rest of the code is structured.
What is the total Cost of this Setup?
Hi! The total cost was about 2.8k although some parts I probably should’ve gone cheaper on like the motherboard. I have a full list of the parts in the description
@@sourishk07 Thank you! I did not notice the sheet in the description. Very Helpful!
This video is super helpful my god thank you
I’m really glad you think so! Thanks
I might just wanna attend GTC next year, nice video!
Love to hear it! Hopefully I’ll see you there
the implementation has more information than a whole semester of my MSc in AI
Haha love to see that. Thanks for watching!
Sorry did I misunderstand something or did you say SGD when it was only GD you talked about? When was stochastic elements discussed?
I guess technically I didn’t talk about how the dataset was batched when performing GD, so no stochastic elements were touched upon. However, I just used SGD as a general term to talk about vanilla gradient descent, like how PyTorch and Tensorflow’s APIs are structured.
@@sourishk07 I see! It would be interesting to see if/how the stochastic element helps with the landscape l(x, y) = x^2 + a|y| or whatever that example was :)
I am the only one to not understand the RMS propagation math formula? What is the gradient squared is it per component or is the Hessian? How do you divide a vector by another vector? Could someane explain me please.
Hi! Sorry, this is something I should've definitely clarified in the video! I've gotten a couple other comments about this as well. Everything in the formula is component-wise. You square each element in the gradient matrix individually & you perform component-wise division, along with the component-wise square root. Again, I really apologize for the confusion! I'll make sure to make these things clearer next time.
what a title 😂
Appreciate the visit!
Very Clear Explanation! Thank you. I especially appreciate the fact that you included the equations.
Thank you! And I’m glad you enjoyed it
I wonder if we could use the same training loop NVIDIA used in the DrEureka paper to find even better optimizers.
Hi! Using reinforcement learning in the realm of optimizers is a fascinating concept and there's already research being done on it! Here are a couple cool papers that might be worth your time: 1. Learning to Learn by Gradient Descent by Gradient Descent (2016, Andrychowicz et al.) 2. Learning to Optimize (2017, Li and Malik) It would be fascinating to see GPT-4 help write more efficient optimizers though. LLMs helping accelerate the training process for other AI models seems like the gateway into AGI
@@sourishk07 Thanks for the answer!
The intuition behind why the methods help with convergence is a bit misleading imo. The problem is not in general with slow convergence close to optimum point because of a small gradient, that can easily be fixed with letting step size depend on gradient size. The problem that it solves is when the iterations zig-zag because of large components in some directions and small components in the direction you actually want to move. By averaging (or similar use of past gradients) you effectively cancel out the components causing the zig-zag.
Hello! Thanks for the comment. Optimizers like RMSProp and Adam do make step size dependent on gradient size, which I showcase in the video, so while there are other techniques to deal with slow convergence close to the optimum point due to small gradients, having these optimizers still help. Maybe I could've made this part clearer though. Also, from my understanding, learning rate decay is a pretty popular technique used so wouldn't that just slow down convergence even more as the learning rate decays & the loss approaches the area with smaller gradients? However, I definitely agree with your bigger point about these optimizers from preventing the loss from zig-zagging! In my RMSProp example, I do show how the loss is able to take a more direct route from the starting point to the minimum. Maybe I could've showcased a bigger example where SGD zig-zags more prominently to further illustrate the benefit that RMSProp & Adam bring to the table. I really appreciate you taking the time to give me feedback.
@@sourishk07 Yeah, I absolutely think the animations give good insight into the different strategies within "moment"-based optimizers. My point was more that even with "vanilla" gradient descent methods, the step sizes can be handled to not vanish as the gradient gets smaller, and that real benefit of the other methods is for altering the _direction_ of descent to deal with situations where eigenvalues of the (locally approximate) quatratic form differs in orders of magnitude. But I must also admit that (especially in the field of machine learning) the name SGD seem to be more or less _defined_ to include a fixed decay rate of step sizes, rather than just the method of finding a step direction (where finding step sizes would be a separate (sub-)problem), so your interpretation is probably more accurate than mine. Anyway, thanks for replying and I hope you continue making videos on the topic!
Absolutely loved the graphics and intensive paper based proof of working of different optimizers , all in the same video. You just earned a loyal viewer.
Thank you so much! I'm honored to hear that!
The “problem” the Adam algorithm in this case is presented to solve (the one with local and global minima) is simply wrong - in small amounts of dimensions this is infact a problem, but the condition for the existence of a local minima grows more and more strongly with the amount of dimensions. So in practice, when you have millions of parameters and therefore dimensions, local minima that aren’t the global minima will simply not even exist, the probability for such existence is simply unfathomably small.
Hi! This is a fascinating point you bring up. I did say at the beginning that the scope of optimizers wasn't just limited to neural networks in high dimensions, but could also be applicable in lower dimensions. However, I probably should've added a section about saddle points to make this part of the video more thorough, so I really appreciate the feedback!
This Server is a dream 😄
Haha stay tuned for a more upgraded one soon!
I used to have networks where the loss was fluctuating in a very periodic manner every 30 or so steps and I never knew why that happened. Now it makes sense! It just takes a number of steps for the direction of Adam weight updates to change. I really should have looked this up earlier.
Hmm while this might be Adam's fault, I would encourage you to see if you can replicate the issue with SGD w/ Momentum or see if another optimizer without momentum solves it. I believe there are a wide array of reasons as to why this periodic behavior might emerge.
why not using a metaheuristic approach?
Hi! There seems to be many interesting papers about using metaheuristic approaches with machine learning, but I haven't seen too many applications of them in industry. However, this is a topic I haven't looked too deeply into! I simply wanted to discuss the strategies that are commonly used by modern day deep learning and maybe I'll make another video about metaheuristic approaches! Thanks for the idea!
Great video dude!
Thanks so much! I've seen your videos before! I really liked your videos about Policy Gradients methods & Importance Sampling!!!
@@sourishk07 thanks! There was some hard work behind them, so I’m happy to hear they’re appreciated. But I don’t need to tell you that. This video is a master piece!
I really appreciate that coming from you!!
Gemini 1.5 Pro: This video is about optimizers in machine learning. Optimizers are algorithms that are used to adjust the weights of a machine learning model during training. The goal is to find the optimal set of weights that will minimize the loss function. The video discusses four different optimizers: Stochastic Gradient Descent (SGD), SGD with Momentum, RMSprop, and Adam. * Stochastic Gradient Descent (SGD) is the simplest optimizer. It takes a step in the direction of the negative gradient of the loss function. The size of the step is determined by the learning rate. * SGD with Momentum is a variant of SGD that takes into account the history of the gradients. This can help the optimizer to converge more quickly. * RMSprop is another variant of SGD that adapts the learning rate for each parameter of the model. This can help to prevent the optimizer from getting stuck in local minima. * Adam is an optimizer that combines the ideas of momentum and adaptive learning rates. It is often considered to be a very effective optimizer. The video also discusses the fact that different optimizers can be better suited for different tasks. For example, Adam is often a good choice for training deep neural networks. Here are some of the key points from the video: * Optimizers are algorithms that are used to adjust the weights of a machine learning model during training. * The goal of an optimizer is to find the optimal set of weights that will minimize the loss function. * There are many different optimizers available, each with its own strengths and weaknesses. * The choice of optimizer can have a significant impact on the performance of a machine learning model.
Thank you Gemini for watching, although I'm not sure you learned anything from this lol
Very nicely explained. Wish you brought up the relationship between these optimizers and numerical procedures though. Like how vanilla gradient descent is just Euler's method applied to a gradient rather than one derivative.
Thank you so much. And there were so many topics I wanted to cram into this video but couldn't in the interest of time. That is a very interesting topic to cover and I'll add it to my list! Hopefully we can visit it soon :) I appreciate the idea
7hrs of work per day, that's pretty sweet. Wondering how many hours your Chinese colleagues do...
It really depends on the team! My work life balance is pretty good, but some nights I do have to work after I get back home!
Gem of a channel!
Thank you so much!
what a good video, I watched it and bookmarked so I can come back to it when I understand more about the topic
Glad it was helpful! What concepts do you feel like you don’t understand yet?
I dont know what I did for youtube to randomly bless me with this gem of a channel, but keep your work up man. I love your content, its nice to see people with similar passions.
I’m really glad to hear that! Thanks for those kind words
This is so cool, im definitely gonna try this when I get my hands on some extra hardware. Amazing video. I can also imagine this must be pretty awesome if youre some sort of scientist/student at a university that needs some number crunching machine since youre not limited to being at your place or some pc lab.
Yes, I think it’s a fun project for everyone to try out! I learned a lot of about hardware and the different softwares
Just found out your channel. Instant follow 🙏🏼 Hope we can see more Computer Science content like this. Thank you ;)
Thank you so much for watching! Don't worry, I have many more videos like this planned! Stay tuned :)
I need help. I tried using the code and the trials are being saved somewhere, but I can't find it. can you tell me where it is getting stored at? Edit: I found it. it was stored in the C:User\(UserName)\AppData\Local\Temp folder.
If you're simply running main.py, then the checkpoints should be saved in the same directory as main.py under a folder titled 'output.' Let me know if that's what you were looking for!
@@sourishk07 what do I do if I can't find the output folder?
@@simsimhaningan Are there any errors while running main.py? My guess is you're not in the same folder as main.py when you run it. Make sure you're in the root directory of the repository when you run main.py!
Why do you move your head so much
LMAO idk man...
I remembered when my teacher gave me assignment on optimizers I have gone through blogs, papers and videos but everywhere I see different formulas I was so confused but you explained everything at one place very easily.
I'm really glad I was able to help!
love that title haha
Haha thank you!
Nice vid, I'd mention MAS too, to explicity say that Adam at the start is weaker and could fit local minima(until it gets enough data) and SGD peforms well with its stochasity, and then slower, so both methods (peformed nearly like I mentioned in MAS Paper)
Thank you for the feedback! These are great things to include in a part 2!
1.54k subs it's crazy low for this quality remember me when you make it my boy <3
Thank you for those kind words! I'm glad you liked the video
Great, great, great!!!
Thanks!!!
Great video!!
Glad you enjoyed it
Thank you So much sir. But I will like you to create videos on upconvolutions or transposed convolutions. Thank you for understanding
Hi! Thank you for the great video ideas. I'll definitely add those to my list!
Great video! You forgot to add the paper links in the description :)
I'm glad you enjoyed it! And thank you for the catch; The description has been updated!
is this system good for inference? Llama 70b will run on this? I wonder whether RAM really compensates for the VRAM
Hello! That's a good question. Unfortunately, 70b models struggle to run. Llama 13b works pretty well. I think for my next server, I definitely want to prioritize more VRAM
Hi, what is your experience with this rig? Is it not a problem for the temperature that the case is so tight?
The temperature has not been an issue with the same case size
this is por**graphy
LMAO
I love it! You are also nice to hear and see! :D
Haha thank you very much!
Loved this vid
Thank you Raunak!
a “week” in the life😭😂
Don't expose me 😭😂
U had me for a sec with the vanilla mangoes😭😭😂😂😂😂😭😭
HAHAHA had to keep you on your toes!