Some days the internet makes me sad. Other days it reminds me of all the people with the same niche interests as me and how incredibly talented some of them are. Thanks for putting so much effort into this :)
*skilled. not talented. talent is god-given. skill is developed through practice. I think it's a little disrespectful to call someone talented - almost makes it seem like they didn't work for it. 🙂
"MINI" Project? What the heck?! You just munched a lot of hard to grasp technical implementations, coded a working example, shared it on your blog, AND made a fully animated video about it!! You make me mad.
Good job. The reason workgroups are laid out in 1d/2d/3d grids is that all GPU compute APIs were first designed and implemented on top of existing graphics concepts where calculating outputs as e.g. 2D grids is a natural thing.
Can you talk about scan operations like Blelloch and Hillis/Steele algorithms? Maybe you've already talked about them I haven't seen all your videos. This would be nice to know in what context they're used and provide a cool visualisation of them too.
Great content! One thing I have never understood is, for matrix multiplication, what is the sort of threshold in size that makes a GPU implementation faster than a CPU implementation? If I want to do one million multiplications of 4 x 4 matrices, ignoring the overhead required to set up the computations, is the GPU faster than the CPU? Surely not. What about 100 x 100? 1000 x 1000?
What was the reason for surprise at the x-horizontal / y-vertical layout? It's the standard convention for image processing, which is what GPUs are designed for
y-axis is generally vertical up and I'm not bothered too much about this as well. What confused me was (z, y, x). I understand that GPUs weren't designed to work with these kind of computations but it always confused me (when I started out).
Some days the internet makes me sad. Other days it reminds me of all the people with the same niche interests as me and how incredibly talented some of them are. Thanks for putting so much effort into this :)
Thanks a lot. Glad you liked the video 😃
*skilled. not talented. talent is god-given. skill is developed through practice. I think it's a little disrespectful to call someone talented - almost makes it seem like they didn't work for it.
🙂
"MINI" Project? What the heck?! You just munched a lot of hard to grasp technical implementations, coded a working example, shared it on your blog, AND made a fully animated video about it!! You make me mad.
😅
That was a click bait for me, cause it ain't MINI at all
Great stuff Tushar. I have been keen on learning GPU programming so great to see your videos in my feed. Keep it up and all the best.
Great to hear! 😀
Yay, CUDA video. Feel like my timeline has been needing CUDA content
I am reading the CUDA C programming book and your videos are super helpful in visualizing the memory access process! Thank you very much!
Glad it was helpful!
dude this is actually amazing. you’re the cs version of 3b1b… keep up the great work!
Thanks a lot! I appreciate it 😃
Hey, I’ve been going through the Programming Massively Parallel Processors book lately and doing some CUDA and this was a GREAT video!!!
Thanks a lot! Glad the video was helpful!
Good job. The reason workgroups are laid out in 1d/2d/3d grids is that all GPU compute APIs were first designed and implemented on top of existing graphics concepts where calculating outputs as e.g. 2D grids is a natural thing.
I am from embedded systems background but I love your work. Keep Going brother, Just don't quit ! There's always an audience for great content.
Thanks a lot! I’m in this for the long run 😃
This is not a "mini" project, you made some real content here! Fantastic video, congrats!
Thanks! 😃
Super satisfying to see Manim to show the algorithm like that.
Glad you enjoy it!
Great video. I love the simplicity, and the great explanation.
Thanks. Glad you found it useful.
Crazy good manim skills! perfect video
Appreciate it!
This is so beautiful and magnificent to see ❤❤❤🎉
Thank you so much!
Beautiful visualization!! i am enjoying watching your videos. Keep up the good work
Thank you! Cheers!
Thanks for the research! Keep going! I would like to see other algorithms being run an optimized on GPUs...
Your content is very helpfull and your method teach is great
Beautiful video as usual. I'll am motivated to pick up PMPP after sem end just from watching your videos!
Thanks a lot and good luck 😃
High quality content. Subscribed.
Thanks a lot. I really appreciate it 😃
Amazing🎉.
Thank you!
Gonna enjoy this knowledge
yay i know a little about this now, thank you!!
I'm so dumb to understand this but I know this is something good. I'll understand it someday
really nice stuff, thanks
Glad you liked it!
i study linear algebra and im shocked right now now cuz it’s important to program the hardware system
Nice Manim work!
Can you talk about scan operations like Blelloch and Hillis/Steele algorithms? Maybe you've already talked about them I haven't seen all your videos. This would be nice to know in what context they're used and provide a cool visualisation of them too.
I can take these topics for some future videos. Thanks a lot for suggesting. 😀
Great content!
One thing I have never understood is, for matrix multiplication, what is the sort of threshold in size that makes a GPU implementation faster than a CPU implementation? If I want to do one million multiplications of 4 x 4 matrices, ignoring the overhead required to set up the computations, is the GPU faster than the CPU? Surely not. What about 100 x 100? 1000 x 1000?
GPUs are suited for large dataset. I can’t specify a number as it will depend on the algorithm and GPU specs.
Thanks for the video, from where did you learn all this stuff? Any book or course?
I've put links to a couple of Blog posts in the description. They were very helpful (especially when it came to verifying my code).
But you didn’t take any course right?
Nope
What did you do? Compared with GFLOP you developed your own function to handle matrix mul ?
I wrote SGeMM from scratch (that runs on a gpu)
@ got it 🔥great man !!
Thans for you Service
Yet another Indian banger video
What was the reason for surprise at the x-horizontal / y-vertical layout? It's the standard convention for image processing, which is what GPUs are designed for
y-axis is generally vertical up and I'm not bothered too much about this as well. What confused me was (z, y, x). I understand that GPUs weren't designed to work with these kind of computations but it always confused me (when I started out).
@@0mean1sigma i think you mean i,j vs x,y.. i often means the row in matrix operations. But x is always horizontal in cartesian coordinates.
I was hoping that someone would simplify GPU's Matrix multiplication to me , so thank you .
Glad you found it useful 😃
W project
So, after trying your benchmarks, I've got that cuBLAS is fastest in comparison to any of your approach.
Though, nice video
Thanks a lot for your comment. Yes, my implementations are slower than CUDA but my focus was more on understanding the GPU programming concepts.
i have a question ... from where can i learn these concepts
I’ve provided some of the links in the video description. There are also some good textbooks. Good luck!
are these floating point or integer matrices ?
Float
@@0mean1sigma thank you for sharing your GPU coding experience
Are you using manim?
Yes he is
Why it feels like 3blue1brown vid... You use manim???
Yes
Where is Cuda/C/C++
Check out the links in the description
Mini💀
5 minutes in and I’m starting to stroke
wtf did I just watched ? 😬
Are you using manim??
Yes