Really enjoyed watching the vid, I've been learning computer architecture with nand2tetris and Digital Design and Computer Architecture by David Harris (Author), Sarah Harris (Author). I'm so happy to be able to understand the concepts he was talking about in this vid. Anyway thank you for the easy-for-beginner excellent content.
Hi Tom, at 16:36, on line 19, you should fix the "float(i);" to "(float) i;" I'm assuming you're trying to cast the integer value to a floating point data type.
Why did you need to use "float f" at time index 30:00 - why didn't you combine everything into 1 line of: "d_out[idx] = d_in[threadIdx.x] * d_in[threadIdx.x]" ? Is there a penalty for reading the thread index multiple times - or you did it just for clarity and explaining how the code works?
How do you ensure that the threadID does not go out of bounds of the array? I could have 1000 threads right? But only have 60 elements in array to square.
among all the cuda videos I ve watched this one made the most sense to me
true
Amazing lecture. Helped me a loooooot for my final exam. Thank u soooo much. ❤️❤️❤️
This is very good video explanation about GPU computation
Amazing info! Love the way the data flow and execution is explained!
It is like impossible power of computation! Beautiful beast!
Really enjoyed watching the vid, I've been learning computer architecture with nand2tetris and Digital Design and Computer Architecture by David Harris (Author), Sarah Harris (Author). I'm so happy to be able to understand the concepts he was talking about in this vid. Anyway thank you for the easy-for-beginner excellent content.
Great lecture thanks for sharing! Thanks for sharing an interesting piece of history on how "bug" concept came to be
Great Lecture! Very helpful!
Excellent introduktion! Thanks!
best cuda explanation ever
Cheers mate! Always love a good programming lecture. :)
Great tutorial. Thank you !
Hi Tom, at 16:36, on line 19, you should fix the "float(i);" to "(float) i;" I'm assuming you're trying to cast the integer value to a floating point data type.
Why did you need to use "float f" at time index 30:00 - why didn't you combine everything into 1 line of: "d_out[idx] = d_in[threadIdx.x] * d_in[threadIdx.x]" ? Is there a penalty for reading the thread index multiple times - or you did it just for clarity and explaining how the code works?
Thank you so much for the video! Quite helpful. Appreciate it :D
How do you ensure that the threadID does not go out of bounds of the array? I could have 1000 threads right? But only have 60 elements in array to square.
you pass the arraysize along with thread amount to the kernal e.g. square < < < 1, arraySize > > > ensres only 64 threads are created
Very neat!Thank you!
Could you have squared the d_in array in place? So d_in[idx] = d_in[idx] * d_in[idx]
15:20 Single Instruction Multiple Threads
Can you tell me what threads mean ? because I'm new to the GPU world😁
nice boy
Amazing !!
You could add timestamps
Great explanation! Thy
*thx not thy
Great tutorial! Thank you so much!