Thank you for posting this. My knowledge of Cuda is stuck in 2018, and it is a real eye opener what it can do now. Block clusters, grid sync, matrix multiplications kernels launched directly from device...It is making me interested in programming them again. Both of the presentations by Stephen Jones were excellent.
so I can have multiple thingies running different instructions and I can fire and forget between these thingies without any synchronization penalty? looks pretty cool.
Your both CUDA videos are one of the best I have ever seen on that topic. Very interesting, insightful and entertaining. Gold.
Thank you for posting this. My knowledge of Cuda is stuck in 2018, and it is a real eye opener what it can do now. Block clusters, grid sync, matrix multiplications kernels launched directly from device...It is making me interested in programming them again. Both of the presentations by Stephen Jones were excellent.
so I can have multiple thingies running different instructions and I can fire and forget between these thingies without any synchronization penalty? looks pretty cool.