This is a really good lecture. Learned a lot about the state of the art for video inference. Was super interesting to see how all these ideas built on each other slowly over the (model) generations.
my guess is even though one pixel in the Pool2D is looking at a 4*3 pixels of the input, some Pool2D pixels are actually sort of overlapped. 6 = 3+1+1+1.
Justin was the reason CS231n was interesting. Thanks for open-sourcing these videos also.
Thank you so much for sharing these lectures Justin, super useful!
Hello, can u please make an implementation video on this, using the encoder like architecture of transformer for video classification
The most useful lectures I've ever seen on UA-cam. Thank you.
The Universe wants the BEST for you. Follow the path of least resistance ❣️
This is a really good lecture. Learned a lot about the state of the art for video inference. Was super interesting to see how all these ideas built on each other slowly over the (model) generations.
why the receptive field after Pool2D is 1x6x6?
my guess is even though one pixel in the Pool2D is looking at a 4*3 pixels of the input, some Pool2D pixels are actually sort of overlapped. 6 = 3+1+1+1.
Really great lecture!