*DeepMind x UCL | Deep Learning Lectures | 4/12 | Advanced Models for Computer Vision* *My takeaways:* *1. Why we need to go beyond image classification **0:15* *2. Plan for this lecture **3:40* *3. Tasks beyond classification **5:20* 3.1 Object detection 7:22 -Model: Fast R-CNN, two-stage detector 14:40 ---Identify good candidate bounding boxes ---Classify and refine -Model: RetinaNet, one-stage detector 20:55 3.2 Semantic segmentation 28:12 -Model: U-Net 32:45 3.3 Instance segmentation 36:31 -Reference model: Mask R-CNN 3.4 Metrics and benchmarks 37:48 -Classification: percentage of correct predictions; Top-1: top predication is the correct class; Top5: correct class is in top-5 predictions -Object detection and segmentation: Intersection-over-union (IoU) -Object detection and segmentation dataset: Cityscapes, COCO 3.5 Training tricks 42:58 -Transfer learning *4. Beyond single input image: motions are import cues **50:26* 4.1 Pairs of images 59:16 -Model: FlowNet 1:00:35 4.2 Video input 1:03:56 -Apply 2D model to each frame 1:04:12 -3D convolutions 1:05:48 4.3 Applications: -Action recognition 1:09:30 --Model: SlowFast 1:11:25 4.4 Training tricks 1:14:35 -Transfer learning 4.5 Challenges: difficult to obtain labels; large memory requirements; high Latency; high energy consumption 1:15:53 *5. Beyond strong supervision **1:20:23* 5.1 Data labelling is tedious 1:20:36 5.2 Self supervised learning 1:21:40 -Standard loss: learn mapping between inputs and output distributions/values -Metric learning: learn to predict distances between inputs given some similarity measure (e.g. same person or not) -State-of-the-art representation learning vs supervised learning on accuracy and number of parameters 1:29:41 *6. Open questions **1:30:16*
Great course ! Thanks DeepMind, Thanks Viorica! just a little remark, it's hard to tell what she is showing with the laser pointer on the projection :)
The object detection part was a little confusing, too much explanation with only a few images to refer to. I think a more visual explanation would work better 😊
Great lectures, thanks! One small request I'd like to make is that due to showing only the lecturer's face and the computer slides and not the projector screen in the class, in some complicated slides it gets hard to follow, as the teacher is obviously pointing at parts of those slides as she is speaking. If it's not inconvenient, please try and do that with a mouse pointer instead, so that it's clearer to the UA-cam viewers. Thanks again!
Thank you so much for the kind, simple, and well-explained lecture! The open question parts were great that give some insights about the near future :)
abhishek yadav It’s a single unit for the forward pass; but during training, the RPN (region proposal network) is trained separately using objectness scores for loss.
I really like the series, and tr to watch half a a episode per day. This lecture was for me(not a native English Speaker) quit hard to follow. I don`t want to offend, i just want to give feedback. I think it was gerat that you gave it a shot, keep at it and you will became better :)
I don't get why literally everywhere in the research (CV) community quote FPS without hardware. Even children know that FPS is hardware dependent, because they play games and the same game will have different FPS on the same graphics settings, sometimes even on the same machine based on cooling. Changing the performance of the hardware will drastically change the 'FPS'. From 1 to 1000. This number is totally meaningless without indicating the hardware. I guess its my job to guess which card you used and how many? In 2018 I would assume 1 or 2 NVIDIA GTX 1080TIs.
Ben B You raise a very valid point, this is one of my pet peeves about CV literature too! Numbers like FPS or even running time are too dependent on the underlying hardware to be meaningful without context. A more meaningful measure would perhaps be ‘#computations per forward pass’ or something similar. In this particular case, the 5fps claim comes directly from the Faster RCNN paper, which came out in 2015. The exact details of the hardware are not mentioned in the paper (or at least I couldn’t find it). I assume it will be quite a bit faster on contemporary GPUs.
*DeepMind x UCL | Deep Learning Lectures | 4/12 | Advanced Models for Computer Vision*
*My takeaways:*
*1. Why we need to go beyond image classification **0:15*
*2. Plan for this lecture **3:40*
*3. Tasks beyond classification **5:20*
3.1 Object detection 7:22
-Model: Fast R-CNN, two-stage detector 14:40
---Identify good candidate bounding boxes
---Classify and refine
-Model: RetinaNet, one-stage detector 20:55
3.2 Semantic segmentation 28:12
-Model: U-Net 32:45
3.3 Instance segmentation 36:31
-Reference model: Mask R-CNN
3.4 Metrics and benchmarks 37:48
-Classification: percentage of correct predictions; Top-1: top predication is the correct class; Top5: correct class is in top-5 predictions
-Object detection and segmentation: Intersection-over-union (IoU)
-Object detection and segmentation dataset: Cityscapes, COCO
3.5 Training tricks 42:58
-Transfer learning
*4. Beyond single input image: motions are import cues **50:26*
4.1 Pairs of images 59:16
-Model: FlowNet 1:00:35
4.2 Video input 1:03:56
-Apply 2D model to each frame 1:04:12
-3D convolutions 1:05:48
4.3 Applications:
-Action recognition 1:09:30
--Model: SlowFast 1:11:25
4.4 Training tricks 1:14:35
-Transfer learning
4.5 Challenges: difficult to obtain labels; large memory requirements; high Latency; high energy consumption 1:15:53
*5. Beyond strong supervision **1:20:23*
5.1 Data labelling is tedious 1:20:36
5.2 Self supervised learning 1:21:40
-Standard loss: learn mapping between inputs and output distributions/values
-Metric learning: learn to predict distances between inputs given some similarity measure (e.g. same person or not)
-State-of-the-art representation learning vs supervised learning on accuracy and number of parameters 1:29:41
*6. Open questions **1:30:16*
These lectures as interesting, however it feels like they are way too high-level.
Great course ! Thanks DeepMind, Thanks Viorica! just a little remark, it's hard to tell what she is showing with the laser pointer on the projection :)
The object detection part was a little confusing, too much explanation with only a few images to refer to. I think a more visual explanation would work better 😊
I feel the same
This lecture series is great and bound to spread great knowledge
Oh my goodness. I’m so left behind in this field. I don’t know how to catch up. Thank you for the lesson!
Great lectures, thanks! One small request I'd like to make is that due to showing only the lecturer's face and the computer slides and not the projector screen in the class, in some complicated slides it gets hard to follow, as the teacher is obviously pointing at parts of those slides as she is speaking. If it's not inconvenient, please try and do that with a mouse pointer instead, so that it's clearer to the UA-cam viewers. Thanks again!
Thank you so much for the kind, simple, and well-explained lecture! The open question parts were great that give some insights about the near future :)
Thank you for sharing this. It's a great walkthrough of how computer vision has improved.
Great lecture and big thanks to DeepMind for sharing this great content.
Correct me if I'm wrong but faster RCNN is a one stage detector and end-to-end differentiable as opposed to what is given in the lecture
abhishek yadav It’s a single unit for the forward pass; but during training, the RPN (region proposal network) is trained separately using objectness scores for loss.
I really like the series, and tr to watch half a a episode per day. This lecture was for me(not a native English Speaker) quit hard to follow. I don`t want to offend, i just want to give feedback. I think it was gerat that you gave it a shot, keep at it and you will became better :)
I don't get why literally everywhere in the research (CV) community quote FPS without hardware. Even children know that FPS is hardware dependent, because they play games and the same game will have different FPS on the same graphics settings, sometimes even on the same machine based on cooling. Changing the performance of the hardware will drastically change the 'FPS'. From 1 to 1000. This number is totally meaningless without indicating the hardware.
I guess its my job to guess which card you used and how many? In 2018 I would assume 1 or 2 NVIDIA GTX 1080TIs.
Ben B You raise a very valid point, this is one of my pet peeves about CV literature too!
Numbers like FPS or even running time are too dependent on the underlying hardware to be meaningful without context. A more meaningful measure would perhaps be ‘#computations per forward pass’ or something similar.
In this particular case, the 5fps claim comes directly from the Faster RCNN paper, which came out in 2015. The exact details of the hardware are not mentioned in the paper (or at least I couldn’t find it). I assume it will be quite a bit faster on contemporary GPUs.
Quite useful
Thank you.
Wow! We blink to reduce activity in the brain. I'm going to close my eyes when I think about things now. :)
thank you!
I hardly think the length of this video is a coincidence?!?!
What do you mean? 😅