DeepMind x UCL | Deep Learning Lectures | 4/12 | Advanced Models for Computer Vision

Поділитися
Вставка
  • Опубліковано 21 лис 2024

КОМЕНТАРІ • 23

  • @leixun
    @leixun 4 роки тому +38

    *DeepMind x UCL | Deep Learning Lectures | 4/12 | Advanced Models for Computer Vision*
    *My takeaways:*
    *1. Why we need to go beyond image classification **0:15*
    *2. Plan for this lecture **3:40*
    *3. Tasks beyond classification **5:20*
    3.1 Object detection 7:22
    -Model: Fast R-CNN, two-stage detector 14:40
    ---Identify good candidate bounding boxes
    ---Classify and refine
    -Model: RetinaNet, one-stage detector 20:55
    3.2 Semantic segmentation 28:12
    -Model: U-Net 32:45
    3.3 Instance segmentation 36:31
    -Reference model: Mask R-CNN
    3.4 Metrics and benchmarks 37:48
    -Classification: percentage of correct predictions; Top-1: top predication is the correct class; Top5: correct class is in top-5 predictions
    -Object detection and segmentation: Intersection-over-union (IoU)
    -Object detection and segmentation dataset: Cityscapes, COCO
    3.5 Training tricks 42:58
    -Transfer learning
    *4. Beyond single input image: motions are import cues **50:26*
    4.1 Pairs of images 59:16
    -Model: FlowNet 1:00:35
    4.2 Video input 1:03:56
    -Apply 2D model to each frame 1:04:12
    -3D convolutions 1:05:48
    4.3 Applications:
    -Action recognition 1:09:30
    --Model: SlowFast 1:11:25
    4.4 Training tricks 1:14:35
    -Transfer learning
    4.5 Challenges: difficult to obtain labels; large memory requirements; high Latency; high energy consumption 1:15:53
    *5. Beyond strong supervision **1:20:23*
    5.1 Data labelling is tedious 1:20:36
    5.2 Self supervised learning 1:21:40
    -Standard loss: learn mapping between inputs and output distributions/values
    -Metric learning: learn to predict distances between inputs given some similarity measure (e.g. same person or not)
    -State-of-the-art representation learning vs supervised learning on accuracy and number of parameters 1:29:41
    *6. Open questions **1:30:16*

  • @TheAero
    @TheAero Рік тому +1

    These lectures as interesting, however it feels like they are way too high-level.

  • @gringo6969
    @gringo6969 4 роки тому +6

    Great course ! Thanks DeepMind, Thanks Viorica! just a little remark, it's hard to tell what she is showing with the laser pointer on the projection :)

  • @Marcos10PT
    @Marcos10PT 4 роки тому +27

    The object detection part was a little confusing, too much explanation with only a few images to refer to. I think a more visual explanation would work better 😊

    • @wy2528
      @wy2528 4 роки тому

      I feel the same

  • @sayakchakrabarty
    @sayakchakrabarty 4 роки тому +4

    This lecture series is great and bound to spread great knowledge

  • @ninadesianti9587
    @ninadesianti9587 4 роки тому +2

    Oh my goodness. I’m so left behind in this field. I don’t know how to catch up. Thank you for the lesson!

  • @susmitislam1910
    @susmitislam1910 3 роки тому +2

    Great lectures, thanks! One small request I'd like to make is that due to showing only the lecturer's face and the computer slides and not the projector screen in the class, in some complicated slides it gets hard to follow, as the teacher is obviously pointing at parts of those slides as she is speaking. If it's not inconvenient, please try and do that with a mouse pointer instead, so that it's clearer to the UA-cam viewers. Thanks again!

  • @dsazz801
    @dsazz801 2 роки тому

    Thank you so much for the kind, simple, and well-explained lecture! The open question parts were great that give some insights about the near future :)

  • @farhanhubble
    @farhanhubble 4 роки тому

    Thank you for sharing this. It's a great walkthrough of how computer vision has improved.

  • @lukn4100
    @lukn4100 3 роки тому

    Great lecture and big thanks to DeepMind for sharing this great content.

  • @abhishekyadav479
    @abhishekyadav479 4 роки тому

    Correct me if I'm wrong but faster RCNN is a one stage detector and end-to-end differentiable as opposed to what is given in the lecture

    • @ArshedNabeel
      @ArshedNabeel 4 роки тому +1

      abhishek yadav It’s a single unit for the forward pass; but during training, the RPN (region proposal network) is trained separately using objectness scores for loss.

  • @danielsoeller
    @danielsoeller 3 роки тому

    I really like the series, and tr to watch half a a episode per day. This lecture was for me(not a native English Speaker) quit hard to follow. I don`t want to offend, i just want to give feedback. I think it was gerat that you gave it a shot, keep at it and you will became better :)

  • @ben6
    @ben6 4 роки тому +5

    I don't get why literally everywhere in the research (CV) community quote FPS without hardware. Even children know that FPS is hardware dependent, because they play games and the same game will have different FPS on the same graphics settings, sometimes even on the same machine based on cooling. Changing the performance of the hardware will drastically change the 'FPS'. From 1 to 1000. This number is totally meaningless without indicating the hardware.
    I guess its my job to guess which card you used and how many? In 2018 I would assume 1 or 2 NVIDIA GTX 1080TIs.

    • @ArshedNabeel
      @ArshedNabeel 4 роки тому

      Ben B You raise a very valid point, this is one of my pet peeves about CV literature too!
      Numbers like FPS or even running time are too dependent on the underlying hardware to be meaningful without context. A more meaningful measure would perhaps be ‘#computations per forward pass’ or something similar.
      In this particular case, the 5fps claim comes directly from the Faster RCNN paper, which came out in 2015. The exact details of the hardware are not mentioned in the paper (or at least I couldn’t find it). I assume it will be quite a bit faster on contemporary GPUs.

  • @DatascienceConcepts
    @DatascienceConcepts 4 роки тому +1

    Quite useful

  • @lizgichora6472
    @lizgichora6472 3 роки тому

    Thank you.

  • @ben6
    @ben6 4 роки тому

    Wow! We blink to reduce activity in the brain. I'm going to close my eyes when I think about things now. :)

  • @thomasdeniffel2122
    @thomasdeniffel2122 4 роки тому

    thank you!

  • @mortenkallese4024
    @mortenkallese4024 4 роки тому +2

    I hardly think the length of this video is a coincidence?!?!