DeepMind x UCL | Deep Learning Lectures | 4/12 | Advanced Models for Computer Vision

Поділитися
Вставка
  • Опубліковано 31 тра 2020
  • Following on from the previous lecture, DeepMind Research Scientist Viorica Patraucean introduces classic computer vision tasks beyond image classification (object detection, semantic segmentation, optical flow estimation) and describes state of the art models for each, together with standard benchmarks. She discusses similar models for video processing for tasks like action recognition, tracking, and the associated challenges. In particular, she refers to recent work to make video processing more efficient, including using elements of reinforcement learning. Next, she describes various settings for self-supervised learning in uni-modal and multi-modal (vision+audio, visio+language) settings, where large scale is beneficial. Viorica ends with a discussion on open questions in vision and the role of computer vision research within the broader goal of building intelligent agents.
    Download the slides here:
    storage.googleapis.com/deepmi...
    Find out more about how DeepMind increases access to science here:
    deepmind.com/about#access_to_...
    Speaker Bio:
    Viorica is a Research Scientist at DeepMind, working mainly on Computer Vision related problems, with focus on video processing. She did her PhD in Toulouse, France, on statistical models for image processing, and then focused on 3D shape- and video-analysis during her postdoctoral work in Paris and Cambridge. Her dream is to contribute to creating a computational model for the human visual system.
    About the lecture series:
    The Deep Learning Lecture Series is a collaboration between DeepMind and the UCL Centre for Artificial Intelligence. Over the past decade, Deep Learning has evolved as the leading artificial intelligence paradigm providing us with the ability to learn complex functions from raw data at unprecedented accuracy and scale. Deep Learning has been applied to problems in object recognition, speech recognition, speech synthesis, forecasting, scientific computing, control and many more. The resulting applications are touching all of our lives in areas such as healthcare and medical research, human-computer interaction, communication, transport, conservation, manufacturing and many other fields of human endeavour. In recognition of this huge impact, the 2019 Turing Award, the highest honour in computing, was awarded to pioneers of Deep Learning.
    In this lecture series, research scientists from leading AI research lab, DeepMind, deliver 12 lectures on an exciting selection of topics in Deep Learning, ranging from the fundamentals of training neural networks via advanced ideas around memory, attention, and generative modelling to the important topic of responsible innovation.
  • Наука та технологія

КОМЕНТАРІ • 24

  • @leixun
    @leixun 3 роки тому +38

    *DeepMind x UCL | Deep Learning Lectures | 4/12 | Advanced Models for Computer Vision*
    *My takeaways:*
    *1. Why we need to go beyond image classification **0:15*
    *2. Plan for this lecture **3:40*
    *3. Tasks beyond classification **5:20*
    3.1 Object detection 7:22
    -Model: Fast R-CNN, two-stage detector 14:40
    ---Identify good candidate bounding boxes
    ---Classify and refine
    -Model: RetinaNet, one-stage detector 20:55
    3.2 Semantic segmentation 28:12
    -Model: U-Net 32:45
    3.3 Instance segmentation 36:31
    -Reference model: Mask R-CNN
    3.4 Metrics and benchmarks 37:48
    -Classification: percentage of correct predictions; Top-1: top predication is the correct class; Top5: correct class is in top-5 predictions
    -Object detection and segmentation: Intersection-over-union (IoU)
    -Object detection and segmentation dataset: Cityscapes, COCO
    3.5 Training tricks 42:58
    -Transfer learning
    *4. Beyond single input image: motions are import cues **50:26*
    4.1 Pairs of images 59:16
    -Model: FlowNet 1:00:35
    4.2 Video input 1:03:56
    -Apply 2D model to each frame 1:04:12
    -3D convolutions 1:05:48
    4.3 Applications:
    -Action recognition 1:09:30
    --Model: SlowFast 1:11:25
    4.4 Training tricks 1:14:35
    -Transfer learning
    4.5 Challenges: difficult to obtain labels; large memory requirements; high Latency; high energy consumption 1:15:53
    *5. Beyond strong supervision **1:20:23*
    5.1 Data labelling is tedious 1:20:36
    5.2 Self supervised learning 1:21:40
    -Standard loss: learn mapping between inputs and output distributions/values
    -Metric learning: learn to predict distances between inputs given some similarity measure (e.g. same person or not)
    -State-of-the-art representation learning vs supervised learning on accuracy and number of parameters 1:29:41
    *6. Open questions **1:30:16*

  • @Marcos10PT
    @Marcos10PT 3 роки тому +25

    The object detection part was a little confusing, too much explanation with only a few images to refer to. I think a more visual explanation would work better 😊

    • @wy2528
      @wy2528 3 роки тому

      I feel the same

  • @gringo6969
    @gringo6969 3 роки тому +6

    Great course ! Thanks DeepMind, Thanks Viorica! just a little remark, it's hard to tell what she is showing with the laser pointer on the projection :)

  • @sayakchakrabarty
    @sayakchakrabarty 3 роки тому +4

    This lecture series is great and bound to spread great knowledge

  • @farhanhubble
    @farhanhubble 3 роки тому

    Thank you for sharing this. It's a great walkthrough of how computer vision has improved.

  • @ninadesianti9587
    @ninadesianti9587 3 роки тому +2

    Oh my goodness. I’m so left behind in this field. I don’t know how to catch up. Thank you for the lesson!

  • @lukn4100
    @lukn4100 3 роки тому

    Great lecture and big thanks to DeepMind for sharing this great content.

  • @dsazz801
    @dsazz801 Рік тому

    Thank you so much for the kind, simple, and well-explained lecture! The open question parts were great that give some insights about the near future :)

  • @susmitislam1910
    @susmitislam1910 3 роки тому +2

    Great lectures, thanks! One small request I'd like to make is that due to showing only the lecturer's face and the computer slides and not the projector screen in the class, in some complicated slides it gets hard to follow, as the teacher is obviously pointing at parts of those slides as she is speaking. If it's not inconvenient, please try and do that with a mouse pointer instead, so that it's clearer to the UA-cam viewers. Thanks again!

  • @DatascienceConcepts
    @DatascienceConcepts 3 роки тому +1

    Quite useful

  • @thomasdeniffel2122
    @thomasdeniffel2122 3 роки тому

    thank you!

  • @abhishekyadav479
    @abhishekyadav479 3 роки тому

    Correct me if I'm wrong but faster RCNN is a one stage detector and end-to-end differentiable as opposed to what is given in the lecture

    • @ArshedNabeel
      @ArshedNabeel 3 роки тому +1

      abhishek yadav It’s a single unit for the forward pass; but during training, the RPN (region proposal network) is trained separately using objectness scores for loss.

  • @lizgichora6472
    @lizgichora6472 3 роки тому

    Thank you.

  • @danielsoeller
    @danielsoeller 3 роки тому

    I really like the series, and tr to watch half a a episode per day. This lecture was for me(not a native English Speaker) quit hard to follow. I don`t want to offend, i just want to give feedback. I think it was gerat that you gave it a shot, keep at it and you will became better :)

  • @TheAero
    @TheAero 7 місяців тому +1

    These lectures as interesting, however it feels like they are way too high-level.

  • @mortenkallese4024
    @mortenkallese4024 3 роки тому +2

    I hardly think the length of this video is a coincidence?!?!

  • @ben6
    @ben6 3 роки тому +5

    I don't get why literally everywhere in the research (CV) community quote FPS without hardware. Even children know that FPS is hardware dependent, because they play games and the same game will have different FPS on the same graphics settings, sometimes even on the same machine based on cooling. Changing the performance of the hardware will drastically change the 'FPS'. From 1 to 1000. This number is totally meaningless without indicating the hardware.
    I guess its my job to guess which card you used and how many? In 2018 I would assume 1 or 2 NVIDIA GTX 1080TIs.

    • @ArshedNabeel
      @ArshedNabeel 3 роки тому

      Ben B You raise a very valid point, this is one of my pet peeves about CV literature too!
      Numbers like FPS or even running time are too dependent on the underlying hardware to be meaningful without context. A more meaningful measure would perhaps be ‘#computations per forward pass’ or something similar.
      In this particular case, the 5fps claim comes directly from the Faster RCNN paper, which came out in 2015. The exact details of the hardware are not mentioned in the paper (or at least I couldn’t find it). I assume it will be quite a bit faster on contemporary GPUs.

  • @ben6
    @ben6 3 роки тому

    Wow! We blink to reduce activity in the brain. I'm going to close my eyes when I think about things now. :)

  • @seleldjdfmn221
    @seleldjdfmn221 3 роки тому

    Great vid. Keep growing! also, Let's Build Each other up :d