DeepMind x UCL | Deep Learning Lectures | 3/12 | Convolutional Neural Networks for Image Recognition

Поділитися
Вставка
  • Опубліковано 21 лис 2024

КОМЕНТАРІ • 30

  • @leixun
    @leixun 4 роки тому +51

    *DeepMind x UCL | Deep Learning Lectures | 3/12 | Convolutional Neural Networks for Image Recognition*
    *My takeaways:*
    *1. Plan for this lecture **0:10*
    *2. Background **1:30*
    2.1 How can we feed images to a neural network? 2:55
    2.2 Feedforward network can't learn images well, because it learns from the spatial correlation pixels, not features in images 4:50
    2.3 Locality and translation invariance in inputs: images, sounds, text, molecules, etc. 7:12
    2.4 How to take advantage of topological structure in inputs from 2.3: weight sharing and hierarchy 10:48
    2.5 History: data-driven research, ImageNet challenge 12:17:
    -2012: AlexNet
    -2013: Improvement on AlexNet
    -2014: New architecture breakthroughs like VGG and GoogLeNet are fundamentally different from AlexNet
    -2015: New architecture breakthrough: ResNet
    -2015+: Saturated performance. Combine the predication of lots of models (assemble network), new building blocks, etc. have been tried
    *3. Building blocks **19:46*
    3.1 From fully connected to locally connected 19:53
    3.2 Receptive field, feature map, kernel/filter, channel 24:50
    3.3 Valid convolution: output size = input size - kernel size +1 28:05
    3.4 Full convolution: output size = input size + kernel size -1 28:50
    3.5 Same convolution (desirable to reduce computation and create feature hierarchies): output size = input size 29:35
    3.6 Strided convolution: kernel slides along with the image with a step >1 31:14
    3.7 Dilated convolution: kernel is spread out, step >1 between kernel elements 32:27
    3.9 Depth-wise convolution 34:00
    3.10 Pooling: compute mean or max over small windows to reduce resolution 34:48
    *4. Convolutional neural networks (CNN) **35:32*
    4.1 Computational graphs: recap from lecture 2 36:20
    *5. Going deeper: case studies **38:10*
    5.1 LeNet-5 (1998) for handwritten digits 38:10
    5.2 AlexNet (2012) for ImageNet dataset 39:35
    -Very few connections between groups to reduce GPU communication
    -It is very difficult to train deep neural networks (e.g. 4-5 layers) with saturated activation function such as Sigmond and Tanh. AlexNet uses ReLU.
    -Regularization: dropout, weight decay
    5.3 What limits the number of layers in CNN: computation complex, optimization difficulties 46:44
    5.4 VGG (2014) 48:03
    -Stack many convolution layers before pooling
    -Use the same convolution to avoid resolution reduction
    -Stack 3x3 kernels to have the same receptive filed as 5x5 kernels, but contain fewer parameters
    -Use data parallelism during training vs AlexNet uses model parallelism
    -Error plateaus after 16 layers (VGG has a few versions: VGG-11, VGG-13, VGG-16, VGG-19), this is due to optimization difficulties 51:18
    5.5 Challenges of depth: computation complex, optimization difficulties 52:07
    5.6 Techniques for improving optimization 52:45
    -Careful initialization
    -Sophisticated optimizer: e.g. variants of gradient descent
    -Normalization layers
    -Network design: e.g. ResNet
    5.7 GoogLeNet/Inception (2014) 55:27
    -Inception block: multiple convolutions with different sizes in parallel with each other, or with pooling
    -Inception v2 introduces Batch Normalization (BN): reduces the sensitivity to initialization, makes the network more robust to the large learning rate. Moreover, because BN is applied on a back of data, it introduces stochasticity/noises to the network and acts as a regularizer. The downside of BN introduces a dependency between different images in the batch at test time, and this could be a source of bugs 56:39
    5.8 ResNet (2015) 1:00:08
    -Residual connections help training deeper networks: e.g. 152 layers
    -Bottleneck block: reduce the number of parameters 1:02:00
    -ResNet v2 avoid all nonlinearities in the residual pathway: help training even deeper networks, e.g. 1000 layers 1:03:05
    -Bottlenet block makes the ResNet cheap to compute 1:03:55
    5.9 DenseNet (2016) 1:05:03
    -Connect layers to all previous layers
    5.10 Squeeze-and-Excitation Networks (SENet, 2017) 1:05:36
    -Incorporate global context
    5.11 AmoebaNet (2018) 1:06:33
    -Neural architecture search
    -Search acyclic graphs composed of predefined layers
    5.12 Other techniques for reducing complexity 1:07:50
    -Depthwise convolution
    -Separable convolution
    -Inverted bottleneck (MobileNet v2, MNasNet, EfficientNet)
    *6. Advanced topics **1:09:40*
    6.1 Data augmentation 1:09:58
    6.2 Visualizing CNN 1:11:44
    6.3 Other topics to explore (not in this lecture): pre-training and fine-tuning; group equivariant CNN; recurrence and attention 1:14:36
    *7. Beyond image recognition **1:16:32*
    7.1 What else can we do with CNN 1:16:45
    -Object detection; semantic segmentation; Instance segmentation
    -Generative adversarial networks (GANs), autoencoders, Autoregressive models
    -Representation learning, self-supervised learning
    -Video, audio, text, graphs
    -Reinforcement learning agents
    *Ending **1:18:55*

    • @Amy_Yu2023
      @Amy_Yu2023 4 роки тому +2

      Thanks for shaing. Very helpful.

    • @simondemeule3934
      @simondemeule3934 3 роки тому +1

      Thank you for taking the time to type this out

    • @leixun
      @leixun 3 роки тому +2

      @@simondemeule3934 You are welcome!

    • @bazsnell3178
      @bazsnell3178 3 роки тому

      You didn't need to do all this, but could have downloaded the slides when you see the link when you click on the SHOW MORE text at the top of the page. ll of the lectures have the accompanying slides.

  • @ans1975
    @ans1975 3 роки тому +2

    It's impossible not to love this guy:
    Beside clarity, kindness shines through these words!

  • @djeurissen111
    @djeurissen111 4 роки тому

    Increasing the depth in combination with skip-connections feels like flattening an iteration.
    In the previous lecture it was already mentioned that Neural Networks cannot do multiplication, which is essentially repeatably applying addition with respect to an certain amount of repetitions.
    What if we make the ConvNets recursive, take image => compute some features and a probability if we are done => take image and features => compute features => repeat until done...

  • @lukn4100
    @lukn4100 3 роки тому +2

    Great lecture and big thanks to DeepMind for sharing this great content.

  • @bartekbinda1114
    @bartekbinda1114 3 роки тому +1

    Thank you for this great informative lecture, really helped me to understand the topic better and prepare for my bachelor thesis :)

    • @DanA-st2ed
      @DanA-st2ed 3 роки тому

      im doing CAD of TB using CXR for my bachelors also, good luck!

  • @ninadesianti9587
    @ninadesianti9587 4 роки тому +1

    Thank you for sharing the lecture. Great presentation!

  • @franciswebb7522
    @franciswebb7522 4 роки тому +4

    Really enjoyed this. Can anyone explain how we achieve 96 channels at 43:58? Would this layer contain 32 different 11x11 kernels that are fed in each input channel (RGB)?

    • @dkorzhenkov
      @dkorzhenkov 4 роки тому +1

      No, this layer consists of 96 filters with a kernel size of 11x11. Each of these filters takes all three channels (RGB) as input and returns a single channel. Therefore, the total number of learnable parameters for this layer equals 3*96*11*11 = input_dim*output_dim*kernel_size*kernel_size

    • @pervezbhan1708
      @pervezbhan1708 2 роки тому

      ua-cam.com/video/r_Q12UIfMlE/v-deo.html

  • @_rkk
    @_rkk 3 роки тому

    excellent lecture . Thankyou Sander

  • @lizgichora6472
    @lizgichora6472 3 роки тому

    Very interesting, Architectural Structure and infrastructure, thank you very much.

  • @esaprakasa
    @esaprakasa 4 роки тому +2

    Clear and detail presentation!

  • @luisg.camara2525
    @luisg.camara2525 4 роки тому +5

    Really good and engaging lecture. As a note, and being a misophonia sufferer, I had trouble with the constant mouth smacking sounds.

    • @whatever_mate
      @whatever_mate 2 роки тому

      Came in the comments just to see if I was alone in this :(

  • @1chimaruGin0_0
    @1chimaruGin0_0 4 роки тому

    Is there any lecture offer by this man?

  • @shahrozkhan4683
    @shahrozkhan4683 4 роки тому

    Great presentation!

  • @Iamine1981
    @Iamine1981 3 роки тому

    ImageNet was a Classification problem involving 1000 distinct classes. Why did Research stop at 1000 classes only, and not expand the set of classes to way more than that? Was K = 1000 chosen as an optimal number based on the availability of data, or were there other concerns? Thank you. Amine

    • @paulcurry8383
      @paulcurry8383 3 роки тому

      AFIAK 1000 classes is an arbitrary choice so most likely availability of data, and 1000 being a sufficiently large number of classes to make the dataset “challenging”

  • @jaredbeckwith
    @jaredbeckwith 4 роки тому

    Just finished a Cats vs Dogs CNN

  • @郭绍彤
    @郭绍彤 3 роки тому

    Repeat and repeat again we can rule the Canada !

  • @basheeralwaely9658
    @basheeralwaely9658 4 роки тому

    Nice!

  • @kyalvidigi1398
    @kyalvidigi1398 4 роки тому

    Second

  • @mawkuri5496
    @mawkuri5496 3 роки тому +1

    the thumbnail looks like a girl with a beard