DeepMind x UCL | Deep Learning Lectures | 3/12 | Convolutional Neural Networks for Image Recognition

Поділитися
Вставка
  • Опубліковано 30 кві 2024
  • In the past decade, convolutional neural networks have revolutionised computer vision. In this lecture, DeepMind Research Scientist Sander Dieleman takes a closer look at convolutional network architectures through several case studies, ranging from the early 90's to the current state of the art. He also reviews some of the building blocks that are in common use today, discuss the challenges of training deep models, and strategies for finding effective architectures, with a focus on image recognition.
    Download the slides here:
    storage.googleapis.com/deepmi...
    Find out more about how DeepMind increases access to science here:
    deepmind.com/about#access_to_...
    Speaker Bio:
    Sander Dieleman is a Research Scientist at DeepMind in London, UK, where he he has worked on the development of AlphaGo and WaveNet. He was previously a PhD student at Ghent University, where he conducted research on feature learning and deep learning techniques for learning hierarchical representations of musical audio signals. During his PhD he also developed the deep learning library Lasagne and won solo and team gold medals respectively in Kaggle's "Galaxy Zoo" competition and the first National Data Science Bowl. In the summer of 2014, he interned at Spotify in New York, where he worked on implementing audio-based music recommendation using deep learning on an industrial scale.
    About the lecture series:
    The Deep Learning Lecture Series is a collaboration between DeepMind and the UCL Centre for Artificial Intelligence. Over the past decade, Deep Learning has evolved as the leading artificial intelligence paradigm providing us with the ability to learn complex functions from raw data at unprecedented accuracy and scale. Deep Learning has been applied to problems in object recognition, speech recognition, speech synthesis, forecasting, scientific computing, control and many more. The resulting applications are touching all of our lives in areas such as healthcare and medical research, human-computer interaction, communication, transport, conservation, manufacturing and many other fields of human endeavour. In recognition of this huge impact, the 2019 Turing Award, the highest honour in computing, was awarded to pioneers of Deep Learning.
    In this lecture series, research scientists from leading AI research lab, DeepMind, deliver 12 lectures on an exciting selection of topics in Deep Learning, ranging from the fundamentals of training neural networks via advanced ideas around memory, attention, and generative modelling to the important topic of responsible innovation.
  • Наука та технологія

КОМЕНТАРІ • 30

  • @leixun
    @leixun 3 роки тому +51

    *DeepMind x UCL | Deep Learning Lectures | 3/12 | Convolutional Neural Networks for Image Recognition*
    *My takeaways:*
    *1. Plan for this lecture **0:10*
    *2. Background **1:30*
    2.1 How can we feed images to a neural network? 2:55
    2.2 Feedforward network can't learn images well, because it learns from the spatial correlation pixels, not features in images 4:50
    2.3 Locality and translation invariance in inputs: images, sounds, text, molecules, etc. 7:12
    2.4 How to take advantage of topological structure in inputs from 2.3: weight sharing and hierarchy 10:48
    2.5 History: data-driven research, ImageNet challenge 12:17:
    -2012: AlexNet
    -2013: Improvement on AlexNet
    -2014: New architecture breakthroughs like VGG and GoogLeNet are fundamentally different from AlexNet
    -2015: New architecture breakthrough: ResNet
    -2015+: Saturated performance. Combine the predication of lots of models (assemble network), new building blocks, etc. have been tried
    *3. Building blocks **19:46*
    3.1 From fully connected to locally connected 19:53
    3.2 Receptive field, feature map, kernel/filter, channel 24:50
    3.3 Valid convolution: output size = input size - kernel size +1 28:05
    3.4 Full convolution: output size = input size + kernel size -1 28:50
    3.5 Same convolution (desirable to reduce computation and create feature hierarchies): output size = input size 29:35
    3.6 Strided convolution: kernel slides along with the image with a step >1 31:14
    3.7 Dilated convolution: kernel is spread out, step >1 between kernel elements 32:27
    3.9 Depth-wise convolution 34:00
    3.10 Pooling: compute mean or max over small windows to reduce resolution 34:48
    *4. Convolutional neural networks (CNN) **35:32*
    4.1 Computational graphs: recap from lecture 2 36:20
    *5. Going deeper: case studies **38:10*
    5.1 LeNet-5 (1998) for handwritten digits 38:10
    5.2 AlexNet (2012) for ImageNet dataset 39:35
    -Very few connections between groups to reduce GPU communication
    -It is very difficult to train deep neural networks (e.g. 4-5 layers) with saturated activation function such as Sigmond and Tanh. AlexNet uses ReLU.
    -Regularization: dropout, weight decay
    5.3 What limits the number of layers in CNN: computation complex, optimization difficulties 46:44
    5.4 VGG (2014) 48:03
    -Stack many convolution layers before pooling
    -Use the same convolution to avoid resolution reduction
    -Stack 3x3 kernels to have the same receptive filed as 5x5 kernels, but contain fewer parameters
    -Use data parallelism during training vs AlexNet uses model parallelism
    -Error plateaus after 16 layers (VGG has a few versions: VGG-11, VGG-13, VGG-16, VGG-19), this is due to optimization difficulties 51:18
    5.5 Challenges of depth: computation complex, optimization difficulties 52:07
    5.6 Techniques for improving optimization 52:45
    -Careful initialization
    -Sophisticated optimizer: e.g. variants of gradient descent
    -Normalization layers
    -Network design: e.g. ResNet
    5.7 GoogLeNet/Inception (2014) 55:27
    -Inception block: multiple convolutions with different sizes in parallel with each other, or with pooling
    -Inception v2 introduces Batch Normalization (BN): reduces the sensitivity to initialization, makes the network more robust to the large learning rate. Moreover, because BN is applied on a back of data, it introduces stochasticity/noises to the network and acts as a regularizer. The downside of BN introduces a dependency between different images in the batch at test time, and this could be a source of bugs 56:39
    5.8 ResNet (2015) 1:00:08
    -Residual connections help training deeper networks: e.g. 152 layers
    -Bottleneck block: reduce the number of parameters 1:02:00
    -ResNet v2 avoid all nonlinearities in the residual pathway: help training even deeper networks, e.g. 1000 layers 1:03:05
    -Bottlenet block makes the ResNet cheap to compute 1:03:55
    5.9 DenseNet (2016) 1:05:03
    -Connect layers to all previous layers
    5.10 Squeeze-and-Excitation Networks (SENet, 2017) 1:05:36
    -Incorporate global context
    5.11 AmoebaNet (2018) 1:06:33
    -Neural architecture search
    -Search acyclic graphs composed of predefined layers
    5.12 Other techniques for reducing complexity 1:07:50
    -Depthwise convolution
    -Separable convolution
    -Inverted bottleneck (MobileNet v2, MNasNet, EfficientNet)
    *6. Advanced topics **1:09:40*
    6.1 Data augmentation 1:09:58
    6.2 Visualizing CNN 1:11:44
    6.3 Other topics to explore (not in this lecture): pre-training and fine-tuning; group equivariant CNN; recurrence and attention 1:14:36
    *7. Beyond image recognition **1:16:32*
    7.1 What else can we do with CNN 1:16:45
    -Object detection; semantic segmentation; Instance segmentation
    -Generative adversarial networks (GANs), autoencoders, Autoregressive models
    -Representation learning, self-supervised learning
    -Video, audio, text, graphs
    -Reinforcement learning agents
    *Ending **1:18:55*

    • @Amy_Yu2023
      @Amy_Yu2023 3 роки тому +2

      Thanks for shaing. Very helpful.

    • @simondemeule3934
      @simondemeule3934 3 роки тому +1

      Thank you for taking the time to type this out

    • @leixun
      @leixun 3 роки тому +2

      @@simondemeule3934 You are welcome!

    • @bazsnell3178
      @bazsnell3178 3 роки тому

      You didn't need to do all this, but could have downloaded the slides when you see the link when you click on the SHOW MORE text at the top of the page. ll of the lectures have the accompanying slides.

  • @ans1975
    @ans1975 3 роки тому +2

    It's impossible not to love this guy:
    Beside clarity, kindness shines through these words!

  • @esaprakasa
    @esaprakasa 3 роки тому +2

    Clear and detail presentation!

  • @lukn4100
    @lukn4100 3 роки тому +2

    Great lecture and big thanks to DeepMind for sharing this great content.

  • @ninadesianti9587
    @ninadesianti9587 3 роки тому +1

    Thank you for sharing the lecture. Great presentation!

  • @djeurissen111
    @djeurissen111 3 роки тому

    Increasing the depth in combination with skip-connections feels like flattening an iteration.
    In the previous lecture it was already mentioned that Neural Networks cannot do multiplication, which is essentially repeatably applying addition with respect to an certain amount of repetitions.
    What if we make the ConvNets recursive, take image => compute some features and a probability if we are done => take image and features => compute features => repeat until done...

  • @shahrozkhan4683
    @shahrozkhan4683 3 роки тому

    Great presentation!

  • @lizgichora6472
    @lizgichora6472 3 роки тому

    Very interesting, Architectural Structure and infrastructure, thank you very much.

  • @_rkk
    @_rkk 2 роки тому

    excellent lecture . Thankyou Sander

  • @bartekbinda1114
    @bartekbinda1114 2 роки тому +1

    Thank you for this great informative lecture, really helped me to understand the topic better and prepare for my bachelor thesis :)

    • @DanA-st2ed
      @DanA-st2ed 2 роки тому

      im doing CAD of TB using CXR for my bachelors also, good luck!

  • @1chimaruGin0_0
    @1chimaruGin0_0 3 роки тому

    Is there any lecture offer by this man?

  • @basheeralwaely9658
    @basheeralwaely9658 3 роки тому

    Nice!

  • @franciswebb7522
    @franciswebb7522 3 роки тому +4

    Really enjoyed this. Can anyone explain how we achieve 96 channels at 43:58? Would this layer contain 32 different 11x11 kernels that are fed in each input channel (RGB)?

    • @dkorzhenkov
      @dkorzhenkov 3 роки тому +1

      No, this layer consists of 96 filters with a kernel size of 11x11. Each of these filters takes all three channels (RGB) as input and returns a single channel. Therefore, the total number of learnable parameters for this layer equals 3*96*11*11 = input_dim*output_dim*kernel_size*kernel_size

    • @pervezbhan1708
      @pervezbhan1708 2 роки тому

      ua-cam.com/video/r_Q12UIfMlE/v-deo.html

  • @Iamine1981
    @Iamine1981 3 роки тому

    ImageNet was a Classification problem involving 1000 distinct classes. Why did Research stop at 1000 classes only, and not expand the set of classes to way more than that? Was K = 1000 chosen as an optimal number based on the availability of data, or were there other concerns? Thank you. Amine

    • @paulcurry8383
      @paulcurry8383 2 роки тому

      AFIAK 1000 classes is an arbitrary choice so most likely availability of data, and 1000 being a sufficiently large number of classes to make the dataset “challenging”

  • @luisg.camara2525
    @luisg.camara2525 3 роки тому +5

    Really good and engaging lecture. As a note, and being a misophonia sufferer, I had trouble with the constant mouth smacking sounds.

    • @whatever_mate
      @whatever_mate Рік тому

      Came in the comments just to see if I was alone in this :(

  • @jaredbeckwith
    @jaredbeckwith 3 роки тому

    Just finished a Cats vs Dogs CNN

  • @user-ls7mr4rq5x
    @user-ls7mr4rq5x 3 роки тому

    Repeat and repeat again we can rule the Canada !

  • @kyalvidigi1398
    @kyalvidigi1398 3 роки тому

    Second

  • @mawkuri5496
    @mawkuri5496 3 роки тому +1

    the thumbnail looks like a girl with a beard