DeepMind x UCL | Deep Learning Lectures | 5/12 | Optimization for Machine Learning

Поділитися
Вставка
  • Опубліковано 6 тра 2024
  • Optimization methods are the engines underlying neural networks that enable them to learn from data. In this lecture, DeepMind Research Scientist James Martens covers the fundamentals of gradient-based optimization methods, and their application to training neural networks. Major topics include gradient descent, momentum methods, 2nd-order methods, and stochastic methods. James analyzes these methods through the interpretive framework of local 2nd-order approximations.
    Download the slides here:
    storage.googleapis.com/deepmi...
    Find out more about how DeepMind increases access to science here:
    deepmind.com/about#access_to_...
    Speaker Bio:
    James Martens is a Research Scientist at DeepMind working on the fundamentals of deep learning including optimization, initialization, and regularization. Before that he received his BMath from the University of Waterloo, and did his Masters and PhD at University of Toronto, coadvised by Geoff Hinton and Rich Zemel. During his PhD he helped revive interest in deep neural network training by showing how deep networks could be effectively trained using pure optimization methods (which has now become the standard approach).
    About the lecture series:
    The Deep Learning Lecture Series is a collaboration between DeepMind and the UCL Centre for Artificial Intelligence. Over the past decade, Deep Learning has evolved as the leading artificial intelligence paradigm providing us with the ability to learn complex functions from raw data at unprecedented accuracy and scale. Deep Learning has been applied to problems in object recognition, speech recognition, speech synthesis, forecasting, scientific computing, control and many more. The resulting applications are touching all of our lives in areas such as healthcare and medical research, human-computer interaction, communication, transport, conservation, manufacturing and many other fields of human endeavour. In recognition of this huge impact, the 2019 Turing Award, the highest honour in computing, was awarded to pioneers of Deep Learning.
    In this lecture series, research scientists from leading AI research lab, DeepMind, deliver 12 lectures on an exciting selection of topics in Deep Learning, ranging from the fundamentals of training neural networks via advanced ideas around memory, attention, and generative modelling to the important topic of responsible innovation.
  • Наука та технологія

КОМЕНТАРІ • 24

  • @leixun
    @leixun 3 роки тому +35

    *DeepMind x UCL | Deep Learning Lectures | 5/12 | Optimization for Machine Learning*
    *My takeaways:*
    *Quick Conclusions **1:09:23*
    *1. Plan for this lecture **0:18*
    *2. Introduction and motivation **0:42*
    *3. Gradient descent **4:12*
    3.1 Gradient descent intuition 5:15
    3.2 The problem with gradient descent 8:26
    3.3 Convergence theory 10:56
    *4. Momentum methods **17:34*
    -A modification to gradient descent
    4.1Momentum equations 19:47
    4.2 Visualization: gradient descent vs momentum 20:40
    4.3 Convergence theory 23:18
    *5. 2nd-order methods **26:48*
    5.1 The problem with 1st-order methods 27:09
    5.2 Derivation of Newton's method 28:52
    5.3 Visualization: gradient descent vs momentum vs 2nd-order method 30:44
    5.4 Comparison with gradient descent 31:38
    5.5 More on 2nd-order methods
    5.6 Barrier to the application of 2nd-order method for neural networks 43:03
    5.7 Diagonal approximation: used in RMSprop and Adam 44:38
    5.8 Block-diagonal approximation: TONGA 47:25
    5.9 Kronecker-product approximation: K-FAC 49:39
    *6. Stochastic optimization **52:54*
    6.1 Motivation of stochastic methods 53:36
    6.2 Mini-batching 54:34
    6.3 Stochastic gradient descent (SGD) 56:12
    6.4 Convergence theory 59:39
    6.5 Stochastic 2nd-order and momentum methods 1:03:10
    6.6 Experiments example: Adam vs K-FAC+momentum vs momentum 1:06:00
    *7. Conclusions **1:09:23*
    *8. Q&A **1:11:30*
    - mini-batch size and learning rate 1:26:05

  • @jasonlin1316
    @jasonlin1316 3 роки тому +3

    one of the best lectures on ML optimization, explained from first principles when most others don't.

  • @luisbanegassaybe6685
    @luisbanegassaybe6685 3 роки тому

    This clears up a lot of questions I have are often answered by internet searches, thanks for putting up free content

  • @Iamine1981
    @Iamine1981 3 роки тому +1

    A great set of lectures from an incredibly competent team of scientists! Keep them coming please and thank you for the continued education -:)

  • @ChuanChihChou
    @ChuanChihChou 3 роки тому +7

    Best demo of the momentum method ever

  • @maharshidhada1827
    @maharshidhada1827 3 роки тому +4

    Hi, the lectures don't show the pointers on the slides. Because of which, whenever the lecturers say something like "...this term here...", it takes a while to figure out which term they are referring to (: For example, in this video it is very difficult to follow from 13:30 till 13:50

  • @lukn4100
    @lukn4100 3 роки тому

    Great lecture and big thanks to DeepMind for sharing this great content.

  • @bryanbosire
    @bryanbosire 2 роки тому

    Great lecture....I enjoyed after going through Hinton slides on optimization

  • @Dicklesberg
    @Dicklesberg 2 роки тому

    This guy is an absolute legend. Been following his great work since his hessian free method many years ago.

  • @muhammadharris4470
    @muhammadharris4470 3 роки тому +6

    11:00 anyone who wants to learn more about the "curvature" head over here: www.khanacademy.org/math/multivariable-calculus/multivariable-derivatives/curvature/v/curvature-intuition

  • @Kaslor1000
    @Kaslor1000 3 роки тому

    This lecture is amazing.

  • @actuaryquant5480
    @actuaryquant5480 3 роки тому

    Question: can anyone please tell me why at 1:02:50, this term is as good as you can get given the intrinsic uncertainty of data?

  • @FabianRoling
    @FabianRoling 3 роки тому +11

    The volume is pretty low. It's not a problem for me personally, because I can just set my system volume to 200% or more, but that should definitely not be required and not everybody can do that.

    • @user-oy9kf8jh7r
      @user-oy9kf8jh7r 2 роки тому +1

      maybe you can change your speaker

    • @FabianRoling
      @FabianRoling 2 роки тому

      @@user-oy9kf8jh7r That would not change the fact at all that this video is much more quiet than most others.

  • @muhammadharris4470
    @muhammadharris4470 3 роки тому

    isn't h(theta) for hypothesis?

  • @cherilshah1281
    @cherilshah1281 2 роки тому

    I didn't understand a lot of things in this video, can anyone help me out with the pre requisites that would help me understand these concepts better (especially 2nd order methods)

    • @mateusdeassissilva8009
      @mateusdeassissilva8009 Рік тому

      I think you must understand:
      - calculus, in general (pay attention to vector calculus)
      - algebra
      - mathematical optimization

  • @TheAero
    @TheAero 7 місяців тому

    37K Views in 3 years..people don't wanna learn from the best...

  • @turneyminyard4342
    @turneyminyard4342 3 роки тому

    The tenuous break extremely flash because tights beautifully worry outside a fearful fearless lotion. friendly, grubby gruesome pail

  • @davidajaba
    @davidajaba 3 роки тому +4

    I'm here first... 😅 Just gotta put that put there...