DeepMind x UCL | Deep Learning Lectures | 2/12 | Neural Networks Foundations

Поділитися
Вставка
  • Опубліковано 21 лис 2024

КОМЕНТАРІ • 89

  • @leixun
    @leixun 4 роки тому +74

    *DeepMind x UCL | Deep Learning Lectures | 2/12 | Neural Networks Foundations*
    *My takeaways:*
    *1. Plan for this lecture **0:40*
    *2. What is not covered in this lecture **1:59*
    *3. Overview*
    3.1 Neural network applications 3:59
    3.1 What started the deep learning revolution 5:10
    *4. Neural networks **9:17*
    *5. Single-layer neural networks **17:14*
    5.1 Activation function: sigmoid 17:53
    5.2 Loss function: cross-entropy for binary classification 19:50
    5.3 Final activation function for multi-class classification: softmax 23:15
    5.4 Uses 26:01
    5.5 Limitations 27:28
    *6. Two-layer neural networks **28:15*
    *7. Tensorflow playground **32:34*
    *8. Universal Approximation Theorem **33:55*
    *9. Deep neural networks **40:29*
    9.1 Activation function: ReLU 41:05
    9.2 Intuition behind network depth 44:20
    9.3 Computational graphs 49:00
    *10. Learning/training **52:27*
    10.1 Optimizer: Gradient descent 53:24
    10.2 Optimizers that are built on gradient descent: Adam, RMSProp 54:39
    10.3 Computational graphs for training 55:45
    10.4 Backpropagation, chain rule 57:15
    10.5 Linear layers as computational graph 1:00:48
    10.6 ReLU layers as computational graph 1:02:30
    10.7 Softmax as computational graph 1:03:09
    10.8 Cross-entropy as computational graph 1:04:04
    10.9 "Cross-entropy Jungles" 1:06:00
    10.10 Computational graph example: 3-layer MLP with ReLU 1:06:58
    *11. Pieces of the puzzle: max, conditional execution **1:08:20*
    *12. Practical issues **1:10:44*
    12.1 Overfitting and regularization 1:10:51
    12.2 Lp regularization 1:12:44
    12.3 Dropout 1:13:14
    12.4 As models grow, their learning dynamics changes: double descent 1:13:32
    12.5 Diagnosing and debugging 1:16:15
    *13. Bouns: Multiplicative interactions **1:19:48*

  • @intuitivej9327
    @intuitivej9327 3 роки тому +4

    Fantastic~!! 숨을 죽이며 되돌려 보기를 몇 번.. 정말 좋은 강의다.. 문과출신 경력단절 10년 아이 둘 엄마인 내가
    인공지능을 알아가는 재미에 빠짐.... 수학은 알수록 참 많은 이야기를 담고 있다....
    math is not about numbers but logic and... storytelling as well.
    Thank you so much from South Korea.

  • @lukn4100
    @lukn4100 3 роки тому +3

    Great lecture and big thanks to DeepMind for sharing this great content. - Wojciu, dalej tak. Super!

  • @Alex-ms1yd
    @Alex-ms1yd 4 роки тому +6

    oh, this are just projections!.. Its such a great intuition, it finally clicked now.. thanks!

    • @cryptorevolution9547
      @cryptorevolution9547 3 роки тому

      We appreciate your compliment..for more guidance WhatsApp.....+::1,,,3,,,,,,1,,,,,,3,,,,,,5,,,,,,3,,,,,,9,,,,,,8,,,,,,.2, ,,,,,7,,,,,,0,,,,,,,,

  • @raghavram6419
    @raghavram6419 4 роки тому +4

    Wow! Great lecture covering the fundamentals. I liked the focus on computational graphs and on reasoning WHY certain components work or don't work .

    • @SudhirPratapYadav
      @SudhirPratapYadav 3 роки тому

      same, usually engineering side of things are missing in deep learning lectures

  • @annercamping
    @annercamping 3 роки тому +10

    my auto play was on and i just woke up to this

  • @bingeltube
    @bingeltube 4 роки тому +15

    Very recommendable! However, the tiny inserts providing further reading recommendations are hard to read. I suggest these recommended readings should be included in the text section underneath the video link.

    • @synaesthesis
      @synaesthesis 4 роки тому +2

      The slides are in the video description; you can copy the titles and authors of the readings from there. :)

    • @bingeltube
      @bingeltube 4 роки тому +1

      @@synaesthesis thank you! Pardon my oversight! :-)

  • @JY-pf7bc
    @JY-pf7bc 4 роки тому +5

    Intuitive, fun, weaved with valuable experience and new research results. Excellent!

  • @nguyenngocly1484
    @nguyenngocly1484 4 роки тому +1

    You can turn artificial neural networks inside-out by using fixed dot products (weighted sums) and adjustable (parametric) activation functions. The fixed dot products can be computed very quickly using fast transforms like the FFT. Also the number of overall parameters required is vastly reduced. The dot products of the transform act as statistical summary measures. Ensuring good behavour. See Fast Transform (fixed filter bank) neural networks.

  • @eduardoriossanchez3393
    @eduardoriossanchez3393 4 роки тому +7

    53:17 I think there is a mistake in the Jacobian's formula, in the left-down corner.

  • @sumanthnandamuri2168
    @sumanthnandamuri2168 4 роки тому +2

    It will be even better if you can release course assignments to public for practice

  • @neurophilosophers994
    @neurophilosophers994 3 роки тому +2

    If a NN can't compute distances without multiplicative dot products how did Alpha Fold calculate the evolution of protein folding states entirely based on distances?

  • @iinarrab19
    @iinarrab19 4 роки тому +4

    Intuitive and explains in detail.

    • @pervezbhan1708
      @pervezbhan1708 2 роки тому

      ua-cam.com/video/r_Q12UIfMlE/v-deo.html

  • @danielpark6010
    @danielpark6010 4 роки тому +2

    20:50
    May I have a brief explanation about what "logarithm of probability of correct/entirely correct classification" in these two slides means?
    What is the significance of it and why is it helpful to negate it?

    • @luisleal4169
      @luisleal4169 3 роки тому +1

      Lot of reasons and statistical considerations, but as intuitive argument(not proof) negating it helps interpreting small values as good and big values as bad, exactly what a loss function in ML is interpreted.

    • @Theoneandonly_Justahandle
      @Theoneandonly_Justahandle Рік тому +1

      You might also look at shannon`s information entropy and associated measures: en.wikipedia.org/wiki/Quantities_of_information . In short, -log(p) is the a measure of the amount of information an event e with prob. p carries. Intuitively, it tells you how often you`d have to divide your space of possibilities in half in order to locate/find the event e with absolute certainty among all other events. (see also ua-cam.com/video/v68zYyaEmEA/v-deo.html&ab_channel=3Blue1Brown for a great explanation)

  • @davidolushola3419
    @davidolushola3419 4 роки тому +3

    Wow it's awesome. Please can I get the slide for this lecture. Thanks 😊

  • @luksdoc
    @luksdoc 4 роки тому +1

    A very nice lecture.

  • @JakobGille2
    @JakobGille2 4 роки тому +6

    Me: being happy about a deep learning lecture.
    Also me: seeing complicated formulas and closing the tab.

    • @speedfastman
      @speedfastman 4 роки тому +6

      Don't let it discourage you!

    • @jingtao1181
      @jingtao1181 4 роки тому +3

      just focus on the concept. There are major APIs that can do the math for you.

    • @yifanyang806
      @yifanyang806 4 роки тому +1

      It’s not that complicated, don’t give up.

  • @xmtiaz
    @xmtiaz 4 роки тому +2

    33:29
    33:29
    "Play is the highest form of research" - Albert Einstein

  • @lizgichora6472
    @lizgichora6472 3 роки тому +1

    Thank you.

  • @jingtao1181
    @jingtao1181 4 роки тому +4

    Thank you for the lecture! In 3e-5, what does e stand for?

    • @AndreyButenko
      @AndreyButenko 4 роки тому +8

      That is scientific notation: 3e-5 is the same as 3 * 10 ^ -5 which is 0.00003 😊

    • @jingtao1181
      @jingtao1181 4 роки тому

      @@AndreyButenko Thank you so much. so this means that setting the learning rate at 0.00003 is really helpful?

    • @bingeltube
      @bingeltube 4 роки тому +2

      e stands for Euler's constant and it refers in this case to the exponential function see e.g. en.wikipedia.org/wiki/E_(mathematical_constant)

    • @jingtao1181
      @jingtao1181 4 роки тому +1

      @@bingeltubeHi, I was wondering whether e is a constant before, but it does not make sense to me either. If e is a constant approximately 2.7, then 3e - 5 = 3.1. However, I think an effective learning rate is somewhat between 0.001-0.1.

    • @jingtao1181
      @jingtao1181 4 роки тому +5

      @@bingeltube I think Andrey's answer makes more sense.

  • @christopherparsonson7119
    @christopherparsonson7119 4 роки тому +20

    Are the slides for this series available?

    • @vmikeyboi323
      @vmikeyboi323 4 роки тому

      this

    • @jingtao1181
      @jingtao1181 4 роки тому

      @@vmikeyboi323 where?

    • @mateusdeassissilva8009
      @mateusdeassissilva8009 4 роки тому

      @@vmikeyboi323 where?

    •  4 роки тому

      @@jingtao1181 When people just comment "this", it usually means something like "I agree". Don't ask my why the word "this", I don't know either.

    • @jingtao1181
      @jingtao1181 4 роки тому

      @ thanks for the reminder

  • @marcospereira6034
    @marcospereira6034 4 роки тому +3

    The explanations are a little too handwavy in this lecture. Wojciech seems to assume some intuitions are obvious when they aren't for someone without a lot of experience in the field.
    For example, when showing how sigmoids can emulate an arbitrary function, he said we can just "average" the sigmoid and the reversed sigmoid to form a bump, without mentioning that this averaging comes from the softmax (or am I wrong and it is something else?).

    • @marcospereira6034
      @marcospereira6034 4 роки тому +1

      Still, I appreciate the effort and think the lecture is great overall. Just need to complement it with other sources.
      Speaking of which, I would recommend Chris Olah's post on understanding neural networks through topology: colah.github.io/posts/2014-03-NN-Manifolds-Topology/
      Thank you guys at deepmind for this course!

    • @marcospereira6034
      @marcospereira6034 4 роки тому

      Also at 1:22:06, it's hard to interpret the graphs - it's not immediately obvious what blue and green represent. Why would one assume the labels are obvious? Graphs should always be labelled 🙂

    • @AeroGDrive
      @AeroGDrive 4 роки тому

      I just think that the averaging with the reverse sigmoid comes from a second neuron, in fact he says there are 6 neurons, and each pair of neurons is the sigmoid + reverse sigmoid, then we have 3 resulting bells (as in the graph) which are then weighted average in the next layer

    • @eyeofhorus1301
      @eyeofhorus1301 4 роки тому +1

      @@marcospereira6034 And if you know nothing like me there's soo soooo much more that he's hand wavy about and expects to grasp incredibly quickly... lol

    • @heyna88
      @heyna88 Рік тому

      Idk. I felt the entire explanation was quite superficial. I’m not sure whether this depends on the audience though

  • @wy2528
    @wy2528 4 роки тому +2

    It is super clear

  • @taimurzahid7877
    @taimurzahid7877 4 роки тому +2

    Can we please get the slides for these series?

    • @Aeradill
      @Aeradill 4 роки тому +1

      They are in the video description Taimur

  • @havelozo
    @havelozo 2 роки тому +1

    What software was used to make these slides?

  • @TheNotoriousPhD
    @TheNotoriousPhD 4 роки тому +4

    Thanks for this Interesting series. Any place I can get related math knowledge ?

    • @123456wei
      @123456wei 4 роки тому

      same, would love to have some links to related math content

    • @SudhirPratapYadav
      @SudhirPratapYadav 3 роки тому

      Go for Engineering Mathematics (rather than pure, note same topics you will find in pure mathematics too, but learn it from engineering perspective, thus select courses/books named 'engineering mathematics')
      Topics -> Linear Algebra, Multivariate Calculus, Optimisation, graph theory, discrete mathematics & Most important in this case Numerical Methods for _________ (fill here whatever you like)

    • @SudhirPratapYadav
      @SudhirPratapYadav 3 роки тому

      I forgot to mention - Finite point airthmatic

  • @jonathan-._.-
    @jonathan-._.- 4 роки тому +3

    :o there is a writing mistake in "neural networks as computational graphs" sotfmax instead of softmax

  • @ThePentanol
    @ThePentanol Рік тому +1

    I really like this explanation. But yet this is not the best. There are a lot of blind spots in this work. This course is not for beginners

  • @mahsaabtahi6633
    @mahsaabtahi6633 4 роки тому

    I have to try hard to hear and understand the lecturer, I wish he spoke more clearly.

  • @aromax504
    @aromax504 4 роки тому +2

    Can someone explain in a simpler way the term '"Numerically Stable"

    • @aromax504
      @aromax504 4 роки тому +1

      @Prasad SeemakurthiThank you Prasad. It was really healpful

    • @marcospereira6034
      @marcospereira6034 4 роки тому

      numerical stability refers to the accumulation of errors. an algorithm that is numerically unstable is sensitive to errors and allows them to accumulate, causing the final result to diverge from the correct value.

    • @SudhirPratapYadav
      @SudhirPratapYadav 3 роки тому

      I need to clarify one more thing -- here 'error' really comes from converting thing from continuous to discrete. Basically you can't implement (actually you can by analytical/symbolic maths but leaving it aside) continuous computations in computers. So what we do --> We discrete (convert to numbers in this case) and thus there are some 'errors' which accumulate and computations don't converge (they usually blow up to infinity or to 0) -> thus even that function is theoretically (in sense continuous space) converging. It will not do so in actual computation on computers --> thus numerically unstable.

  •  4 роки тому

    1:04:30 Hold on… Are you telling me that, given a neural network and a set of weights, I could generate a picture of the most doggish dog ever? :D
    Edit: The answer is "yes"! ua-cam.com/video/shVKhOmT0HE/v-deo.html

  • @s3zine342
    @s3zine342 3 роки тому

    35:57

  • @s3zine342
    @s3zine342 3 роки тому

    9:18

  • @chungweiwang5240
    @chungweiwang5240 3 роки тому

    The tricky triangle lamentably look because comic clasically enjoy near a meek purple. spicy, sordid spot

  • @joshuahinojosa7348
    @joshuahinojosa7348 3 роки тому

    The historical production longitudinally bake because bead practically trade since a wrathful feature. vengeful, dangerous wealth

  • @staceymcsharry2725
    @staceymcsharry2725 3 роки тому

    The wet calendar normally tempt because tomato alternately release inside a obnoxious open. chubby, exultant panty