Machine Learning Lecture 35 "Neural Networks / Deep Learning" -Cornell CS4780 SP17

Поділитися
Вставка
  • Опубліковано 4 чер 2024

КОМЕНТАРІ • 59

  • @assafna
    @assafna 3 роки тому +18

    This is one of the best lectures you gave in this series, super clear, very helpful and even enjoyable, thanks to this wonderful demo. Seriously, well done!

  • @janmichaelaustria620
    @janmichaelaustria620 4 роки тому +13

    Good god! I've watched the whole series and this by far was the best one! Thank you Prof Weinberger for making these available!

  • @vatsan16
    @vatsan16 3 роки тому +6

    I had goosebumps at 23:03 when he gives the other perspective on neural networks. HOW DID I NOT THINK OF THAT?

  • @SureshKumar-yc5lc
    @SureshKumar-yc5lc 3 роки тому +6

    Best Teacher ever in my opinion!!!

  • @ugurkap
    @ugurkap 4 роки тому +21

    This was one of the best explanations I have heard on neural networks and really cool demo at the end. I suspect you could have chosen a better picture for yourself though.

  • @pranavsawant1439
    @pranavsawant1439 4 роки тому +7

    You are the best professor I have ever come across. Thanks a lot! This world needs more people like you!

  • @minhtamnguyen4842
    @minhtamnguyen4842 4 роки тому +8

    I can not thank you enough professor. This is extremely helpful to me. I idolize you

  • @tostupidforname
    @tostupidforname 4 роки тому +7

    Everything i have seen from this lecture is absolutely fantastic! Thank you very much for uploading this. Your enthusiasm while teaching makes it so fun to learn and im very glad that i don't completely rely on my professors lectures now.

  • @tarunluthrabk
    @tarunluthrabk 3 роки тому +2

    Did n't hear so beautiful explanation of deep learning before, A Prof Killian is worth thousand blogs!!

  • @yuehu7037
    @yuehu7037 3 роки тому +5

    I have taken three graduate courses in ML and data Analytics, this one still inspire me a lot!

  • @CKPSchoolOfPhysics
    @CKPSchoolOfPhysics 2 роки тому +1

    After Gilbert Strang 18.06 sets the intuition right for vector space, you have been doing the exactly same with ML algos. We are more than privileged to go through your lectures. This is great service to mankind. Namaste from India !

  • @TheSetegn
    @TheSetegn 5 місяців тому

    He is undeniably one of the best instructors. He effortlessly simplifies complex topics, presenting them in an engaging and entertaining manner. His ability to use humor not only makes the learning experience enjoyable but also serves as a powerful teaching tool for machine learning concepts. Additionally, I was impressed by his humble and excellent personality, which greatly enhances the overall learning environment. His passion for the subject is palpable, and it genuinely enriches the course

  • @smallstone626
    @smallstone626 4 роки тому +5

    One of the best machine learning lecture. Thank you very much, professor.

  • @j.adrianriosa.4163
    @j.adrianriosa.4163 3 роки тому +2

    -So... do I study OR do I have some fun? - Was I asking myself one hour ago when thinking about watching The Expanse.
    Watched this lecture. An OR became an AND. I learned AND I laughed a lot.
    You are so good it even defines logic. Du bist der Beste. Vielen Dank!

  • @Shkencetari
    @Shkencetari 5 років тому +23

    This was very helpful. Your understanding and the way you teach are amazing. Thank you very much!!!

  • @michaelmellinger2324
    @michaelmellinger2324 2 роки тому +2

    4:35 Why we use ReLU instead of sigmoid
    6:00 Sigmoid is flat at ends and has no gradient
    9:45 Deep learning scales linearly with more data. Kernels are quadratic
    15:00 Discussion begins
    17:00 What’s going on inside a neural network
    23:50 Piecewise linear approximation
    28:00 With a neural network you can approximate any smooth function arbitrarily close
    29:45 Discuss layers
    33:00 There is no function that you can learn with a deep network that you can’t also learn with a shallow network with just one layer
    35:30 The benefit of multiple layers
    37:00 The exponential effect of multiple layers
    38:00 A few small matrices have the same expressive power
    40:00 Why do deep networks not overfit…SGD
    41:10 Demo
    45:30 Second demo. Hot or not facial dataset

  • @benoyeremita1359
    @benoyeremita1359 Рік тому

    Wow, the way you explained the concept of layers, and the demo at the end. What a JOY it must be to be present physically in your class.
    😍🤩

  • @chaowang3093
    @chaowang3093 3 роки тому +2

    Really great analogy for the deep neural network.

  • @rohit2761
    @rohit2761 2 роки тому

    Watched the entire semester course in 22 days. Got every single explanation. Super Clear, Extremely awesome

  • @raider7204
    @raider7204 Рік тому

    This is the best lecture on deep learning I have ever seen.

  • @LennyDKs
    @LennyDKs 2 роки тому

    Simply amazing lecture! Loved every bit of it

  • @amarshahchowdry9727
    @amarshahchowdry9727 3 роки тому +2

    Thank you so much Kilian for this beautiful series of lectures on Machine Learning.
    I had a doubt regarding the intuition that you had given for the neural networks. From what I have understood, when we train a NN, we learn a mapping from a vector space where our data is highly non-linear to a vector space where are data is much less 'complexly' arranged (linearly separable for classification and linear for regression). Subsequently, our learned weights represent the nonlinear approximation of the hyperplane in the original vector space, that was required in the first place, to accomplish whatever task we have to do.
    So, I just wanted to confirm that both these intuitions go hand in hand right??? As in the weights learned give us a mapping and then can also be seen as representing a non linear hyperplane in our original vector space, as shown by you. Furthermore, for models like a decoder in an Autoencoder, is there an intuition of a hyperplane, cause the mapping intuition seems the only one that is right for such a case as we map our latent representation into a vector space with greater number dimensions, but the idea of a hyperplane being fit doesn't seem appropriate. Is it that, we consider every pixel of output as an individual classification or regression (depending on what loss we choose)??
    It will be a grt help if you can help me with this.

  • @jiviteshsharma1021
    @jiviteshsharma1021 4 роки тому +2

    This was an insanely good explanation.

  • @khonghengo
    @khonghengo 3 роки тому +2

    I bookmarked your course link www.cs.cornell.edu/courses/cs4780/2018fa/lectures/ and did not see any video, I was worried that I could not watch your lecture. And here it is, thank you very much. I don't have enough words to say how much I appreciate your teaching. 3 lectures left to go...

  • @mehmetaliozer2403
    @mehmetaliozer2403 Рік тому

    after all I still come back to here and try to comprehend again what is going on, best lecture ever 🤖

  • @Biesterable
    @Biesterable 5 років тому +5

    amazing!

  • @aramnasser268
    @aramnasser268 3 роки тому

    Fun to learn :)
    Amazing lecture as usual....
    Thanks a lot

  • @yuvaaditya
    @yuvaaditya 4 роки тому +3

    best demo in the world

  • @trantandat2699
    @trantandat2699 4 роки тому +2

    so amazing!!!

  • @user-kc1xf6hq1b
    @user-kc1xf6hq1b 4 місяці тому

    legendary!

  • @vladimirsheven9826
    @vladimirsheven9826 4 роки тому +4

    They are called Matryoshka ))

  • @Cyberpsycow000
    @Cyberpsycow000 2 роки тому

    Hi prof Weinberger,
    I love these lectures so much and I'm so glad I learned my first machine learning class from you, thanks a lot!! One more thing, I'm so interested in the projects you mentioned in class, I found the homeworks posted online, but I just can't find the information of projects. Could you please post these online so I have a chance to try stuffs I learned from the classes?

  • @user-qh6pn6hq3d
    @user-qh6pn6hq3d 2 роки тому

    Thank you professor !

  • @aletivardhan6679
    @aletivardhan6679 3 роки тому +1

    Hello Prof Weinberger,
    I'm unable to understand how the regression model(that can fit nonlinear data using multiple lines), can be extended to a classification problem. Using a combination of multiple lines with relu activations would essentially approximate a non-linear function(please correct me if I'm wrong). For a classification problem with 2 classes, can this nonlinear function(a combination of multiple lines with activations), itself be treated as the decision boundary? If not, how exactly is nonlinearity achieved in a decision boundary?
    I'm sorry if I'm mixing two different concepts up.
    Thank you.

    • @kilianweinberger698
      @kilianweinberger698  3 роки тому +1

      If you have a classification problem, you typically let the neural network have one output per class (which you normalize with a soft-max function). Each output predicts the probability that this input has this particular class.
      Actually, this is similar to multi-class logistic regression. If you have k classes, then for logistic regression people just train k logistic regression classifiers, each one deciding if it is a particular class or not. Here, also the outputs are normalized with a softmax function. Essentially you use that classifier as the last layer of the neural network. Hope this helps ...

    • @aletivardhan6679
      @aletivardhan6679 3 роки тому

      @@kilianweinberger698 Thanks a lot professor!

  • @MrWincenzo
    @MrWincenzo 3 роки тому

    Hello Prof Weinberger,
    your insight on piecewise approximation really shed light on my understanding of NN, thanks for sharing your amazing lessons.
    Actually i keep wondering what happens instead with DNNs and how that piecewise approximation propagate along the subsequent layers.
    Is it reasonable to say that in next layers each piece-fucntion of the previous layer is approximated with local piecewise approximations to get closer and closer to the real funcion's curvature?

    • @kilianweinberger698
      @kilianweinberger698  3 роки тому +1

      In the subsequent layers you are again building piece wise linear functions - however out of the piece wise linear functions from the earlier layers. In fact, the number of “kinks” (i.e. nonlinearities) in your piece wise linear function grows exponentially with depth, which is why deep neural networks are so powerful (and why you would need exponentially many hidden nodes with a shallow one-hidden layer network).
      Imagine the first layer has 10 nodes, the second layer 20 nodes, the third layer is the output (one node).Each node in the second layer is then a piece-wise linear function consisting of 10 non-linearities. The final function is a linear combination of 20 functions, each consisting of 10 non-linearities, so you end up with 200 linear parts.
      Hope this helps ...

    • @MrWincenzo
      @MrWincenzo 3 роки тому

      @@kilianweinberger698 thank you prof for your kind reply.
      I got it, again a great explanation. Simply the 2nd layer's input is no more linear. The 2nd functions are evaluated on values from piece-wise curve instead that from a line.

  • @hussain5755
    @hussain5755 4 роки тому

    11:00 what are the kernel you are talking about?

    • @ugurkap
      @ugurkap 4 роки тому

      Basically, functions that map input vectors into another space. Then, you use this new mapped version as your input for the linear classifier. Linear classifier thinks it is in a higher dimension and since it is easier to seperate data points if they are far away from each other, and this is the case in higher dimensions, you usually seperate them successfully. There is more information in the previous videos and lecture notes. So, it is not GPU functions if that is what you were asking.

  • @wise_math
    @wise_math Рік тому

    Which of your previous Lecture(s) are important requirement(s) to understand this one?

  • @wugriffin7272
    @wugriffin7272 5 років тому +14

    And they told me Germans can't make jokes.

    • @Saganist420
      @Saganist420 4 роки тому +5

      Kilian Weinberger single handedly raising the average german comedy coefficient by 1 standard deviation.

  • @silent_traveller7
    @silent_traveller7 3 роки тому +1

    ohhhh my god. This was funny!!

  • @utkarshtrehan9128
    @utkarshtrehan9128 3 роки тому +1

    🤯 35:00

  • @gauravsinghtanwar4415
    @gauravsinghtanwar4415 3 роки тому +1

    Damn, I want to give 100 thumbs up for the demo but youtube allows me to give just one -:(
    Amazing stuff

  • @shrishtrivedi2652
    @shrishtrivedi2652 2 роки тому

    2:42 start

  • @giraffaelll
    @giraffaelll 3 роки тому

    Why do they have a piano in their class?

  • @jiahao2709
    @jiahao2709 4 роки тому +1

    kernel matrix quatratic, DL linear

  • @subhasdh2446
    @subhasdh2446 Рік тому

    I mean why don't you try stand-up. Or have you tried it before?

  • @itachi4alltime
    @itachi4alltime 2 роки тому

    Anyone else accidentally raise their hands?! Just me, cool

  • @kc1299
    @kc1299 3 роки тому

    禁止套娃

  • @gregmakov2680
    @gregmakov2680 Рік тому

    hahah, cai thang bia ra cai ly thuyet 1 lop chang khac gi triet ly mac-le :D:D du do luong gat biet bao nhieu the he :D:D:D:D

  • @arasuper1935
    @arasuper1935 Рік тому

    Younkniwnwhy
    Ears

  • @gregmakov2680
    @gregmakov2680 Рік тому

    hahaha, lazy learning is good :D:DD:D lazy is a good characteristics, not active :D:D:D active is so bad :D the new era, the new characteristics, new elites :D:D:D