Neural Networks Pt. 3: ReLU In Action!!!

Поділитися
Вставка
  • Опубліковано 1 гру 2024

КОМЕНТАРІ • 308

  • @statquest
    @statquest  3 роки тому +13

    The full Neural Networks playlist, from the basics to deep learning, is here: ua-cam.com/video/CqOfi41LfDw/v-deo.html
    Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

  • @DThorn619
    @DThorn619 4 роки тому +127

    Just to help with promotion the study guides he posts on his site only cost $3.00 and they are immensely helpful to refer back to. It's like having his entire video condensed into a handy step by step guide as a PDF. Yes, you could just watch the video over and over but this way you help Josh continue making great content for us at the cost of a cup of coffee.

    • @statquest
      @statquest  4 роки тому +31

      TRIPLE BAM!!! Thanks for the promotion! I'm glad you like the study guides. It takes a lot of work to condense everything down to just a few pages.

  • @MrAlb3rtazzo
    @MrAlb3rtazzo 4 роки тому +85

    every time a go the bathroom, and I use "soft plus " I think about neural nets again and this accelerates my learning process :)

  • @iiilllii140
    @iiilllii140 Рік тому +8

    This is such a nice and clear visualization of how activation functions work inside a neural network, and a perfect way to remember the inner workings. This is a masterpiece!
    Before that I knew, apply an input, activation functions, etc, etc, and you will receive an output with a magic value. But now I have a much more deeper understanding of WHY we are applying these activation functions / different activation functions.

  • @naf7540
    @naf7540 3 роки тому +96

    This is just so crystal clear and must have taken you some time to really deconstruct in order to explain it, really fantastic, thank you Josh!

    • @statquest
      @statquest  3 роки тому +41

      Thanks! It took a few years to figure out how to create this whole series.

    • @wassuphomies263
      @wassuphomies263 3 роки тому +3

      @@statquest Thank you for the videos! This helps a lot :)

    • @AdityaSingh-qk4qe
      @AdityaSingh-qk4qe 2 роки тому +4

      @@statquest That's a big BAM!!! StatQuest is by far one of the best resources for statistics and ML - thanks a lot, you helped me understand so many concepts, which I never got such as PCA, and even this how activation functions like relu actually bring non-linearity by slicing, fliping, etc operations!

  • @discotecc
    @discotecc 9 місяців тому +3

    The theoretical simplicity of deep learning is a beautiful thing

  • @maliknauman3566
    @maliknauman3566 3 роки тому +4

    Google should give you award for spreading knowledge to us all...

  • @wojpaw5362
    @wojpaw5362 3 роки тому +2

    OMG - CLEAREST EXPLANATION OF RELU ON THE PLANET!!! PLEASE TEACH ME EVERYTHING YOU KNOW

    • @statquest
      @statquest  3 роки тому +1

      bam! Here's a list of all of my videos: statquest.org/video-index/

    • @wojpaw5362
      @wojpaw5362 3 роки тому +1

      @@statquest Thank you Mister :)

  • @RomaineGangaram
    @RomaineGangaram 7 місяців тому +3

    Bro you are a genius. Much love from South Africa. Soon i will be able to buy your stuff. You deserve it

  • @추벌레배
    @추벌레배 4 роки тому +5

    How could you make difficult Machine Learning contents so easy? Incredible!

  • @ashutoshPY
    @ashutoshPY 4 роки тому +7

    The way you explain is on another level sir..Thanks🙏

    • @statquest
      @statquest  4 роки тому +1

      Thank you very much! :)

  • @hozaifas4811
    @hozaifas4811 Рік тому +1

    Your explanation deserves a huge bam! that's great man

  • @raul825able
    @raul825able Рік тому +2

    Thanks Josh!!! It's such a fun to learn machine learning from your videos.

  • @alimehrabifard1830
    @alimehrabifard1830 4 роки тому +25

    Awesome guy, Awesome channel, Awesome video, TRIPLE BAM!!!

  • @speedtent
    @speedtent Рік тому +1

    You saved my life thank you from korea

  • @mainakray6452
    @mainakray6452 4 роки тому +2

    your explanation makes things so simple ...

  • @DED_Search
    @DED_Search 3 роки тому +4

    I like how you explain affine transformation rotates, scales and flips activation function with vivid illustration. Now I can relate to Lecun's deep learning class, which talks about this in abstract matrix form. Thanks.

  • @ifargantech
    @ifargantech Рік тому +1

    I always expect your intro music... I like it. Your content is also satisfying. Thank you!

  • @heplaysguitar1090
    @heplaysguitar1090 3 роки тому +8

    I come here every time I learn some new concept to understand it clearly. Thanks a ton!!
    Would really love to jam with you someday for the intros, and maybe we can call it a BAMMING session.

    • @statquest
      @statquest  3 роки тому +1

      That would be awesome!

  • @aswink112
    @aswink112 3 роки тому +9

    My mind is blowing. Triple Bam. Josh Starmer - a great thanks to you for making such amazing videos, educating others for free of cost.

  • @RubenMartinezCuella
    @RubenMartinezCuella 3 роки тому +3

    Hey Josh, here is a topic you may be interested in making a video about. It is very relevant and I feel like not many videos in the web explain it:
    - Which are the hyperparameters affecting a NN and what is the intuition behind each of them. Most packages (e.g. caret) run a grid of models with all combinations of parameters you have specified, but it gets very computationally expensive pretty easily. It would be great to learn about some of the intuition behind in order to feed that grid something better than random guesses.
    Let me know what you think about this topic, and thanks again for your great job.

    • @statquest
      @statquest  3 роки тому +2

      I'll keep that in mind. In the next few months I want to do a webinar on how to do neural networks in python (with sklearn) and maybe that will help you.

    • @RubenMartinezCuella
      @RubenMartinezCuella 3 роки тому +1

      @@statquest thank you

  • @QuranKarreem
    @QuranKarreem Рік тому +1

    Very good explanation ,especially when you talked about the relu function which is not differentiable
    keep up the great work brother

  • @_1jay
    @_1jay Рік тому +3

    Another banger

  • @jennycotan7080
    @jennycotan7080 Рік тому +1

    You said that ReLU sounds like a robot. My personification of this function is actually a robot who is simple-minded when solving problems! Coincidence!

  • @lisun7158
    @lisun7158 2 роки тому +3

    [Notes excerpt from this video]
    7:10 The reason why ReLu works. -- Like other activation function, the weights and bias on the connection slice, flip and stretch the function image into new shape.
    7:40 The method to solve the problem that the derivative of ReLu function is not defined at bent point (0,0). -- Manually define the derivative to be 0 or 1.

  • @hemantrawat1576
    @hemantrawat1576 2 роки тому +1

    I really like the intro of the video statquest....

  • @williambertolasi1055
    @williambertolasi1055 4 роки тому +4

    Good explanation. It is interesting to see how the ReLU is used to gradually refine the function that defines the probabilistic output.
    Looking at how the ReLU is used reminds me of the use of diodes (with an approximate characteristic curve) in electronic circuits.

    • @statquest
      @statquest  4 роки тому +3

      Same here - the ReLU reminds me of a circuit.

  • @edmalynpacanor7601
    @edmalynpacanor7601 2 роки тому +1

    Not skipping ads for my guy Josh

    • @statquest
      @statquest  2 роки тому

      BAM! :) thanks for your support!

  • @TheRandomPerson49
    @TheRandomPerson49 26 днів тому +1

    I first like the video then starts learning 😊😊🔥🔥🔥🔥

  • @FuugaMo
    @FuugaMo 4 дні тому +1

    This is so helpful! Thank you!!

  • @maryamsajid8400
    @maryamsajid8400 2 роки тому +1

    amazing job... understood clearly... now i don't have to search more for ReLU :D

  • @epistemophilicmetalhead9454
    @epistemophilicmetalhead9454 Рік тому +1

    note regarding relu: derivative at 0 is not defined so we assume the derivative at 0 o be either 0 or 1.

  • @Xayuap
    @Xayuap Рік тому +2

    a tiny bam is just a declaration of humility

  • @Red_Toucan
    @Red_Toucan Рік тому +3

    I was struggling (well, I still am) with understanding activation functions and these videos are helping me a lot. And man, you answer every single comment, even in old videos. Thanks a lot and many bams from Argentina :D .
    There's one thing that I did't quite get, and maybe this is reviewed in the next episodes.
    I understand activation functions are used to introduce "non-linearity" to predictions, but to me they still seem very arbitrary. I mean, why would I, for example, with ReLU in mind, keep positive values and change negative values to 0? Am I not losing a lot of information there?
    I know when it comes to deep learning sometimes the answer is along the lines of "because some dude tried it 10 years ago and it worked. Here's a paper discussing it" but I'd still like to ask.

    • @statquest
      @statquest  Рік тому +1

      It might help to see the ReLU in action. Here's an example that shows how it can help us fit any shape to the data: ua-cam.com/video/83LYR-1IcjA/v-deo.html

    • @SonkMc
      @SonkMc 11 місяців тому

      No estás perdiendo información porque lo que viaja después de que empiezas a manipular el algoritmo ya no es la información original, sino una "sombra", y lo que intentas hacer es ajustar los parámetros para que todo quede ajustado a esa sombra.
      El objetivo de las funciones de activación es agregar no linearidad, pero recuerda que la magia también está en las neuronas y las capas ocultas de la red. Cada capa oculta agrega un punto de intersección, y eso va a ajustar más la información que se observa

  • @r0cketRacoon
    @r0cketRacoon 8 місяців тому +1

    could you do another video on back propagation with 2 hidden layers combined with relu functions ?
    I really love ur visualization

    • @r0cketRacoon
      @r0cketRacoon 8 місяців тому

      Acccording to the formula for the deravative of loss function with respect to weights/bias in the previous video, if we replace the softplus function with ReLu function, then (e**x / (1+e**x)) is replaced with 0 or 1.
      If 0 then the deravative of loss function with respect to weights/bias is 0, then step size = 0, then there is no tweak in weights, bias
      Can u make a video for that? Or may i be wrong

    • @statquest
      @statquest  8 місяців тому

      Yes, that's correct. So, for very simple neural networks, it can be hard to train with ReLU. However, for bigger, more complicated networks, the value is rarely 0 since there are so many possible inputs.

    • @r0cketRacoon
      @r0cketRacoon 8 місяців тому +1

      @@statquest tks, I'm really appreciate ur dedication to making a video like this. It has helped so much around the concepts

  • @ramyagoka9693
    @ramyagoka9693 3 роки тому +1

    Thank you sir so much for such an clear explanation

  • @firattamur1682
    @firattamur1682 3 роки тому +4

    Hi, I was really excited when I saw you start to neural networks after your great machine learning videos. Can you create a playlist for neural networks as you did for machine learning? It is easier to follow with playlists. Thanks

    • @statquest
      @statquest  3 роки тому +2

      Yes, soon! As soon as I finish all the videos in this series. There are at least 2, maybe 3 or 4 more to go.

  • @drzl
    @drzl 3 роки тому +1

    Thank you, this helped me with an assignment

  • @harishbattula2672
    @harishbattula2672 3 роки тому +1

    Thank you for the explanation.

  • @user-et8es9vg5z
    @user-et8es9vg5z 8 місяців тому +1

    As always everything is very clear but I still don't understand why is RELU function is currently the most effective function in machine learning. I mean, the shape seems so less natural than for the softplus function for instance.

    • @statquest
      @statquest  8 місяців тому +1

      It's simple - super easy to calculate - so it doesn't take any time, which means we can use a lot more of them to fit more complicated shapes to the data.

  • @nandakumar8936
    @nandakumar8936 7 місяців тому +1

    'at least it's ok with me' - all we need for peace of mind

  • @matthewlee2405
    @matthewlee2405 4 роки тому +2

    Thank you very much Starmer, very clear and great video! Thank you!

  • @FloraSora
    @FloraSora 3 роки тому +2

    I love the toilet paper image for softplus... didn't catch it on the first watch but it became more and more suspicious as I went through this a few times... LOL.

  • @tagoreji2143
    @tagoreji2143 2 роки тому +1

    thank you Professor

  • @adityams1659
    @adityams1659 3 роки тому +1

    *this video/ most of his videos have less than 100K views!!??*
    *ppl are missing out on a gold mine!*

  • @ML-jx5zo
    @ML-jx5zo 4 роки тому +2

    Again a appreciation for u

  • @darshdesai2754
    @darshdesai2754 3 роки тому +1

    Hey Josh! Amazing content - as always. I have always found your videos to be very useful in understanding the fundamental ideas, rather than just accepting the 'theoretical definitions'. I just wanted throw out a suggestion that it would be great if you could collaborate with other open source/free for all learning mediums like Khan Academy. This would not only increase the viewer base for all open source platforms but it would also fill in the gaps where the content on your channel or their channel has not been created yet.

    • @statquest
      @statquest  3 роки тому +1

      I'll keep that in mind. However, I have no idea how to collaborate with Khan Academy. If you have suggestions, let me know.

  • @jamasica5839
    @jamasica5839 3 роки тому

    With ReLU life is easier, you don't have to computing complicated THE CHAIN RULE :D
    Great series!!! I finally get it because of you Josh!

  • @nonalcoho
    @nonalcoho 4 роки тому +2

    Learning math is becoming sooooo easy and funny with your effort!
    Thank you so MUCH! BAM~~~~~~
    If possible, can you make a video about the "gradient vanishing problem" in the future~?

    • @statquest
      @statquest  4 роки тому +1

      I'll keep that in mind.

  • @Luxcium
    @Luxcium Рік тому

    *I am already familiar with Neural Networks Part One* 😂😂😂 So I this my quest will start here so it’s time to _Start Quest_ This time my quest is leading me to the *ReLU in Action* then I will unwind and back propagar 🎉the *Recurrent Neural Networks (RRNs)…* I will then learn What is « *Seq2Seq* »but I must go watch *Long Short Term-Memory* I think I will have to check out the quest also *Word Embedding and Word2Vec…* and then I will be happy to come back to learn with Josh 😅 I am impatient to learn *Attention for Neural Networks* _Clearly Explained_

    • @statquest
      @statquest  Рік тому

      Please just just watch the videos in order: ua-cam.com/play/PLblh5JKOoLUIxGDQs4LFFD--41Vzf-ME1.html

  • @dengzhonghan5125
    @dengzhonghan5125 3 роки тому

    Can u also talk about CNN and RNN? U r my favorite teacher.

    • @statquest
      @statquest  3 роки тому +1

      CNNs are coming up soon.

  • @hongyichen8369
    @hongyichen8369 3 роки тому +2

    Hi, there are a lot of activation functions like relu and tanh etc..., can you make a video about the usage of different activation functions?

    • @statquest
      @statquest  3 роки тому +2

      I'll keep that in mind.

  • @anishtadev2678
    @anishtadev2678 3 роки тому +2

    Thank you Sir

  • @DED_Search
    @DED_Search 3 роки тому +1

    Could you shed some light on the advantage and disadvantage of Relu vs Soft Plus please? Thank you. I didn't know there was soft plus until this video. lol

    • @statquest
      @statquest  3 роки тому

      There seems to be a raging debate as to whether or not ReLU is better or worse than Soft Plus and it could be domain specific. So I don't really know the answer - maybe just try them both and see what works better.

    • @DED_Search
      @DED_Search 3 роки тому

      @@statquest thank you!

  • @Vanadium404
    @Vanadium404 Рік тому +2

    That SoftPlus toilet paper and the Chain Rule sound effect in every video lol

  • @Mak007-h5s
    @Mak007-h5s 2 роки тому +1

    So good!

  • @adamoja4295
    @adamoja4295 2 роки тому +1

    That was very satisfying

  • @BowlingBowlingParkin
    @BowlingBowlingParkin 2 роки тому +1

    AMAZING!

  • @user-rt6wc9vt1p
    @user-rt6wc9vt1p 3 роки тому +2

    How would we deal with the derivative of this function? I've read that the derivative is 0 for x < 0 and 1 for x > 0, but I'm having issues in that when weights are initialized below zero, (something like -0.5), the derivative of the activation function is 0. The chain rule would then make the entire gradient for that weight 0, and the weight would just never change.

    • @statquest
      @statquest  3 роки тому +2

      Yes, that's true, so you may have to try a few different sets of random numbers. However, in practice, we usually have much larger networks with lots of inputs and lots of connections to every node and a lot of data for training. In this case, because we sum all of the connections, having a few negative weights may, for some subset of the training data, result in 0 for the derivative, but not for all data.

    • @user-rt6wc9vt1p
      @user-rt6wc9vt1p 3 роки тому +1

      @@statquest Ah, I see. My network had 1 neuron per layer, makes sense now

  • @sohambasu660
    @sohambasu660 2 роки тому

    I really the great content you make that helps us to understand such difficult topics.
    Also, if you kindly include the formula generally used for the concept and break them down in the video, it would be immensely useful.
    Thanks anyways.

    • @statquest
      @statquest  2 роки тому

      I'm not sure I understand your question. This video discusses the formula for ReLU and breaks it down.

  • @ProEray
    @ProEray 4 роки тому +6

    I desperately need a recurrent neural networks video :'(

    • @ProEray
      @ProEray 4 роки тому +1

      Good Job btw, liked and subscribed as always

    • @statquest
      @statquest  4 роки тому +5

      Thanks! I'm working on convolutional neural networks right now.

  • @chaoukimachreki6422
    @chaoukimachreki6422 2 роки тому +1

    Just awesome...

  • @6866yash
    @6866yash 3 роки тому +1

    You are a godsend :')

  • @arkobanerjee009
    @arkobanerjee009 4 роки тому +3

    Brilliant as usual. Is there an SQ on softmax activation function in the pipeline?

    • @statquest
      @statquest  4 роки тому +4

      Thank you and yes. Right now I'm working on Convolutional Neural Networks and image recognition, but after that (and perhaps part of that) we'll cover softmax.

    • @masteronepiece6559
      @masteronepiece6559 4 роки тому +3

      @@statquest If you can visualize what happens to the data to show us how those methods are working. Best regards,

  • @adriangabriel3219
    @adriangabriel3219 2 роки тому +1

    Why does it make sense to use a ReLu at the end? Is it to reduce the complexity of the green squiggle from a curvy to a pointy squiggle?

    • @statquest
      @statquest  2 роки тому +1

      It restricts the final output to be between 0 and 1.

  • @omerutkuerzengin3061
    @omerutkuerzengin3061 10 місяців тому +1

    Du bist toll!

  • @HarryKeightley
    @HarryKeightley 8 місяців тому

    Thank you for the very clear explanations - this video series is wonderful in its capacity to communicate complex topics in a very clear and understandable manner. I have a question: does the use of the ReLu function for this network result in a less accurate model overall due to its piecewise linearity? It seems the more elegant curve has been replaced by a more simplistic linear-looking triangle. Would this mean a larger network would be needed in order for the model to be more accurate - so that the non-linear relationship between dosage and effectiveness can be modelled more accurately through a more complete/complex interaction of nodes?

    • @statquest
      @statquest  8 місяців тому +1

      That's a good question and I'm not 100% what the answer is other than, "it probably depends on the data". That said, ReLU, because of it's simplicity, allows for much deeper neural networks (more hidden layers and more nodes per layer) than the sigmoid shape, and, as a result, allows for more complicated shapes to be fit to the data. ReLU's introduction to neural networks a little over 20 years ago made a huge impact on AI because "deep learning" wasn't possible with the sigmoid shape. In contrast, with it's super simple derivative, ReLU allowed neural networks to model much more complicated datasets than ever before.

    • @HarryKeightley
      @HarryKeightley 8 місяців тому +1

      @@statquest Thanks for taking the time to answer my question, much appreciated :). I'm very much looking forward to watching the rest of your videos related to NNs and related concepts.

  • @flockenlp1
    @flockenlp1 Рік тому

    Hi, are you planning on making a video on Radial Basis Function Networks and self-organizing maps?
    Especially with self-organizing maps it's very hard to find good ressources, at least so far I found nothing that could really help me wrap my head around this topic. I figured, since this seem like your kind of topic and you have a hand for explaining these things in an easy to understand way there'd be no harm in asking :)

    • @statquest
      @statquest  Рік тому

      I have a video on the radial basis function with respect to Support Vector Machines, if you are interested in that. SVMs: ua-cam.com/video/efR1C6CvhmE/v-deo.html Polynomial Kernel: ua-cam.com/video/Toet3EiSFcM/v-deo.html and Radial Basis Function Kernel: ua-cam.com/video/Qc5IyLW_hns/v-deo.html

  • @DED_Search
    @DED_Search 3 роки тому +1

    at 8:17, could you please explain in details why it does not matter that RELU is bent? How does it related to gradient vanishing/exploding? The reason I am asking is that if like you said we can get around the non-differentiability at bent point by setting gradient to 0, then it leads to gradient vanishing during back propagation, right? Thanks.

    • @statquest
      @statquest  3 роки тому

      The gradient for ReLU is either 1 or 0. Thus, the gradient can not vanish unless every single value that goes through it is less than 0. If this happens, the node goes dark and becomes unusable, but this is rare if your data is relatively large.

    • @DED_Search
      @DED_Search 3 роки тому

      @@statquest Thanks. That makes sense. I guess what I didn't understand was that you said in the video that "we can simply define the gradient at bent point to be 0 or 1". So the choice really does not make a difference? Thanks.

    • @statquest
      @statquest  3 роки тому +2

      @@DED_Search It doesn't make a difference because the probability of having an input value equal = 0 *exactly* (and not 0.000001 or -0.00001) is pretty much 0. So it doesn't matter what value we give the derivative at *exactly* 0.

    • @DED_Search
      @DED_Search 3 роки тому +1

      @@statquest thank you!

  • @RumayzaNorova
    @RumayzaNorova 2 місяці тому +1

    Day 4 of leaving the comment and studying with statquest :)

  • @ThePanagiotisvm
    @ThePanagiotisvm 3 роки тому +1

    It's only me that I didn't understand where the values of weights and bias come from? Why for example the first weight w1=1.70?

    • @statquest
      @statquest  3 роки тому +1

      The weights and biases come from backpropagation, which I talk about in Part 1 in this series, and the show a simple example in Part 2 ua-cam.com/video/IN2XmBhILt4/v-deo.html and then go into more detail in these videos: ua-cam.com/video/iyn2zdALii8/v-deo.html ua-cam.com/video/GKZoOHXGcLo/v-deo.html

    • @ThePanagiotisvm
      @ThePanagiotisvm 3 роки тому +1

      @@statquest thank you!! Back propagation video made it clear.

    • @statquest
      @statquest  3 роки тому +1

      @@ThePanagiotisvm bam!

  • @slkslk7841
    @slkslk7841 2 роки тому

    At 7:45 could you please tell why gradient descent wouldn't work for a bent line? Gradient descent videos didn't help clear this doubt.
    Amazing video btw! Thanks

    • @statquest
      @statquest  2 роки тому +1

      Gradient Descent needs to have a derivative defined at all points. Technically the bent line does not a derivative when x = 0 (at the bend). However, at 8:03 I say that we just define the derivative at x = 0 to either 0 or 1, and when we do that, gradient descent works just fine with the ReLU.

  • @onemanshow3274
    @onemanshow3274 3 роки тому +1

    Hey Josh, Can you please please make videos on Recurrent Neural Network and Transformers

    • @statquest
      @statquest  3 роки тому +1

      I'll keep that in mind.

  • @vishaltyagi5000
    @vishaltyagi5000 4 роки тому +1

    Hi really love your work. Any plans on doing the Recurrent Neural networks including the modern RNN units (LSTM, GRU)?

    • @statquest
      @statquest  4 роки тому +3

      I'll keep that in mind. I'm working on convolutional neural networks right now.

    • @vishaltyagi5000
      @vishaltyagi5000 4 роки тому +4

      @@statquest Thanks. Appreciate your efforts 🙂

  • @Intaberna986
    @Intaberna986 Рік тому +1

    God bless you, I mean it.

  • @mischievousmaster
    @mischievousmaster 4 роки тому +1

    Josh, could you please do a video on NLP and its implementation in python?. Would really love that.
    And about the video, it is awesome as always!

    • @statquest
      @statquest  4 роки тому +3

      I'll keep that in mind.

  • @Odiskis1
    @Odiskis1 3 роки тому +1

    How do we know values won't become really high above 0? I though the activation function contained the values in both negative and positive directions so that they wouldn't explode. Is that not a problem?

    • @statquest
      @statquest  3 роки тому

      A much bigger problem is something called a "vanishing gradient", which is where the gradient gets very small and mover very slowly - too slowly to learn. ReLU helps eliminate that problem by having a gradient that is either 0 or 1.

  • @miriza2
    @miriza2 4 роки тому +2

    Triple BAM!!! 💥 💥 💥

    • @statquest
      @statquest  4 роки тому

      :)

    • @zeetech0123
      @zeetech0123 3 роки тому

      Fourple BAM!!!!
      .
      .
      .
      .
      .
      .
      .
      (ik its not a word :p lol)

  •  3 роки тому +1

    I gave "like" before watched it...

  • @abbastailor3501
    @abbastailor3501 2 роки тому +1

    Take me to your leader Josh 🤲

  • @Abhi-qi6wm
    @Abhi-qi6wm 2 роки тому

    There is an error at 3:15 because you change the dosage from 0.2 to 0.1 in the equation.

    • @statquest
      @statquest  2 роки тому +1

      Oops!

    • @Abhi-qi6wm
      @Abhi-qi6wm 2 роки тому +1

      @@statquest amazing video nonetheless

  • @edwinokwaro9944
    @edwinokwaro9944 5 місяців тому

    you did not show how the rest of the weights are updated? I need to understand how the derivative of the activation function affects the weight update

    • @statquest
      @statquest  5 місяців тому

      See this video for details and just replace the derivative of the softmax with 0 or 1 depending on the value for x: ua-cam.com/video/GKZoOHXGcLo/v-deo.html

  • @anirudhsingh9025
    @anirudhsingh9025 2 роки тому

    i have been watching your quest for NN for past few days and the way you explain is good but i didn't get one thing that you said about adding 2 lines on a graph ? So how do we add 2 lines on a graph and finde a third curve?

    • @statquest
      @statquest  2 роки тому

      What time point, minutes and seconds, are you asking about?

  • @nickmishkin4162
    @nickmishkin4162 6 місяців тому

    Nice video. If ReLU always outputs a positive number, how can the neural network produce a negative sloping curve?

    • @statquest
      @statquest  6 місяців тому +1

      A weight that comes after the ReLU can be negative, and flip it over.

    • @nickmishkin4162
      @nickmishkin4162 5 місяців тому

      @@statquest So is it inefficient to end a neural network with a ReLU function? Because then we never allow the network to generate a negative slope.
      Correct me if I'm wrong:
      Input -> ReLU(X1) -> only positive outputs -> negative weights -> ReLU(X2) -> only positive final outputs.
      I guess my real question is this: can negative weights followed by a ReLU function produce a negative slope?
      Thanks!

    • @statquest
      @statquest  5 місяців тому +1

      ​@@nickmishkin4162 This video pretty much illustrates everything you want to know about the ReLU. Look at the shape of the function that comes out of the final ReLU at 5:35

    • @nickmishkin4162
      @nickmishkin4162 5 місяців тому +1

      @@statquest Yes! Didn't realize your nn ended with a ReLU. Thank you

  • @nirajpattnaik6294
    @nirajpattnaik6294 4 роки тому +2

    Awesome.. Awaiting CNN from SQ ..

    • @statquest
      @statquest  4 роки тому +1

      It should come out in the next few weeks.

  • @TrungNguyen-ib9mz
    @TrungNguyen-ib9mz 3 роки тому

    Great video!! But might you explain more about how to estimate w1,w2,b1,b2,...? Thank you!

    • @statquest
      @statquest  3 роки тому +2

      I explain how to estimate parameters (w1, w2, b1, b2 etc.) in Part 2 in this series (this is Part 3): ua-cam.com/video/IN2XmBhILt4/v-deo.html and in these videos: ua-cam.com/video/iyn2zdALii8/v-deo.html and ua-cam.com/video/GKZoOHXGcLo/v-deo.html

    • @TrungNguyen-ib9mz
      @TrungNguyen-ib9mz 3 роки тому +1

      @@statquest Thank you so much!

  • @muzammelmokhtar6498
    @muzammelmokhtar6498 2 роки тому

    Great video, but i dont really understand about the curvy and bent ReLU on the last part of the video..

    • @statquest
      @statquest  2 роки тому

      What time point, minute and seconds, are you asking about?

    • @muzammelmokhtar6498
      @muzammelmokhtar6498 2 роки тому

      @@statquest 7.42-8.15

    • @statquest
      @statquest  2 роки тому

      @@muzammelmokhtar6498 Bent lines don't have derivatives where they are bent. This, in theory, is a problem for backpropagation, which is what we use to optimize the weights and biases. To get around this, we simply define a value for the derivative at the bend. For details on back propagation, see: ua-cam.com/video/IN2XmBhILt4/v-deo.html ua-cam.com/video/iyn2zdALii8/v-deo.html and ua-cam.com/video/GKZoOHXGcLo/v-deo.html

    • @muzammelmokhtar6498
      @muzammelmokhtar6498 2 роки тому +1

      @@statquest ouh okay, thank you👍

  • @alfadhelboudaia1935
    @alfadhelboudaia1935 4 роки тому +2

    Hi, you are really awesome, would appreciate it, if you do a video on the (Maximum a posterior estimation (MAP))?

    • @statquest
      @statquest  4 роки тому

      I'll keep that in mind.

  • @pelocku1234
    @pelocku1234 3 роки тому

    Is the derivative of SSR with respect to w1
    -2(obs - pred) * ifelse(y1 * w3 + y2 * w4 + b3

    • @statquest
      @statquest  3 роки тому

      I do the full backpropagation for all the parameters, including w1, in this video (just sub in the ReLU derivative for the SoftMax derivative): ua-cam.com/video/GKZoOHXGcLo/v-deo.html

    • @pelocku1234
      @pelocku1234 3 роки тому

      @@statquest I was trying to use this video and sub in the ReLU as you suggested. I think that the ReLU at the end might be what is tripping me up. I haven't figured it out yet, but I will keep trying.

  • @yacinerouizi844
    @yacinerouizi844 3 роки тому +1

    thnak you

  • @口加口非
    @口加口非 4 роки тому

    Can you make a video about Recursive Feature Elimination ? I like your video style.

    • @statquest
      @statquest  4 роки тому

      I'll keep that in mind.

  • @rafibasha4145
    @rafibasha4145 3 роки тому

    Hi Josh,thank you for the excellent videos ..how the input data splitted across 2 hidden neurons

    • @statquest
      @statquest  3 роки тому +1

      I'm not sure I understand your question. However, I will say that input data simply follows the connections from the input node to the nodes in the hidden layers. For details, see: ua-cam.com/video/CqOfi41LfDw/v-deo.html

    • @rafibasha1840
      @rafibasha1840 3 роки тому +1

      @@statquest ,Thanks Josh

  • @ofek8280
    @ofek8280 3 роки тому +2

    Shameless Self Promotion is the funniest thing Ive seen in youtube!

  • @prasadphatak1503
    @prasadphatak1503 3 роки тому +4

    Tiny bam 😂 omfg I couldn't stop laughing. It's like Ryan Reynolds is explaining Neural Networks 😂

  • @pravinverma9668
    @pravinverma9668 2 місяці тому

    Why are you using 2 nodes in the hidden layer, means what is the purpose of solving it using 2 nodes, we can construct it using 1 node also. Please clarify.

    • @statquest
      @statquest  2 місяці тому

      The more nodes you have in a hidden layer and the more hidden layers you have the more complicated a dataset you can fit your neural network to. In this case, we have a simple dataset and I found, with trial and error, that we could fit it with just 2 nodes.

  • @Anujkumar-my1wi
    @Anujkumar-my1wi 4 роки тому +1

    i want to know that whether weights in neural network are linear relation between input and nonlinear output or is it something else?

    • @statquest
      @statquest  4 роки тому +2

      The weights and biases are linear transformations. Remember, the equation for a line is y = slope * x + intercept, and we can replace the "slope" with the "weight" and the "intercept" with the "bias". So y = weight * x + bias = a linear transformation. All of the non-linearity comes from the non-linear activation functions.

    • @Anujkumar-my1wi
      @Anujkumar-my1wi 4 роки тому

      ​@@statquest so are we using weights in neural network as a linear relation between the input and nonlinear output and we introduce nonlinearity to linear combination because it is the linear combination of inputs and weights(showing the linear relation between the input and nonlinear output) as the linear combination is showing the linear relation between input and output and we're using nonlinear activation function so to convert that linear relation between input and nonlinear into nonlinear result.

    • @nguyenngocly1484
      @nguyenngocly1484 4 роки тому +1

      f(x)=x is connect. f(x)=0 is disconnect. You can view ReLU as a switch. What are being connected and disconnected? Dot products (weighted sums.) The funny thing is if all the switch states are known you can simplify many connected dot products into a single dot product.

    • @Anujkumar-my1wi
      @Anujkumar-my1wi 4 роки тому +1

      @@nguyenngocly1484 hey ,thanks ,Can you tell me that why we use weighted sum as an activation function's input ,can't we use neuron's input as activation function's input in a neural network.

    • @statquest
      @statquest  4 роки тому +1

      @@Anujkumar-my1wi The "weighted sum" is simply the weights times the input value. If you have multiple connections to a neuron, then each one has it's own weight - just like the connections to the final ReLU function in this video.

  • @carzetonao
    @carzetonao Рік тому +1

    Really like your video and r shirt is nice

  • @anshulbisht4130
    @anshulbisht4130 2 роки тому

    hey josh m how u added blue and orange line @5.24. i mean how that blue line in -y axis came in +y axis ( we need to multiply it by something which is not shown in video ) ?.. hopefully u reply soon :)

    • @statquest
      @statquest  2 роки тому

      We added the y-axis coordinates of the two lines. However, those lines are not exactly to scale, so that might be confusing you.

  • @chenzeping9603
    @chenzeping9603 Рік тому

    if you add activation funct to the final layer, doesn't it restrict the possible output values? (i.e. if you add relu to the last layer before the output, doesn't it restrict outputs to pos? similarly if you use sigmoid, doesnt it restrict it to between [0, 1]

    • @statquest
      @statquest  Рік тому

      I'm a little confused by your question. Are you asking about what happens at 5:35 ?

  • @seanlynch4354
    @seanlynch4354 Рік тому

    I have a question, will they y output in a ReLU function always be the same as the x input if the, x input is greater than 0?