Softmax Layer from Scratch | Mathematics & Python Code

Поділитися
Вставка
  • Опубліковано 5 лют 2025

КОМЕНТАРІ •

  • @evertonsantosdeandradejuni3787
    @evertonsantosdeandradejuni3787 3 роки тому +25

    Exatcly what I need, a channel that focus more on the mathematics as well as the computer science side of things.
    Please don't ever be afraid to get into the maths, keep it like this.
    There's a lotta superficials tutorials channels out there, theres a void to be filled in the deep mathematical kinda sense.

  • @woaw
    @woaw 3 роки тому +1

    Your videos are a godsend. I had so much trouble trying to build a CNN from scratch for my Uni course, and nothing on the internet even come close to how well you explain every single step. Thank you so much!

    • @independentcode
      @independentcode  3 роки тому +1

      Thank you for the kind words. I'm really glad my videos helped :)

  • @AB51002
    @AB51002 Рік тому

    This helped me a lot!
    Lots of projects out there just use the Pytorch API for the most common functionalities, which makes things harder for people who desire to change the code in order to implement a research idea or just experiment in general.
    Thank you so much for your effort, and for taking the time to make these videos!

  • @erron7682
    @erron7682 3 роки тому +8

    Your channel is one of the best on the whole UA-cam.
    I am very pleased to see you and remember the basics)
    If I decide to make a channel about DL, I will take an example from you.

  • @davidfernandezfernandez8368

    Literally, I've been looking for this exact video for months. It's exactly what I was looking for, a high cuality video that really goes in depth about all the maths and coding around neural networks. I feel like a lot of the videos out there either explain all this in a high level of abstraction or only go in depth about the maths or the coding part, but this just does all things great.
    Honestly I can't believe this channel is bellow 100k subs.
    Keep it up man!!

  • @denismerigold486
    @denismerigold486 3 роки тому +4

    Happy New Year!!!
    Thanks for creating this channel!
    I already know all this, but your presentation is amazing.♥♥♥

  • @dumbcalamitychild
    @dumbcalamitychild 2 роки тому +2

    This is one gem of a channel!

  • @denismerigold486
    @denismerigold486 2 роки тому +6

    I miss you!

  • @willlowtree
    @willlowtree 3 роки тому

    how do you have less than a thousand subscribers ? this is better than most of popular stuff out there. please keep making videos i love them

  • @rushikparmar6355
    @rushikparmar6355 3 роки тому

    Great work! Your videos are one of the best in explaining the most complex topics !

  • @jamieabw4517
    @jamieabw4517 8 місяців тому

    thank you so much for all of these videos - the explanations are genuinely incredible and make it so simple to understand, are you ever gonna return and continue to make these videos?

  • @cicADA001
    @cicADA001 3 роки тому +1

    Alright, this deserves a sub ^^

  • @AngeloKrs878
    @AngeloKrs878 2 роки тому +3

    What happens if we are using batch gradient descent instead of stochastic GD? In the video we have M as n x n matrix, and the output of softmax is n x 1. Now, in the batch GD case, the ouput of softmax is n x m, but what about the M matrix?

  • @black-sci
    @black-sci 8 місяців тому

    Can you also make video on cross-entropy loss.

  • @joaopaulocosta1062
    @joaopaulocosta1062 Рік тому +2

    Please do the LSTM I beg you

  • @deadlygamer7960
    @deadlygamer7960 2 роки тому +3

    Hlo there mister love your videos if possible please make a video on recurrent layers. (From Scratch)

  • @Xphy
    @Xphy 3 роки тому

    Like before watching, cuz i trust the content❤️

  • @vargas4762
    @vargas4762 3 роки тому

    Excellent video, thx for the content

  • @kaustabhchakraborty4721
    @kaustabhchakraborty4721 3 роки тому

    I am your 754th subscriber.

  • @ManyAliens
    @ManyAliens 3 роки тому

    Excellent video; liked the previous one as well. I was wondering if it was necessary to have the analytic formula for the gradient. Couldn't we just just use a small variation of the input and use the output variation to get the local gradient, using the forward function when back propagating ?

    • @independentcode
      @independentcode  3 роки тому +1

      It is an interesting question :) I've never tried it, but I think you will have to run the network as many times as you have parameters. If you have a function f(x,y) and you want the derivative with respect to x, then you have to evaluate (f(x+dx,y) - f(x,y))/dx, but then you will have to compute (f(x,y+dy) - f(x,y))/dy to get the derivative with respect to y. Unless I missed something, this seems unfeasible for a neural network given that it has thousands of parameters. I think this might also lead to bigger and bigger imprecision as you go back in the network.

  • @bossgd100
    @bossgd100 Рік тому +1

    Why did you stop doing vidéos :/ ?

  • @shivamsinghcuchd
    @shivamsinghcuchd 2 роки тому

    I was wondering if the Backward pass code will work for Batch of Input.

  • @anisnehmar725
    @anisnehmar725 3 роки тому

    thanks you sir for this great explanation, but i have one question ! what about the loss ? we need to define a new loss function ? we must use cross entropy loss not the binary cross entropy ?

    • @independentcode
      @independentcode  3 роки тому +1

      Hi Anis. The loss that you will be using depends mainly on the use case of the neural network. We defined Mean Square Error in the first video, and Binary Cross Entropy in the second one.

    • @anisnehmar725
      @anisnehmar725 3 роки тому

      @@independentcode I mean WE Can use any loss fonction that WE want with softmax ? And for the binary cross entropy i thought its only for binary classification ?

  • @TheAstralftw
    @TheAstralftw 3 роки тому

    on 3.08 if k=i then ... what is k , and what is i?

    • @independentcode
      @independentcode  3 роки тому +1

      We were computing the derivative of E (error) with respect to x_k (k_th element in the input vector). That's for the k. The formula is a sum for i.
      Try expanding the sum, and take the derivative with respect to one of the input variable. You'll understand.

  • @LoongBerries
    @LoongBerries 11 місяців тому

    why don't you inherited the Activation class

    • @independentcode
      @independentcode  11 місяців тому

      That's because the Activation class takes in a function that will be applied to each input element individually: y_i=f(x_i). In the case of Softmax, each output depends on all the inputs, so the backpropagation works out differently.

  • @tangomuzi
    @tangomuzi 3 роки тому

    Excellent explanation again. However the implememtation is inefficient. It is good to simplify the formulas to explain. But this leads to inefficiency. Normally you do not need to form identity matrix and M explicitly and transpose of the matrix M. Form a vector y=(y1 y2...) then kroncker product of y will give you all the elements contained in MM.T. And using numpy brodcasting you can do substraction. Not full implementation but it will look like reshape(kron(y, y)....) - y

    • @independentcode
      @independentcode  3 роки тому +1

      Hey! I didn't know about the Kronecker product, thank you for mentioning it! I looked at it, and indeed we can compute the same formula as:
      np.identity(n) * y - np.reshape(np.kron(y, y), (n, -1))
      However, I feel like we still have to create the identity matrix since we're subtracting elements in the diagonal.

    • @tangomuzi
      @tangomuzi 3 роки тому +1

      @@independentcode Look your last formula M\hadamard(I - M.T) = M - M\hadamard M.T and you can obtain H = M\hadamard M.T by reshaping kronecker product. And M - H will be computed by broadcasting.

    • @tangomuzi
      @tangomuzi 3 роки тому

      @@independentcode Another way(less elegant): Before reshaping do substraction(without using identity) then reshape.

    • @tangomuzi
      @tangomuzi 3 роки тому

      @@independentcode Moreover, guess what? You do not need even to form M, therefore you do not need matrix vector multiplication.Hint: You need only elementwise vector multiplication. Try it. If you could not come up with solution, I will explain you

    • @independentcode
      @independentcode  3 роки тому

      @@tangomuzi I came up with this, using the identity: (np.identity(n) - y.T) * y
      Is this what you were thinking of ?

  • @bossgd100
    @bossgd100 Рік тому

    When transformer tutorials ?

  • @swapnilmasurekar5431
    @swapnilmasurekar5431 2 роки тому +1

    Please make a video on RNN implementation from scratch

  • @MohammadAhmadi-hw5rk
    @MohammadAhmadi-hw5rk Рік тому

    -1 and +10

  • @The_Quaalude
    @The_Quaalude 7 місяців тому

    Bro stopped making videos at the worst time possible ‼️😭

  • @daniyalmasood4649
    @daniyalmasood4649 3 роки тому +1

    First to comment

  • @chetanchoudhary485
    @chetanchoudhary485 10 місяців тому

    Why stop uploading the video, isn't it your responsibility to complete what you've started ? 🥲
    Your content is what many of us looking for, so keep doing bro 🤜🤛