Normalizing Activations in a Network (C2W3L04)

Поділитися
Вставка
  • Опубліковано 17 січ 2025

КОМЕНТАРІ • 38

  • @IgorAherne
    @IgorAherne 7 років тому +89

    Professor is very sneaky and uses the TV series approach. Every time I am saying "well, one more video and that's it", but in the end we have "John kills Bob" and a big "to be continued sign". So here I am, sitting for 2 hours straight, watching about another important technique that happens to exist, mentioned at the end of each video :D Thanks!

    • @md.rijoanrabbi99
      @md.rijoanrabbi99 6 років тому +1

      ha ha. i like your comment......how many ways he has!!

  • @sudhakar3115
    @sudhakar3115 5 років тому +2

    Thanks for uploading the concise, core idea centric along with implementational explanation of core elements needed to build an efficient neural network.

  • @alakbarvalizada5425
    @alakbarvalizada5425 3 роки тому +5

    For input normalization x divided by variance. Instead of variance should it be standard deviation? I mean not sigma^2 but sqrt(sigma^2) at 0:58 time of video.

  • @abhishekrao2120
    @abhishekrao2120 7 років тому +15

    What is lowercase m here? Is it number of hidden units in layer l or the number of samples in mini batch?

    • @realTCG2
      @realTCG2 7 років тому +5

      It is the batch size

    • @kyungtaelim4412
      @kyungtaelim4412 7 років тому +5

      Normally, It is the number of training data = batch size here.

    • @abhishekrao2120
      @abhishekrao2120 7 років тому +1

      Thank you

    • @eason_longleilei
      @eason_longleilei 7 років тому +1

      it's the samples numbers

    • @claudiocimarelli
      @claudiocimarelli 6 років тому +7

      it is so confusing how he explains this detail because he surely means that (i) is the ith neuron of n in layer [l]. Clearly (and is a rare case ) he made some sort of mistake, and in fact, in a subsequent video, he talks about (i) as an example in a batch and not as a neuron.

  • @AbhinavSingh-oq7dk
    @AbhinavSingh-oq7dk 2 роки тому +1

    7:28 , it is said we might want larger variance for z, but why? Wouldn't that lead to slow learning problem / vanishing gradient in case of sigmoid?

    • @Komisar95
      @Komisar95 Рік тому

      If you imagine the variance which is small, you will produce inputs near zero, where sigmoid behaves as a linear function. This would make the whole layer a useless linear transformation.

  • @TheBjjninja
    @TheBjjninja 5 років тому +2

    I thought the purpose of batch normalization is a regularization technique for neural networks with the objective of reducing overfitting. Perhaps it both increases performance also and addresses overfitting?

  • @abdulmukit4420
    @abdulmukit4420 3 роки тому +1

    Beautiful explanation. This makes so much more sense now.

  • @depizixuri58
    @depizixuri58 4 роки тому

    What if the layer is not fully connected? How is the batch normalization done?

  • @adityavaikunt5498
    @adityavaikunt5498 5 років тому +1

    if we set gamma and beta to get a different mean and variance each time for a different layer, what is the purpose of batch normalization? or is the effect of batch normalization restricted to each layer individually?

    • @walidmaly3
      @walidmaly3 4 роки тому

      They are trainable so you do not set them

  • @clivefernandes5435
    @clivefernandes5435 4 роки тому

    So gamma and beta are learning the true variance and mean of the dataset rite ?

  • @sandipansarkar9211
    @sandipansarkar9211 4 роки тому +2

    great explanation.Need to make notes

  • @jamesearle6932
    @jamesearle6932 6 років тому +1

    What is z? 2:30

    • @MaXXiVeEP
      @MaXXiVeEP 5 років тому +8

      It is the value of the neuron before applying the activation function. So a = h(z) for some activation function h. z itself is the dot product of the weights w with the inputs to the neuron.

  • @gorgolyt
    @gorgolyt 5 років тому +2

    I found it a bit unclear what the axis of normalisation was (I believe each individual activation is normalised using the mean and standard deviation of that activation over the batch?), and how many learnable parameters there are -- is there a gamme and beta for each activation? It's not clear whether the zs, gammas, betas, etc. are scalars of single activations or vectors of the whole layer.

    • @kayicomert7933
      @kayicomert7933 5 років тому +1

      As I understood, Z is the vector of whole output of one layer. You find the variance and the mean over this vector, too. Probably, you have found your answer while ago nevertheless I wanted to answer.

    • @beizhou2488
      @beizhou2488 5 років тому +2

      They are the vectors of the whole layer. For each z, four parameters are attached to it. Two learnable parameters, which are gamma and beta, and two unlearnable parameters, which are the mean and deviation. All the calculations are element-wise but the representations are all vectors. Hopefully, my answer dissipates your doubts. If it is not clear for you, leave your question here.

    • @vamsikrishnabhadragiri402
      @vamsikrishnabhadragiri402 4 роки тому

      @@beizhou2488 Could you please explain why don't we want to make the mean and standard deviation come from the same distribution all the time. I mean why did we add beta and gamma to the normalized equation.

  • @vikankshnath8068
    @vikankshnath8068 5 років тому

    Amazing thanks

  • @KapilSharma-co8xq
    @KapilSharma-co8xq 4 роки тому +1

    Should it not be
    Sigma^2=(1/m) summation (zi^2-u^2)

    • @apurvsingh2575
      @apurvsingh2575 4 роки тому

      Since he has already subtracted the value of mean from X, i.e, X = X - mu, the new mean becomes 0, so, variance = X^2/N

  • @Jirayu.Kaewprateep
    @Jirayu.Kaewprateep Рік тому

    📺💬 We can use Z cap I instead of Z I
    🧸💬 Does it mean that we do not find all the value from all nodes because probability start from the same the update value is in linear with beta and gamma parameter⁉

  • @yuchenzhao6411
    @yuchenzhao6411 4 роки тому

    How to initialize gamma and beta? (gamma=1, beta=0)?

  • @machinelearning3518
    @machinelearning3518 3 роки тому

    it should be x=x/sigma not sigma^2 ??

  • @keqiaoli4617
    @keqiaoli4617 4 роки тому

    Thanks for the video. I have a dumb question about what do you mean by the hidden unit value? or what does z refer to? It confused me a lot.

    • @alikhansmt
      @alikhansmt 4 роки тому +3

      Hidden unit value refers to the output vector of multiplication of previous layer activation and the weights before applying non linearity to it.
      z = WX+b
      here z = hidden unit values i.e preactivation values of some hidden layer
      W = Weight matrix of that layer (the weight values are what are learnt by the network)
      X= Input to that layer. -> These are the values that are normalized.
      b= bias.

    • @FULLCOUNSEL
      @FULLCOUNSEL 4 роки тому

      z VALUE comes form normal distribution statistical tables....hidden unit are variables in the hidden layers

  • @banipreetsinghraheja8529
    @banipreetsinghraheja8529 6 років тому +1

    "The effect of gamma and beta is to set the mean to whatever you want it to be", you forgot to mention variance. Should've been that the effect of gamma and beta is to set the mean and variance to whatever you want it to be.

  • @jandresjn
    @jandresjn 4 роки тому

    Excelent presentation, terrible letter :v