Deep L-Layer Neural Network (C1W4L01)

Activation Functions (C1W3L06)

Other Regularization Methods (C2W1L08)

Комаровский. Когда конец войны, Трамп не поможет, потеря Украины, эмиграция, многоженство в Украине

Рабочий способ бросить вредную привычку

Random Initialization (C1W3L11)

DeepLearningAI

Переглядів 51 141

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 29 гру 2024
Take the Deep Learning Specialization: bit.ly/2IavakT
Check out all our courses: www.deeplearni...
Subscribe to The Batch, our weekly newsletter: www.deeplearni...
Follow us:
Twitter: / deeplearningai_
Facebook: / deeplearninghq
Linkedin: / deeplearningai

КОМЕНТАРІ • 28

@swfsql Рік тому ⁺¹
I have noted that my models would not converge nicely (last assignment from C1W4, 3 ReLU + 1 sigmoid layers) when comparing to a notebook reference that I'm following.
If I just initialized my weights from a normal distribution, the cost would get stuck at a high value. I've tried scaling the weights, changing to a uniform distribution, changing the learning rate to various values, nothing worked.
Then following your code, I saw that if I divided the weights for each layer according to the sqrt of the number of features to that layer, then it would start converging beautifully. Would be interesting to know why!
Thanks for your lessons!
@sangwoohan1177 5 років тому ⁺⁵
That random korean subtitle tho...
@RealMcDudu 5 років тому
If you use tanh activation function you have an even bigger problem - the gradients will always be equal to zero, and no learning is feasible (not even a disabled - all weights go in the same direction - learning).
@arthurkalb1817 2 роки тому
It seems like the most general statement of the solution is that the coefficients must form full rank matrices.
@X_platform 7 років тому ⁺⁵
Since we are using leaky ReLU for most cases now, should we initiate weights as extreme as possible so when back propagation take place, they will have higher chance to land in different local extremas?
@wolfisraging 7 років тому
kiryu nil, what do u mean by 'as extreme'??
@X_platform 7 років тому
Using tf.random_normal, to set high standard deviation*
@wolfisraging 7 років тому ⁺¹
kiryu nil, well I think the best way to initialize the weights is by using xaveir initializer, from my observations, its the best way, i think that is why the default initializer in tensorflow is this
@dhirajupadhyay01 5 років тому
@@wolfisraging But why? (if you could explain)
@wolfisraging 5 років тому
@@dhirajupadhyay01 , In short, it helps signals reach deep into the network.
If the weights in a network start too small, then the signal shrinks as it passes through each layer until it’s too tiny to be useful.
If the weights in a network start too large, then the signal grows as it passes through each layer until it’s too massive to be useful.
Xavier initialization makes sure the weights are ‘just right’, keeping the signal in a reasonable range of values through many layers.
@sakshipathak1855 3 роки тому
from where can we access the practice questions?
@jagadeeshkumarm3333 6 років тому ⁺¹
what is the best choice for learning rate(alpha)...?
@byteofcnn3519 4 роки тому
@Amey Paranjape can the learning rate be learned?or is it meaningful to do so?
@acidtears 4 роки тому
@@byteofcnn3519 how would you learn what the perfect learning rate is? It makes sense to initialize it at 0.01 as that rate is similar to the pace of learning in humans (tiny changes over time).
@danielchin1259 5 років тому ⁺¹
**UNSTABLE EQUILIBRIUM**
@jessicajiang3781 5 років тому
can anyone explain why gradient descent study slow when the slope is 0 (flat)? arent we are trying to find the max and min in this function? thanks
@AllmohtarifeBlogspot 4 роки тому
As you can see here www.desmos.com/calculator/hzsiwhfmdw
When x is too large sigmoid(x) is ~flat. thus, the derivative~=0
and when we have a very small gradient/derivative we're going to make very small steps towards the minimum which means a slow learning
@shubhamsaha7887 4 роки тому
If W = 0, B = 0, then A = 0. Similarly all vectors should be zero. Isn't it?
@anarbay24 4 роки тому ⁺¹
I also think all nodes should be equal to zero. Interestingly, though, Andrew never mentions that property.
@zql7351 4 роки тому
Yes. But the symmetry can also explain why we cannot initialize all weights to an nonzero same value.
@benjaminbraun9371 2 роки тому ⁺¹
I would say, that it depends on the chosen activation function
@saanvisharma2081 6 років тому
Best activation function???
@ahmed_nyc 5 років тому
ReLu
@shahbazquraishy143 5 років тому
Could have been a shorter video....

Наступне

Автоматичне відтворення

Deep L-Layer Neural Network (C1W4L01)

Deep L-Layer Neural Network (C1W4L01)

Activation Functions (C1W3L06)

Activation Functions (C1W3L06)

Other Regularization Methods (C2W1L08)

Other Regularization Methods (C2W1L08)

Комаровский. Когда конец войны, Трамп не поможет, потеря Украины, эмиграция, многоженство в Украине

Комаровский. Когда конец войны, Трамп не поможет, потеря Украины, эмиграция, многоженство в Украине

Рабочий способ бросить вредную привычку

Рабочий способ бросить вредную привычку

Дал Свою Безлимитную Карту Друзьям, Потратили Миллионы... (Хазяева, Кокошка, Дилблин, Сатир)

Дал Свою Безлимитную Карту Друзьям, Потратили Миллионы... (Хазяева, Кокошка, Дилблин, Сатир)

Weight Initialization in a Deep Network (C2W1L11)

Weight Initialization in a Deep Network (C2W1L11)

MIT Introduction to Deep Learning | 6.S191

MIT Introduction to Deep Learning | 6.S191

Backpropagation Intuition (C1W3L10)

Backpropagation Intuition (C1W3L10)

RMSProp (C2W2L07)

RMSProp (C2W2L07)

Why Neural Networks can learn (almost) anything

Why Neural Networks can learn (almost) anything

Recurrent Neural Networks (RNNs), Clearly Explained!!!

Recurrent Neural Networks (RNNs), Clearly Explained!!!

Layer Normalization - EXPLAINED (in Transformer Neural Networks)

Layer Normalization - EXPLAINED (in Transformer Neural Networks)

Why Deep Representations? (C1W4L04)

Why Deep Representations? (C1W4L04)

Regularization (C2W1L04)

Regularization (C2W1L04)

Дал Свою Безлимитную Карту Друзьям, Потратили Миллионы... (Хазяева, Кокошка, Дилблин, Сатир)

Дал Свою Безлимитную Карту Друзьям, Потратили Миллионы... (Хазяева, Кокошка, Дилблин, Сатир)

Этот бой - Самое большое РАЗОЧАРОВАНИЕ за всю КАРЬЕРУ БУАКАВА!

Этот бой - Самое большое РАЗОЧАРОВАНИЕ за всю КАРЬЕРУ БУАКАВА!

🤔Можно ли спастись от Ядерки в Холодильнике ? #shorts

🤔Можно ли спастись от Ядерки в Холодильнике ? #shorts

СИНИЙ ИНЕЙ УЖЕ ВЫШЕЛ!❄️

СИНИЙ ИНЕЙ УЖЕ ВЫШЕЛ!❄️

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

«Шнурки не зрізайте, акуратненько»: медик про реакцію військових на поранення #shorts

«Шнурки не зрізайте, акуратненько»: медик про реакцію військових на поранення #shorts

Что будет если украсть в магазине шоколадку 🍫

Что будет если украсть в магазине шоколадку 🍫

ПРАНК НАД БОЯРСКИМ | КОНФЛИКТ НА ДОРОГЕ

ПРАНК НАД БОЯРСКИМ | КОНФЛИКТ НА ДОРОГЕ