Softmax Regression (C2W3L08)

DeepLearningAI

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 21 гру 2024
Take the Deep Learning Specialization: bit.ly/2xdG0Et
Check out all our courses: www.deeplearni...
Subscribe to The Batch, our weekly newsletter: www.deeplearni...
Follow us:
Twitter: / deeplearningai_
Facebook: / deeplearninghq
Linkedin: / deeplearningai

КОМЕНТАРІ • 48

@Jabrils 7 років тому ⁺⁷²
thank you master andrew, this was super to the point.
@roniquinonez9715 6 років тому ⁺⁹
Spotted a wild Jabril in his natural environment!
*Attempts to say Hello*
@HZLTV Рік тому ⁺⁷
I don't think I'll ever understand the maths behind this properly, but the fact that I even understood it *sort of* just proves how good he is at teaching... the visual example given helped a tonne
@mariusz2313 6 років тому ⁺¹¹¹
"If you can't explain it simply, you don't understand it well enough. "
Albert Einstein
You definitely know the topic perfectly well! Thanks!
@Jonn123save 6 років тому ⁺¹
so true
@roboticsresources9680 6 років тому ⁺¹
Actually , it was Richard Feynman who said it
@bornslippy9109 6 років тому ⁺¹
no einstein said it, feynman applied it perfectly
@est9949 4 роки тому ⁺¹
This is a common logical fallacy: A implies B is not equivalent to B implies A.
@manuel783 3 роки тому ⁺²
Clarifications about Softmax Regression
Please note that at 4:30, the text for the softmax formulas mixes subscripts "j" and "i", when the subscript should just be the same (just "i") throughout the formula.
@84xyzabc 4 роки тому ⁺⁵
I think, in the denominator, its t_j at 4:58
@camilaonofri2624 4 роки тому ⁺¹
If you use the sigmoid fx for a multiclass problem you have to make a decision boundary for each class against the others (1-vs-all algo), and then you get independent probabilities. How were the decision boundaries for the examples 10:33 calculated considering you had no hidden layers? (how were the line equations found?)
@mortezaabdipour5584 6 років тому ⁺²
Thank you, M.r Andrew, for always sharing your knowledge
@stefandimeski8569 6 років тому ⁺⁵
How were the decision boundaries for the examples 10:33 calculated? (how were the line equations found?)
@grez911 6 років тому ⁺²
Make a grid, let's say 100 by 100 points, and in each point calculate the activation. It's actually how he do this in programming exercises.
@stefandimeski8569 6 років тому
Thanks man! Then I suppose it's impossible to get a closed-form equation for the decision boundary?
Can you please provide link to the video where he uses the method you described here?
@camilaonofri2624 4 роки тому
please let me know if you got it
@wifi-YT 4 роки тому
At 9:32 forward, what’s the reason the decision boundaries are all linear, when the softmax function is NOT itself a linear function? The softmax function uses, after all, an e to the z function in its numerator, which function is certainly not linear! So, similarly, why is it that Andrew says at 11:16 that only when you add hidden layers do you end up with non-linear decision boundaries?
@jjjj_111 6 років тому ⁺²
Fantastic, I love how easy it was to understand the material that was presented. If you have a donation page, please let me know!
@fjficm 3 роки тому
with "t" when you say normailsed, the probabilities would be different, wouldnt it? You would have to use 1 / sqr (t.t) in front of the 4x1 matrix and convert it into a unit matrix. Then use the dot products of the element to work out the probablities of each which will still work out to 1. Or is this wrong
@boratsagdiev6486 5 років тому ⁺⁴
It should be tj not ti at 4:33 right?
@mathiasgustum858 4 роки тому
yes :)
@lucylu2530 4 роки тому
I believe that under the sigma function it should be i=4
@AdarshMahabubnagar 2 роки тому
Is it possible to show the softmax activation function graphically? If so, please provide
@benw4361 5 років тому ⁺¹
It seems like the largest number is still selected as the predicted solution i.e. 5, so I'm confused by what the purpose is of softmax when you could select the largest value instead? Wouldn't that effectively translate to the class with the largest probability anyway?
@michael3698bear 3 роки тому ⁺²
Late reply but for anyone wonder. Yes you are correct, for prediction purposes it makes no difference (though still maybe you would like to see the "probabilities" generated), but you are correct that just the the max will be chosen as the "predicted solution". This however is not true for training. When training you need to be able to measure "how wrong" you were. This is where the softmax function comes in to give probabilities, which you can calculate loss from, and also calculate a derivative to update the weights.
@scl144 5 років тому ⁺⁷
번역해주신 분 누구신지 몰라서 동서남북으로 절했습니다. 너무 감사합니다.. 덕분에 훌륭한 강의 보고 갑니다
@leagueofotters2774 3 роки тому
Soft and soothing....kind of like the Bob Ross of machine learning.
@usf5914 3 роки тому
4:03 (4, 1) or (1, 4)?
@yilmazbingol6953 6 років тому ⁺²
i think at time 4.30,sum indices should be j=0 to j=3. if i m wrong,
pls correct me.
@mathias8137 6 років тому
This makes definitely more sense to me. The same applies for the sum written in 5:00
@Jack-dx7qb 7 років тому ⁺²
So clear!
@ziku8910 3 роки тому
That was very helpful, thank you!
@gavin8535 4 роки тому
What is W^L at 3:56 ?
@aayushpaudel2379 4 роки тому
Weight matrix for layer L , i.e the last layer.
@mariabardas2568 4 роки тому
Great lesson !!!! Very useful.
@derrik-bosse 7 років тому ⁺¹
Where do the numbers in the Z vector come from?
@kunhongyu5053 7 років тому ⁺¹
just assumption
@derrik-bosse 7 років тому
Kunhong YU sure, but i mean intuitively, what do they represent?
@kunhongyu5053 7 років тому
Similar to simple Logistic Regression, Softmax just adds more output units rather than one. For logistic regression, output unit is just a 1 dimensional vector computing input X's linear "score", if it's larger than zero, then its label is 1 and vice versa. Softmax is like training multiple binary classifiers simultaneously, for a sample, each element in Z is also a "score", largest score denotes that sample may have corresponding label.
@DouglasDuhaime 7 років тому ⁺¹
@derrikbosse, the Z vector identified here has 3 arguments: W{L}, A{L-1}, and B{L}. W{L} is the vector of weights within the last layer of the network. A{L-1} is the vector of outputs from the penultimate layer of the network. B{L} is the bias vector from the last layer of the network. If none of this makes any sense, check out Professor Ng's earlier discussion of logistic regression, which is the simplest kind of neural network. That helped me make sense of this presentation: ua-cam.com/video/hjrYrynGWGA/v-deo.html
@marcostavarez2702 6 років тому
could someone please explain the roles of the 'blocks-of-color' and the 'colored-in-circles' represent ?
@IrfanAhmad-od2sn 5 років тому
colored-in-circles are actual training data/values.After training model on training data,model predicted all decision boundaries.so color of blocks are blocks predicted by model.
@grez911 6 років тому
How do you calculate so quickly? I don't see a calculator on your table.
@chancychan7175 5 років тому
Actually I cal it for MR.NG hh
@wolfisraging 7 років тому
U r best
@sandipansarkar9211 4 роки тому
Greqt exoplamantion.Ned to wtch again
@fatimahmath4819 5 років тому
⚘⚘⚘⚘⚘

Наступне

Автоматичне відтворення