Softmax Regression (C2W3L08)
Вставка
- Опубліковано 21 гру 2024
- Take the Deep Learning Specialization: bit.ly/2xdG0Et
Check out all our courses: www.deeplearni...
Subscribe to The Batch, our weekly newsletter: www.deeplearni...
Follow us:
Twitter: / deeplearningai_
Facebook: / deeplearninghq
Linkedin: / deeplearningai
thank you master andrew, this was super to the point.
Spotted a wild Jabril in his natural environment!
*Attempts to say Hello*
I don't think I'll ever understand the maths behind this properly, but the fact that I even understood it *sort of* just proves how good he is at teaching... the visual example given helped a tonne
"If you can't explain it simply, you don't understand it well enough. "
Albert Einstein
You definitely know the topic perfectly well! Thanks!
so true
Actually , it was Richard Feynman who said it
no einstein said it, feynman applied it perfectly
This is a common logical fallacy: A implies B is not equivalent to B implies A.
Clarifications about Softmax Regression
Please note that at 4:30, the text for the softmax formulas mixes subscripts "j" and "i", when the subscript should just be the same (just "i") throughout the formula.
I think, in the denominator, its t_j at 4:58
If you use the sigmoid fx for a multiclass problem you have to make a decision boundary for each class against the others (1-vs-all algo), and then you get independent probabilities. How were the decision boundaries for the examples 10:33 calculated considering you had no hidden layers? (how were the line equations found?)
Thank you, M.r Andrew, for always sharing your knowledge
How were the decision boundaries for the examples 10:33 calculated? (how were the line equations found?)
Make a grid, let's say 100 by 100 points, and in each point calculate the activation. It's actually how he do this in programming exercises.
Thanks man! Then I suppose it's impossible to get a closed-form equation for the decision boundary?
Can you please provide link to the video where he uses the method you described here?
please let me know if you got it
At 9:32 forward, what’s the reason the decision boundaries are all linear, when the softmax function is NOT itself a linear function? The softmax function uses, after all, an e to the z function in its numerator, which function is certainly not linear! So, similarly, why is it that Andrew says at 11:16 that only when you add hidden layers do you end up with non-linear decision boundaries?
Fantastic, I love how easy it was to understand the material that was presented. If you have a donation page, please let me know!
with "t" when you say normailsed, the probabilities would be different, wouldnt it? You would have to use 1 / sqr (t.t) in front of the 4x1 matrix and convert it into a unit matrix. Then use the dot products of the element to work out the probablities of each which will still work out to 1. Or is this wrong
It should be tj not ti at 4:33 right?
yes :)
I believe that under the sigma function it should be i=4
Is it possible to show the softmax activation function graphically? If so, please provide
It seems like the largest number is still selected as the predicted solution i.e. 5, so I'm confused by what the purpose is of softmax when you could select the largest value instead? Wouldn't that effectively translate to the class with the largest probability anyway?
Late reply but for anyone wonder. Yes you are correct, for prediction purposes it makes no difference (though still maybe you would like to see the "probabilities" generated), but you are correct that just the the max will be chosen as the "predicted solution". This however is not true for training. When training you need to be able to measure "how wrong" you were. This is where the softmax function comes in to give probabilities, which you can calculate loss from, and also calculate a derivative to update the weights.
번역해주신 분 누구신지 몰라서 동서남북으로 절했습니다. 너무 감사합니다.. 덕분에 훌륭한 강의 보고 갑니다
Soft and soothing....kind of like the Bob Ross of machine learning.
4:03 (4, 1) or (1, 4)?
i think at time 4.30,sum indices should be j=0 to j=3. if i m wrong,
pls correct me.
This makes definitely more sense to me. The same applies for the sum written in 5:00
So clear!
That was very helpful, thank you!
What is W^L at 3:56 ?
Weight matrix for layer L , i.e the last layer.
Great lesson !!!! Very useful.
Where do the numbers in the Z vector come from?
just assumption
Kunhong YU sure, but i mean intuitively, what do they represent?
Similar to simple Logistic Regression, Softmax just adds more output units rather than one. For logistic regression, output unit is just a 1 dimensional vector computing input X's linear "score", if it's larger than zero, then its label is 1 and vice versa. Softmax is like training multiple binary classifiers simultaneously, for a sample, each element in Z is also a "score", largest score denotes that sample may have corresponding label.
@derrikbosse, the Z vector identified here has 3 arguments: W{L}, A{L-1}, and B{L}. W{L} is the vector of weights within the last layer of the network. A{L-1} is the vector of outputs from the penultimate layer of the network. B{L} is the bias vector from the last layer of the network. If none of this makes any sense, check out Professor Ng's earlier discussion of logistic regression, which is the simplest kind of neural network. That helped me make sense of this presentation: ua-cam.com/video/hjrYrynGWGA/v-deo.html
could someone please explain the roles of the 'blocks-of-color' and the 'colored-in-circles' represent ?
colored-in-circles are actual training data/values.After training model on training data,model predicted all decision boundaries.so color of blocks are blocks predicted by model.
How do you calculate so quickly? I don't see a calculator on your table.
Actually I cal it for MR.NG hh
U r best
Greqt exoplamantion.Ned to wtch again
⚘⚘⚘⚘⚘