so for anyone wondering. there is an issue in the derivative of this one! It is not as simple as I stated. I was a bit younger when I did this so please have mercy with me :) The rest should still be correct.
Amazing explanation, I loved the fact you took some example numbers, did the calculations, and showed how to values are modified by the function. Really got the point home. Keep it up!
Damn, why couldn't everyone explain it like this? I am dumb and I need an explanation like I am a 5-year-old but most of the explanations on the internet assume that we are all smart as fuck. Thank you!
I think what you mean in the last part about the difference between using a Sigmoid or a Softmax for classification is that for a binary classification problem you only need the probabilities of the two outcomes and a threshold, let's say if I predict A with over 50% probability then my prediction is A, otherwise my prediction is B. For a multi-classification task however you want to normalize over all possible outcomes to obtain a prediction probability for each class
I love your videos!!!!! It helped me create my very first AI ever! Ur tutorials are so concise! I was wondering, if you know how to do it, could you make a tutorial on q learning in Java, then deep q learning in Java, the deep q learning being something I have been struggling with implementation
What part are you struggling with? I have implemented it. Basically you should first implement a version with a table without neural networks. After that, you replace the table with a neural network and add a replay buffer. I have code which works. You can also add me on discord (Luecx@0540) and we can talk about it in detail
@@finneggers6612 I have the concept down for the basic q learning, however, I cannot figure out how to even begin to train the AI, like what inputs to give it and how to train the AI with the reward and punishment. I tried to send a friend request to chat with u on discord about it, but it didn't work, I can give u my discord real quick KRYSTOS THE OVERLORD#4864
I wonder in which cases it's advantageous to use softmax over using percentages of the total sum? Numerically it seems softmax is good for separating big values from smaller ones: EDIT: **googles** Apparently is exactly that. To make high values more evident.
Thanks, Finn- helpful! I don't think you mention why you need the exponential functions in the Softmax definition. If you showed some negative example values as components of your a-vector (totally legitimate outputs of e.g. a layer with a tanh activation function) it would be easier to see that without them, you wouldn't be guaranteed probabilities bounded by zero and one.
You are correct! I did not think about that when I made the video but your critic is 100% correct. Thank you for pointing this one out. the exponential function has a nice derivative behaviour and the output value is always > 0. Everything else would not make sense in this context.
thanks for the explanation Finn!! I have a question. whenever ı google "derivative ofsoftmax function ı always find something like this " eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/ ". I am working on a project that is about pure java implementation of multi Layered NN can you help me how can ı use derivative of softmaxfunction.
the problem of softmax is about the derivative form here math.stackexchange.com/questions/945871/derivative-of-softmax-loss-function?rq=1 u must consider two case 1. i not equal to k and 2. i equal to k I have calculate it by myself but I m not sure if it is right can u go through it once?
my understanding is that we use e because it doesn't change the probability by much as opposed to multiplying by a constant such as 100. it's a form of normalizing data. check out ua-cam.com/video/8ps_JEW42xs/v-deo.html - he goes deeper into the actual use of euler's constant e
Congrats on making this so simple to understand, you actualy know what the function does. I sometimes wonder if people actually even understand the content they reproduce or are just too lazy to try to put things in a way others can understand. Einstein did famously say: "If you can't explain it simply you don't understand it well enough"
I am not 100% sure but maybe its a combination of: "The derivative is pretty simple" and "We need something exponential". The latter one so that probabilities make a little bit more sense
so for anyone wondering. there is an issue in the derivative of this one! It is not as simple as I stated. I was a bit younger when I did this so please have mercy with me :) The rest should still be correct.
The best explanation I've heard of.
Thanks for the most concise and forthright explanation of the softmax activation function I've ever seen!
what are you taking about? i can barely understand what he's saying.
Amazing explanation, I loved the fact you took some example numbers, did the calculations, and showed how to values are modified by the function. Really got the point home. Keep it up!
Good explanation 10/10
Damn, why couldn't everyone explain it like this? I am dumb and I need an explanation like I am a 5-year-old but most of the explanations on the internet assume that we are all smart as fuck. Thank you!
Thank you so much for the explanation!
Great, found the difference between softmax and sigmoid. Thanks
GREAT VIDEO ,Thank you !
At 5:30 the light bulb went on. THANK YOU! :)
Excellent explanation.
Thanks, easy to follow👏🏾👏🏾
I Love it :) thanks
Perfect ✌🏻✌🏻
Excellent explanation
I think what you mean in the last part about the difference between using a Sigmoid or a Softmax for classification is that for a binary classification problem you only need the probabilities of the two outcomes and a threshold, let's say if I predict A with over 50% probability then my prediction is A, otherwise my prediction is B. For a multi-classification task however you want to normalize over all possible outcomes to obtain a prediction probability for each class
yeah exactly. I might not have pointed that out good enough
This (3:20) little thing tells a lot about you and is the way to reach more subs. Thanks!
You just got a new like and subscriber
Super excellent
thanks a lot
I love your videos!!!!! It helped me create my very first AI ever! Ur tutorials are so concise!
I was wondering, if you know how to do it, could you make a tutorial on q learning in Java, then deep q learning in Java, the deep q learning being something I have been struggling with implementation
What part are you struggling with? I have implemented it.
Basically you should first implement a version with a table without neural networks.
After that, you replace the table with a neural network and add a replay buffer.
I have code which works.
You can also add me on discord (Luecx@0540) and we can talk about it in detail
@@finneggers6612 I have the concept down for the basic q learning, however, I cannot figure out how to even begin to train the AI, like what inputs to give it and how to train the AI with the reward and punishment. I tried to send a friend request to chat with u on discord about it, but it didn't work, I can give u my discord real quick KRYSTOS THE OVERLORD#4864
There’s no problem with sigmoid , all activation functions have their uses
at @2:06, there seems to be a typo. x should be [0,1,2,3,4,5] instead of [1,2,3,4,5,6]. f(0) = 1/1+e(0) = 0.5; f(1) !=0.5
Yeah you are right. My bad. Thanks for noticing!
I wonder in which cases it's advantageous to use softmax over using percentages of the total sum? Numerically it seems softmax is good for separating big values from smaller ones:
EDIT: **googles** Apparently is exactly that. To make high values more evident.
respect
Thanks, Finn- helpful! I don't think you mention why you need the exponential functions in the Softmax definition. If you showed some negative example values as components of your a-vector (totally legitimate outputs of e.g. a layer with a tanh activation function) it would be easier to see that without them, you wouldn't be guaranteed probabilities bounded by zero and one.
You are correct! I did not think about that when I made the video but your critic is 100% correct. Thank you for pointing this one out.
the exponential function has a nice derivative behaviour and the output value is always > 0. Everything else would not make sense in this context.
thanks for the explanation.
thanks for the explanation Finn!! I have a question. whenever ı google "derivative ofsoftmax function ı always find something like this " eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/ ". I am working on a project that is about pure java implementation of multi Layered NN can you help me how can ı use derivative of softmaxfunction.
the problem of softmax is about the derivative
form here math.stackexchange.com/questions/945871/derivative-of-softmax-loss-function?rq=1
u must consider two case
1. i not equal to k and
2. i equal to k
I have calculate it by myself but I m not sure if it is right can u go through it once?
why do we use e?
my understanding is that we use e because it doesn't change the probability by much as opposed to multiplying by a constant such as 100.
it's a form of normalizing data. check out ua-cam.com/video/8ps_JEW42xs/v-deo.html - he goes deeper into the actual use of euler's constant e
Congrats on making this so simple to understand, you actualy know what the function does. I sometimes wonder if people actually even understand the content they reproduce or are just too lazy to try to put things in a way others can understand. Einstein did famously say: "If you can't explain it simply you don't understand it well enough"
well I hope I explained it right. I was a lot younger and I feel like the derivative might not be that simple.... Hope its still remotely correct :)
why do we use "e"?
I am not 100% sure but maybe its a combination of: "The derivative is pretty simple" and "We need something exponential". The latter one so that probabilities make a little bit more sense
what is "e" ? how i can get the "e" value ?..
in 4.24 Minute
privacy private It’s probably Eulers number. It’s about 2.7 but in every programming language it should be defined somewhere.
@@finneggers6612 Thank you