Great video! Some points about English to help you improve: You should say "how it looks" instead of "how does it look like". If you want to include the "like", you can say "what it looks like" instead.
Great video. I would recommend plus 2 level students who can see how basic calculus is used in AI in later stage. Besides, these would serve as good exercises
relu, leaky relu, swish seem to be an evolution. The issue with relu was that it leaves dead weights. Then the issue with leaky relu was its discontinuity. And swish finally fixed all of them. Are relu and leaky relu still useful for anything? Also why was GELU used for language models? Why does GELU work better there than other activation functions?
Thank you for your question. Indeed, ReLU, LeakyReLU and Swish is an evolution. And it is true that ReLU suffers from dead neurons, but still, ReLU and its variants such as LeakyReLU are used in ANNs, specially in computer vision tasks like image segmentation. One advantage of ReLU is its simple and **efficient** computation. As for GELU, some properties of GELU make it suitable for more complex tasks like NLP. For example, its non-monotonic behavior allows the network to capture more complex patterns in text data. But having said these, the choice of activation function heavily depends on the data and the task, and one should experiment with different activation functions to find the best one for a given task.
I have my own activation function that I use, it's Softplus like function it's the integral of (1+tanh(x))/2 which looks like Sigmoid except it's faster in training It's integral is this equation that I call "Rectified Integral Tangent Hyperpolica" RITH for short It's mostly linear for x≥1 which makes it fast in training (x+ln(cosh(x)))/2 I added the term 1/e to center it between 0 and positive infinity
Thanks for the comment, but that depends on the value of alpha. As I mentioned in the video, if alpha =1, the derivative of ELU is continuous (also see the plotted curve corresponds to alpha=1) But if alpha != 1, the derivative will be a discontinuous function
Thanks for the video! Great overview & refresher 🤗
I appreciate the calm, slow and clear voice.
Thanks for the comment. That’s very encouraging to hear!
Extremely helpful thank you very much
Thanks Tom for your nice words, that’s very encouraging!
Great video! Some points about English to help you improve:
You should say "how it looks" instead of "how does it look like". If you want to include the "like", you can say "what it looks like" instead.
Thanks, that’s a good point! I’ll try to remember that for my next video.
Great video. I would recommend plus 2 level students who can see how basic calculus is used in AI in later stage. Besides, these would serve as good exercises
Thank you for this video ❤
i want to practice all optimizers with different activation functions with some maths problems and in python could you please suggest good book
relu, leaky relu, swish seem to be an evolution.
The issue with relu was that it leaves dead weights. Then the issue with leaky relu was its discontinuity. And swish finally fixed all of them.
Are relu and leaky relu still useful for anything?
Also why was GELU used for language models? Why does GELU work better there than other activation functions?
Thank you for your question. Indeed, ReLU, LeakyReLU and Swish is an evolution. And it is true that ReLU suffers from dead neurons, but still, ReLU and its variants such as LeakyReLU are used in ANNs, specially in computer vision tasks like image segmentation. One advantage of ReLU is its simple and **efficient** computation.
As for GELU, some properties of GELU make it suitable for more complex tasks like NLP. For example, its non-monotonic behavior allows the network to capture more complex patterns in text data.
But having said these, the choice of activation function heavily depends on the data and the task, and one should experiment with different activation functions to find the best one for a given task.
Animations are cool.. Have you used manimce or manimgl.?
Thanks for the comment 😊 I have used ManimCE
so far I have never played with ManimGL yet , but will check it out and see if it worth switching
may i know how you make these videos?
I have my own activation function that I use, it's Softplus like function
it's the integral of (1+tanh(x))/2 which looks like Sigmoid except it's faster in training
It's integral is this equation that I call "Rectified Integral Tangent Hyperpolica" RITH for short
It's mostly linear for x≥1 which makes it fast in training
(x+ln(cosh(x)))/2 I added the term 1/e to center it between 0 and positive infinity
Love from India
nope
Cool
❤
Wrong!
The derivativ of the ELU ist a perfect continuous function everywhere even at 0.
ua-cam.com/video/56ZxEmGRt2k/v-deo.html
Thanks for the comment, but that depends on the value of alpha.
As I mentioned in the video, if alpha =1, the derivative of ELU is continuous (also see the plotted curve corresponds to alpha=1)
But if alpha != 1, the derivative will be a discontinuous function