**Note, starting at **1:09**, the denominator of the sigmoid function should be e^x+1 rather than e^(x+1).** Machine Learning / Deep Learning Tutorials for Programmers playlist: ua-cam.com/play/PLZbbT5o_s2xq7LwI2y8_QtvuXZedL6tQU.html Keras Machine Learning / Deep Learning Tutorial playlist: ua-cam.com/play/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL.html
Honestly I was staring at that for the last 3 minutes wondering how that function could possibly be correct. The e^x would cancel out and you would be left with 1/e (a constant). I was seriously scratching my head until I decided to look down hahaha
Thank you for informative video. but I am a bit confused why negative values are ignored. or they do not hold any useful information. if yes why? thank you in advance
i love that we are learning the line of codes immediately after learning the concept rather than learning the code way later after learning the concepts, this is so helpful and make the codes less overwhelming
without a doubt you are one of the best instructor I have ever seen. I absolutely love your channel. well done! and thank you so much for your great help. You saved my life
Wow. Amazing video! You make it easy to understand unlike so many other UA-camrs who (for some reason) enjoy complicating things just to hear themselves talk.
This video really helped me to understand activation functions. Thank you so much for your work. Your videos are, by far, the best resource I've come across regarding these concepts.
I think this topic is so fascinating because it seems odd to me that *so much* inspiration can be drawn from the brain. In any case, your videos are exceptional, and I am so glad that I came across your channel and playlist!
Thank you for creating these videos! They really help me. I do have a book on Artificial Neural Networks, but videos help me get the intuition behind all the mathematical formulae I see! Thank you very much!
I still don't understand why we use activation function. I don't think this was addressed properly in the video. What's wrong with just letting the output be the weighted sum? If it was about binarizing the output to imitate the activation and deactivation of a biological neuron then it would make sense but we don't really do that necessarily, do we? Sigmoid function and ReLU both will give granular values so what exactly was the point of recalling the activation of biological neurons? Disappointingly poor explanation.
Hey Krish, tanh actually depends on the sigmoid function. Specifically, tanh(x) = 2sigmoid(2x) - 1 tanh's output is symmetric around 0, as its range is between -1 and 1. sigmoid, on the otherhand, has asymmetric output, with its range between 0 and 1. During training, this asymmetry can cause the activation output to "saturate" at 0 or 1. If this happens, the net becomes harder to train/converge. I may do a video in the future that discusses the different activation functions in more detail and perhaps compare a few against each other by showing training results of the same net with the same data but using different activation functions to show how each perform relative to each other.
4:10 I don't get why adding another layer of relu makes it equivalent to the code above. Isn't the "relu" layer take weighted sum of the nodes in the previous layer as an input?
One of the most effective activation functions is ReLU. However f(x)=x is connect, f(x)=0 is disconnect. ReLU is a switch. A ReLU net is a switched system of dot products. For a specific input all the switch states become known. If you follow the linear algebra the net actually becomes equivalent to a simple matrix. Which you can examine with various metrics. For noise sensitivity the variance equation for linear combinations of random variables applies to dot products. Anyway that can lead you to fast transform fixed filter bank nets where rather than adjustable dot products (weighted sums) and a fixed activation function, you have fixed dot products (enacted with fast transforms) and adjustable-parametric activation functions. The Walsh Hadamard transform is good.
Dear team, excellent explanation with simplicity. The effective 5 minutes to know it all. Request to know if the team can explain the following ACTIVATION FUNCTIONS TOO: Identity. Binary Step, Tanh, ArcTan, Leaky ReLU, and Softmax.
1:11 where does the six come from? is it a trainable value like a bias, or is it the number of nodes in the active layer, or is it always six? In other words, why is the sigmoid's input limited to six?
Hey Dour - The 6 was just where the graph illustration cut off the x-axis. Sigmoid's input isn't limited to +/-6. Sigmoid can accept all positive and negative numbers. In other words, sigmoid has a domain of all real numbers.
I'm sorry, let me clarify. The point at which your sigmoid begins to return only 1 or 0 is when the input is positive or negative 6. Why is 6 the chosen number for this?
No problem. With the way the graph illustrates the function, it does make it look like an input of +/-6 to sigmoid would return 1 or 0, but it is only a visual representation. Sigmoid is asymptotic to 1 and 0. In other words, as x approaches positive infinity, sigmoid(x) approaches 1. As x approaches negative infinity, sigmoid(x) approaches 0. For example, plug in x=6. We'll see that sigmoid(6) is not quite equal to 1. We have: sigmoid(x) = e^x / [(e^x) +1] sigmoid(6) = e^6 / [(e^6) +1] sigmoid(6) = 0.99753 Let me know if this helps clarify.
Can the same network have different activation functions? (If yes - 4:19 Here the acivation function is applyed to the the layer just defined above right?)
No difference in functionality, just two different options to specify an activation function following a layer. You can either include it as a parameter for a given layer, or you can add it as a "layer" afterwards. This is just for flexibility in building the network, however, note that the activation function isn't technically a "layer."
Excerpt from "Activation Functions in a neural networks explained" section -- deeplizard blog: "...This means that even in very deep neural networks, if we only had linear transformations of our data values during a forward pass, the learned mapping in our network from input to output would also be linear." I get it that if we are using linear transformations on every node (in all the layers), then their combined effect would result in a linear transformation at the output f(a+b) = f(a) + f(b) ... but can you kindly elaborate on this point with some example which can illustrate why this would be undesirable?? also what is meant by the term learned mapping?
Hello, deeplizard, I have a question. This is rather not meant for such public places as the youtube comment section. That's why I tried to find out one of your email addresses or contact you directly. Unfortunately I didn't find anything. Is it possible to contact you directly? With best regards Gabriel
Hi, I would like to know which of the independent variables is much significant or has the highest impact on the dependent variable. My model is a 9-5-1 MLP which I have extracted its weights and biases. However, my concern now is how to use those weights to determine the most relevant input to the least relevant so I can rank. Thank you.
The softmax activation function outputs a probability distribution over the all possible classes. If you have two classes, for example, softmax may output [0.3, 0.7], giving a 30% probability that your sample belongs to class 1 and a 70% probability that your sample belongs to class 2.
We use softmax on the output layer of a classifier network. This will give us a probability distribution over each class. For example, if we had a network that classified images as cat, dog, or lizard, then that would mean we would have three output nodes. When we softmax the output, we will get a probability assigned to each node, and all of these probabilities must sum to 1. So we may have 85% assigned to lizard, 10% to cat, and 5% to dog, for example.
@@deeplizard Thanks for detailed reply. I agree. 1. But could you please tell me, that even normal activation functions like sigmoid functions give probability of class label. then what makes softmax so special? 2. Moreover, can we use deep learning in case if we want to optimize a objective function ? (we have objective function but want to optimize it using deep learning, is it possible? )
Can the activation be just the part of sinus curve in range [-pi, pi]? If x is less than -pi or x is more than pi then y = 0. or y = 2 * sin (x * pi/2) in the range [-2; 2]
Awesome lecture as always. Also, I have sth to ask. Does the activation layer work only for the hidden layer specified above it or does it make all the hidden layers use the same activation function?
Hey Omar - You're welcome! For classification problems, the activation output on the output layer is going to determine what the model classifies the input as.
Hey Srijal - The language is Python, and the API is Keras. This video is part of the following deep learning playlist: ua-cam.com/play/PLZbbT5o_s2xq7LwI2y8_QtvuXZedL6tQU.html And if you're interested in exploring Keras further, I have another playlist that gives tutorials on how to build neural networks, train them, etc. with Keras here: ua-cam.com/play/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL.html
In relu, More positive it is, more activated the node is. For, 1 it'll be activated, for 3 it'll be activated also..! Than what's the difference between different positive values? Node is gonna activated anyway
Yes, that's right-- the more positive the *more* activated the node. There are varying levels of activation. You can think of it as a spectrum of activation levels rather than a binary "on" or "off." You can also change what it means to be activated or not activated. For example, we could choose that a node is meaningfully activated only if the activation output is greater than 10, rather than 0. We can do this by varying the bias. More on bias and how this is done in this video: ua-cam.com/video/HetFihsXSys/v-deo.html
It only means that its activation value (the number that is passed to the following nodes in the network) is higher. Thus it has a greater impact on the activation of the following nodes.
Hey Dragana - The activation value is determined by the weighted sum of the output values from nodes in the previous layer. The weights in the weighted sum are determined by the connection values. For example, suppose we have a layer called layer1 with 2 nodes and a following layer called layer2 with 3 nodes. Then there will be 6 total connections connecting layer1 to layer2 (assuming these layers are dense/fully connected). To find the activation output of a single node in layer2, we must take the weighted sum of the two outputs from the nodes in layer1 (with the weights being the values of the corresponding two connections connecting layer1 nodes to the single node we're focusing on in layer2). This weighted sum is then passed to whatever activation function you're using to give you the activation value. Let me know if this helps clarify.
Greetings Jobson! We have enabled auto-generated English subtitles for all videos. I'm not sure why the three videos you commented on are not showing English subtitles. We will look into this. In the mean time, you can visit the blog pages on deeplizard.com for any of our UA-cam videos. There are fully written lecture notes in English for each video there.
Hey niyatha - Download access to code files and notebooks is available as a perk for the deeplizard hivemind. Check out the details regarding deeplizard perks and rewards at: deeplizard.com/hivemind If you choose to join, you will gain access to download the notebook from this video at the link below: www.patreon.com/posts/code-for-deep-19266563
Yes, good catch Dr S. The pinned comment on this video states the correction, and the corresponding blog for the video below has the correction as well. deeplizard.com/learn/video/m0pIlLfpXWE
@@deeplizard I caught this after the video after seeing your video description--you might consider adding a pop up 'note' on your video window that you made a mistake so everyone catches it during your video. Other than that--really great explanation. Excited to try more of your videos. Subscribed.
By that reasoning, simulating soft body physics also takes no skills. Yes, you can use the library without knowing what you are doing, but you will never reach a higher level of thinking. Keras is important for project creations and much other stuff, where writing your own neural network and training algorithm can actually own a disadvantage rather than an advantage. In the case of specific development, Keras become obsolete as you may need to dig much deeper than what the library has to offer. So yes you can create wonderful projects without using Keras or TensorFlow, and you can use it as a no-brainer, but it's kind of like writing functions to save yourself steps and repetitiveness. Hope that helps.
Hey red - It was set to be that way on the UA-cam backend. I this was a mistake. I've changed it. It looks like it takes some time on UA-cam's side to update. Note that there is also a blog post for each video in this series here: deeplizard.com/learn/video/m0pIlLfpXWE
I was expecting to understand what is non-linearity, because it is said "The purpose of the activation function is to introduce non-linearity into the output of a neuron."
Agree and started zooming in later episodes. In the mean time, you can check the corresponding written blogs for each video at deeplizard.com to get a better view of text contents.
I'm not sure :( They are auto-generated by UA-cam, but it will not auto-generate English for this vid. We'll need to open a support ticket with UA-cam to sort it out. In the mean time, you can visit the written portion of this lesson here: deeplizard.com/learn/video/m0pIlLfpXWE
**Note, starting at **1:09**, the denominator of the sigmoid function should be e^x+1 rather than e^(x+1).**
Machine Learning / Deep Learning Tutorials for Programmers playlist: ua-cam.com/play/PLZbbT5o_s2xq7LwI2y8_QtvuXZedL6tQU.html
Keras Machine Learning / Deep Learning Tutorial playlist: ua-cam.com/play/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL.html
Honestly I was staring at that for the last 3 minutes wondering how that function could possibly be correct. The e^x would cancel out and you would be left with 1/e (a constant). I was seriously scratching my head until I decided to look down hahaha
Thank you for informative video. but I am a bit confused why negative values are ignored. or they do not hold any useful information. if yes why? thank you in advance
@@MadJDMTurboBoost same haha
i love that we are learning the line of codes immediately after learning the concept rather than learning the code way later after learning the concepts, this is so helpful and make the codes less overwhelming
You're my favorite Machine Learning teacher, please keep making such videos.
I just found the best teacher for Machine Learning!! Subscribed!! Thanks Sensei!!
without a doubt you are one of the best instructor I have ever seen. I absolutely love your channel. well done! and thank you so much for your great help. You saved my life
This is the most intuitively understandable explanation I have seen
Thank you!
You are seriousy best and the way you are explaining through codes it helped me to understand ssd code which i took from github.Thank you so much mam
Wow. Amazing video! You make it easy to understand unlike so many other UA-camrs who (for some reason) enjoy complicating things just to hear themselves talk.
This video really helped me to understand activation functions. Thank you so much for your work. Your videos are, by far, the best resource I've come across regarding these concepts.
I think you just saved me from failing my exam, so for this I thank you!!
I've just started ML and thanks to you. Cuz of you I'm able to digest Coursera lectures practically. And finally you've earned a subscriber 👍
Thank you, Saanvi!
Haa na, woh Chinese lodu kya bolta hai sahi me samajh nahi aata 😟
You always come with a fine solution to our doubts. Super explanation. I appreciate it.
I think this topic is so fascinating because it seems odd to me that *so much* inspiration can be drawn from the brain. In any case, your videos are exceptional, and I am so glad that I came across your channel and playlist!
i love how you explain things!! I love your voice too!
Amazing series, very clearly explained ! Congratulations! Looking forward to complete the whole series!
That's literally the best explanation for activation functions I found after a lot of googling! Well done, keep going ♥ 👏
Your explanation seems to have a super close sigmoid=1 on my personal neurons! Awesome explanation and timing of illustration, thanks
Learning from Book didn't understand a neural network thanks to you now i know what is a neural network and how it works
awesome! I've always just took activation functions as a necessary part of ANNs but now I understand what they are there for ! :)
Thank you for creating these videos! They really help me. I do have a book on Artificial Neural Networks, but videos help me get the intuition behind all the mathematical formulae I see! Thank you very much!
Thank you Tymothy! Glad you're finding such value in the content!
I still don't understand why we use activation function. I don't think this was addressed properly in the video. What's wrong with just letting the output be the weighted sum? If it was about binarizing the output to imitate the activation and deactivation of a biological neuron then it would make sense but we don't really do that necessarily, do we? Sigmoid function and ReLU both will give granular values so what exactly was the point of recalling the activation of biological neurons? Disappointingly poor explanation.
excellent and concise. Well done!
Watched this and immediately subscribed, great explanation! Thank you so much for these
Very well explained! Thank you so much
Nicely explained, Thank you :)
Awesome videos. These concepts are so nicely explained.
can you explain about tanh activation function and why tanh is better than sigmoid?
Hey Krish,
tanh actually depends on the sigmoid function. Specifically,
tanh(x) = 2sigmoid(2x) - 1
tanh's output is symmetric around 0, as its range is between -1 and 1.
sigmoid, on the otherhand, has asymmetric output, with its range between 0 and 1. During training, this asymmetry can cause the activation output to "saturate" at 0 or 1. If this happens, the net becomes harder to train/converge.
I may do a video in the future that discusses the different activation functions in more detail and perhaps compare a few against each other by showing training results of the same net with the same data but using different activation functions to show how each perform relative to each other.
deeplizard thank you I, hope you doing such video
Amazing channel
Great videos! im working through the coursera course to get my tensorflow cert and your videos are filling in the gaps.
Yeah I think that her videos are exceptional! By the way, how did the rest of your course go? Did you get your certification?
This is a great explanation, I finally got it thanks to your video!
Glad to hear, khoantum! Thanks for letting me know!
Great and concise explanation of the fundamentals of nn. Thanks so much.
You're very welcome, AI. Thank you!
This work is awesome, keep up the good work!
4:10 I don't get why adding another layer of relu makes it equivalent to the code above. Isn't the "relu" layer take weighted sum of the nodes in the previous layer as an input?
That was super helpful. Thank you.
This is a great explanation
Thanks teacher
The explanation is ❤️🩹
Thank you for these videos! Its so easy to follow your explanations.
One of the most effective activation functions is ReLU. However f(x)=x is connect, f(x)=0 is disconnect. ReLU is a switch. A ReLU net is a switched system of dot products. For a specific input all the switch states become known. If you follow the linear algebra the net actually becomes equivalent to a simple matrix. Which you can examine with various metrics. For noise sensitivity the variance equation for linear combinations of random variables applies to dot products.
Anyway that can lead you to fast transform fixed filter bank nets where rather than adjustable dot products (weighted sums) and a fixed activation function, you have fixed dot products (enacted with fast transforms) and adjustable-parametric activation functions. The Walsh Hadamard transform is good.
really good explanation, thank you!
Question : at 4:24 , is it the addition of activation function or is it addition of activation layer ? 🤔
Dear team, excellent explanation with simplicity. The effective 5 minutes to know it all. Request to know if the team can explain the following ACTIVATION FUNCTIONS TOO: Identity. Binary Step, Tanh, ArcTan, Leaky ReLU, and Softmax.
Hello, thank you for a great explanation! How do we decide which activation function to be used in a layer?
Amazing video
Great video! I have a question: what’s the benefit of adding the activation as an object instead of just part of the dense object?
Thanks For the video
4:21 You haven’t passed any parameters for output layer. How to do that?
thank you a lot for your help
great explanation
1:11 where does the six come from? is it a trainable value like a bias, or is it the number of nodes in the active layer, or is it always six? In other words, why is the sigmoid's input limited to six?
Hey Dour - The 6 was just where the graph illustration cut off the x-axis. Sigmoid's input isn't limited to +/-6. Sigmoid can accept all positive and negative numbers. In other words, sigmoid has a domain of all real numbers.
I'm sorry, let me clarify. The point at which your sigmoid begins to return only 1 or 0 is when the input is positive or negative 6. Why is 6 the chosen number for this?
No problem. With the way the graph illustrates the function, it does make it look like an input of +/-6 to sigmoid would return 1 or 0, but it is only a visual representation. Sigmoid is asymptotic to 1 and 0. In other words, as x approaches positive infinity, sigmoid(x) approaches 1. As x approaches negative infinity, sigmoid(x) approaches 0.
For example, plug in x=6. We'll see that sigmoid(6) is not quite equal to 1.
We have:
sigmoid(x) = e^x / [(e^x) +1]
sigmoid(6) = e^6 / [(e^6) +1]
sigmoid(6) = 0.99753
Let me know if this helps clarify.
{
"question": "Activation functions define neuron _______________ based on _______________",
"choices": [
"output, input",
"weights, reward signals",
"connectivity, prior neurons",
"biases, weights"
],
"answer": "output, input",
"creator": "Chris",
"creationDate": "2019-12-07T02:56:48.362Z"
}
Added to the site! Thank you :)
Can the same network have different activation functions?
(If yes - 4:19 Here the acivation function is applyed to the the layer just defined above right?)
yes and yes
@@deeplizard thanks lizard!
this video was really good
What's the difference between the activation function and a separate activation layer? at 4:20
No difference in functionality, just two different options to specify an activation function following a layer. You can either include it as a parameter for a given layer, or you can add it as a "layer" afterwards. This is just for flexibility in building the network, however, note that the activation function isn't technically a "layer."
Excerpt from "Activation Functions in a neural networks explained" section -- deeplizard blog:
"...This means that even in very deep neural networks, if we only had linear transformations of our data values during a forward pass, the learned mapping in our network from input to output would also be linear."
I get it that if we are using linear transformations on every node (in all the layers), then their combined effect would result in a linear transformation at the output
f(a+b) = f(a) + f(b)
... but can you kindly elaborate on this point with some example which can illustrate why this would be undesirable??
also what is meant by the term learned mapping?
Thanks a lot mam! This helped a lot
I'm glad to hear that, Md.Yasin Arafat Yen! You're welcome!
Very helpful, thanks!
Simple, but it helped me out a lot!
Hello, deeplizard,
I have a question. This is rather not meant for such public places as the youtube comment section. That's why I tried to find out one of your email addresses or contact you directly. Unfortunately I didn't find anything. Is it possible to contact you directly?
With best regards
Gabriel
Perfect Explanation .
thank you
Thank you so much :) Keep making such amazing videos please.
thank you! I love you!
Hi, I would like to know which of the independent variables is much significant or has the highest impact on the dependent variable. My model is a 9-5-1 MLP which I have extracted its weights and biases. However, my concern now is how to use those weights to determine the most relevant input to the least relevant so I can rank. Thank you.
Can you describe what Softmax function does, when and why to use it?
i tried looking through the blog couldn't find much on softmax there.
The softmax activation function outputs a probability distribution over the all possible classes. If you have two classes, for example, softmax may output [0.3, 0.7], giving a 30% probability that your sample belongs to class 1 and a 70% probability that your sample belongs to class 2.
Sigmoid or ReLU which one to use
thank you very much
Nice Explanation. What is softmax and why it is widely used in Deep learning instead of general activation functions?
We use softmax on the output layer of a classifier network. This will give us a probability distribution over each class. For example, if we had a network that classified images as cat, dog, or lizard, then that would mean we would have three output nodes. When we softmax the output, we will get a probability assigned to each node, and all of these probabilities must sum to 1. So we may have 85% assigned to lizard, 10% to cat, and 5% to dog, for example.
@@deeplizard Thanks for detailed reply. I agree.
1. But could you please tell me, that even normal activation functions like sigmoid functions give probability of class label. then what makes softmax so special?
2. Moreover, can we use deep learning in case if we want to optimize a objective function ? (we have objective function but want to optimize it using deep learning, is it possible? )
Thanks
Finally understand sigmoid
Extremely! Wonderful Explanation
Is there any industry benchmark that Sigmoid is better than Relu or Vice Versa?
Hi, I found this from somewhere "Because of the horizontal line in ReLu for when x
Can the activation be just the part of sinus curve in range [-pi, pi]? If x is less than -pi or x is more than pi then y = 0.
or y = 2 * sin (x * pi/2) in the range [-2; 2]
Awesome lecture as always.
Also, I have sth to ask. Does the activation layer work only for the hidden layer specified above it or does it make all the hidden layers use the same activation function?
The activation function is specified per layer.
Activation function happens on the second half of a node. So it might be miss leading to show all the weight pointing to the next hidden layer.
first and foremost,
thank you for amazing explain,
Hey Omar - You're welcome! For classification problems, the activation output on the output layer is going to determine what the model classifies the input as.
Amazing tutorials!
Great tutorial, but more importantly without the activation functions the neural networks won't be able to learn non-linear separation plane.
Thank you so much
You're welcome, muneeb!
great video Tnx!
Which programming language are you using?
Hey Srijal - The language is Python, and the API is Keras.
This video is part of the following deep learning playlist:
ua-cam.com/play/PLZbbT5o_s2xq7LwI2y8_QtvuXZedL6tQU.html
And if you're interested in exploring Keras further, I have another playlist that gives tutorials on how to build neural networks, train them, etc. with Keras here:
ua-cam.com/play/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL.html
thanks a lot @deeplizard
what do you mean by an activated neuron?
In relu, More positive it is, more activated the node is. For, 1 it'll be activated, for 3 it'll be activated also..! Than what's the difference between different positive values? Node is gonna activated anyway
Yes, that's right-- the more positive the *more* activated the node. There are varying levels of activation. You can think of it as a spectrum of activation levels rather than a binary "on" or "off." You can also change what it means to be activated or not activated. For example, we could choose that a node is meaningfully activated only if the activation output is greater than 10, rather than 0. We can do this by varying the bias. More on bias and how this is done in this video: ua-cam.com/video/HetFihsXSys/v-deo.html
but what does "more activated" actually mean?
It only means that its activation value (the number that is passed to the following nodes in the network) is higher. Thus it has a greater impact on the activation of the following nodes.
@@deeplizard is the level of activation actually the value of the weight of a connection?
Hey Dragana - The activation value is determined by the weighted sum of the output values from nodes in the previous layer. The weights in the weighted sum are determined by the connection values.
For example, suppose we have a layer called layer1 with 2 nodes and a following layer called layer2 with 3 nodes. Then there will be 6 total connections connecting layer1 to layer2 (assuming these layers are dense/fully connected).
To find the activation output of a single node in layer2, we must take the weighted sum of the two outputs from the nodes in layer1 (with the weights being the values of the corresponding two connections connecting layer1 nodes to the single node we're focusing on in layer2). This weighted sum is then passed to whatever activation function you're using to give you the activation value.
Let me know if this helps clarify.
Love your voice
Thanks!
very nice
Note that the first sigmoid function is wrong. The denominator should be e^x+1 rather than e^(x+1).
Nice catch! Thanks for letting me know. I've pinned a comment to this video stating the correct representation of the denominator.
what is a meaning of input_shape
You saved my tommorow's ML exam.
Whoop! Let me know how it goes. Good luck!
@@deeplizard Great!!! And guess what, I got the same question. Thanks a lot.
Hello, I can't activate English subtitles. Please provide the subtitles.
Greetings Jobson! We have enabled auto-generated English subtitles for all videos. I'm not sure why the three videos you commented on are not showing English subtitles. We will look into this. In the mean time, you can visit the blog pages on deeplizard.com for any of our UA-cam videos. There are fully written lecture notes in English for each video there.
can i access this jupyter notebook?
Hey niyatha - Download access to code files and notebooks is available as a perk for the deeplizard hivemind. Check out the details regarding deeplizard perks and rewards at: deeplizard.com/hivemind
If you choose to join, you will gain access to download the notebook from this video at the link below:
www.patreon.com/posts/code-for-deep-19266563
@1:09 Sigmoid function (e^x)/(1 + e^x) -- video should be edited
Yes, good catch Dr S. The pinned comment on this video states the correction, and the corresponding blog for the video below has the correction as well.
deeplizard.com/learn/video/m0pIlLfpXWE
@@deeplizard I caught this after the video after seeing your video description--you might consider adding a pop up 'note' on your video window that you made a mistake so everyone catches it during your video. Other than that--really great explanation. Excited to try more of your videos. Subscribed.
Does Deep learning with Keras need no skills?
One of my friend said that it's irrelevant to code with Keras :'-(
By that reasoning, simulating soft body physics also takes no skills. Yes, you can use the library without knowing what you are doing, but you will never reach a higher level of thinking. Keras is important for project creations and much other stuff, where writing your own neural network and training algorithm can actually own a disadvantage rather than an advantage. In the case of specific development, Keras become obsolete as you may need to dig much deeper than what the library has to offer. So yes you can create wonderful projects without using Keras or TensorFlow, and you can use it as a no-brainer, but it's kind of like writing functions to save yourself steps and repetitiveness. Hope that helps.
@@LemonOrange_ sometimes it's better and even faster to learn things backwards.
why the subtitles auto generate korean instead of english? hmmm
Hey red - It was set to be that way on the UA-cam backend. I this was a mistake. I've changed it. It looks like it takes some time on UA-cam's side to update. Note that there is also a blog post for each video in this series here: deeplizard.com/learn/video/m0pIlLfpXWE
@@deeplizard Thank you so much for your help
I was expecting to understand what is non-linearity, because it is said "The purpose of the activation function is to introduce non-linearity into the output of a neuron."
I think u should zoom in when u explain the code.
Agree and started zooming in later episodes.
In the mean time, you can check the corresponding written blogs for each video at deeplizard.com to get a better view of text contents.
Zoom in can't see clearly
sigmoid function is wrong in the video
Yes, you're right. The corresponding blog for the video below has the correction.
deeplizard.com/learn/video/m0pIlLfpXWE
Why do the auto generated subtitles think you're speaking Korean???
I'm not sure :( They are auto-generated by UA-cam, but it will not auto-generate English for this vid. We'll need to open a support ticket with UA-cam to sort it out. In the mean time, you can visit the written portion of this lesson here:
deeplizard.com/learn/video/m0pIlLfpXWE
@@deeplizard I guess they need a better language detection AI lol
ECE 449 UofA
I can't understand why everyone explains neural networks with sigmoid, yet later says to never use sigmoid. Just skip the sigmoid, then.
The answer lies in linearity and non linearity.