21:59 this is the intuitive explanation for backprop i was looking for! if you know the weightchange to reduce the error, then simply do this change to the output of the predecessor instead of to the the weight. it's the same result. because of you know the calculus from the predecessor from the forward propagation, you can give the change through to the inputs of the predecessor. it's hard to explain, but i hope i got it.
how did you solve the h1 and h2 I couldn't get my head around that math, for the x in sigmoid which value of x did you use? 0 or 1? and also for the values of x and y what are the values
Self-Organizing Multi Linear networks......No backprop, no calculus, no bias wt. and in most networks, no special activation function. Most of my networks that I build use this method.
22:42 you don't know or you truely believe what you are saying? because that's the most important part of the neural networks... if you don't know how to backprop you are doomed.... and i'm looking for 1 explanation where they explain how it works when the previous layer has more than 1 neuron...
hi Fernando, I understand your anger. But the thing is, you have a set of labels already available to you since this is supervised learning algorithm. So, you can update the weights as Wnew = Wold+a(labels-y)x, where y is the obtained output and a is the learning rate. this is true for all the values of the weight on a certain node. I believe he didnt mention it because its quite a universal rule. You always use the gradient descent or delta learning rule. I hope this helps, cheers :)
if the previous layer has more than one nodes, you take one output node and work on it like a separate adaline node. This can further be performed on the previous layer nodes too. Since the weight updates follow a backward path, it is called backpropagation :)
true thats the calculation part. You actually derivate the the error to update weights and biases. Thats exactly what i just said. It's called delta learning. The fact that weight updation goes from nth layer to n-1th layer and so on, upto the weights between the input layer and first hidden layer
@@FerMJy The error actually follows a parabolic path for gradient descent. The equation is Error^2 = (y-xi)^2, which is a parabolic equation. So, in order to minimise this equation, you take the tangent to that point. This process itself, is called the gradient descent. You do it for all weights separately. Or, i tends from 1 to n
Very crisp simple explanation of neural network - you made a great foundation for me to build the skyscraper. Thanks a lot.
Incredibly clear explanation of the concepts! Thanks a lot
best content that i can find on internet regarding MlP
I think it's worth mentioning that the magic of back propagation is the "chain rule".
true, I totally agree. The gradient descent approach , as a whole is indeed very fascinating :)
21:59 this is the intuitive explanation for backprop i was looking for! if you know the weightchange to reduce the error, then simply do this change to the output of the predecessor instead of to the the weight. it's the same result. because of you know the calculus from the predecessor from the forward propagation, you can give the change through to the inputs of the predecessor. it's hard to explain, but i hope i got it.
This is the best video on MLP
Excellent video, very well explained
Finally I found it after 2 days of search
how did you solve the h1 and h2 I couldn't get my head around that math, for the x in sigmoid which value of x did you use? 0 or 1? and also for the values of x and y what are the values
Refer to my video on simple perceptrons: ua-cam.com/video/aiDv1NPdXvU/v-deo.html
Can the neural network be used in stock data analysis
Well explained. Simple english
I'm happy to be the 10kth subscriber
🎉
Thanks a lot
Self-Organizing Multi Linear networks......No backprop, no calculus, no bias wt. and in most networks, no special activation function. Most of my networks that I build use this method.
come on backpropagation is not that complex just try to add it
genius
You're just like Andrew ng lite ....
Best
22:42 you don't know or you truely believe what you are saying?
because that's the most important part of the neural networks... if you don't know how to backprop you are doomed....
and i'm looking for 1 explanation where they explain how it works when the previous layer has more than 1 neuron...
hi Fernando, I understand your anger. But the thing is, you have a set of labels already available to you since this is supervised learning algorithm. So, you can update the weights as Wnew = Wold+a(labels-y)x, where y is the obtained output and a is the learning rate. this is true for all the values of the weight on a certain node. I believe he didnt mention it because its quite a universal rule. You always use the gradient descent or delta learning rule. I hope this helps, cheers :)
if the previous layer has more than one nodes, you take one output node and work on it like a separate adaline node. This can further be performed on the previous layer nodes too. Since the weight updates follow a backward path, it is called backpropagation :)
@@sahajshukla no you don't... you have to derivate the calculation of the wighted sum...
true thats the calculation part. You actually derivate the the error to update weights and biases. Thats exactly what i just said. It's called delta learning. The fact that weight updation goes from nth layer to n-1th layer and so on, upto the weights between the input layer and first hidden layer
@@FerMJy The error actually follows a parabolic path for gradient descent. The equation is Error^2 = (y-xi)^2, which is a parabolic equation. So, in order to minimise this equation, you take the tangent to that point. This process itself, is called the gradient descent. You do it for all weights separately. Or, i tends from 1 to n