if you are watching this in lockdown you are one of the rare species on the earth . many students are wasting their time on facebook, youtube, twitter, netflix, watching movies playing pubg, but you are working hard to achieve something . ALL the best ...nitj student here
I have recently been thinking of Data Science and Machine Learning, Krishna Naik's videos were very helpful in framing my decision. Thank you Krishna Naik.
That was an awesome journey.Now I have finished all the videos in the deep learning playlist. If you notice I have written a comment on each of the videos which was unnecessary.Now I will commence my journey to the ineuron course of Deep Learning with NLP which has commenced on the 18th of April. Oh Krish I wonder should review all the videos once again before commencing the journey of ineuron .Not a bad thought indeed. Ha!Ha!.Bye Krish .Stay blessed . Keep contributing.
sir g love ho gia ap sy main first time nlp smjny ki koshish kr raha tha q ky main ny final year my isy as a research work choose kia hy and sir your videos help me alot love you sir boht ziada
Hi , Thanks for your wonderful explanation, In my opinion , this detailed video is more important for researcher rather than programmers want to use LSTM or RNN
Nice lecture sir. Plz, try to solve only one numerical example manually for at least one epoch sir. It will be helpful to understand lstm in depth. Thank you
At 20:27 when context is similar , sigmoid(y) is vector[1 1 1 1], why will sigmoid(y)*tanh(y) give me vextor [0 0 0 0] , by looking at sigmoid and tanh graph when sigmoid(y) tends to -> 1 even tanh(y) graph tends to 1 , then sigmoid(y)*tanh(y) should result to vector [1 1 1 1] as well
Me watching other YT videos: Watch then like/dislike/do nothing Me watching Krish sir's videos: First like then watch Thank you so much for explaining so many things. I learnt complete practical ML/DL from your videos. A big thumbs up from my side. Definitely, I will share your channel to anyone who would want to dive into ML/DL/DS.
Hi Sir, I have a serious doubt. At 20:31 you are saying tanh will give output as 0000.. if context has not changed. How this happens plz elaborate that. I have spent a lot of time thinking on it bt still couldn't find the answer.
Man You explain really great. I was confused in GRU and LSTM, your explanation was wonderful. Your skills gained one more subscriber to your channel. Thank You for such videos.
Sir, please upload videos on Boltzmann Machines...it feels very much complicated to understand the maths equations behind it...your videos has helped me a lot to learn ML/DL concepts Love ur videos♥️♥️
Sigmoid doesn't inherently converts real values to binary labels i.e. 0 or 1, instead it'll be range of real values between 0 to 1 (inclusive). The vectors at output of gates need NOT be something like [0 0 1 1] but can be, and most probably be, something like [0.122, 0.23, 0, 0.983].
At time stamp 7:00 i think this matrix multiplication not possible. In matrix multiplication, the number of columns in the first matrix must be equal to the number of rows in the second matrix for the multiplication to be valid.
I'm having a feel that the equation mentioned at 10:40 isn’t right... For Ft = sig(Wf * [Ht-1, Xt] + Bf) Ht-1 should already have its weight associated, ie., Ht-1 = sig(Wt-1 * Xt-1 + Bt-1) , correct? Which means, for Wf, we won’t be factoring in Wt-1 into it again, but only use the current weight Wi Can someone comment on this and correct me if I'm wrong, please?
I do not get something. We know that vanishing gradient problem is happening because the derivative of the sigmoid or tank function is between 0.25 max and 1 max and after many layers, the derivative cannot help in the update of the weight. However, here we are using sigmoid again. Aren't going to have the same problem
Hey Krish, it was a very informative video on the subject. thanks for the lovely work. I am not sure if I can request a topic that I and many others could be interested in. However, you being from an AI industrial side, it would be nice to see some content in the future about ml model encryption and resources for production. Great job on the youtube playlists
Hey krish could you like explain how each of the input features are mapped to the rnn units and how the ouputs are then formed? like im really having a hard time picturing how these input features are getting mapped at each time step? Like could you explain with this text sequence example itself where each word has n no. of features i.e is a vector of size n and how these features are mapped Thanks!!!
Sir why are we again applying sigmoid function in the input layer while we have already done in the Forget Date? what is the necessity of calculating i(t) again? isn't f(t) = i(t)?
6:41 this is against the matrix multiplication rule, I was also doing the same manually for input layer, I was stuck for hours why I am not able to add the output to the memory state, then I found out that the I am applying wrong matrix multiplication rules. Anyways great Explanation.
So is it fair to say the forget gate decides "where the new word fits in the context" and hence the forgetting in the context and the input gate decides how the new word 'changes' the context, thereby altering the influence of the new word on the context?
Thanks for the video Krish. One doubt is, how would word vectors change to 0 & 1 when we pass through sigmoid function? Greater than 0.5 might mark as 1, but how is this probability determined? based on what value?
Sigmoid function is f(x) = 1/1+e^-x, after calculating value W.x + b this result passes through the sigmoid function which outputs value between 0 and 1. If output is greater than 0.5 then it is assigned 1 else 0 is assigned.
There's a mistake, the output of gate will be a vector of real values between 0 and 1 (inclusive) not binaries - i.e. not 0 or 1. Network learns best way to project, first by linear transformation (W times something) then by non-linear transformation (applying sigmoid). To answer your "how", the network "learns" what's best way to do this transformation (by learning weights) to optimize the objective function.
Sir, are you possible image classify of folder wise move. I'm data operator forest Kaziranga National Park. Many photos trapped camera. Manual segregation is to very hard. Please help you
So Inputgate is containing sigmoid and multiplication operation. same inputgate is involving in forget gate also. So forget gate is including input gate and output gate also including input gate. but output gate is something different like added tanh first then input gate . am i right ? anything wrong
Few suggestion. Please reduce the frequency of words particular and over. As you already talk about something specific, it's not really needed everytime to use particular same way over. You are referring here, so simply here will sound good in place of over here.
Buddy, Even MIT did not go into the deep of it. Understanding the Math Behind the complex Deep Learning Networks are really complex. I was wondering that as the context changes, how the sigmoid function makes the value to 0 or near to zero to forget the past memory. Because the input is changing right? Then it must not proceed further..isn't it?
Hi Krish I have done with the lstm forecasting in that I am facing a data mismatch, like prediction was done for test data but the prediction data is lower than the test data
LSTM is kinda crappy when it comes to predictions corona video cases. Krish according to you which algorithm should be the best to predict world's COVID-19 cases
if you are watching this in lockdown you are one of the rare species on the earth . many students are wasting their time on facebook, youtube, twitter, netflix, watching movies playing pubg, but you are working hard to achieve something . ALL the best ...nitj student here
self love is important......wts nitj?
@@aasavravi5919 it's NIT Jodhpur
Superb..100% true well said
@@aasavravi5919 nit JALANDHAR.
@@techtrader8434 Jamshedpur/Jaipur are also options
Ravi first time in this session I felt like lost . I loved your board presentation .
you are really lost he is Krish
I agree... This format is harder to follow.
@@shubhamsongire6712 lmao
@@shubhamsongire6712 🤣🤣
I have recently been thinking of Data Science and Machine Learning, Krishna Naik's videos were very helpful in framing my decision. Thank you Krishna Naik.
That was an awesome journey.Now I have finished all the videos in the deep learning playlist. If you notice I have written a comment on each of the videos which was unnecessary.Now I will commence my journey to the ineuron course of Deep Learning with NLP which has commenced on the 18th of April.
Oh Krish I wonder should review all the videos once again before commencing the journey of ineuron .Not a bad thought indeed.
Ha!Ha!.Bye Krish .Stay blessed . Keep contributing.
i also see your comments in every vedio.ha ha
Hello Sir, Is the concept of the video clear to you? If yes, Please help me with the same. Please reply on ritish_m@outlook.com
I was really strugling to understand the core concept of LSTM. This really helped me. Thank you very much,,Also the blog is really awesome..
@Krish Naik great video! the first video that gets to the point and explains concepts in detail.
sir g love ho gia ap sy main first time nlp smjny ki koshish kr raha tha q ky main ny final year my isy as a research work choose kia hy and sir your videos help me alot love you sir boht ziada
Thank you, sir! It's great content and I'm almost following your NLP playlist.
This is the best explanation on LSTM.. really thanks
Hi , Thanks for your wonderful explanation,
In my opinion , this detailed video is more important for researcher rather than programmers want to use LSTM or RNN
Amazing explanation, you made it very simple and clear
Nice lecture sir. Plz, try to solve only one numerical example manually for at least one epoch sir. It will be helpful to understand lstm in depth. Thank you
At 20:27 when context is similar , sigmoid(y) is vector[1 1 1 1], why will sigmoid(y)*tanh(y) give me vextor [0 0 0 0] , by looking at sigmoid and tanh graph when sigmoid(y) tends to -> 1 even tanh(y) graph tends to 1 , then sigmoid(y)*tanh(y) should result to vector [1 1 1 1] as well
I have same doubt plz reply.
same doubt
Same doubt
the best explanation as usual,, thank you so much for your effort.
Me watching other YT videos: Watch then like/dislike/do nothing
Me watching Krish sir's videos: First like then watch
Thank you so much for explaining so many things. I learnt complete practical ML/DL from your videos. A big thumbs up from my side. Definitely, I will share your channel to anyone who would want to dive into ML/DL/DS.
Finest explanation of such a difficult topic, hats off!! 🫡
Tq for your unconditional service
thanks so much my brother..great explanation .Allah bless you
good explanation i have ever seen ..
Wonderful video. Again great explanation. I think I might run out of words after a few more videos.
watching it in 2024 from Pakistan... he saved me from failing NLP course... thank you
Thank You sir for such videos, Just please arrange it in playlist or in your website in order to access it easily. Thank You so much.
@krish naik wonderfull explanation
Hi Sir, I have a serious doubt. At 20:31 you are saying tanh will give output as 0000.. if context has not changed. How this happens plz elaborate that. I have spent a lot of time thinking on it bt still couldn't find the answer.
Did you find answer to this bro? Even I came across the same doubt. It would be better if Krish could explain it.
nice! simple explanations.... much appreciable Sir
Man You explain really great. I was confused in GRU and LSTM, your explanation was wonderful. Your skills gained one more subscriber to your channel. Thank You for such videos.
Thank you so much sir, for such a great explanation
amazing explanation sir..many thanks
Excellent sir
Sir, please upload videos on Boltzmann Machines...it feels very much complicated to understand the maths equations behind it...your videos has helped me a lot to learn ML/DL concepts
Love ur videos♥️♥️
Hi, Can you please tell me which all concepts in ML and DL you feel are mathematically complicated to understand?
Need some help in understand at timestamp 12:41. If current input (Xt) is not similar to previous input, how does it make result of forget as 0?
Wonderful Explanation!
Excellent..
Sigmoid doesn't inherently converts real values to binary labels i.e. 0 or 1, instead it'll be range of real values between 0 to 1 (inclusive). The vectors at output of gates need NOT be something like [0 0 1 1] but can be, and most probably be, something like [0.122, 0.23, 0, 0.983].
great explaination
At time stamp 7:00 i think this matrix multiplication not possible. In matrix multiplication, the number of columns in the first matrix must be equal to the number of rows in the second matrix for the multiplication to be valid.
I'm having a feel that the equation mentioned at 10:40 isn’t right...
For Ft = sig(Wf * [Ht-1, Xt] + Bf)
Ht-1 should already have its weight associated, ie., Ht-1 = sig(Wt-1 * Xt-1 + Bt-1) , correct?
Which means, for Wf, we won’t be factoring in Wt-1 into it again, but only use the current weight Wi
Can someone comment on this and correct me if I'm wrong, please?
finally i saw detailed explanation. ty.
I do not get something. We know that vanishing gradient problem is happening because the derivative of the sigmoid or tank function is between 0.25 max and 1 max and after many layers, the derivative cannot help in the update of the weight. However, here we are using sigmoid again. Aren't going to have the same problem
Thank you sir❤
Thanks Krish
I have been waiting for this video so long.
Hey Krish, it was a very informative video on the subject. thanks for the lovely work. I am not sure if I can request a topic that I and many others could be interested in. However, you being from an AI industrial side, it would be nice to see some content in the future about ml model encryption and resources for production. Great job on the youtube playlists
Hey krish could you like explain how each of the input features are mapped to the rnn units and how the ouputs are then formed? like im really having a hard time picturing how these input features are getting mapped at each time step? Like could you explain with this text sequence example itself where each word has n no. of features i.e is a vector of size n and how these features are mapped Thanks!!!
What happens to the -1 values of tanh and sigmoid cross product when the information is added to cell state in lstm?
Bro because of you i understood deep learning very well I need a small help can u send some resources for learning deep learning with tensorflow pls
Hi, actually I don't understand why do we need to do the sigmoid part twice? Once for input and once for forget gate? Isn't it doing the same thing?
Bro I have the same doubt. The weights may change but doesn't that impact the model? Please let me know if you found any answer
Sir why are we again applying sigmoid function in the input layer while we have already done in the Forget Date? what is the necessity of calculating i(t) again? isn't f(t) = i(t)?
Can you please make a video on how to combine two deep learning model which are trained on different dataset
6:41 this is against the matrix multiplication rule, I was also doing the same manually for input layer, I was stuck for hours why I am not able to add the output to the memory state, then I found out that the I am applying wrong matrix multiplication rules. Anyways great Explanation.
it is hadamard product
hello Krish can you explain Conv-LSTM with one sample data and difference with LSTM and time distributed concept of LSTM?
so complicated ...
hehe😂
i think you have to clear your basics first
So is it fair to say the forget gate decides "where the new word fits in the context" and hence the forgetting in the context and the input gate decides how the new word 'changes' the context, thereby altering the influence of the new word on the context?
How does long term dependency problem relates with Vanishing gradient problem , anyone plz explain ?
please upload video any real time project in deep learning using like lstm algotihm
Please upload further videos....
Thanks for the video Krish. One doubt is, how would word vectors change to 0 & 1 when we pass through sigmoid function? Greater than 0.5 might mark as 1, but how is this probability determined? based on what value?
Sigmoid function is f(x) = 1/1+e^-x, after calculating value W.x + b this result passes through the sigmoid function which outputs value between 0 and 1. If output is greater than 0.5 then it is assigned 1 else 0 is assigned.
There's a mistake, the output of gate will be a vector of real values between 0 and 1 (inclusive) not binaries - i.e. not 0 or 1.
Network learns best way to project, first by linear transformation (W times something) then by non-linear transformation (applying sigmoid).
To answer your "how", the network "learns" what's best way to do this transformation (by learning weights) to optimize the objective function.
Thanks Sir.
please upload time series analysis using RNN asap...
Yes coming up
Sir, are you possible image classify of folder wise move. I'm data operator forest Kaziranga National Park. Many photos trapped camera. Manual segregation is to very hard. Please help you
So Inputgate is containing sigmoid and multiplication operation. same inputgate is involving in forget gate also. So forget gate is including input gate and output gate also including input gate. but output gate is something different like added tanh first then input gate . am i right ? anything wrong
LSTM accepts input of variable size?? Or padding is required to make all input of same size?
is the video for the different types of LSTM skipped ?
Few suggestion. Please reduce the frequency of words particular and over. As you already talk about something specific, it's not really needed everytime to use particular same way over. You are referring here, so simply here will sound good in place of over here.
Krish sir,how are the weights different at every gate?Since we are sending the same concatenated weights to every gate,how can it be different?
Could you please make the video in seq2seq architecture for the Conversational Modeling?
Thanks. Please upload LSTM video on practicals
Love your video, but I have a question so how do we update the weight or backproprogate the LSTM ?
I think the bacpropogation process of lstm rnn is same as simple rnn
Buddy, Even MIT did not go into the deep of it. Understanding the Math Behind the complex Deep Learning Networks are really complex.
I was wondering that as the context changes, how the sigmoid function makes the value to 0 or near to zero to forget the past memory. Because the input is changing right? Then it must not proceed further..isn't it?
@KrishNaik please could you tell what is the math behnid this concatenation operation [ht-1 ,xt] ? what is ',' ?is it addition , multiplication?
It is actually concatenation.. let say ht-1 is m size vector and xt is n size then resulting thing would be m+n size v vector
Please upload video about autoencoder
A small confusion in C t-1. How does Ct-1 differ from h t-1, if both are previous output
Hi krish ,
In lstm don't we have back propagation and weight updation ? if yes, why?
Can you please make a video on GAN as well?
Great
i m waiting for ur next video
Hi Krish I have done with the lstm forecasting in that I am facing a data mismatch, like prediction was done for test data but the prediction data is lower than the test data
LSTM is kinda crappy when it comes to predictions corona video cases.
Krish according to you which algorithm should be the best to predict world's COVID-19 cases
Hello Sir, Is the concept of the video clear to you? If yes, Please help me with the same. Please reply on ritish_m@outlook.com
How can we do extractive summarisation in BERT??
Sir how can we use time series data as input in CNN.please guid me
Could you please programming tutorial for lstm and GRU ?
Krish , Please upload more on LSTM .
21:48 yes , very confusing
Please go back to your whiteboard. You're amazing with whiteboard and marker!
Is there anything left with deep learning tutoril or it is completed
Is accuracy meaningless in keras models.....?
Can anyone provide reference link to learn the word to vector conversion topics
go to deep learning playlist
Please help me to work with time series data
from which book you are teaching krish
link is given in reference kkk
Will you be uploading videos on Transfer Learning ?
Transfer learning is a very broad topic bro everyday a new algorithm is comming using transfer learning
@@RinkiKumari-us4ej Hello Sir, Is the concept of the video clear to you? If yes, Please help me with the same. Please reply on ritish_m@outlook.com
please upload more videos
Day of Recording of this video is the day when the LOCKDOWN started !!!!!!!!
Mar-24-2021
R u 29 years old?
board is more good
Паша Техник совсем плох, индусом стал, нейронками занялся
confusing
You disappointed us
😂
Too much advertisements😒😔
Hello all my name is Krish Naik...🤣😁😝