“The definition of genius is taking the complex and making it simple.” - A congruence of this quote and this lecture series defines the quality of Instructors. Thanks a ton to Alex and Ava!!! Thank you very much!
Will be honest, this is probably the best lecture I have ever seen on DL. Most other lectures are so inaccessible and jargon filled that they fail to drive home the fundamentals. Kudos to Ava and Alexander
MIT gives a title of introduction to deep learning, but some people realize that it is quite deep rationale behind what they have sought for a long time ago. Thank you MIT. A great lecture.
Incredible lecture, I had to pause halfway through just to absorb as much information as I could. Please keep these coming, I have a great aptitude for neural networks! This course is right up my alley :)
Thanks for the amazing class once again! Recurrent Neural Networks are very strong and important nowadays in our society, and the improvement and studies about them make a huge impact on this!
I would like to thanks Alex and Ava. Have this content with this quality is priceless for someone that is trying to learn ML and DL by himself. Thank you for share this incredible class online for free.
Just a minor correction in 20:12 which I couldn't find the reason anywhere: tf.keras.layers.SimpleRNN actually implements an output-to-output recurrence which is slightly different from the scratch model provided. This can be demonstrated by checking their weights and manually multiplying them. The output-to-output recurrence (when omitting the bias) would be self.h = self.W_xh*x --> output = tf.math.tanh(self.h+ previous_output*w_oo). If anyone knows why keras implements output-to-output instead of hidden-to-hidden, please let me know, I usually see in literature hidden-to-hidden.
I find it to be more intuitive since it can be simplified to: output = tanh(W_xh*x+ output*w_oo), hence w_oo is a weight that is "memorizing" or "giving importance" to the previous outputs.
Kudos to Alex and Ava for taking the time to making these concepts so simple to consume! I'm definitely going to watch all the lectures for this course.
Pretty sure I tried viewing this lecture series at least 2 years ago, and this format is much more understandable and digestible. Thank you MIT and lecturers/producers.
I watched all the lectures of 2020 in the lockdown , when I first I thought what I learnt in other course for 1 week is equivalent what I learn from a sngle lecture , its too cool to cover all these coccepts in an hour..
Once again, I really enjoyed this lecture. All the concepts are well explained, I am just about to start the lab session and feel quite excited to apply all I picked up from this lecture. Thank you!
Question: So Ht is a single float being passed from one cell to the next? And what are the dimensions of the weight matrix W_hh? Because I assume it must learn something of value that is characterized by the relationship between the input at T-1 and T. So my question is, is the weight matrix W_hh dimensioned by the size of the vocabulary squared? If so what sort of intuition should I have about what H is encoding?
16:50 why y only depends on h? doesn't y depend both on h and x? Does it mean that h(t) has considered the non-linear feature, and y(t) just considered as linear to h(t) is enough?
The lecture was very well presented, and easy to understand. I learned a lot. Maybe because I'm still new to the subject, I found the explanation on Attention a little difficult to understand. Thank you very much!
Is there anyone who is trying out the lab? @Alexander, consider setting up a forum where we could discuss issues /insights that we have while trying out the lab exercises
Brilliant content, absolutely loving it. I am implementing sample networks using test datasets in R for the topics in the lectures. If anybody wants the files, let me know. The lecturers voice trembles a bit (not sure if nervous or exited), so some words of encouragement: you rock!
Great detailed tutorials! thanks. Just a quick typo on Backpropagation through time slide (32 mins), shouldn't the Loss L_3 be L_t instead? i.e. corresponding to the t^th unit.
Is there an online community/forum for this course? I'm curious to see how others implemented solutions for the lab! (I implemented a solution for the lab but I don't think my implementation is very clean or efficient...)
In the lecture she says that the backprop takes place through the cell state and the original pathway is left undisturbed, but if we don't backpropagate through the original pathway (i.e. xt & ht-1), how are the weights going to adjust to give a lower cost function value?
may I ask what's the differences between feed-forward and traditional neural networks? isn't traditional neural networks feed-forwarded? and RNN is not feed-forward neural networks, right? because it's recurrent neural network.
Ava one question so are you saying that y at each time step is a word that is being predicted with each of the input word and hidden state. So what I mean by that at 0th time step word "I" is fed into the RNN and some word[basically a vector in the space is produced denoted by y0] is predicted and a hidden state is generated. Now this hidden state and the word "love" is fed into the 1st time step and some word[denoted by y1] is predicted and another hidden state is generated and this goes on until the last word is fed into the RNN. And here the time t is denoted by the number of words in the input sequence meaning 4 in this case since the sentence is composed of 4 words "I love recurrent neural"
Thanks for uploading so great videos and I have one question about the list of the videos, are they in the right sequence or just placed randomly in the list?
Does RNN really able to handle variable length input sequences by its own or we need to provide the maximum_length parameter there so that the smaller inputs need to be padded??? Anyone please clarify this. Thanks
“The definition of genius is taking the complex and making it simple.” - A congruence of this quote and this lecture series defines the quality of Instructors. Thanks a ton to Alex and Ava!!! Thank you very much!
Yes. You nailed it.
Yes. It is nice to have such a high quality lecture available to the public. I am also impressed by the instructors. Go Ava!
One of the clearest explanations on RNNs, LSTMs.
I really liked the way LSTM concept is explained. The attention mechanism has been briefly described yet well explained. Thank you so much.
Will be honest, this is probably the best lecture I have ever seen on DL. Most other lectures are so inaccessible and jargon filled that they fail to drive home the fundamentals. Kudos to Ava and Alexander
Even if I’ve already watched previous lectures, I am watching these ones as it is the first time. Masterpiece 😭❤️🐐
MIT gives a title of introduction to deep learning, but some people realize that it is quite deep rationale behind what they have sought for a long time ago. Thank you MIT. A great lecture.
Incredible lecture, I had to pause halfway through just to absorb as much information as I could. Please keep these coming, I have a great aptitude for neural networks! This course is right up my alley :)
Going from Attack on Titans to Deep Learning. What a week :)
ngl the lecture was a bit easier to understand
@@liluo7790 😂😂true
Same bro 💪
@@JohnZakaria 😁😁
@@otakudnp3880 🙄
Clear intro. to RNNs building up intuition from the basic principles Loved the lecture !
Thanks for the amazing class once again! Recurrent Neural Networks are very strong and important nowadays in our society, and the improvement and studies about them make a huge impact on this!
Our instructor's flow is super smooth, no cap
This is the best explanation of LSTMs I've seen!
Thanks a lot for making these mit lectures public... I'm so happy to learn these.. it's all because of you 🤗
Ava Soleimany has a really high level skills of knowledge explaining. Thanks for making these lectures public.
I would like to thanks Alex and Ava. Have this content with this quality is priceless for someone that is trying to learn ML and DL by himself. Thank you for share this incredible class online for free.
Ava really cleared the confusing bits of the internal workings of standard RNNs and LSTMs. Thanks.
Thanks Ava and Alex.
This lecture is perfection! I say that as a pedantic PhD 🙂. I can tell that a crap ton of work went into it.
Just a minor correction in 20:12 which I couldn't find the reason anywhere: tf.keras.layers.SimpleRNN actually implements an output-to-output recurrence which is slightly different from the scratch model provided. This can be demonstrated by checking their weights and manually multiplying them. The output-to-output recurrence (when omitting the bias) would be self.h = self.W_xh*x --> output = tf.math.tanh(self.h+ previous_output*w_oo). If anyone knows why keras implements output-to-output instead of hidden-to-hidden, please let me know, I usually see in literature hidden-to-hidden.
I find it to be more intuitive since it can be simplified to: output = tanh(W_xh*x+ output*w_oo), hence w_oo is a weight that is "memorizing" or "giving importance" to the previous outputs.
I just wait all day through at office to get back home and watch this amazing series of lectures. Thank you Team @Alexander Amini
سلام خانوم کتر.عالی بود.شما باعث افتخار ما ایرانیها همچنین مایه ی فخر MIT هستید.بی نظیر هستین.
Kudos to Alex and Ava for taking the time to making these concepts so simple to consume! I'm definitely going to watch all the lectures for this course.
Grateful to you profs and MIT! 💯
What a wonderful introduction to the intuition behind RNNs. :)
Going from WandaVision to Deep learning. What a weekend :D
Me too! Looking forward to tonight's episode!
Haha cool🔥
WandaVision to ComputerVision :D
with some fav song from The Weeknd xD
Many thanks for taking the time to produce and release this invaluable content. Cheers from Abidjan!
Pretty sure I tried viewing this lecture series at least 2 years ago, and this format is much more understandable and digestible. Thank you MIT and lecturers/producers.
Perfect timing for the weekend ;) can’t wait
I watched all the lectures of 2020 in the lockdown , when I first I thought what I learnt in other course for 1 week is equivalent what I learn from a sngle lecture , its too cool to cover all these coccepts in an hour..
at MIT, this is called learning by "drinking from a firehose". once you get used to it, other options seem tedious.
That was the best video I watched about RNN. Thank you 😊
Perfect , loved the previous RNN lecture watched it over and over , couldn’t wait for this one
i did two courses and know alot ... but my understanding about this becomes more and more clear... thanks for these videos
it really shows - MIT = world class and nothing else
This lecture is clearer than the waters of the Caribbean. Fantastic.
😂
The best lecture, the best weekends!
Thanks for your awesome explanation of LSTMs! can't wait to see the deep-dive into Transformers and how they compare! :)
Love the 14:20 to 15:30 pseudo logic representation of RNN. Cant be represented more simpler !
Thanks to these two Amazing Iranian young MIT professors!Ava and Alexander!
Great lectures, the topics seem so simple to me now. Thank you Ava and Alexander!
very well presented lecture, condensing much info into a one-hour session. Bravo!
Once again, I really enjoyed this lecture. All the concepts are well explained, I am just about to start the lab session and feel quite excited to apply all I picked up from this lecture. Thank you!
You guys are genius because things are simplified
Very clearly explaining and Perfect teaching timing!
"Beautiful" is the only word I can think of at the end of lecture #1
Glad I came across this! perfect way to spend my weekend!
Thanks.
everything she said gets into my mind directly ,amazing lecture ! thank you very much
Friday Evening.. Perfect 🕯️
Thanks, Ava for your interesting presentation. I am proud of you as one Iranian girl.
Question: So Ht is a single float being passed from one cell to the next? And what are the dimensions of the weight matrix W_hh? Because I assume it must learn something of value that is characterized by the relationship between the input at T-1 and T. So my question is, is the weight matrix W_hh dimensioned by the size of the vocabulary squared? If so what sort of intuition should I have about what H is encoding?
Well done on the lecturing skills improvement ! i could certainly feel that haha :) Thanks for the Lecture !
Watching this right now, awesome Ava!
Perfect timing !! Thanks for the upload
This is the future of teaching and teaches you the future of technology
Thanks a lot for the tutorials. Those were the best ones I 've ever seen about Deep Learning . Best Wishes
16:50 why y only depends on h? doesn't y depend both on h and x? Does it mean that h(t) has considered the non-linear feature, and y(t) just considered as linear to h(t) is enough?
amazing structure of the class, loved it
Amazing lecture! Can't thank you enough. Thanks :)
Thanks a lot for sharing, really appreciate the from scratch code part that's help me to test my understanding of what was described.
Huge respect for such a great lecture. Thanks Ava
Very good Representation and you can learn many things.Well done
Amazing lecture!! Thanks for sharing! Congrats to professors and MIT!
Amazing lectures, crystal clear explanation of concepts!
The lecture was very well presented, and easy to understand. I learned a lot.
Maybe because I'm still new to the subject, I found the explanation on Attention a little difficult to understand.
Thank you very much!
Is there anyone who is trying out the lab? @Alexander, consider setting up a forum where we could discuss issues /insights that we have while trying out the lab exercises
is there any forum now?
Brilliant content, absolutely loving it. I am implementing sample networks using test datasets in R for the topics in the lectures. If anybody wants the files, let me know.
The lecturers voice trembles a bit (not sure if nervous or exited), so some words of encouragement: you rock!
This is a very complicated topic, but was a great lecture!! Congrats!!
Imagine giving a full lecture without once saying "uhm", or something similar! Really high lecturing skills here
Well done, best UA-cam channel I've subscribed. Please keep it up
Attention is all we need
Beautifully explained!
Can't we use *ascii codes* for words rather than using *one hot encoding* . . . ❓ ❓ ❓
Great detailed tutorials! thanks. Just a quick typo on Backpropagation through time slide (32 mins), shouldn't the Loss L_3 be L_t instead? i.e. corresponding to the t^th unit.
Great content, presentation, and clarity! Thank you very much Ava.
nice tutorial. I love the tutorials in this channel
Is there an online community/forum for this course? I'm curious to see how others implemented solutions for the lab! (I implemented a solution for the lab but I don't think my implementation is very clean or efficient...)
me too, tell me if you find out
This lecture is statement of excellence🙏
Hey, if we encounter problems while solving labs how do we approach it, is there any channel or forum or any medium to contact or solve our doubts??
In the lecture she says that the backprop takes place through the cell state and the original pathway is left undisturbed, but if we don't backpropagate through the original pathway (i.e. xt & ht-1), how are the weights going to adjust to give a lower cost function value?
Don’t have enough superlatives. Ava is amazing
Past year's courses are further available for OCW than ever 👍🏻
Thank you for this perfect lecture Alexander and Ava
Thank you for our hero: UA-cam Algorithm
Amazing Class! So enlightening about RNNs! Thank you for sharing all of these amazing classes!
Thanks for providing this great course! Could anyone elaborate on the encoding bottleneck issue at 53:00? Is it only for LSTM or all RNNs?
may I ask what's the differences between feed-forward and traditional neural networks? isn't traditional neural networks feed-forwarded?
and RNN is not feed-forward neural networks, right? because it's recurrent neural network.
Ava one question so are you saying that y at each time step is a word that is being predicted with each of the input word and hidden state.
So what I mean by that at 0th time step word "I" is fed into the RNN and some word[basically a vector in the space is produced denoted by y0] is predicted and a hidden state is generated. Now this hidden state and the word "love" is fed into the 1st time step and some word[denoted by y1] is predicted and another hidden state is generated and this goes on until the last word is fed into the RNN. And here the time t is denoted by the number of words in the input sequence meaning 4 in this case since the sentence is composed of 4 words "I love recurrent neural"
Great... awaiting the session
Excellent presentation. Thanks.
Hey guys, where can i have acess to the Lab exercises?
github.com/aamini/introtodeeplearning
Sorry, for gated RNNs, you talk about the self-state c_t. But how is c_0 initialised?
My Friday night vibe. :D
Thanks for uploading so great videos and I have one question about the list of the videos, are they in the right sequence or just placed randomly in the list?
Truly amazing, thank you!
Does RNN really able to handle variable length input sequences by its own or we need to provide the maximum_length parameter there so that the smaller inputs need to be padded??? Anyone please clarify this. Thanks
Great stuff! Is there a forum where I can post clarifications on the lab?
Is the output equation is correct at 17.00 , is not it require activation there too?
Eva mentioned "labs" a couple of times where RNN is implemented. Where can I get a video of the same?
Can't wait
Why is it not a network from hidden layer to output layer in slide 6.24 min in this video?
Perfect lecture to understand...
why do we use same weights for every time step in rnn why can't we use best posssible weight at that timestep?
Thank you , that was awesome