I came for the RNN implementation from fundamentals, and stayed for the training pipeline. Thanks for a great video, it's the clearest explanation of RNN (in code) I've seen.
It seems your rnn structure is different from that provided in pytorch tutorial. But I think in a typical rnn model, .h2o( ) should be defined that takes 'hidden' as an input as apposed to a .i2o( ) that calculates output directly from combined.
Your tutorial does a great job at showcasing RNN. However, you forgot to mention a train-validate-test split to see if the model actually generalizes (understands the concept) or if it just learns the names by heart. Your "testing" uses names that the model was trained on. In an example like this it would be very important to use a train-validate-test split since without it you will never catch potential overfitting!
Thanks for the feedback, and yes absolutely correct. The code was heavily inspired by the original pytorch example which didn't split the data :( In most of my other tutorials I make sure to do this.
@@patloeber That's great :) If guess since RNN is a more advanced topic most viewers hopefully have seen another video from you before which includes the train-validate-test split :) I've also done mistakes like deciding on hyperparameters or the model architecture based on the test instead of the validation set before.
I enjoyed this tutorial. I run the code and added an accuracy calculation. I am getting about 60% accuracy using random data samples to form the training data. Russian names have more names on the list a little unbalanced there (minor nitpick)
Have you seen the explanations by Yannic Kilchner? I'm not knowledgable enough to tell what audience Yannic's teaching is intended for but it seems like all the comments are all very positive!
@@HieuTran-rt1mv He's a recently graduated Phd who posts deep learning related content on his UA-cam channel (Yannic Kilchner)! His video "Attention is All You Need" is his take on discussing the paper. He has a lot of stuff related to transformers. I think he doesn't do any follow-along code but he does explain his intuitive understanding of cool deep learning papers.
I noticed you concatenated the input and the hidden state then fed that into the linear layer, however I see in other implementations that they multiply each by their weight matrices respectively then add them together before feeding it through an activation layer. Is there any reason you concatenate those two together?
Multiplying is "standard" but the beauty of NNs is you can connect things however you want (or however works best). When making the original tutorial I was experimenting with a bunch of ways to pass the state and in this case concatenating just converged way faster.
Based on your def train() function, init_hidden() would be called every time a new name from the dataset is used as an input. Why is it required every time? What would be the consequence of only calling it once(before the training loop) and never again? I thought that the hidden state contains important representations that adjust with every new training example, but over here it gets reset on every iteration. My understanding about what the hidden state represents may be flawed.
No your thinking is good. You can indeed use a RNN that knows about the whole sentence, or even multiple sentences...In this video to keep it simple I used a RNN that was trained only word by word, and for each word we looked at one character at a time. So it should learn based on the character sequences, so no need to store information from other words here...
Hello. Great content as always. Could you please explain how the following call works? output, next_hidden = rnn(input_tensor, hidden_tensor) It is calling the forward method in RNN class but is not explicit.
Hi, this is a really great tutorial on RNN pytorch!! I have a question regards to the model, why there is no tanh in the model. And why pass the concat of input_tensor and hidden_tensor to the i2o, shouldn't it just pass the hidden_tensor to the i2o? Could you please clarify this? Thank you! class RNN(nn.Module): def __init__(self, input_size, hidden_size, output_size): super(RNN, self).__init__() self.hidden_size = hidden_size self.i2h = nn.Linear(input_size + hidden_size, hidden_size) self.tanh = nn.Tanh() self.i2o = nn.Linear(hidden_size, output_size) self.softmax = nn.LogSoftmax(dim = 1) # (1,57): second dim def forward(self, input_tensor, hidden_tensor): combined = torch.cat((input_tensor, hidden_tensor), 1) hidden = self.tanh(self.i2h(combined)) output = self.i2o(hidden) output = self.softmax(output) return output, hidden
It depends on the applications, yes sometimes we only pass the hidden cell to the next sequence step, but in this case the next step takes the previous hidden cell and a new input, so it needs the combined one. And in this simple example I didn't need TanH, but you are correct this is the default activation for RNNs
Great video! However, I have a doubt. I am wondering shouldn't you keep the trained weights for hidden state while inferencing instead of calling `init_hidden` in the `predict` function? Please clarify.
My bad. I got it. The weights are infact the trained weights. Its just that hidden state starts from zero for all the new words while weights are shared among all `timesteps`. I am keeping the reply if someone else mistakenly think about it.
In the RNN forward method, I think this is not standard. You are computing the output at t, Ot, as a function of hidden state at t-1 and the input at t-1. As far as I have seen the more common way is computing ht from ht-1 and xt-1 and then use this ht to compute the Ot. Is this on purpose? Do you have any source to see and understand this implementation better? Thank you and great tutorials!
Thank you very very much for these tutorials. It is reaally helpful to get further. Sorry for not being active on patrion, I am still a student. Plesaee keep going and show us more possibilities in deep learning. Maybe a series on NLP or Time Series Forecasting?
So I have some questions about the training procedure if you don't mind! In the training loop, you give the model a random word from our dictionary 100000 times. My first question is , how do you know that the model didn't repeat some words several times? For examples maybe it repeated numerous times many Arabic or Chinese names, and on the other side didn't had access to any Portugese words. If that happened, wouldn't the low loss graph be misleading, since the model repeatedly saw the same words? To my second question, wouldn't it be better if we split the data in train - test - val sets based on some percentage of our choosing? To make sure all the languages are represented accordingly, in all of the sets. Why did you chose this approach with the random words? Is it a common practice with NLP and RNNs? Thank you very much!
From your example I don't understand if you run the sequences batch wise, e.g. 50 words in 1 batch. So the 1st letter from all words as an input, then the 2nd from all words etc.. in that way one would save time. Or is that not possible?
Top-notch video! Just a suggestion - the name of dictionaries category_lines and all_categories could have been more intuitive. Nonetheless, you explained it extremely well.
Interesting thing is that we dont have any nonlinearity between each time step and hidden states are mapped linearly from new input and previous hidden state to the next hidden state and its still so powerful!
I was entering the names to check the country they belong to, and I found out that if the name starts with capital letter then it is assigned a different country than when the name starts with a small letter. Is it that the RNN is just memorizing the entire dataset and not generalizing ( meaning : over fitting) or maybe because the dataset has no name starting with a capital letter . Your help is appreciated. Thank you.
Is the way to combine the input and the hidden is the only way of doing that, or are there other methods for combining the next hidden to generate a new hidden and output? because I saw a method were they have two weights tensors, the first is multiplied by the input and the second by the previous hidden then they add them and apply the activation function to get the new hidden, then multiply that by another weight tensor and apply another activation function to get the output.
Good point. Although I draw random training samples, it is very likely that with the large number of iterations it has seen all the names already. A better approach would be of course to separate into training and testing sets, but evaluation of the model was not the point of this tutorial.
Hi, I would like to know if it's just me or the English and French names get poor prediction performance? I tried to play a bit with the learning rate but still.
Hello, thanks for your video. I have one question: why you are using this loop in train function? In 66 string: for i in range(line_tensor.size()[0]): output, hidden = rnn(line_tensor[i], hidden) Here output will be rewritten for each letter in line_tensor however loss will be computed using last output. If i am false please fix.
@python_engineer sir, last week I messaged in the python intermediate course. below error showing while executing the code. unable to pick up the developer who gave the info it will run in python 2.7. Can you help me? WindowsError: [Error 193] %1 is not a valid Win32 application
hey @pythonEngineer , I recently finished a python course at my university. Now I m feeling a little confident about python. I watched this video I'm not understanding most of the functions. what do you suggest ?. is it enough if i watch your pytorch tutorial
Thank you a lot for your videos :) I am trying to run the training and prediction on the GPU but it seems that I forgot to pass to device some tensor. In particular I am passing to device: - combined = torch.cat((input_tensor, hidden_tensor), 1).to(device) - rnn = RNN(N_LETTERS, n_hidden, n_categories).to(device) but it shows the following error: All input tensors must be on the same device. Received cpu and cuda:0 so I suppose I'm missing something, but can't figure out what
got an error trying to run this ImportError: cannot import name 'load_data' from 'utils' Does anyone know how to resolve this? @Gonzalo Polo @Patrick Loeber
One of the best pytorch channels, i have ever encountered
Thanks so much!
I came for the RNN implementation from fundamentals, and stayed for the training pipeline. Thanks for a great video, it's the clearest explanation of RNN (in code) I've seen.
Thanks so much! Glad you like it
Awesome to see this as a video - I am re-learning pytorch after a long break and wondered if anyone had covered this tutorial
A perfect toturial to understand RNN and how it can be implemented. Thank you so much.
Bir iş bu kadar iyi yapılır mı ya? Adamsın adam.....
Best video I’ve ever seen on the rnn content thus far, thanks for the great video and keep up the good work 🎉
Exactly what I needed! Thank you so much!
The best RNN video on UA-cam! Thx for the efforts!
glad it was helpful!
This is gold. I really hope you can create a similar content with transformers
It seems your rnn structure is different from that provided in pytorch tutorial. But I think in a typical rnn model, .h2o( ) should be defined that takes 'hidden' as an input as apposed to a .i2o( ) that calculates output directly from combined.
I noticed it as well. I dont think its correct implementation
Your tutorial does a great job at showcasing RNN. However, you forgot to mention a train-validate-test split to see if the model actually generalizes (understands the concept) or if it just learns the names by heart.
Your "testing" uses names that the model was trained on. In an example like this it would be very important to use a train-validate-test split since without it you will never catch potential overfitting!
Thanks for the feedback, and yes absolutely correct. The code was heavily inspired by the original pytorch example which didn't split the data :( In most of my other tutorials I make sure to do this.
@@patloeber That's great :) If guess since RNN is a more advanced topic most viewers hopefully have seen another video from you before which includes the train-validate-test split :)
I've also done mistakes like deciding on hyperparameters or the model architecture based on the test instead of the validation set before.
Wow! This tutorial was really helpful in understanding and coding RNNs. Excellent!
Glad you enjoyed it!
can you believe I failed to hit the notification bell for this lovely channel? You rock!
Nice, thanks!
Your demonstration is easy to understand, thank you!
Thank you so much! This is a great tutorial on RNNs. You break down the hidden states really well!
Thank you:) glad you like it :)
Excellent! this video is so good - amazing explanation and intuition!
I enjoyed this tutorial. I run the code and added an accuracy calculation. I am getting about 60% accuracy using random data samples to form the training data. Russian names have more names on the list a little unbalanced there (minor nitpick)
Could you talk about Attention and Transformers? Your tutorials are easy to understand.
I am also interested in these topics!
Have you seen the explanations by Yannic Kilchner? I'm not knowledgable enough to tell what audience Yannic's teaching is intended for but it seems like all the comments are all very positive!
@@rubenpartono excuse me, who is Yannic? :)))
@@HieuTran-rt1mv He's a recently graduated Phd who posts deep learning related content on his UA-cam channel (Yannic Kilchner)! His video "Attention is All You Need" is his take on discussing the paper. He has a lot of stuff related to transformers. I think he doesn't do any follow-along code but he does explain his intuitive understanding of cool deep learning papers.
@@rubenpartono Thank you for your information, I will watch his videos now.
Your channel did boom after your freecodecamp videos, keep going strong💪
Yes :) I will!
@@patloeber I am starting a youtube channel myself, can I connect with you on LinkedIn.
I noticed you concatenated the input and the hidden state then fed that into the linear layer, however I see in other implementations that they multiply each by their weight matrices respectively then add them together before feeding it through an activation layer. Is there any reason you concatenate those two together?
This makes more sense. No idea why he concatenated.
Multiplying is "standard" but the beauty of NNs is you can connect things however you want (or however works best). When making the original tutorial I was experimenting with a bunch of ways to pass the state and in this case concatenating just converged way faster.
This is a great explanation and example as well. Love it!
thank you for this informative video!
This video is so good!!!!
Your tutorials are great I learnt so much.
This is a high quality video.
Thanks!
Thank you, Can I make a model from this to detect if the given sentence is a question?
Bro, you are the best. Looking for more pytorch contents from you. Also would definitely watch if you start a tensorflow 2 series 👍
Thanks! Yes TF will come soon
Based on your def train() function, init_hidden() would be called every time a new name from the dataset is used as an input. Why is it required every time? What would be the consequence of only calling it once(before the training loop) and never again? I thought that the hidden state contains important representations that adjust with every new training example, but over here it gets reset on every iteration. My understanding about what the hidden state represents may be flawed.
No your thinking is good. You can indeed use a RNN that knows about the whole sentence, or even multiple sentences...In this video to keep it simple I used a RNN that was trained only word by word, and for each word we looked at one character at a time. So it should learn based on the character sequences, so no need to store information from other words here...
can you tell me how should I get the accuracy / confidence of this model after each output
@4:24, I think video captioning would be better suited application for the rightmost architecture here, than video classification
Hello. Great content as always. Could you please explain how the following call works?
output, next_hidden = rnn(input_tensor, hidden_tensor)
It is calling the forward method in RNN class but is not explicit.
Using the __call__ method on a PyTorch model will always execute the forward pass
Could you share the font family using in this tutorial?
Hi, this is a really great tutorial on RNN pytorch!!
I have a question regards to the model, why there is no tanh in the model. And why pass the concat of input_tensor and hidden_tensor to the i2o, shouldn't it just pass the hidden_tensor to the i2o? Could you please clarify this? Thank you!
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(RNN, self).__init__()
self.hidden_size = hidden_size
self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
self.tanh = nn.Tanh()
self.i2o = nn.Linear(hidden_size, output_size)
self.softmax = nn.LogSoftmax(dim = 1) # (1,57): second dim
def forward(self, input_tensor, hidden_tensor):
combined = torch.cat((input_tensor, hidden_tensor), 1)
hidden = self.tanh(self.i2h(combined))
output = self.i2o(hidden)
output = self.softmax(output)
return output, hidden
It depends on the applications, yes sometimes we only pass the hidden cell to the next sequence step, but in this case the next step takes the previous hidden cell and a new input, so it needs the combined one. And in this simple example I didn't need TanH, but you are correct this is the default activation for RNNs
@@patloeberThanks!
hi, how would you implement this by using the torch.nn.RNN module?
Other than the use of a for loop to train your data instead of vector multiplication, this was an extremely informative and well-made video. Thanks!
Great video! However, I have a doubt. I am wondering shouldn't you keep the trained weights for hidden state while inferencing instead of calling `init_hidden` in the `predict` function? Please clarify.
My bad. I got it. The weights are infact the trained weights. Its just that hidden state starts from zero for all the new words while weights are shared among all `timesteps`. I am keeping the reply if someone else mistakenly think about it.
Great content bro
@Python Engineer please keep posting more projects
Thanks! I definitely want to do more projects.
hello, appreciated your advice as in I cannt really import the all_letters from utils
In the RNN forward method, I think this is not standard. You are computing the output at t, Ot, as a function of hidden state at t-1 and the input at t-1. As far as I have seen the more common way is computing ht from ht-1 and xt-1 and then use this ht to compute the Ot. Is this on purpose? Do you have any source to see and understand this implementation better? Thank you and great tutorials!
Thank you very very much for these tutorials. It is reaally helpful to get further. Sorry for not being active on patrion, I am still a student. Plesaee keep going and show us more possibilities in deep learning. Maybe a series on NLP or Time Series Forecasting?
i am glad i subscribed. understanding the concept is one thing. actually being able to teach is totally another. 😊
Thanks! Glad you like it:)
Please create a series on NLP in detail
I have a question cant we use for this problem transformers im stuck trying to use bert
i have a few questions if you dont mind. does this work on untrained data? also how can i implement rnn to reduce noise in an audio file? thanks. ☺
So I have some questions about the training procedure if you don't mind!
In the training loop, you give the model a random word from our dictionary 100000 times. My first question is , how do you know that the model didn't repeat some words several times? For examples maybe it repeated numerous times many Arabic or Chinese names, and on the other side didn't had access to any Portugese words. If that happened, wouldn't the low loss graph be misleading, since the model repeatedly saw the same words?
To my second question, wouldn't it be better if we split the data in train - test - val sets based on some percentage of our choosing? To make sure all the languages are represented accordingly, in all of the sets. Why did you chose this approach with the random words? Is it a common practice with NLP and RNNs?
Thank you very much!
this helped me do my homework lol thanks!!
Another awesome video. Thank you
Thanks!
From your example I don't understand if you run the sequences batch wise, e.g. 50 words in 1 batch. So the 1st letter from all words as an input, then the 2nd from all words etc.. in that way one would save time. Or is that not possible?
Top-notch video! Just a suggestion - the name of dictionaries category_lines and all_categories could have been more intuitive. Nonetheless, you explained it extremely well.
thanks! yes you are right :)
Interesting thing is that we dont have any nonlinearity between each time step and hidden states are mapped linearly from new input and previous hidden state to the next hidden state and its still so powerful!
relu or tanh are applied for non-linearity
@@patloeber i meant in this example that you showed, you didn't have nonlinearity right?
Hiii sir.
what are plot steps and print steps?
Can you explain?
Hi where would I find this program's code?
I was entering the names to check the country they belong to, and I found out that if the name starts with capital letter then it is assigned a different country than when the name starts with a small letter. Is it that the RNN is just memorizing the entire dataset and not generalizing ( meaning : over fitting) or maybe because the dataset has no name starting with a capital letter . Your help is appreciated. Thank you.
maybe all names should be transformed to lowercase first. any yes overfitting is very likely in this example
Why is super(RNN..) used in the init of RNN itself? Shouldn't it be super(nn.module) ? Please help me to understand
Is the way to combine the input and the hidden is the only way of doing that, or are there other methods for combining the next hidden to generate a new hidden and output? because I saw a method were they have two weights tensors, the first is multiplied by the input and the second by the previous hidden then they add them and apply the activation function to get the new hidden, then multiply that by another weight tensor and apply another activation function to get the output.
Best videos on PyTorch. Any plans on creating a Tensorflow playlist? @Python Engineer
Thanks! Yes indeed. You can have a look at my community tab, there I just announced to make a tenaorflow course soon
Thank you so much Sir.... Its a request to keep making videos on pytorch... Thank you..
Thanks! Yes I will :)
This is many to one RNN?
in the end where you trying guesses, perhaps all are correct because model have seen all those names already?
Good point. Although I draw random training samples, it is very likely that with the large number of iterations it has seen all the names already. A better approach would be of course to separate into training and testing sets, but evaluation of the model was not the point of this tutorial.
@@patloeber thanks for reply. i'd like to ask, are you planning any tutorials about BERT models?
Hi, Thanks for the video. It would be great if you could post a video on torch serve implementation of any of these models.
Thanks for the suggestion! Will add it to my list
Sir, can we use triplent loss in image classification,
If you re having 2000 image classes..
Or else normla cross entropy and softmax is enough
love your videos
Thanks!
Hi, I would like to know if it's just me or the English and French names get poor prediction performance? I tried to play a bit with the learning rate but still.
Great tutorial, thanks! Could you please tell me which VSCode theme do you use? It looks great!
Nightowl Theme :)
@@patloeber thanks :)
Hello, thanks for your video. I have one question: why you are using this loop in train function?
In 66 string:
for i in range(line_tensor.size()[0]):
output, hidden = rnn(line_tensor[i], hidden)
Here output will be rewritten for each letter in line_tensor however loss will be computed using last output. If i am false please fix.
Because we look at each character one by one
Very good video! I have next topic for you, reinforcement learning!
That’s a complex topic, but will definitely come sooner or later :)
@python_engineer
sir, last week I messaged in the python intermediate course. below error showing while executing the code. unable to pick up the developer who gave the info it will run in python 2.7. Can you help me?
WindowsError: [Error 193] %1 is not a valid Win32 application
Seems like you are not using batch-learning? Could you explain why?
Only to keep it simple in this tutorial. Also the dataset was not very big...
hey @pythonEngineer , I recently finished a python course at my university. Now I m feeling a little confident about python. I watched this video I'm not understanding most of the functions. what do you suggest ?. is it enough if i watch your pytorch tutorial
Yes you should watch my pytorch beginner course first
@@patloeber got it.
Thank you
Fantastic !
Thank you :)
Hey, can you make a full course in python in freecodecamp because you only did the Intermediate video so can you please upload a full course video
You mean an expert python course ?
@@patloeber yes
Ok I’ll consider it :)
very helpful
Glad to hear that
is there any Pytorch certification like tensorflow ??
good question! not that I know of...
Thank you a lot for your videos :)
I am trying to run the training and prediction on the GPU but it seems that I forgot to pass to device some tensor.
In particular I am passing to device:
- combined = torch.cat((input_tensor, hidden_tensor), 1).to(device)
- rnn = RNN(N_LETTERS, n_hidden, n_categories).to(device)
but it shows the following error:
All input tensors must be on the same device. Received cpu and cuda:0
so I suppose I'm missing something, but can't figure out what
yes all your tensors AND the model must be on the same device. so somewhere you are missing the .to(device) call
very nice tutorial sir, could you make a model on Signature Verification ?
please must reply me
Can you make a chatbot ai in python using rnn
I already have a chatbot tutorial with PyTorch on my channel, but yes probably I make a more advanced one in the future.
Carter Radial
Why is video classification many to many?
Only if you want to do a classification for each frame
Perez Scott Clark Sandra Robinson Nancy
got an error trying to run this
ImportError: cannot import name 'load_data' from 'utils'
Does anyone know how to resolve this?
@Gonzalo Polo
@Patrick Loeber