PyTorch RNN Tutorial - Name Classification Using A Recurrent Neural Net

Поділитися
Вставка
  • Опубліковано 26 лис 2024

КОМЕНТАРІ • 136

  • @teetanrobotics5363
    @teetanrobotics5363 4 роки тому +32

    One of the best pytorch channels, i have ever encountered

  • @MartyAckerman310
    @MartyAckerman310 4 роки тому +27

    I came for the RNN implementation from fundamentals, and stayed for the training pipeline. Thanks for a great video, it's the clearest explanation of RNN (in code) I've seen.

    • @patloeber
      @patloeber  4 роки тому

      Thanks so much! Glad you like it

  • @sprobertson
    @sprobertson 10 днів тому

    Awesome to see this as a video - I am re-learning pytorch after a long break and wondered if anyone had covered this tutorial

  • @samannaghavi4477
    @samannaghavi4477 Рік тому +1

    A perfect toturial to understand RNN and how it can be implemented. Thank you so much.

  • @yildirimakbal6723
    @yildirimakbal6723 Рік тому

    Bir iş bu kadar iyi yapılır mı ya? Adamsın adam.....

  • @jaxx6712
    @jaxx6712 5 місяців тому

    Best video I’ve ever seen on the rnn content thus far, thanks for the great video and keep up the good work 🎉

  • @Borey567
    @Borey567 5 місяців тому +1

    Exactly what I needed! Thank you so much!

  • @stancode7228
    @stancode7228 3 роки тому

    The best RNN video on UA-cam! Thx for the efforts!

  • @mraihanafiandi
    @mraihanafiandi 2 роки тому +1

    This is gold. I really hope you can create a similar content with transformers

  • @stonejiang1162
    @stonejiang1162 Рік тому +4

    It seems your rnn structure is different from that provided in pytorch tutorial. But I think in a typical rnn model, .h2o( ) should be defined that takes 'hidden' as an input as apposed to a .i2o( ) that calculates output directly from combined.

    • @Szymon-vp7tn
      @Szymon-vp7tn Рік тому

      I noticed it as well. I dont think its correct implementation

  • @floriandonhauser2383
    @floriandonhauser2383 3 роки тому +8

    Your tutorial does a great job at showcasing RNN. However, you forgot to mention a train-validate-test split to see if the model actually generalizes (understands the concept) or if it just learns the names by heart.
    Your "testing" uses names that the model was trained on. In an example like this it would be very important to use a train-validate-test split since without it you will never catch potential overfitting!

    • @patloeber
      @patloeber  3 роки тому +3

      Thanks for the feedback, and yes absolutely correct. The code was heavily inspired by the original pytorch example which didn't split the data :( In most of my other tutorials I make sure to do this.

    • @floriandonhauser2383
      @floriandonhauser2383 3 роки тому +3

      @@patloeber That's great :) If guess since RNN is a more advanced topic most viewers hopefully have seen another video from you before which includes the train-validate-test split :)
      I've also done mistakes like deciding on hyperparameters or the model architecture based on the test instead of the validation set before.

  • @navintiwari
    @navintiwari 3 роки тому +2

    Wow! This tutorial was really helpful in understanding and coding RNNs. Excellent!

    • @patloeber
      @patloeber  3 роки тому +1

      Glad you enjoyed it!

  • @terraflops
    @terraflops 4 роки тому

    can you believe I failed to hit the notification bell for this lovely channel? You rock!

  • @coc2912
    @coc2912 Рік тому

    Your demonstration is easy to understand, thank you!

  • @EasyCodesforKids
    @EasyCodesforKids 4 роки тому +5

    Thank you so much! This is a great tutorial on RNNs. You break down the hidden states really well!

    • @patloeber
      @patloeber  4 роки тому

      Thank you:) glad you like it :)

  • @NehaJoshi-x5n
    @NehaJoshi-x5n 7 місяців тому

    Excellent! this video is so good - amazing explanation and intuition!

  • @felix9x
    @felix9x Рік тому

    I enjoyed this tutorial. I run the code and added an accuracy calculation. I am getting about 60% accuracy using random data samples to form the training data. Russian names have more names on the list a little unbalanced there (minor nitpick)

  • @HieuTran-rt1mv
    @HieuTran-rt1mv 4 роки тому +5

    Could you talk about Attention and Transformers? Your tutorials are easy to understand.

    • @xinjin871
      @xinjin871 3 роки тому

      I am also interested in these topics!

    • @rubenpartono
      @rubenpartono 3 роки тому

      Have you seen the explanations by Yannic Kilchner? I'm not knowledgable enough to tell what audience Yannic's teaching is intended for but it seems like all the comments are all very positive!

    • @HieuTran-rt1mv
      @HieuTran-rt1mv 3 роки тому

      @@rubenpartono excuse me, who is Yannic? :)))

    • @rubenpartono
      @rubenpartono 3 роки тому +1

      @@HieuTran-rt1mv He's a recently graduated Phd who posts deep learning related content on his UA-cam channel (Yannic Kilchner)! His video "Attention is All You Need" is his take on discussing the paper. He has a lot of stuff related to transformers. I think he doesn't do any follow-along code but he does explain his intuitive understanding of cool deep learning papers.

    • @HieuTran-rt1mv
      @HieuTran-rt1mv 3 роки тому

      @@rubenpartono Thank you for your information, I will watch his videos now.

  • @IAmTheMainCharacter
    @IAmTheMainCharacter 4 роки тому

    Your channel did boom after your freecodecamp videos, keep going strong💪

    • @patloeber
      @patloeber  4 роки тому

      Yes :) I will!

    • @IAmTheMainCharacter
      @IAmTheMainCharacter 4 роки тому

      @@patloeber I am starting a youtube channel myself, can I connect with you on LinkedIn.

  • @Jfantab
    @Jfantab Рік тому +3

    I noticed you concatenated the input and the hidden state then fed that into the linear layer, however I see in other implementations that they multiply each by their weight matrices respectively then add them together before feeding it through an activation layer. Is there any reason you concatenate those two together?

    • @aakashdusane
      @aakashdusane Рік тому

      This makes more sense. No idea why he concatenated.

    • @sprobertson
      @sprobertson 10 днів тому +1

      Multiplying is "standard" but the beauty of NNs is you can connect things however you want (or however works best). When making the original tutorial I was experimenting with a bunch of ways to pass the state and in this case concatenating just converged way faster.

  • @affahrizain
    @affahrizain 2 роки тому +1

    This is a great explanation and example as well. Love it!

  • @pangqiu4854
    @pangqiu4854 3 роки тому +1

    thank you for this informative video!

  • @alioraqsa
    @alioraqsa Рік тому +1

    This video is so good!!!!

  • @lifeislarge
    @lifeislarge 2 роки тому

    Your tutorials are great I learnt so much.

  • @user-or7ji5hv8y
    @user-or7ji5hv8y 4 роки тому +1

    This is a high quality video.

  • @habeang304
    @habeang304 3 роки тому +1

    Thank you, Can I make a model from this to detect if the given sentence is a question?

  • @ashwinprasad5180
    @ashwinprasad5180 4 роки тому +3

    Bro, you are the best. Looking for more pytorch contents from you. Also would definitely watch if you start a tensorflow 2 series 👍

    • @patloeber
      @patloeber  4 роки тому

      Thanks! Yes TF will come soon

  • @TechnGizmos
    @TechnGizmos 4 роки тому +2

    Based on your def train() function, init_hidden() would be called every time a new name from the dataset is used as an input. Why is it required every time? What would be the consequence of only calling it once(before the training loop) and never again? I thought that the hidden state contains important representations that adjust with every new training example, but over here it gets reset on every iteration. My understanding about what the hidden state represents may be flawed.

    • @patloeber
      @patloeber  4 роки тому +3

      No your thinking is good. You can indeed use a RNN that knows about the whole sentence, or even multiple sentences...In this video to keep it simple I used a RNN that was trained only word by word, and for each word we looked at one character at a time. So it should learn based on the character sequences, so no need to store information from other words here...

  • @PUNERI_TONY
    @PUNERI_TONY Рік тому

    can you tell me how should I get the accuracy / confidence of this model after each output

  • @INGLERAJKAMALRAJENDRA
    @INGLERAJKAMALRAJENDRA 3 місяці тому

    @4:24, I think video captioning would be better suited application for the rightmost architecture here, than video classification

  • @sighedsighed8882
    @sighedsighed8882 4 роки тому +2

    Hello. Great content as always. Could you please explain how the following call works?
    output, next_hidden = rnn(input_tensor, hidden_tensor)
    It is calling the forward method in RNN class but is not explicit.

    • @patloeber
      @patloeber  4 роки тому +3

      Using the __call__ method on a PyTorch model will always execute the forward pass

  • @olsay
    @olsay 3 місяці тому

    Could you share the font family using in this tutorial?

  • @wenqinliu6522
    @wenqinliu6522 4 роки тому +1

    Hi, this is a really great tutorial on RNN pytorch!!
    I have a question regards to the model, why there is no tanh in the model. And why pass the concat of input_tensor and hidden_tensor to the i2o, shouldn't it just pass the hidden_tensor to the i2o? Could you please clarify this? Thank you!
    class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
    super(RNN, self).__init__()
    self.hidden_size = hidden_size
    self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
    self.tanh = nn.Tanh()
    self.i2o = nn.Linear(hidden_size, output_size)
    self.softmax = nn.LogSoftmax(dim = 1) # (1,57): second dim
    def forward(self, input_tensor, hidden_tensor):
    combined = torch.cat((input_tensor, hidden_tensor), 1)
    hidden = self.tanh(self.i2h(combined))
    output = self.i2o(hidden)
    output = self.softmax(output)
    return output, hidden

    • @patloeber
      @patloeber  4 роки тому +2

      It depends on the applications, yes sometimes we only pass the hidden cell to the next sequence step, but in this case the next step takes the previous hidden cell and a new input, so it needs the combined one. And in this simple example I didn't need TanH, but you are correct this is the default activation for RNNs

    • @wenqinliu6522
      @wenqinliu6522 4 роки тому

      @@patloeberThanks!

  • @ugestacoolie5998
    @ugestacoolie5998 6 місяців тому

    hi, how would you implement this by using the torch.nn.RNN module?

  • @jacksonrolando403
    @jacksonrolando403 3 роки тому +1

    Other than the use of a for loop to train your data instead of vector multiplication, this was an extremely informative and well-made video. Thanks!

  • @mohitgupta5000
    @mohitgupta5000 10 місяців тому

    Great video! However, I have a doubt. I am wondering shouldn't you keep the trained weights for hidden state while inferencing instead of calling `init_hidden` in the `predict` function? Please clarify.

    • @mohitgupta5000
      @mohitgupta5000 10 місяців тому

      My bad. I got it. The weights are infact the trained weights. Its just that hidden state starts from zero for all the new words while weights are shared among all `timesteps`. I am keeping the reply if someone else mistakenly think about it.

  • @donfeto7636
    @donfeto7636 Місяць тому

    Great content bro

  • @prabaldutta1935
    @prabaldutta1935 4 роки тому

    @Python Engineer please keep posting more projects

    • @patloeber
      @patloeber  4 роки тому

      Thanks! I definitely want to do more projects.

  • @mmk4732
    @mmk4732 2 роки тому

    hello, appreciated your advice as in I cannt really import the all_letters from utils

  • @gonzalopolo2612
    @gonzalopolo2612 Рік тому

    In the RNN forward method, I think this is not standard. You are computing the output at t, Ot, as a function of hidden state at t-1 and the input at t-1. As far as I have seen the more common way is computing ht from ht-1 and xt-1 and then use this ht to compute the Ot. Is this on purpose? Do you have any source to see and understand this implementation better? Thank you and great tutorials!

  • @alteshaus3149
    @alteshaus3149 3 роки тому +1

    Thank you very very much for these tutorials. It is reaally helpful to get further. Sorry for not being active on patrion, I am still a student. Plesaee keep going and show us more possibilities in deep learning. Maybe a series on NLP or Time Series Forecasting?

  • @n000d13s
    @n000d13s 4 роки тому +1

    i am glad i subscribed. understanding the concept is one thing. actually being able to teach is totally another. 😊

    • @patloeber
      @patloeber  4 роки тому

      Thanks! Glad you like it:)

  • @pallavisaxena7496
    @pallavisaxena7496 2 роки тому

    Please create a series on NLP in detail

  • @amindhahri2542
    @amindhahri2542 Рік тому

    I have a question cant we use for this problem transformers im stuck trying to use bert

  • @n000d13s
    @n000d13s 4 роки тому +1

    i have a few questions if you dont mind. does this work on untrained data? also how can i implement rnn to reduce noise in an audio file? thanks. ☺

  • @bhar92
    @bhar92 3 роки тому

    So I have some questions about the training procedure if you don't mind!
    In the training loop, you give the model a random word from our dictionary 100000 times. My first question is , how do you know that the model didn't repeat some words several times? For examples maybe it repeated numerous times many Arabic or Chinese names, and on the other side didn't had access to any Portugese words. If that happened, wouldn't the low loss graph be misleading, since the model repeatedly saw the same words?
    To my second question, wouldn't it be better if we split the data in train - test - val sets based on some percentage of our choosing? To make sure all the languages are represented accordingly, in all of the sets. Why did you chose this approach with the random words? Is it a common practice with NLP and RNNs?
    Thank you very much!

  • @mingzhouzhu4668
    @mingzhouzhu4668 3 роки тому

    this helped me do my homework lol thanks!!

  • @VishnuRadhakrishnaPillai
    @VishnuRadhakrishnaPillai 4 роки тому

    Another awesome video. Thank you

  • @stellamn
    @stellamn 3 роки тому

    From your example I don't understand if you run the sequences batch wise, e.g. 50 words in 1 batch. So the 1st letter from all words as an input, then the 2nd from all words etc.. in that way one would save time. Or is that not possible?

  • @sagnikrana7016
    @sagnikrana7016 3 роки тому

    Top-notch video! Just a suggestion - the name of dictionaries category_lines and all_categories could have been more intuitive. Nonetheless, you explained it extremely well.

    • @patloeber
      @patloeber  3 роки тому

      thanks! yes you are right :)

  • @raminessalat9803
    @raminessalat9803 3 роки тому

    Interesting thing is that we dont have any nonlinearity between each time step and hidden states are mapped linearly from new input and previous hidden state to the next hidden state and its still so powerful!

    • @patloeber
      @patloeber  3 роки тому +1

      relu or tanh are applied for non-linearity

    • @raminessalat9803
      @raminessalat9803 3 роки тому

      @@patloeber i meant in this example that you showed, you didn't have nonlinearity right?

  • @monishraju4789
    @monishraju4789 Рік тому

    Hiii sir.
    what are plot steps and print steps?
    Can you explain?

  • @DarkF4lcon
    @DarkF4lcon 3 роки тому

    Hi where would I find this program's code?

  • @anantshukla6092
    @anantshukla6092 3 роки тому +1

    I was entering the names to check the country they belong to, and I found out that if the name starts with capital letter then it is assigned a different country than when the name starts with a small letter. Is it that the RNN is just memorizing the entire dataset and not generalizing ( meaning : over fitting) or maybe because the dataset has no name starting with a capital letter . Your help is appreciated. Thank you.

    • @patloeber
      @patloeber  3 роки тому +1

      maybe all names should be transformed to lowercase first. any yes overfitting is very likely in this example

  • @jjjw516
    @jjjw516 10 місяців тому

    Why is super(RNN..) used in the init of RNN itself? Shouldn't it be super(nn.module) ? Please help me to understand

  • @billy-cg1qq
    @billy-cg1qq Рік тому

    Is the way to combine the input and the hidden is the only way of doing that, or are there other methods for combining the next hidden to generate a new hidden and output? because I saw a method were they have two weights tensors, the first is multiplied by the input and the second by the previous hidden then they add them and apply the activation function to get the new hidden, then multiply that by another weight tensor and apply another activation function to get the output.

  • @tejamarneni
    @tejamarneni 4 роки тому

    Best videos on PyTorch. Any plans on creating a Tensorflow playlist? @Python Engineer

    • @patloeber
      @patloeber  4 роки тому

      Thanks! Yes indeed. You can have a look at my community tab, there I just announced to make a tenaorflow course soon

  • @muhammadkhattak4819
    @muhammadkhattak4819 4 роки тому

    Thank you so much Sir.... Its a request to keep making videos on pytorch... Thank you..

  • @pranilpatil4109
    @pranilpatil4109 6 місяців тому

    This is many to one RNN?

  • @gsom2000
    @gsom2000 4 роки тому

    in the end where you trying guesses, perhaps all are correct because model have seen all those names already?

    • @patloeber
      @patloeber  4 роки тому +1

      Good point. Although I draw random training samples, it is very likely that with the large number of iterations it has seen all the names already. A better approach would be of course to separate into training and testing sets, but evaluation of the model was not the point of this tutorial.

    • @gsom2000
      @gsom2000 4 роки тому

      @@patloeber thanks for reply. i'd like to ask, are you planning any tutorials about BERT models?

  • @chakra-ai
    @chakra-ai 4 роки тому

    Hi, Thanks for the video. It would be great if you could post a video on torch serve implementation of any of these models.

    • @patloeber
      @patloeber  4 роки тому

      Thanks for the suggestion! Will add it to my list

  • @shaikrasool1316
    @shaikrasool1316 4 роки тому

    Sir, can we use triplent loss in image classification,
    If you re having 2000 image classes..
    Or else normla cross entropy and softmax is enough

  • @yuriihalychanskyi8764
    @yuriihalychanskyi8764 4 роки тому

    love your videos

  • @genetixx01
    @genetixx01 3 роки тому

    Hi, I would like to know if it's just me or the English and French names get poor prediction performance? I tried to play a bit with the learning rate but still.

  • @pezosanta
    @pezosanta 4 роки тому

    Great tutorial, thanks! Could you please tell me which VSCode theme do you use? It looks great!

    • @patloeber
      @patloeber  4 роки тому +2

      Nightowl Theme :)

    • @pezosanta
      @pezosanta 4 роки тому

      @@patloeber thanks :)

  • @sof_ai
    @sof_ai 4 роки тому

    Hello, thanks for your video. I have one question: why you are using this loop in train function?
    In 66 string:
    for i in range(line_tensor.size()[0]):
    output, hidden = rnn(line_tensor[i], hidden)
    Here output will be rewritten for each letter in line_tensor however loss will be computed using last output. If i am false please fix.

    • @patloeber
      @patloeber  4 роки тому

      Because we look at each character one by one

  • @kitgary
    @kitgary 4 роки тому

    Very good video! I have next topic for you, reinforcement learning!

    • @patloeber
      @patloeber  4 роки тому

      That’s a complex topic, but will definitely come sooner or later :)

  • @chandranbose5804
    @chandranbose5804 4 роки тому

    @python_engineer
    sir, last week I messaged in the python intermediate course. below error showing while executing the code. unable to pick up the developer who gave the info it will run in python 2.7. Can you help me?
    WindowsError: [Error 193] %1 is not a valid Win32 application

  • @Небудьбараном-к1м
    @Небудьбараном-к1м 4 роки тому

    Seems like you are not using batch-learning? Could you explain why?

    • @patloeber
      @patloeber  4 роки тому +1

      Only to keep it simple in this tutorial. Also the dataset was not very big...

  • @manjunathjayam4895
    @manjunathjayam4895 4 роки тому

    hey @pythonEngineer , I recently finished a python course at my university. Now I m feeling a little confident about python. I watched this video I'm not understanding most of the functions. what do you suggest ?. is it enough if i watch your pytorch tutorial

    • @patloeber
      @patloeber  4 роки тому +1

      Yes you should watch my pytorch beginner course first

    • @manjunathjayam4895
      @manjunathjayam4895 4 роки тому

      @@patloeber got it.
      Thank you

  • @sarahjamal86
    @sarahjamal86 4 роки тому

    Fantastic !

  • @swarnalathavura1334
    @swarnalathavura1334 4 роки тому

    Hey, can you make a full course in python in freecodecamp because you only did the Intermediate video so can you please upload a full course video

  • @makannikmat5679
    @makannikmat5679 3 роки тому

    very helpful

  • @pra8495
    @pra8495 3 роки тому

    is there any Pytorch certification like tensorflow ??

    • @patloeber
      @patloeber  3 роки тому

      good question! not that I know of...

  • @sergiozavota7099
    @sergiozavota7099 4 роки тому

    Thank you a lot for your videos :)
    I am trying to run the training and prediction on the GPU but it seems that I forgot to pass to device some tensor.
    In particular I am passing to device:
    - combined = torch.cat((input_tensor, hidden_tensor), 1).to(device)
    - rnn = RNN(N_LETTERS, n_hidden, n_categories).to(device)
    but it shows the following error:
    All input tensors must be on the same device. Received cpu and cuda:0
    so I suppose I'm missing something, but can't figure out what

    • @patloeber
      @patloeber  4 роки тому

      yes all your tensors AND the model must be on the same device. so somewhere you are missing the .to(device) call

  • @razi_official
    @razi_official 4 роки тому

    very nice tutorial sir, could you make a model on Signature Verification ?
    please must reply me

  • @chandradeepsingh.8661
    @chandradeepsingh.8661 4 роки тому

    Can you make a chatbot ai in python using rnn

    • @patloeber
      @patloeber  4 роки тому

      I already have a chatbot tutorial with PyTorch on my channel, but yes probably I make a more advanced one in the future.

  • @KennethScott-w8g
    @KennethScott-w8g Місяць тому

    Carter Radial

  • @user-or7ji5hv8y
    @user-or7ji5hv8y 4 роки тому

    Why is video classification many to many?

    • @patloeber
      @patloeber  4 роки тому

      Only if you want to do a classification for each frame

  • @CaraParker-j7n
    @CaraParker-j7n 3 місяці тому

    Perez Scott Clark Sandra Robinson Nancy

  • @GoForwardPs34
    @GoForwardPs34 Рік тому

    got an error trying to run this
    ImportError: cannot import name 'load_data' from 'utils'
    Does anyone know how to resolve this?
    @Gonzalo Polo
    @Patrick Loeber