LSTM Networks: Explained Step by Step!

Поділитися
Вставка
  • Опубліковано 25 жов 2024

КОМЕНТАРІ • 48

  • @rajpulapakura001
    @rajpulapakura001 11 місяців тому +8

    At first I was inclined to click away from the video because of the unorthodox explanation of LSTM in "steps", which was different to what I had seen in other videos and blog posts which focus on the infamous LSTM diagram. However, I was struggling to fully grasp LSTMs so I decided to give the video a try. And it paid off! I can't believe LSTMs are that simple! This video is absolutely essential for understanding LSTMs at a fundamental level.

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w Рік тому +19

    the great thing about your videos is that I am always guaranteed to learn something and learn it with much better understanding.

  • @LifeKiT-i
    @LifeKiT-i Рік тому +4

    i have listened to a 2-hour lecture in my MSc data science, still don't know what is happening. Your video is explain it in a succinct way!!! Thank you!!!

  • @juaneshberger9567
    @juaneshberger9567 Рік тому +8

    Your videos (specifically sampling and deep learning videos) helped me a lot during my master's. Thanks for all the videos!

  • @vzinko
    @vzinko Рік тому +2

    Best explanation of LSTMs on the internet

  • @thankgoodnessitstheweekend2860
    @thankgoodnessitstheweekend2860 8 місяців тому

    Thank you so much for your videos! They are super informative and much more intuitive than the hundreds of slides I have from my master's class. Keep up the great work!

  • @chaitrab9253
    @chaitrab9253 Рік тому +1

    Plz continue the same good work by blending Mathematics with simple Real time example. Fantastic Explanation👍

  • @karanmaniyar9086
    @karanmaniyar9086 Рік тому +3

    I've been trying to understand LSTM through multiple blogs and videos but the thing that why it needs to be this complex , you specifically targeted that point of view to understand it , this is really one of the best videos , because you showed why there was a need for a LSTM and how could the gaps be filled , which is what made it very easy to understand . Could you please list the references as well for the video , so that if anyone has to go further deep into the concepts, it would be very helpful ! Thanks a lot for this video !

  • @charleskangai4618
    @charleskangai4618 Рік тому +1

    Extremely good and helpful! A great genuine desire to help learners by explaining difficult ideas in a most self effacing manner! Many thanks!

  • @carlosenriquehuapayaavalos6297

    Thanks for the video!! Just what I needed for my ML midterm exam. Will be waiting for the Transformers topic that I believe build upon this concept.

  • @pushkarparanjpe
    @pushkarparanjpe Рік тому

    This is an extremely good explanation. Thanks for all the effort and sharing!!

  • @rizkabritania2429
    @rizkabritania2429 8 місяців тому

    super helpful. I cant thank you enough for making this explanation

  • @DarkAtom04
    @DarkAtom04 Рік тому

    Amazing explanation!

  • @jianhuali4080
    @jianhuali4080 9 місяців тому

    great job on this topic explanation!

  • @golnoushghiasi7698
    @golnoushghiasi7698 Рік тому

    So happy you did this video!!! :D Thank you for all the great work!

  • @romainjouhameau2764
    @romainjouhameau2764 Рік тому

    Really easy to follow ! Thanks a lot

  • @santiagolicea3814
    @santiagolicea3814 6 місяців тому

    I love your videos, keep up the awesome work!!!

  • @KarthikNaga329
    @KarthikNaga329 Рік тому +2

    Great video as always!
    The part that still perplexes me:
    How does the LSTM "know" what is important (like dog) and when to actually use that to predict the next word?

    • @Arjun----
      @Arjun---- Рік тому

      It learns through lots of training, just like every other aspect of DL

  • @billdepo1
    @billdepo1 Рік тому

    Just amazing intuition! Thanks so much for the great content.

    • @ritvikmath
      @ritvikmath  Рік тому

      Glad you enjoyed it!

    • @isha2567
      @isha2567 Рік тому

      Thanks! Could you also explain GRU

  • @juaneshberger9567
    @juaneshberger9567 Рік тому

    A video about transformers and GANs in this style would be awesome as well.

  • @nasgaroth1
    @nasgaroth1 Рік тому +1

    Awesome, like always - good job ;-)

  • @_Sam_-zh7sw
    @_Sam_-zh7sw Рік тому +3

    Hi Ritvik,
    It will be really great if you could create videos which explain maths behind ML models like SVM and PCA.I am also curious about ODE, PDE, real analysis, complex analysi and stochastic calculus. But the problem is that i want to explore topics which are relevant to financial engineering. So i could read all quant finance related textbooks. I am a professional and really dont have time to read all applied maths textbooks 😅.

    • @gordongoodwin6279
      @gordongoodwin6279 11 місяців тому

      he has covered the ML models you listed in depth. it's illegal to cover ODE and PDE though bc nobody likes them

  • @prateekcaire4193
    @prateekcaire4193 Рік тому

    Thank you very much! I have a few questions:
    1. Could you please explain the reasoning behind using a candidate cell state and why the tanh activation function is necessary?
    2. I have noticed that many implementations, papers, or blogs I have read use concatenation of h[t-1] and x[t] and a single learnable weight matrix W instead of U and V used in this video. Can you clarify why this is the case?
    3. Despite the success of the model in predicting words, I remain somewhat skeptical about how it achieves such accuracy. :)

    • @Arjun----
      @Arjun---- Рік тому

      1. The candidate cell state is simply the same weighted combination of input and previous hidden state, just like all the other gates. However, the other gates pass through a sigmoid because we want to essentially binarize them. We want them to be ~0 or 1, which is what a sigmoid does. This way, these gates function as True/False activations, letting data through with a 1 and stopping data with a 0. With the candiate cell state, we don't want a binarized output. Rather, we want all the information from the input state and the hidden state. Applying tanh moves the bounds between -1 and 1, which allows it to retain more information (basically we don't want 0s)
      To sumarize, we use the candidate cell state to obtain the actual information from the input and prior hidden state. This is why tanh is applied. The canditate cell state is then multiplied by the output gate (which IS 0s and 1s), and that determines how much of its information should be passed on to the actual cell state
      2. The concatenation of the weight matrices is the same thing as what he showed, it's just a little bit more efficient to store all the numbers in a single matrix. Conceptually its exactly the same though.
      3. That's kind of how it is with all of deep learning lol

  • @teetanrobotics5363
    @teetanrobotics5363 Рік тому +1

    Hi Ritvik. It would be amazing if you could better organize the playlists.(chronological and right videos in right playlists)

  • @davigiordano3288
    @davigiordano3288 9 місяців тому

    Thank you

  • @sophia17965
    @sophia17965 Рік тому

    Isn't h9 also affected by x9?

  • @pauledam2174
    @pauledam2174 Рік тому

    This is probably a not-so-great comment but something seems "wrong" with the gradient descent method (and the chain rule), if it generates a vanishing gradient problem (or exploding gradient problem). I mean, it isn't really "matching reality", because in reality, we can speak about a character on page 500 that was introduced on page 1. We are forced to apply this "band aid" of LSTM because the basic method of gradient descent is not "good enough" or in some way it is artificial. Does anyone agree with this? I am not sure what I mean by "matching reality".

  • @huntcookies5156
    @huntcookies5156 Рік тому +1

    what da dogg doin😄😄

  • @anantsingh75
    @anantsingh75 11 місяців тому

    Bro literally explained what the dog doing

  • @tamirfri1
    @tamirfri1 Рік тому +1

    who cares? we have transformers

    • @ritvikmath
      @ritvikmath  Рік тому +4

      It’s part of the series building up to more complex things

    • @Gowthamsrinivasan
      @Gowthamsrinivasan Рік тому +1

      lstms are still surprisingly powerful for a lot of applications.