PYTORCH COMMON MISTAKES - How To Save Time 🕒

Поділитися
Вставка
  • Опубліковано 20 лип 2024
  • In this video I show you 10 common Pytorch mistakes and by avoiding these you will save a lot time on debugging models. This was inspired by a tweet by Andrej Karpathy and that's why I said it was approved by him :)
    Andrej Karpathy Tweet:
    / 1013244313327681536
    ❤️ Support the channel ❤️
    / @aladdinpersson
    Paid Courses I recommend for learning (affiliate links, no extra cost for you):
    ⭐ Machine Learning Specialization bit.ly/3hjTBBt
    ⭐ Deep Learning Specialization bit.ly/3YcUkoI
    📘 MLOps Specialization bit.ly/3wibaWy
    📘 GAN Specialization bit.ly/3FmnZDl
    📘 NLP Specialization bit.ly/3GXoQuP
    ✨ Free Resources that are great:
    NLP: web.stanford.edu/class/cs224n/
    CV: cs231n.stanford.edu/
    Deployment: fullstackdeeplearning.com/
    FastAI: www.fast.ai/
    💻 My Deep Learning Setup and Recording Setup:
    www.amazon.com/shop/aladdinpe...
    GitHub Repository:
    github.com/aladdinpersson/Mac...
    ✅ One-Time Donations:
    Paypal: bit.ly/3buoRYH
    ▶️ You Can Connect with me on:
    Twitter - / aladdinpersson
    LinkedIn - / aladdin-persson-a95384153
    Github - github.com/aladdinpersson
    OUTLINE:
    0:00 - Introduction
    0:21 - 1. Didn't overfit batch
    2:45 - 2. Forgot toggle train/eval
    4:47 - 3. Forgot .zero_grad()
    6:15 - 4. Softmax when using CrossEntropy
    8:09 - 5. Bias term with BatchNorm
    9:54 - 6. Using view as permute
    12:10 - 7. Incorrect Data Augmentation
    14:19 - 8. Not Shuffling Data
    15:28 - 9. Not Normalizing Data
    17:28 - 10. Not Clipping Gradients
    18:40 - Which ones did I miss?

КОМЕНТАРІ • 108

  • @AladdinPersson
    @AladdinPersson  4 роки тому +25

    Here is the outline for the video, let me know which ones you think I missed:
    0:00 - Introduction
    0:21 - 1. Didn't overfit batch
    2:45 - 2. Forgot toggle train/eval
    4:47 - 3. Forgot .zero_grad()
    6:15 - 4. Softmax when using CrossEntropy
    8:09 - 5. Bias term with BatchNorm
    9:54 - 6. Using view as permute
    12:10 - 7. Incorrect Data Augmentation
    14:19 - 8. Not Shuffling Data
    15:28 - 9. Not Normalizing Data
    17:28 - 10. Not Clipping Gradients
    18:40 - Which ones did I miss?

    • @seanbenhur
      @seanbenhur 3 роки тому

      Shape mismatch error!

    • @caoviethainam9363
      @caoviethainam9363 3 роки тому +2

      save the model and not to rerun the whole shit.

    • @eshtaranyal3011
      @eshtaranyal3011 2 роки тому

      CUDA OOM error

    • @vijayak7308
      @vijayak7308 2 роки тому

      I'm getting error : RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same. I have assigned the input and model to 'cuda'. could you throw some light on this.

  • @Hoxle-87
    @Hoxle-87 3 роки тому +8

    So much good info here. I’ve been doing ML for 5 years n it is always good to review the basics every now n then.

  • @sagnikroy6405
    @sagnikroy6405 2 роки тому +2

    This channel doesn't provide the basic tutorials which are there in the documentations and that's why it's very awesome. Thanks for your genuine content :D

  • @igordemidion9912
    @igordemidion9912 4 роки тому +17

    Common mistakes for me:
    Getting confused with tensor dimensions (as a new guy you can spend plenty of time before harnessing the power of unsqueeze())
    Forgetting .cuda() or .to(device)
    Getting confused with convnet dimensions after conv layer is applied
    Not attempting to balance or disbalance the dataset on purpose, which can be useful
    etc.
    Love your
    videos man, they've helped me alot.

    • @AladdinPersson
      @AladdinPersson  4 роки тому +2

      Those are some great things to keep in mind! Thank you, I appreciate you taking the time to comment

  • @FaisalAES
    @FaisalAES 4 роки тому +3

    I honestly didn’t expect this video to be this professional and informative judging by the thumbnail and title

  • @siyuancheng9575
    @siyuancheng9575 2 роки тому

    Many many many thanks to your video! The contents are all gold to newbie pytorch user and such a great guide!

  • @moorchini
    @moorchini 3 роки тому

    That is a perfect video, really thankful.
    Would you plz tell me what's the best way to get the Accuracy of multiclass classification?

  • @gomeincraft
    @gomeincraft 4 роки тому +3

    Great thanks from Russia. Really love your videos. In a very short time a got PyTorch essentials with the help of yours. So many models have been understood and implemented with your help. Keep it going buddy!!!!

    • @AladdinPersson
      @AladdinPersson  4 роки тому +1

      To hear that makes me very happy, thank you :)

  • @tarakapoor5085
    @tarakapoor5085 3 роки тому

    Thank you for this super helpful video! Do you have to do transformations and normalize your data (if it is images), or can you just feed in the pixel array without transformation/normalization?

  • @user-or7ji5hv8y
    @user-or7ji5hv8y 3 роки тому +2

    These practical tips are really useful.

  • @saminchowdhury7995
    @saminchowdhury7995 3 роки тому

    I made all these mistakes when I was newbie at Pytorch and still do it now sometimes
    This is a very helpful video

  • @sulavojha8322
    @sulavojha8322 4 роки тому +2

    Extremely informative as always.
    Thank you !

  • @hadjdaoudmomo9534
    @hadjdaoudmomo9534 3 роки тому

    Extremely helpful! thanks a lot!!

  • @ogito999
    @ogito999 3 роки тому +1

    The batch norm and bias evaluation difference is probably due to the randomness inherent in initializing 2 sets of biases instead of just 1.

  • @emanuelhuber4312
    @emanuelhuber4312 2 роки тому

    The first tip just led me to the solution. Thanks!

  • @Ip_man22
    @Ip_man22 3 роки тому

    Really helpful! Thank you so much.

  • @floriansommer1094
    @floriansommer1094 3 роки тому

    Thank you so much, you really safed me a lot of time :)

  • @lakeguy65616
    @lakeguy65616 3 роки тому

    great video, very informative! thank you!

  • @MrCmon113
    @MrCmon113 9 місяців тому

    Some of my favourites are breaking the computational graph (e.g. using numpy functions instead of pytorch ones) or backpropagating somewhere you shouldn't.
    Or getting dimensionalities wrong and getting screwed over by Numpy''s automatic broadcasting.
    Or in general not looking for existing Pytorch functions and reinventing the wheel over and over again.

  • @monisprabu1174
    @monisprabu1174 3 роки тому

    Great work bro do more pytorch vids keep it up!!!!!

  • @mihneaandreescu9922
    @mihneaandreescu9922 2 роки тому +1

    AMAZING VIDEO, THANKS VERY VERY MUCH!!!

  • @pawnagon4874
    @pawnagon4874 3 роки тому

    These videos are always so fire, thank you sir

  • @ceo-s
    @ceo-s 10 місяців тому

    You are the best! Just fixed few things

  • @lau.m.7698
    @lau.m.7698 3 роки тому

    Hi! Love your channel! I have a question, what if the data that you want to normalize is not an image but a vector (a sequence of numbers)? What do you think would be the best type of normalization? I've tought about max-min norm that also set the data into [0,1] range but would it be necessary to use normalize with respect the mean and std?
    Thanks!!!

    • @bassemkaroui4914
      @bassemkaroui4914 2 роки тому +1

      Always normalize using mean and std, because if your inputs are always positive then the gradients of the first layer connected to your inputs will always be either positive or negative (depending on the sign of the upcoming gradients) which essentially mean the weights of that layer are all updated in the same direction (either all increase or decrease) and this makes training a bit tricky using zigzag paths

  • @bassemkaroui4914
    @bassemkaroui4914 2 роки тому +1

    Always normalize inputs using mean and std, because if your inputs are always positive then the gradients of the first layer connected to your inputs will always be either positive or negative (depending on the sign of the upcoming gradients) which essentially mean the weights of that layer are all updated in the same direction (either all increase or decrease) and this makes training a bit tricky using zigzag paths

    • @vanerk_
      @vanerk_ 11 місяців тому +2

      Can you please elaborate on why they would move in positive or negative directions? Imagine that your inputs are positive, you pass them through a Linear layer, then apply BatchNorm, then Linear again, then CrossEntropy. It is not obvious why the grads would be positive if chain rule would change signs for some direction derivatives.

  • @shaikrasool1316
    @shaikrasool1316 3 роки тому +1

    One word i can say, The best..
    Thank you so much 😀

  • @sheldonsebastian7232
    @sheldonsebastian7232 3 роки тому

    This video is pure gold

  • @MorisonMs
    @MorisonMs 3 роки тому +1

    1:52
    Low loss doesn't mean overfitting (I agree it's a good idea to run on small dataset at first don't get me wrong)

  • @martimchaves9734
    @martimchaves9734 3 роки тому

    Nice one mate

  • @vanerk_
    @vanerk_ 11 місяців тому

    Applying different augmentations to the same batch, for example when training GANs and applying random flip.

  • @mohdkashif7295
    @mohdkashif7295 3 роки тому

    can i use torch.clamp for clipping gradient instead of torch.nn.utils.clip_grad_norm

  • @xl0xl0xl0
    @xl0xl0xl0 Рік тому

    My fun mistake - added a ReLU in the last layer (before CrossEntropyLoss) - the model trains poorly for a while, then just stops training (once all logits have been driven below zero).

  • @Han-ve8uh
    @Han-ve8uh 2 роки тому

    Could you clarify at 7:03 how does softmax on softmax lead to vanishing gradient?

  • @ShakirKhan-th7se
    @ShakirKhan-th7se 2 роки тому

    What is the best way to get input from different folders with different numbers of images each?

  • @ashishbhatnagar8682
    @ashishbhatnagar8682 4 роки тому

    Very helpful tips. Thanks a lot.

  • @jim_79
    @jim_79 Рік тому

    Very useful tips for a novice like me
    Thank you

  • @neelabhmadan6820
    @neelabhmadan6820 4 роки тому

    Spot on. 🙌

  • @ngawangchoeda3551
    @ngawangchoeda3551 2 роки тому

    Do we really have to normalize the data in initial data transformation, if we use BatchNorm2d layer in our model architecture, because both would perform an identical task.

  • @jonatan01i
    @jonatan01i 3 роки тому

    9:23
    I think it is the gradient of the two layers' biases that are equal. If so, isn't having a bias in the conv layer and another one in the batch layer equivalent to having bias only in one of them but multiplying its gradient by 2?

    • @AladdinPersson
      @AladdinPersson  3 роки тому +1

      From my understanding if we're first running it through a bias (and let's say every node activation gets raised +1) then running it through BatchNorm is going to remove this regardless and therefore it was completely irrelevant of having the bias. So I guess it's not a big deal but it's just an irrelevant parameter. I follow your point that the gradients are equal for the two layers but I don't follow when you say multiplying the gradient by 2

    • @jonatan01i
      @jonatan01i 3 роки тому

      @@AladdinPersson Oh, you are right! I was not aware of the fact while I wrote the comment that the bias of the convolutional layer will be removed first by the batchnorm layer and after running the BN will we just add the bias of the BN layer. For some reason I thought that we add the BN's bias right after we've added the conv's bias. In that case would be the gradients of the two bias terms be the same. There comes from the factor of 2. But I was completely wrong about how we do batchnorm, so I was completely wrong.
      Then, the difference of the loss after training with and without the conv's bias term could be because of numerical reasons, couldn't it?

  • @ali_nawaz_khattak
    @ali_nawaz_khattak 2 роки тому

    can I do it for a custom dataset?
    if yes can you share code snippets for helping purposes?

  • @sahil-7473
    @sahil-7473 3 роки тому

    Great Vid! One more doubts.
    What's the exactly difference between torch.nn.Conv1d and torch.nn.functional.conv1d? Both seems to be present equally. That confusing me😅

    • @AladdinPersson
      @AladdinPersson  3 роки тому +1

      For nn.modules you need to initialize them in the init function, for functional they are "stateless" and you need to manually set the weights. Basically functional has things without parameters/weights (and you would need to set weights manually). You can read more on the forum: discuss.pytorch.org/t/what-is-the-difference-between-torch-nn-and-torch-nn-functional/33597/6

  • @ali_nawaz_khattak
    @ali_nawaz_khattak 2 роки тому

    @Aladdin Persson how to same thing using keras?

  • @henrikvendelbo1117
    @henrikvendelbo1117 3 роки тому

    Which of these are taken care of in lightning trainer?

  • @736939
    @736939 2 роки тому

    When I deploy the model, shoud i also use model.eval() ?

  • @coolz4ravs457
    @coolz4ravs457 Рік тому

    How do you pad the mnist dataset by 2?

  • @WhisperingAZ
    @WhisperingAZ 4 роки тому

    a7la Great Video 3alek !!!!! !!!! ya gamed!!!!!!

  • @georgepap9510
    @georgepap9510 3 місяці тому

    I think you have an error in the check_accruracy function. You need to put the scores(since the are just the logits from a Linear layer) first in a softmax layer and then calculate the argmax .Am i missing something?

  • @saurrav3801
    @saurrav3801 4 роки тому

    Nice video bro....😇😇
    More pytorch videos....about Layers, activation func,optimizers, etc......
    Dont know really where to use which layer and activation func...
    1. How to find mean and std of RGB ?
    2. Is it possible to use batchnorm1d in linear layer ?

    • @AladdinPersson
      @AladdinPersson  4 роки тому

      Thank you for your comment!
      1. I made a video on it just now :)
      2. I think so, but I haven't used this

  • @wolfisraging
    @wolfisraging 3 роки тому +1

    I don't think we need to shuffle the validation or test set right? Cuz there we will only be making the predictions and calculating our metrics like loss and accuracy, which are totally unaffected whether you shuffle or not.
    Plz do correct me if I'm wrong, thanks.

    • @AladdinPersson
      @AladdinPersson  3 роки тому +1

      You're absolutely right, there's no need to shuffle the test set, so that was a mistake on my part. Good catch! :)

  • @doyu_
    @doyu_ 2 роки тому

    Should we leave shuffle=False for test_loader since the order of test data basically doesn't affect on test result, even for non time series?

    • @user-yd1bj3hn8d
      @user-yd1bj3hn8d Рік тому

      Yes, it is unnecessary to shuffle when you test your model, so set shuffle=False when testing

  • @vslaykovsky
    @vslaykovsky 2 роки тому

    So what is the point of normalizing val/test data?

  • @_RMSG_
    @_RMSG_ 2 роки тому

    I'm not understanding how Normalizing data doesn't hurt accuracy
    With time series data, if I take in a set of numbers, and then normalize those numbers, don't I get a normalized output instead of an accurate output?

  • @DanielWeikert
    @DanielWeikert 4 роки тому

    In Tensorflow we often divide only be 255 to normalize. Would that be possible in Pytorch as well? (Would probably save time so we do not have to figure out mean and std) Thanks

    • @AladdinPersson
      @AladdinPersson  4 роки тому +4

      Just doing ToTensor() will divide by 255 so it gets in the range [0,1], but it's been shown to be better if you also do the additional step of obtaining zero mean and std 1 so it gets in the range [-1, 1]

    • @DanielWeikert
      @DanielWeikert 4 роки тому

      @@AladdinPersson Thanks and how do you dertermine the mean and std. Is it like
      torch.mean(mydataset) ?

  • @SaiPrabanjan
    @SaiPrabanjan 3 роки тому

    Can you explain sir how to solve "CUDA out of memory" error in fastai package in pytorch. I am a beginner in fastai package and pytorch in general. Thanks for your great content sir.

    • @AladdinPersson
      @AladdinPersson  3 роки тому

      Most commonly because you don't have enough vram on your gpu, i.e you're running too large batch_size or too large of a model

  • @1chimaruGin0_0
    @1chimaruGin0_0 4 роки тому +4

    Great work, as always!
    I used this to normalize images. I want to know is this good?
    loader = torch.utils.data.DataLoader(train_set, batch_size=16, shuffle=True)
    def mean_and_std(loader):
    mean = 0.
    std = 0.
    nb_samples = 0.
    for data,_ in loader:
    batch_samples = data.size(0)
    data = data.view(batch_samples, data.size(1), -1)
    mean += data.mean(2).sum(0)
    std += data.std(2).sum(0)
    nb_samples += batch_samples
    mean /= nb_samples
    std /= nb_samples
    print(mean)
    print(std)

    • @AladdinPersson
      @AladdinPersson  4 роки тому +4

      The way you're calculating mean seems good but I think with the standard deviation there's a bit of a mistake. Since standard deviation isn't a linear operation you cannot do std += (batch_std_here).
      Doing in this way you will not obtain the real standard deviation. I made a video on it to show how you would do it which you can check out. Although your way will probably work just fine even with a minor flaw with the std :)

    • @1chimaruGin0_0
      @1chimaruGin0_0 4 роки тому

      Thanks you

  • @Dan-uf2vh
    @Dan-uf2vh 3 роки тому

    Can someone explain to me how I should manage Dropout layers, considering I am using batch state - action - reward?
    I don't understand how just setting a mode.train() would work out. In my view, the Dropout layers would have to drop the same way when performing backpropagation, on the batch. Am I wrong? Is there something I am missing and how could I synchronize them if required and possible. Or do they just average out somehow?

    • @TeachAI-UZ
      @TeachAI-UZ 3 роки тому

      You are right when you said the dropout (randomly) drops particular neurons in a layer based on the probability defined by an engineer. However, you do not perform backpropagation when validating (testing) your model. Specifically, model.eval(), which turns your model into testing mode, does not backpropagate; consequently, it does not to use dropout.

    • @Dan-uf2vh
      @Dan-uf2vh 3 роки тому

      @@TeachAI-UZ there are training methods which REQUIRE a history in order to organize outputs and rewards and I would assume they need the exact form being used; for example q-learning with epsilon-greedy; if everything changes between the saved input-output-reward state and the moment you do the backpropagation, then I see no way that could work out

  • @MorisonMs
    @MorisonMs 3 роки тому +2

    Why on the test dataset you perform shuffling?

  • @not_a_human_being
    @not_a_human_being 4 роки тому +1

    `seed=0` and not usual "42" - finally non-total-nerds are getting into the field! 😅Great Vid btw, please keep making more!

    • @AladdinPersson
      @AladdinPersson  4 роки тому +4

      Real Computer Scientists already know the answer to life, the universe and everything. That's trivial so we just use start index as 0. Mathematicians are a bit behind so they use 1.

  • @boquangdong
    @boquangdong 2 роки тому

    number 5. Bias term with BatchNorm. Can you explain more to me in this comment?

  • @homataha5626
    @homataha5626 2 роки тому

    why did we need a DecoderBlock and a Decoder class? why no block for encoder?

  • @deepshankarjha5344
    @deepshankarjha5344 3 роки тому

    aladdin is the best. never l was not able to understand pytorch until this video series

  • @talha_anwar
    @talha_anwar 3 роки тому

    the mistake I made is not putting my model in a function when doing cross-validation. in each fold it retrain on previous model

  • @diegowang9597
    @diegowang9597 3 роки тому

    Is the intro made with manim?

  • @dataaholic
    @dataaholic 4 роки тому

    At 16:06
    Why we pass the mean and std_dev as tuple ? I'm new to Deep learning and today i train a CNN on MNIST. After watching your video I change it to tuple and got a better accuracy and after training. Can you please tell me why this happens? Thanks in Advance and sorry for pasting the logs here in comments .
    --> with: transforms.Normalize(mean_gray, stddev_gray)
    Epoch: 1/10, Train(loss, accuracy): 1.058, 64.758, Test(loss, accuracy): 0.128, 96.480
    Epoch: 2/10, Train(loss, accuracy): 0.349, 88.157, Test(loss, accuracy): 0.063, 98.310
    Epoch: 3/10, Train(loss, accuracy): 0.160, 95.342, Test(loss, accuracy): 0.049, 98.410
    Epoch: 4/10, Train(loss, accuracy): 0.117, 96.635, Test(loss, accuracy): 0.046, 98.570
    Epoch: 5/10, Train(loss, accuracy): 0.094, 97.348, Test(loss, accuracy): 0.041, 98.660
    ----------------------------------------------------------------------------------------------------------------------------
    --> with: transforms.Normalize((mean_gray, ), (stddev_gray,))
    Epoch: 1/10, Train(loss, accuracy): 0.439, 89.382, Test(loss, accuracy): 0.056, 98.290
    Epoch: 2/10, Train(loss, accuracy): 0.120, 96.565, Test(loss, accuracy): 0.041, 98.780
    Epoch: 3/10, Train(loss, accuracy): 0.087, 97.508, Test(loss, accuracy): 0.036, 98.840
    Epoch: 4/10, Train(loss, accuracy): 0.077, 97.803, Test(loss, accuracy): 0.041, 98.890
    Epoch: 5/10, Train(loss, accuracy): 0.068, 98.065, Test(loss, accuracy): 0.039, 98.930

    • @AladdinPersson
      @AladdinPersson  4 роки тому +1

      I think if it's only for a single channel it shouldn't matter, did you make sure to set the seed etc so that the results are comparable?

    • @dataaholic
      @dataaholic 4 роки тому

      @@AladdinPersson yeah, it is for one channel only and no I didn't use any seeding . I try again with seeding.

  • @TeachAI-UZ
    @TeachAI-UZ 3 роки тому +1

    As far as I know, it is not advised to shuffle the validation (testing) data. Anyone experimented with this, too?

    • @wosleepy
      @wosleepy 2 роки тому

      Shuffle will affect learning process. We are actually not learning during the val/test phase, so I guess it won't effect the accuracy. Hope it helps.

  • @sasaglamocak2846
    @sasaglamocak2846 2 роки тому

    Please tell us, how to learn PyTorch...

  • @pankajshinde475
    @pankajshinde475 4 роки тому

    You're pytorch skills are just amazing, are you a phd student? 🤔BTW bow down to your pytorch skills ✌🙇‍♂️

    • @AladdinPersson
      @AladdinPersson  4 роки тому

      No, masters student :)

    • @bpmsilva
      @bpmsilva 3 роки тому

      @@AladdinPersson, your videos are awesome! Where did you learn all of that? You should do a video about your learning experience

  • @feravladimirovna1044
    @feravladimirovna1044 4 роки тому +1

    when it comes to permute I am out of the space!!! lol!

    • @AladdinPersson
      @AladdinPersson  4 роки тому

      Yeah unfortunately my explanation wasn't very good there, just remember if you need to switch some axes of your tensor, use permute not view

  • @konstantin6482
    @konstantin6482 3 роки тому

    What you didn't understand is that by typing in the mean and the variance for your normalization error you introduced a bias and that's why the performance has risen. Read "Learning from data". Awesome video otherwise, thanks 👍