PyTorch Datasets and DataLoaders - Training Set Exploration for Deep Learning and AI

Поділитися
Вставка
  • Опубліковано 5 вер 2024

КОМЕНТАРІ • 132

  • @deeplizard
    @deeplizard  5 років тому +7

    Check out the corresponding blog and other resources for this video at: deeplizard.com/learn/video/mUueSPmcOBc

  • @sulemanrasheed1634
    @sulemanrasheed1634 5 років тому +47

    The reason for using plt.imshow(np.transpose(grid, (1,2,0))):
    For a colored image... plt.imshow takes image dimension in following form [height width channels] ...while pytorch follows [channels height width]... so for compatibility we have to change pytorch dimensions so that channels appear at end... the standard representation of array is [axis0 axis1 axis2].... so we have to convert (0,1,2) to (1,2,0) form to make it compatible for imshow....

    • @7WhiteSword
      @7WhiteSword 5 років тому +2

      Thank you for your explanation, i was confused at first, about to go to the documentation, but then i saw your comment :)

    • @heller4196
      @heller4196 5 років тому +2

      just use grid.permute(1, 2, 0) instead of np.transpose

    • @friday1015
      @friday1015 4 роки тому

      On plt.imshow(np.transpose(grid, (1,2,0))) why is there a color channel. Didn't we squeezed it before on single image sample?

    • @fahadmuntasir2336
      @fahadmuntasir2336 4 роки тому

      Why print(next(iter(train_set))) always results in an image of label 9? Printing it several times results in a similar image.

    • @sulemanrasheed1634
      @sulemanrasheed1634 4 роки тому

      @@fahadmuntasir2336 ensure shuffle is true.

  • @sihanchen9099
    @sihanchen9099 5 років тому +10

    This series is awesome so far, don't why there are not many people watching it.

    • @brucemurdock5358
      @brucemurdock5358 9 місяців тому

      YT algorithm I guess. This is by far the best playlist honestly

  • @simoneparvizi775
    @simoneparvizi775 2 роки тому +1

    OK ok ok ok. Wait a sec. WHY THE FUCK does this extremely well made video (deep explanations, step by step, great audio, great references to Jeremy Howard, to the paper briefely mentioned) got only 50k views...... jesus christ what an amazing video. Ty guys for the awesome work

    • @deeplizard
      @deeplizard  2 роки тому

      😆😅 Thank you, Simone! We're glad to hear that you've found value in the content and have appreciation for the style and level of detail for which we cover it.

  • @drevolan
    @drevolan 5 років тому +3

    This video would've been amazingly useful last week when I struggled implementing my own DataLoaders/Sets :D
    There's not much information about this online so it's greatly appreciated.
    But please, normalize your audio, I had to crank my volume to be able to understand what was being said.
    But nonetheless good job, these videos are really appreciated.

    • @deeplizard
      @deeplizard  5 років тому +1

      Hey Kesdan - Thank you. You are welcome!
      I'll double check the audio. Please let me know if you see an issue in with it in the future.

  • @JimmyCheng
    @JimmyCheng 5 років тому +2

    volume can be louder. Anxiously waiting for the next episode! Ending is magnificent!

    • @deeplizard
      @deeplizard  5 років тому

      It's in the works! Thank you for mentioning the volume. 🙏 Very helpful! We are working on it.

  • @brainify6172
    @brainify6172 4 роки тому

    Best Explanation bro!!!. Video quality and the way you show typing and then you explain is also just awesome

  • @zhengguanwang4337
    @zhengguanwang4337 2 роки тому +2

    perfect!!!!!

  • @malamals
    @malamals 4 роки тому +2

    This video series gave me so much insight into deep learning. Thank you Deeplizard team for this amazing work.
    Can someone share the video of the paper being used here?

  • @dippatel1739
    @dippatel1739 5 років тому +2

    please keep making video. your content is great.

    • @deeplizard
      @deeplizard  5 років тому

      Thank you dip! Will have more coming!

  • @picumtg5631
    @picumtg5631 2 роки тому +1

    Note that the new root should be ./data as it was changed in the fashionmnist

  • @panchajanya91
    @panchajanya91 2 роки тому

    while I was doing on my own, label was not a tensor rather it was an int object.

  • @yeahorightbro
    @yeahorightbro 5 років тому +3

    Best on you tube. Well done and thank you. Wondering though if you'll do one for custom text datasets?

    • @deeplizard
      @deeplizard  5 років тому +2

      Hey Daniel - You are welcome! I will put custom datasets on the list. We'll likely use a custom dataset in the next project.

  • @chronicfantastic
    @chronicfantastic 5 років тому +3

    That's an interesting point about the effectiveness of oversampling - we had a similar issue with an very unbalanced dataset at work and the sklearn weight-parameter didn't seem to make much difference. It's nice to see some research on it.

  • @zhengguanwang4337
    @zhengguanwang4337 2 роки тому

    Do you have tutorial of hyperparameters for RNN.? That would be great!!!!

  • @xdcedar
    @xdcedar 5 років тому +2

    how! could you type so fast and precisely! or does the truth is I am typing too slow actually..

  • @Vikram-wx4hg
    @Vikram-wx4hg 3 роки тому +3

    For label.shape I get following error:
    'int' object has no attribute 'shape'

    • @deeplizard
      @deeplizard  3 роки тому +2

      Hey Vikram - This is due to a change in the api. I've provided details for this update on the blog. See the "Updates" section here:
      deeplizard.com/learn/video/mUueSPmcOBc
      Chris

    • @Vikram-wx4hg
      @Vikram-wx4hg 3 роки тому +1

      @@deeplizard Thanks Chris a lot for your reply. I sort of suspected that there might have been an API update.

  • @hardikchawla4966
    @hardikchawla4966 5 років тому +1

    excellent explanation!!

  • @SamerSallam92
    @SamerSallam92 5 років тому +2

    Thank u very much for your great series
    I guess in the blog article you missed to add image, label = sample

    • @deeplizard
      @deeplizard  5 років тому +1

      Hey Samer - You are welcome! Sometimes the blog and the video will differ. Thanks for pointing that out.

    • @vitoroliveira4290
      @vitoroliveira4290 5 років тому +1

      True, and i didnt get how the label's shape would be printed, since its "int". My ide return this error. But its really not important anyway.

    • @AllenFangs
      @AllenFangs 5 років тому

      @@vitoroliveira4290 any solution for it ? same problem :(

    • @vitoroliveira4290
      @vitoroliveira4290 5 років тому

      @@AllenFangs Not really, sorry

    • @AllenFangs
      @AllenFangs 5 років тому

      @@vitoroliveira4290 just check the comments from deeplizard, it's a bug fixed from Pytorch. I think it's not a big problem

  • @sundarsanthanam6147
    @sundarsanthanam6147 4 роки тому +1

    I dont understand the pyplot configurations in 11:16 to display the grid of images

    • @deeplizard
      @deeplizard  4 роки тому

      Hey Sundar - The torchvision.utils.make_grid function transforms the batch of ten images into grid of images. The grid is no different from any other image we might plot. To understand the nature of the grid, try inspecting the shapes of each tensor:
      images.shape
      grid.shape
      grid.permute(1,2,0).shape
      Note that the make_grid function pads the original images by 2. See the documentation here (padding):
      pytorch.org/docs/stable/_modules/torchvision/utils.html#make_grid
      I hope this helps!
      Chris

    • @sundarsanthanam6147
      @sundarsanthanam6147 4 роки тому

      deeplizard thank you so much

  • @willTryAgainTmrw
    @willTryAgainTmrw 5 років тому +1

    Waiting for next one....

    • @deeplizard
      @deeplizard  5 років тому +1

      Thanks Pratham - Working on it now! Stay tuned!

  • @jayachandra677
    @jayachandra677 4 роки тому +1

    use train_set.targets() instead of train_labels() if you get error

    • @deeplizard
      @deeplizard  4 роки тому

      Hey Jay - Thanks for the information. This one was also posted in the updates section here: deeplizard.com/learn/video/jexkKugTg04

  • @xiangli1133
    @xiangli1133 2 роки тому

    can you do a series like this for the custom dataset, thanks?

  • @SonGoku-lc1sb
    @SonGoku-lc1sb 5 років тому +2

    image , label = sample
    Image has a tensor of size [1.28.28] , where as the label had just an integer value 9 .. and not a tensor (9)
    Why so ?

    • @deeplizard
      @deeplizard  5 років тому +2

      Hey Son Goku - This change was introduced in the torchvision version 0.2.2.
      Double check that this is your version like so: torchvision.__version__
      You can see this change listed in the release notes here: github.com/pytorch/vision/releases
      Have a look (search for Cast MNIST) and you'll see it.
      In my opinion, they should have not made this change. They did it because it fixes another issue. Anyway, the dataloader, which is what we work with mostly still returns a tensor I think. Can you verify this? Thanks and hope this helps!

    • @deeplizard
      @deeplizard  5 років тому +1

      Two underscores on each side (torchvision.__version__). UA-cam is removing one of them.

    • @SonGoku-lc1sb
      @SonGoku-lc1sb 5 років тому +1

      @@deeplizard yes Dataloader returns an object which on iterating returns u a batch or a list of tensors within itself .

  • @ashutoshshah864
    @ashutoshshah864 3 роки тому

    why is the len(batch) = 2? there are 10 images with 10 labels in a batch, right? A bit confused here. I am thinking, for some reason, that a batch would be a list of 10 tuples: batch = ([image, label], [image, label],...,[10th image, label])

  • @vgranjinidevi
    @vgranjinidevi 11 місяців тому +1

    I have a doubt, I have just started to learn deep learning, but I see that if I start coding, someplaces I have to use numpy, sometimes pandas, sometimes PyTorch, Sometimes Matplotlib or seaborn, others places scikit learn. It is like I start at a place and travel here and there, trying to learn all. Is the journey like this? Or Is there any streamlined way or course that teaches you these?

    • @deeplizard
      @deeplizard  11 місяців тому

      Yes. It's completely normal. The journey is like this. Each tool excels at different tasks, almost like the gears on a bike helping you navigate different terrains. You'll find that as you gain experience, you'll get better at knowing when to go deep on a particular tool and when to just get the basics and move on. Remember, there's always another layer to peel back in this field, so you'll never run out of opportunities to go deeper. Hang in there, practice, and it will all click into place. Happy learning! 📚💡

  • @ratkush
    @ratkush 5 років тому +3

    Getting TypeError: object() takes no parameters when running next(iter(train_set))

    • @deeplizard
      @deeplizard  5 років тому +1

      What happens if you try this instead?
      train_set[0]

    • @sytekd00d
      @sytekd00d 5 років тому

      I am getting the same error

    • @sytekd00d
      @sytekd00d 5 років тому +5

      I figured it out....
      Check this line in your code: transform= transforms.Compose([transforms.ToTensor(), ])
      Make sure you add parenthesis after 'ToTensor'. It should be 'ToTensor()'

    • @liucosette6091
      @liucosette6091 2 роки тому

      @@sytekd00d works for me! thanks! could you please tell me why this error happened?

  • @yunhuaji3038
    @yunhuaji3038 5 років тому +3

    Is that how fast & accurately you normally type codes or just a speeded replay?

    • @deeplizard
      @deeplizard  5 років тому +3

      I can type fast but not that fast! 🤣 Yes. Speeded replay.

  • @SamerSallam92
    @SamerSallam92 5 років тому +1

    Also, I guess this line
    With shuffle=True, the first samples in the training set will be returned on the first call to next.
    Should be With shuffle=False ...

  • @rameshthamizhselvan2458
    @rameshthamizhselvan2458 5 років тому +1

    why we are transposing the grid in image show function. matplot lib accepts numpy array rit why don't we give grid.numpy() instead of transposing correct me if I'm wrong.

    • @deeplizard
      @deeplizard  5 років тому

      Hey Ramesh - The imshow function accepts (H,W,C) and PyTorch tensors are shaped like this (C,H,W). This is why we re-arrange the data. Note that using permute() is more straight forward. The site was updated with this: deeplizard.com/learn/video/mUueSPmcOBc

  • @SaimKhan-xj5um
    @SaimKhan-xj5um 5 років тому +1

    God i love this channel 🤗

  • @CoolDude911
    @CoolDude911 5 років тому +1

    I don't know if someone could clarify but I would worry that over-sampling an uncommon class that is actually uncommon in real samples will create a biased model and probably over-fitted to the smaller range of data in the uncommon class.

    • @deeplizard
      @deeplizard  5 років тому

      Hey Barry - Good question. Let's look to the paper.
      The paper says the following:
      "For classical machine learning models it was shown that oversampling can cause overfitting, especially for minority classes [33]. As we repeat small number of examples multiple times, the trained model fits them too well. Thus, according to this prior knowledge undersampling would be a better choice. The results from our experiments do not confirm this conclusion for convolutional neural networks."
      This is can be found in section 4.6: "Generalization of sampling methods"
      Link: arxiv.org/abs/1710.05381

    • @CoolDude911
      @CoolDude911 5 років тому

      @@deeplizard Update: I encountered this kind of problem at work with something. It turns out with a uniform common class and a variable uncommon class, you can get higher accuracy on real test data by training on data where the uncommon class has been augmented. Obviously too much augmentation will create a model that is too biased but 'too bias' may depend on the application. Over-fitting the small class is a separate problem.
      My guess as to why this happens is that a model can get stuck in a local optimum where it makes simple inferences on the common uniform class and has a much harder job learning anything about the variable class.

  • @aryamaansaha2951
    @aryamaansaha2951 4 роки тому +1

    In np.transpose(grid, (1,2,0)) what does do 1, 2,0 represent ?

    • @daudasaniabdullahi4225
      @daudasaniabdullahi4225 4 роки тому +1

      plt.imshow() requires image to be in this format (height, width, channel) but pytorch uses this format (channel, height, width). So therefore u reshape the image, remember ur indexing, channel-0, height-1, width-2 . So that's why u get dat. Thanks

    • @aryamaansaha2951
      @aryamaansaha2951 4 роки тому

      @@daudasaniabdullahi4225 thanks!

  • @sinaasadiyan
    @sinaasadiyan 5 років тому +1

    Hi thanks for your videos.
    which version of pytorch are you using in these videos?

    • @deeplizard
      @deeplizard  5 років тому

      Hey Sina - You are welcome! We are using v0.4.1

  • @TheAnubhav27
    @TheAnubhav27 4 роки тому

    How do i create batches for custom datasets(non-image data) that are not part of the torchvision package? Is there any resource to learn that?

  •  4 роки тому

    You should look at pytorch again. Behaviour changed alot.

    • @deeplizard
      @deeplizard  4 роки тому

      Hey Selcuk - Please share details about what has changed. All changes are posted to the blog on the website. You should check there. Any changes are most certainly minor.

  • @felipeguimaraes7565
    @felipeguimaraes7565 5 років тому

    What is the purpose of plt.figure?

  • @datarachit
    @datarachit 4 роки тому

    what is the logic behind transpose?

    • @deeplizard
      @deeplizard  4 роки тому

      The axis locations inside tensors are not standardized across libraries. Some libraries will switch these around. For example placing the channels at the last axis position. This is the case for the imshow function, so we have to move the axes around. Have a look at the top of the doc here (X : array-like or PIL image): matplotlib.org/api/_as_gen/matplotlib.pyplot.imshow.html

  • @HimothyOHooligan
    @HimothyOHooligan 4 роки тому

    Yo is that sub-60Hz rumble even necessary during the typing sections. It's like 15dB above the speech level. I'm over hear listening on headphones and thinking there's an earthquake happening

    • @deeplizard
      @deeplizard  4 роки тому

      I could remove it. However, what if I told you that the sound is a psychological trick that increases information retention by 50%. Would you then be down? 🧠

    • @HimothyOHooligan
      @HimothyOHooligan 4 роки тому

      @@deeplizard I would read into that but would still prefer it to be not there or to be much lower in level. Info retention improvements go to zero if I feel like I have to skip through or mute.

    • @deeplizard
      @deeplizard  4 роки тому

      Hey Rudy - I'm jk about the improvement. It is possible, but the improvements would likely affect different people in a spectrum of ways. Thank you for your feedback on the matter. I think the effect is used less later in the course. Also, every video has a corresponding blog, so there are other options for learning. 😃

  • @NairodTheBeast
    @NairodTheBeast 3 роки тому

    For some reason the audio from typing in this video bothers me more than other videos

  • @SuperLuckyLad
    @SuperLuckyLad 5 років тому +3

    The little lecture at the end highlights a problem with 'our' wonderful technology... what if you are a vegan taxi driver and you regard your computer as your 'friend'? .... not such bright future then is it?

    • @deeplizard
      @deeplizard  5 років тому +2

      Yes. Good point. I take issue with the destination. He said that "the car already knows where your work is". I find it interesting to question whether we'll even have what we now call "work". Whether we currently identify as taxi drivers or programmers. As for the breakfast, the tech should be able to personalize it.

    • @SuperLuckyLad
      @SuperLuckyLad 5 років тому +1

      @@deeplizard .... maybe's the tech did personalise it... ran a subroutine, did some deep learning of it's own and decided it didn't like Vegans .... lol ..... and hey presto "Psycho Chip" is born.

    • @deeplizard
      @deeplizard  5 років тому +1

      lol. 🤣 In all honesty, these are the types of things we'll need to be considering going forward. Thanks for commenting on it with some of your thoughts!

  • @mamoonanisar6774
    @mamoonanisar6774 5 років тому

    Can I use this process to load and train my 3 labeled images(brain tumor) folders ?

    • @deeplizard
      @deeplizard  5 років тому +1

      Hi mamoona - The answer is yes. Use torchvision.datasets.ImageFolder() to create your dataset.

  • @careymain3036
    @careymain3036 4 роки тому

    I'm using my own image data with pytorch dataloader. I getting error "cannot import name 'read_data_sets' " Have you seen this before ? All i could find in stackoverflow was -" if you have own file with name dataloader.py then it imports your file instead of module and it can't find read_data_sets in your file " but no explanation of how to fix that any idea?

    • @deeplizard
      @deeplizard  4 роки тому

      Hi Carey - Try changing the name of your dataloader.py file.

    • @careymain3036
      @careymain3036 4 роки тому

      @@deeplizard I am running this in jupyter all three classes - dataloader (class MRDataset(data.Dataset))- model and train are in the notebook so i dont have a .py for this project just the notebook - the above was the only answer i could find on stackoverflow

    • @deeplizard
      @deeplizard  4 роки тому

      What code is throwing the error and what is the full error?

    • @careymain3036
      @careymain3036 4 роки тому

      @@deeplizard github.com/maincarey/ML/blob/master/MRI.ipynb

    • @careymain3036
      @careymain3036 4 роки тому

      ImportError Traceback (most recent call last)
      in ()
      16 from tensorboardX import SummaryWriter
      17
      ---> 18 from dataloader import MRIDataset
      19 from dataloader import read_data_sets
      20 import model
      /usr/local/lib/python3.6/dist-packages/dataloader/__init__.py in ()
      ----> 1 from dataloader import read_data_sets
      ImportError: cannot import name 'read_data_sets'

  • @lynnliu7520
    @lynnliu7520 5 років тому +1

    Why my label is an int instead of tensor.. D:

    • @deeplizard
      @deeplizard  5 років тому

      Are you using your own dataset?

    • @lynnliu7520
      @lynnliu7520 5 років тому

      ​@@deeplizard No, I follow the step in the video and use the fashion mnist. But the batch label works fine as tensor..

    • @deeplizard
      @deeplizard  5 років тому

      What version of PyTorch are you running?

    • @lynnliu7520
      @lynnliu7520 5 років тому

      @@deeplizard 1.0.1

    • @deeplizard
      @deeplizard  5 років тому +1

      Hey SHU LIU - I finally tracked this down.
      This change was introduced in the torchvision version 0.2.2.
      Double check that this is your version like so: torchvision.__version__
      You can see this change listed in the release notes here: github.com/pytorch/vision/releases
      Have a look (search for Cast MNIST) and you'll see it.
      In my opinion, they should have not made this change. They did it because it fixes another issue. Anyway, the dataloader, which is what we work with mostly still returns a tensor. Thanks for verifying that. Hope this helps!

  • @sunitakakkar8309
    @sunitakakkar8309 3 роки тому

    Sir, Itried to replicate your code but i am getting stuck when i am trying to get the shape of the labels. the error says that like this : AttributeError: 'int' object has no attribute 'shape' . By this i understand that while converting the data to tensor, the photo got converted to tensor but the label is still int. Can you pls help? i am sharing the link of colab workbook : colab.research.google.com/drive/1WoZHmfr8g9prNOo75mGMGfiKiFXoa6IR?usp=sharing

  • @soulfrench
    @soulfrench 5 років тому

    why so many examples about images but not about something else..

    • @deeplizard
      @deeplizard  5 років тому

      It's the classic example.

  • @champnaman
    @champnaman 5 років тому +1

    Don't want any of the things that the guy mentioned @12:41

  • @ATULYADAV-jz9er
    @ATULYADAV-jz9er 3 роки тому

    Hello
    You have done excellent and impressive work. Actually, I am new in machine learning and I was trying to run the code but I was facing problems. It would be grateful if you help me , i am trying to run this code and getting from model import ft_net, ft_net_dense, PCB
    ModuleNotFoundError: No module named 'model'
    Code link
    github.com/Wanggcong/Spatial-Temporal-Re-identification

  • @abijithjkamath
    @abijithjkamath 3 роки тому

    This video has 784 likes. Illuminati confirmed.

  • @ianjiang3762
    @ianjiang3762 4 роки тому

    Too much useless sound effect or video effect in this video

  • @sdc5574
    @sdc5574 4 роки тому

    The explanation is so complex..Kindly take up easy examples..Also,make a video in Indian accent..Ur accent is highly hard for an indian to understand.

    • @deeplizard
      @deeplizard  4 роки тому

      Hi Souhardya - I will work on my Indian accent! In the mean time, you can try using the website. There is a text version of the content. I hope that will help.

    • @sdc5574
      @sdc5574 4 роки тому

      @@deeplizard Plz make a full project of something in pycharm..plz.

    • @sdc5574
      @sdc5574 4 роки тому +1

      @@deeplizard ua-cam.com/video/DFKHh7_zzJc/v-deo.html
      Something like this..

    • @deeplizard
      @deeplizard  4 роки тому

      Currently working out the details about which direction we'll go in terms of content. Thank you for the suggestion. Try different IDEs though. It will make you a stronger developer. Maybe do a project straight from the command line. 🤔

    • @sdc5574
      @sdc5574 4 роки тому

      @@deeplizard make a detailed project using pytorch.. It's easier to learn from project than from discrete videos.

  • @pabloazevedo
    @pabloazevedo 4 роки тому +1

    Does anyone else get mad with these annoying keyboard typing sounds?