PyTorch Tutorial 09 - Dataset and DataLoader - Batch Training

Поділитися
Вставка
  • Опубліковано 3 січ 2020
  • New Tutorial series about Deep Learning with PyTorch!
    ⭐ Check out Tabnine, the FREE AI-powered code completion tool I use to help me code faster: www.tabnine.com/?... *
    In this part we see how we can use the built-in Dataset and DataLoader classes and improve our pipeline with batch training. See how we can write our own Dataset class and use available built-in datasets.
    - Dataset and DataLoader
    - Automatic batch calculation
    - Batch optimization in training loop
    Part 09: Dataset and DataLoader
    📚 Get my FREE NumPy Handbook:
    www.python-engineer.com/numpy...
    📓 Notebooks available on Patreon:
    / patrickloeber
    ⭐ Join Our Discord : / discord
    If you enjoyed this video, please subscribe to the channel!
    Official website:
    pytorch.org/
    Part 01:
    • PyTorch Tutorial 01 - ...
    Logistic Regression from scratch:
    • Logistic Regression in...
    Code for this tutorial series:
    github.com/patrickloeber/pyto...
    You can find me here:
    Website: www.python-engineer.com
    Twitter: / patloeber
    GitHub: github.com/patrickloeber
    #Python #DeepLearning #Pytorch
    ----------------------------------------------------------------------------------------------------------
    * This is a sponsored link. By clicking on it you will not have any additional costs, instead you will support me and my project. Thank you so much for the support! 🙏

КОМЕНТАРІ • 151

  • @rafiibnsultan
    @rafiibnsultan Рік тому +26

    This whole series is a gold mine, any one diving into PyTorch for the first time is highly recommended to follow this playlist.

  • @3nlighten358
    @3nlighten358 Рік тому +36

    If anyone has issues with the data iterator object not having an attribute next. Instead of data = dataiter.next(), try doing data = next(dataiter). Worked for me

    • @CaptainBravo87
      @CaptainBravo87 Рік тому +2

      That helped .. thanks

    • @ezdul2404
      @ezdul2404 10 місяців тому

      🤝

    • @abdullahisaahmed5366
      @abdullahisaahmed5366 5 місяців тому

      This also works for me "data = next(iter(dataloader))"

    • @swarnendusekharghosh9539
      @swarnendusekharghosh9539 Місяць тому

      I am facing the same issue and unfortunately I was unable to resolve by either of the methods . I am using torch = 2.3 version in a Windows system. I tried this and it worked for me:
      if __name__ == '__main__':

      dataset = WineDataset()
      dataloader = DataLoader(dataset=dataset, batch_size=4, shuffle=True, num_workers=2)
      dataiter = iter(dataloader)
      data = next(dataiter)
      features, labels = data
      print(features, labels)

  • @svennesvensson7530
    @svennesvensson7530 4 роки тому +87

    Ended up here after trying to get tensorflow 2.0 to work properly for 24h, and finally gave up and decided to learn PyTorch instead (installed in 2 min with no problems). Haven´t finished all the tutorials yet, but this is really top quality tutorials. Most tutorials on UA-cam are poorly structured, but this could truly be it´s own Udemy course that you could charge for as someone mentioned (especially with the code repository included). Love the implementations from scratch where you gradually introduce PyTorch classes/methods by replacing code we built from scratch - this is truly a very pedagogic way to learn. Hats off to you sir!

    • @patloeber
      @patloeber  4 роки тому +3

      Thank you for the great feedback!

    • @user-vi5gs6ih6j
      @user-vi5gs6ih6j 3 роки тому +3

      Seriously, I wish I started ML with Pytorch instead of TF.
      At first, the heavy Python put me off since I only had less than 200 hours of experience in serious coding but after a while it all worked much better than I expected.

    • @damianwysokinski3285
      @damianwysokinski3285 3 роки тому

      If you install tf2.0 with conda it takes 10min...

    • @svennesvensson7530
      @svennesvensson7530 3 роки тому

      @@damianwysokinski3285 I am using conda. In theory you are right, that's how it should work - in practice however it was a headache, and literally could not get it to work correctly. Im sure you CAN get it to work just fine, it's just such a hazzle compared to any other library I have ever installed. But maybe this has changed since i tried using it 1 year back.

    • @damianwysokinski3285
      @damianwysokinski3285 3 роки тому

      @@svennesvensson7530 In the autumn 2020, I installed TF2 for the last time

  • @n.w.4940
    @n.w.4940 3 роки тому +9

    Absolute perfect Tutorials Series as one can clearly learn each step on an easy to understand yet funny and "real world" example other than lots of other Tutorials that always end up with MNIST or something. Really appreciate all the work you clearly put in. Thank you and go on! 👌

  • @timrorup8738
    @timrorup8738 4 роки тому +34

    For all of you who have multiprocessing issues: Change num_workers in dataloader = 0 instead of 2.

    • @patloeber
      @patloeber  4 роки тому

      Yes, thanks for the hint!

    • @NNote-zs6eo
      @NNote-zs6eo 3 роки тому +12

      if you put your code under if __name__ == '__main__' : it fix the issues, at least my issues......

    • @augurelite
      @augurelite 2 роки тому +1

      THANKS

    • @avddva1367
      @avddva1367 Рік тому +1

      It worked! Thanks

    • @Yoyo-sy5kl
      @Yoyo-sy5kl Рік тому

      @@NNote-zs6eo This helped me as well

  • @HasanKarakus
    @HasanKarakus 8 місяців тому

    You don't explain it normally. The most explanatory videos I've ever watched. Thanks for your efforts

  • @leromerom
    @leromerom 3 роки тому +2

    Very well explained, the logical sequence that you follow is fantastic, without a doubt one the most useful tutorials that I have seen, thank you for your contribution!

  • @shadowchantghallanda2609
    @shadowchantghallanda2609 2 роки тому +2

    Hi, Python Engineer, thanks for your intuitive tutorial. I just followed your tutorial 08 and succesfully implemented the logistic regression on the wine data set and got an accurancy of 91.67%.

  • @gerben880
    @gerben880 Рік тому

    thanks man! the dataset and dataloader classes always confused me, but you explained it really clearly and now i successfully wrote my own dataset class.

  • @GeorgeA7Xxx
    @GeorgeA7Xxx 3 роки тому +3

    This was an incredible video, thank you so so much! Gonna watch this entire series now.

  • @AbdulQayyum-kd3gf
    @AbdulQayyum-kd3gf 4 роки тому +1

    wonderful explanation

  • @mohammadrashidi6214
    @mohammadrashidi6214 Рік тому +1

    Very nice tutorials... Thanks a lot. I hope we see more posts from you in the fields of PyTorch and deep learning.

  • @ARPAN7004
    @ARPAN7004 3 роки тому +1

    was confused about Dataset and Dataloader, and came across this. Have not finished it yet, but, I know, this is what I needed. Thanks for sharing your knowledge in such a nice way. Please keep them coming. Cannot thank you enough.
    Subscribed right away.

    • @patloeber
      @patloeber  3 роки тому

      Glad you like it :) thank you!

  • @fahadrahmanamik1104
    @fahadrahmanamik1104 3 роки тому +2

    Just wanna give you a hug for the awesome tutorial. Love you man ❤️

  • @wadewang574
    @wadewang574 2 роки тому

    very concise and informative ! I like such style's tutorial !

  • @alexlord5981
    @alexlord5981 Рік тому

    Very useful and clear. Thank you!!

  • @oleholgerson3416
    @oleholgerson3416 4 роки тому

    Well done. Precise and to the point.

  • @hichamsabah31
    @hichamsabah31 Рік тому

    Very insightful. Thanks.

  • @kxiong4021
    @kxiong4021 3 роки тому +1

    For people having errors with the datatier.next() line, showing something about freeze_support(), try to move all code except the WineDataset class under a main function, and call the main function with if __name__ == "__main__":
    main(). Not sure why it solved the problem but worked for me!

  • @mlguru3089
    @mlguru3089 4 роки тому +4

    please keep uploading tutorials on pytorch , amazing tutorials!!!

  • @maithilijoshi796
    @maithilijoshi796 Рік тому

    thankyou for this

  • @connorshorten6311
    @connorshorten6311 3 роки тому +1

    Such a helpful reference, thanks again!

    • @patloeber
      @patloeber  3 роки тому

      Thanks so much! This means a lot coming from you :)

  • @henrygory
    @henrygory Рік тому +3

    The parallel workers did not work for me. I am using Win11. I had to set "num_workers=0" when calling DataLoader to get it to run.

  • @aminfadaei4056
    @aminfadaei4056 4 роки тому

    Ty it helped alot

  • @GoForwardPs34
    @GoForwardPs34 Рік тому

    demystifying Machine Learning !
    unpacking the hype!
    pulling back the curtains.
    WOW !!!!
    Masterfully cutting through the fluff
    @Patrick Loeber generations to come shall here of your exploits done here
    Bravo indeed

  • @utkumetin9453
    @utkumetin9453 3 роки тому +1

    Thanks for great lessons, you are doing amazing job in here, keep going.

  • @TowerGaming1
    @TowerGaming1 3 роки тому

    Danke dir!

  • @ridael-mehdawe4681
    @ridael-mehdawe4681 4 роки тому +1

    appreciated..

  • @alirezamohseni5045
    @alirezamohseni5045 Місяць тому

    excellent

  • @user-fk1wo2ys3b
    @user-fk1wo2ys3b 3 роки тому

    Just genious

  • @vikaskapdoskar
    @vikaskapdoskar 4 роки тому

    very simplistically explained. I was way too confused on how to use Dataloaders and Dataset class with csv's and use to end up simply converting to numpy floats and manipulating the whole learning process. It worked fine but then I failed to use the batch training. Wonderful tutorial

    • @patloeber
      @patloeber  4 роки тому

      thank you! I'm glad it is helpful

  • @geezer2867
    @geezer2867 2 роки тому

    Thanks, It's brilliant!

  • @komorebi0307
    @komorebi0307 Рік тому

    so useful! i've learned a lot!!!

  • @1111boggy
    @1111boggy 3 роки тому +1

    anlatım çok iyi . cok iyi ingilizce bilmeye gerek yok . otomatik ceviri ile dersi dinledim , alanım yazılım değil ve yine de kodları kendi veri setimde çalıştırdım .

  • @gabrielavechini7626
    @gabrielavechini7626 3 роки тому

    Help me a lot! Thanks!

    • @patloeber
      @patloeber  3 роки тому +1

      glad it was helpful!

  • @killian7486
    @killian7486 3 роки тому +1

    This was very helpful. Great work, have a sub

  • @user-hw8bh9vl7h
    @user-hw8bh9vl7h 3 роки тому

    Dataloading was always a nightmare to me, until I watched your video. Thank you so much

  • @genetixx01
    @genetixx01 3 роки тому +1

    In case you have bug like I had, try removing argument "num_workers" in the DataLoader function. It worked for me.

    • @patloeber
      @patloeber  3 роки тому

      yep exactly!

    • @MagufoBoy
      @MagufoBoy 3 роки тому

      in Windows you may also need to add at the beginning,
      if __name__ == '__main__':
      freeze_support()
      # the rest of the code here

  • @valarmorghulisx
    @valarmorghulisx 3 роки тому +2

    thank you so muchh! you are the best. please dont give us lonely! :)

  • @FORCP-bq5fo
    @FORCP-bq5fo 22 дні тому

    Ten-zaur!!!

  • @user-ks9wt2dm4j
    @user-ks9wt2dm4j 3 роки тому

    thanks from China

  • @lakeguy65616
    @lakeguy65616 Рік тому +1

    I've watched, coded and understand this video. How do I incorporate the train/test split with the dataset class and dataloader class? Thank you!

  • @SuiGio
    @SuiGio 3 роки тому

    Hi, great tutorial. I was wondering, what if we want to have separate features returned? Would we return x1,x2,y from __getitem__ ? Plus, should we return 2 lengths from __len__ ? Thanks

  • @johnmichael1295
    @johnmichael1295 3 роки тому

    Not all heroes wear capes. Thank you very much!

  • @tanmay_ds
    @tanmay_ds 3 роки тому

    Can we use to different features as input to the model, parallely?
    And if yes, how can we define them while using enumerate function?

  • @pauldonzier7230
    @pauldonzier7230 3 роки тому +1

    Hello,
    Thank you very much for these tutorials, they're the best on UA-cam by far!
    I was just wondering, what's the exact color theme you're using in VS Code ?
    Keep up the great work! :)

    • @patloeber
      @patloeber  3 роки тому

      in this video it's the monokai theme, now I use the Nightowl theme

  • @comedyman4896
    @comedyman4896 3 місяці тому

    If you have a low-end hardware and you get runtime errors try setting num_workers=0 in DataLoader()

  • @aliali-tv5ft
    @aliali-tv5ft 3 роки тому

    very niceee

  • @EEBADUGANIVANJARIAKANKSH
    @EEBADUGANIVANJARIAKANKSH 3 роки тому +3

    i ran into a error while running this code :
    Error:
    raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
    RuntimeError: DataLoader worker (pid(s) 18988) exited unexpectedly
    Solution:
    set num_workers = 0

  • @stevenrodrig14
    @stevenrodrig14 2 роки тому

    Hello! Thanks for your videos :) I've binged all these just as fast as I did some Netflix shows.
    I was wondering if you have any videos on how to create custom datasets and dataLoaders for custom csv files using TensorDataSets? I'm trying to understand what structure a model() needs if I have custom rgb images from a csv file.
    Thanks again for all your videos!

  • @Murmur1131
    @Murmur1131 3 роки тому

    Schöne Sache! Grüße aus Hamburg!

    • @patloeber
      @patloeber  3 роки тому

      Danke :) Grüße zurück

  • @ashishintown
    @ashishintown 5 місяців тому

    If anyone is getting an error implementing this code "data = dataiter.next()"
    Try "data = next(dataiter) "
    This code is tested on pytorch version 2.1.2

  • @lihanou
    @lihanou 2 роки тому +1

    You didn't separate the trainset and testset. If you use all data for training, what are you going to test with? What really confused me was how to separate a dataset into train_x, train_y and test_x, test_y

  • @fatemehnaseri397
    @fatemehnaseri397 3 роки тому

    how can i load data which is saved in .pkl format in my google drive ,and then performing codes above on that? thank you.

  • @canernm
    @canernm 3 роки тому +1

    Hi thanks for the video! Quick question: why in our WineDataset's __init__() we did not use the super? In order to initialize the init of the superclass Dataset. Thanks

    • @patloeber
      @patloeber  3 роки тому +2

      nice observation. this is simply because torch.utils.data.Dataset doesn’t define its own __init__ function, otherwise we should have done this

    • @canernm
      @canernm 3 роки тому

      Great, thank you =)

  • @monart4210
    @monart4210 3 роки тому

    Thanks for the video! I am applying a model that initializes variational parameters with random numbers in my PyTorch nn.Module called model (e.g., self.mu_q_alpha = nn.Parameter(torch.randn(10, 3,2))) . model() is called for every batch in every epoch to get the loss for that batch. I assume that I am mistaken, but my initial thought was that by calling model() in every batch, we also always reinitialize the values for the variational parameters to random numbers every time. Could you maybe give me feedback on that?:)

  • @shubhamkapoor8756
    @shubhamkapoor8756 2 роки тому

    Can you suggest how to load leed sports database?The data uploaded on the website is not a csv file but rather two folders named data and visualized .I don't understand how to put them in dataloader

  • @sahhaf1234
    @sahhaf1234 Рік тому

    Extremely helpful. But, @5:53 why we write xy[ :,[0] ] instead of xy[ :,0 ]?
    Thanks a lot for sharing your knowledge!!

  • @popamaji
    @popamaji 4 роки тому +1

    12:03 when did we define input or labels so in 13:46 we can use them

    • @patloeber
      @patloeber  4 роки тому

      we define it at 12:03 like you pointed to ;) this process is called unpacking, so we unpack the inputs and labels from the dataloader and define it in this line

  • @tsendo
    @tsendo Рік тому +1

    Thanks for the series. Very good to learn PyTorh. For Dataloaer, line 69 produces "AttributeError: '_MultiProcessingDataLoaderIter' object has no attribute 'next'" error. Is it because this part of API has been modified?

    • @krosp
      @krosp 8 місяців тому +2

      use next(datatiter) instead of datatiter.next()

  • @mariacassie609
    @mariacassie609 3 роки тому

    Hi. How can I convert my folder of images into csv file?

  • @schlingelgen
    @schlingelgen Рік тому

    When I am trying to adapt this Dataset Class for my own use case I run into an error: RuntimeError: expected scalar type Long but found Float -> when I then read the data in with type np.int32 I get another error: RuntimeError: mat1 and mat2 must have the same dtype
    I then tried to load the input with np.int32 and the labels (int class categories) with np.longlong and that finally worked.
    In another tutorial about feed forward net, the labels are also class categories and the data is read with torchvision.datasets but no dtype is provided. Is this information present in the downloaded files and therefore also read in?

  • @anas.2k866
    @anas.2k866 2 роки тому

    When my data is stored in json file, how can I adapt your method to my case. Thank you

  • @DiegoAndresAlvarezMarin
    @DiegoAndresAlvarezMarin 2 роки тому +1

    This video was not clear :(
    However, thank you very much. Your other videos are great!

    • @patloeber
      @patloeber  2 роки тому

      thanks for the feedback!

  • @nicolasgabrielsantanaramos291
    @nicolasgabrielsantanaramos291 4 роки тому

    How do I do when my dataset is 3D. In your case, you have N lines x M columns. In my case one sample is 2D, so I have 2 columns, two features, each of them has 18 points (it is a temporal series, 2 curves). Then I have 180 samples of these. I am confused about what a pass to my self.x

    • @aeklant
      @aeklant 4 роки тому +1

      Don't let the x and y confuse you (the names make them sound like they refer to data dimensions, but they don't). In the examples that he is giving, x represents the training inputs and y represents the corresponding targets. This means that in your case, self.x would be your (180, 18, 2) inputs, while your y would be something like (180,) assuming your target is just one scalar value.

  • @popamaji
    @popamaji 4 роки тому

    Hi I have a 8*9 array in which the last 2 columns are results then I made a tuple with 2tensors one with 8*7 and the other with 8*2 dimensions(I checked datasets of ur vids and it seemed they all were a tuple with 2 tensors) but I got this error "stack expects each tensor to be equal size, but got [8, 7] at entry 0 and [8, 2] at entry 1"
    help me please

    • @patloeber
      @patloeber  3 роки тому

      both need to be of same size, so either [8,2] or [8,7]

  • @sepgorut2492
    @sepgorut2492 4 роки тому +3

    The code at 14:00 wouldn't work. I checked it through over and over again. Then after reading the blurb on Dataloader i removed the workers (making a default of zero) and it worked. Otherwise, apart from the file path to wine.csv, the code was the same. Any ideas about this?

    • @patloeber
      @patloeber  4 роки тому +3

      Yes this is a known issue with Windows. You made the correct fix by setting the workers to 0. I should have mentioned this...

    • @sepgorut2492
      @sepgorut2492 4 роки тому +1

      @@patloeber Thanks for your attention... albeit after hours of swearing at the computer!

    • @patloeber
      @patloeber  4 роки тому +2

      @@sepgorut2492 sorry about the trouble! Great that you could figure it out!

    • @malevolence89
      @malevolence89 Рік тому +1

      Had same exact issue, this comment helped to fix it. Thanks.

  • @perronemirko
    @perronemirko 5 місяців тому

    dataiter = iter(dataloader)
    data = next(dataiter)

  • @j.frostybeats
    @j.frostybeats Рік тому

    pls help!!!
    i get an error in my loss function... i have the same shape as you.. whats wrong??

  • @palanichamyramasamy825
    @palanichamyramasamy825 4 роки тому

    Can you post a video for half precision tensors with pytorch autoencoder

    • @patloeber
      @patloeber  4 роки тому +1

      thanks for the suggestion. I will have a look into this...

    • @palanichamyramasamy825
      @palanichamyramasamy825 4 роки тому

      @@patloeber Thanks for the response.please post a video for tensorboard graph with gpu memory tracking vs the epoch and loss

  • @balazsgonczy3564
    @balazsgonczy3564 2 роки тому

    Hi I have followed your approach on my dataset and I keep getting his error: "IndexError: Index (427) out of range for (0-99)". Do you have any idea why is that? I am trying to apply it on Windows 10 system. Is it a problem?
    ???????????????????????

  • @lankanathaekanayake7680
    @lankanathaekanayake7680 2 роки тому

    can you pls explain collate_fn in dataloader

  • @foodsscenes5891
    @foodsscenes5891 3 роки тому

    Hi Guys,
    I got this error during the execution of the code. Could you please explain to me what is that?
    RuntimeError: DataLoader worker (pid(s) 20384, 24284) exited unexpectedly

    • @patloeber
      @patloeber  3 роки тому +1

      maybe set num_workers=0

  • @alirezag6603
    @alirezag6603 3 роки тому +1

    what ide are u using?

    • @patloeber
      @patloeber  3 роки тому

      VS code. One of my latest tutorials is about my editor setup

  • @Gauravkr0071
    @Gauravkr0071 3 роки тому

    at 7:37, he used dataset= winedataset()
    then first_data= dataset[0], what this means dataset is a object of winedataset class, how winddataset is returning first sample. can anybody explain

    • @horvathbenedek3596
      @horvathbenedek3596 3 роки тому

      Basically, you have
      *class Winedataset(dataset):*
      So Winedataset is a class that inherits from Dataset class
      Then you have
      *dataset = Winedataset()*
      So dataset is an instance of the Winedataset class.
      Then you have
      *first_data = dataset[0]*
      Since dataset INSTANCE inherits from Dataset CLASS, it has a __getitem__() function. Now, I'm no programmer, but it seems to me like Dataset class is written in a way that indexing its instances in a fashion as seem above (that is, *InstanceOfDataset[some_number]* ) acts as the __getitem__() function above - that is, it returns the X[some_number] and Y[some_number] values. X and Y are defined in the __init__() method.
      I hope it's not too late :)

    • @Gauravkr0071
      @Gauravkr0071 3 роки тому

      @@horvathbenedek3596 its never too late , dear sir

  • @sambhrambasu726
    @sambhrambasu726 7 місяців тому

    when run the code it's show me error and show me multiprocessingdataloader has no attribute next

  • @vigenisayan2343
    @vigenisayan2343 2 роки тому

    what IDE are you using

  • @leoyuanluo
    @leoyuanluo 2 роки тому +1

    all the singe batches

    • @jerry870118
      @jerry870118 Рік тому

      tensorflow users get no batches

  • @prakharrajput5427
    @prakharrajput5427 Місяць тому

    How can we use pandas in dataloader

  • @ahmadsystems3560
    @ahmadsystems3560 3 роки тому

    OSError: ./data/wine/wine.csv not found. from where i can download wine.csv, anybody can help

    • @patloeber
      @patloeber  3 роки тому

      In my pytorch github repo: github.com/python-engineer/pytorchTutorial

    • @ahmadsystems3560
      @ahmadsystems3560 3 роки тому

      @@patloeber Thank you very much,

  • @neel5544
    @neel5544 4 роки тому

    Do you use Discord, i want to contact you

    • @patloeber
      @patloeber  4 роки тому

      Not yet. You can contact me on twitter or email :)

  • @axelanderson2030
    @axelanderson2030 Рік тому

    Now do LARGE dataset on a 3080 hardware

  • @taaaaaaay
    @taaaaaaay 3 роки тому

    Can you do a video on using a Dataloader on a very large dataset that will obviously not fit into gpu memory?

    • @patloeber
      @patloeber  3 роки тому +1

      I'll have a look at that...

  • @AFreaks
    @AFreaks 2 місяці тому

    please explain what features and labels are instead of just saying them.

  • @user-wq4hh1lj6g
    @user-wq4hh1lj6g 26 днів тому

    series is not beginner friendly, he directly uses the functions and modules. I dont know how these people are able to understand. We cant solve any other problems even after watching his series. So frustrating

  • @robosergTV
    @robosergTV 4 місяці тому

    watch at 1.5x speed. Feels like he slowed this video down by 2x