PyTorch Tutorial 09 - Dataset and DataLoader - Batch Training
Вставка
- Опубліковано 3 січ 2020
- New Tutorial series about Deep Learning with PyTorch!
⭐ Check out Tabnine, the FREE AI-powered code completion tool I use to help me code faster: www.tabnine.com/?... *
In this part we see how we can use the built-in Dataset and DataLoader classes and improve our pipeline with batch training. See how we can write our own Dataset class and use available built-in datasets.
- Dataset and DataLoader
- Automatic batch calculation
- Batch optimization in training loop
Part 09: Dataset and DataLoader
📚 Get my FREE NumPy Handbook:
www.python-engineer.com/numpy...
📓 Notebooks available on Patreon:
/ patrickloeber
⭐ Join Our Discord : / discord
If you enjoyed this video, please subscribe to the channel!
Official website:
pytorch.org/
Part 01:
• PyTorch Tutorial 01 - ...
Logistic Regression from scratch:
• Logistic Regression in...
Code for this tutorial series:
github.com/patrickloeber/pyto...
You can find me here:
Website: www.python-engineer.com
Twitter: / patloeber
GitHub: github.com/patrickloeber
#Python #DeepLearning #Pytorch
----------------------------------------------------------------------------------------------------------
* This is a sponsored link. By clicking on it you will not have any additional costs, instead you will support me and my project. Thank you so much for the support! 🙏
This whole series is a gold mine, any one diving into PyTorch for the first time is highly recommended to follow this playlist.
If anyone has issues with the data iterator object not having an attribute next. Instead of data = dataiter.next(), try doing data = next(dataiter). Worked for me
That helped .. thanks
🤝
This also works for me "data = next(iter(dataloader))"
I am facing the same issue and unfortunately I was unable to resolve by either of the methods . I am using torch = 2.3 version in a Windows system. I tried this and it worked for me:
if __name__ == '__main__':
dataset = WineDataset()
dataloader = DataLoader(dataset=dataset, batch_size=4, shuffle=True, num_workers=2)
dataiter = iter(dataloader)
data = next(dataiter)
features, labels = data
print(features, labels)
Ended up here after trying to get tensorflow 2.0 to work properly for 24h, and finally gave up and decided to learn PyTorch instead (installed in 2 min with no problems). Haven´t finished all the tutorials yet, but this is really top quality tutorials. Most tutorials on UA-cam are poorly structured, but this could truly be it´s own Udemy course that you could charge for as someone mentioned (especially with the code repository included). Love the implementations from scratch where you gradually introduce PyTorch classes/methods by replacing code we built from scratch - this is truly a very pedagogic way to learn. Hats off to you sir!
Thank you for the great feedback!
Seriously, I wish I started ML with Pytorch instead of TF.
At first, the heavy Python put me off since I only had less than 200 hours of experience in serious coding but after a while it all worked much better than I expected.
If you install tf2.0 with conda it takes 10min...
@@damianwysokinski3285 I am using conda. In theory you are right, that's how it should work - in practice however it was a headache, and literally could not get it to work correctly. Im sure you CAN get it to work just fine, it's just such a hazzle compared to any other library I have ever installed. But maybe this has changed since i tried using it 1 year back.
@@svennesvensson7530 In the autumn 2020, I installed TF2 for the last time
Absolute perfect Tutorials Series as one can clearly learn each step on an easy to understand yet funny and "real world" example other than lots of other Tutorials that always end up with MNIST or something. Really appreciate all the work you clearly put in. Thank you and go on! 👌
For all of you who have multiprocessing issues: Change num_workers in dataloader = 0 instead of 2.
Yes, thanks for the hint!
if you put your code under if __name__ == '__main__' : it fix the issues, at least my issues......
THANKS
It worked! Thanks
@@NNote-zs6eo This helped me as well
You don't explain it normally. The most explanatory videos I've ever watched. Thanks for your efforts
Very well explained, the logical sequence that you follow is fantastic, without a doubt one the most useful tutorials that I have seen, thank you for your contribution!
Hi, Python Engineer, thanks for your intuitive tutorial. I just followed your tutorial 08 and succesfully implemented the logistic regression on the wine data set and got an accurancy of 91.67%.
thanks man! the dataset and dataloader classes always confused me, but you explained it really clearly and now i successfully wrote my own dataset class.
This was an incredible video, thank you so so much! Gonna watch this entire series now.
Great, thank you!
wonderful explanation
Very nice tutorials... Thanks a lot. I hope we see more posts from you in the fields of PyTorch and deep learning.
was confused about Dataset and Dataloader, and came across this. Have not finished it yet, but, I know, this is what I needed. Thanks for sharing your knowledge in such a nice way. Please keep them coming. Cannot thank you enough.
Subscribed right away.
Glad you like it :) thank you!
Just wanna give you a hug for the awesome tutorial. Love you man ❤️
Thanks so much :)
very concise and informative ! I like such style's tutorial !
Very useful and clear. Thank you!!
Well done. Precise and to the point.
Thank you!
Very insightful. Thanks.
For people having errors with the datatier.next() line, showing something about freeze_support(), try to move all code except the WineDataset class under a main function, and call the main function with if __name__ == "__main__":
main(). Not sure why it solved the problem but worked for me!
Thanks, it helps me
@@grigorijdudnik575 how lad?
please keep uploading tutorials on pytorch , amazing tutorials!!!
thank you!
thankyou for this
Such a helpful reference, thanks again!
Thanks so much! This means a lot coming from you :)
The parallel workers did not work for me. I am using Win11. I had to set "num_workers=0" when calling DataLoader to get it to run.
Ty it helped alot
demystifying Machine Learning !
unpacking the hype!
pulling back the curtains.
WOW !!!!
Masterfully cutting through the fluff
@Patrick Loeber generations to come shall here of your exploits done here
Bravo indeed
Thanks for great lessons, you are doing amazing job in here, keep going.
Thanks :)
Danke dir!
appreciated..
excellent
Just genious
very simplistically explained. I was way too confused on how to use Dataloaders and Dataset class with csv's and use to end up simply converting to numpy floats and manipulating the whole learning process. It worked fine but then I failed to use the batch training. Wonderful tutorial
thank you! I'm glad it is helpful
Thanks, It's brilliant!
Glad you like it!
so useful! i've learned a lot!!!
anlatım çok iyi . cok iyi ingilizce bilmeye gerek yok . otomatik ceviri ile dersi dinledim , alanım yazılım değil ve yine de kodları kendi veri setimde çalıştırdım .
Help me a lot! Thanks!
glad it was helpful!
This was very helpful. Great work, have a sub
Thank you 🙏🏻
Dataloading was always a nightmare to me, until I watched your video. Thank you so much
Glad it helped!
In case you have bug like I had, try removing argument "num_workers" in the DataLoader function. It worked for me.
yep exactly!
in Windows you may also need to add at the beginning,
if __name__ == '__main__':
freeze_support()
# the rest of the code here
thank you so muchh! you are the best. please dont give us lonely! :)
glad you like it!
Ten-zaur!!!
thanks from China
thank you!
I've watched, coded and understand this video. How do I incorporate the train/test split with the dataset class and dataloader class? Thank you!
Hi, great tutorial. I was wondering, what if we want to have separate features returned? Would we return x1,x2,y from __getitem__ ? Plus, should we return 2 lengths from __len__ ? Thanks
Not all heroes wear capes. Thank you very much!
glad you like it!
Can we use to different features as input to the model, parallely?
And if yes, how can we define them while using enumerate function?
Hello,
Thank you very much for these tutorials, they're the best on UA-cam by far!
I was just wondering, what's the exact color theme you're using in VS Code ?
Keep up the great work! :)
in this video it's the monokai theme, now I use the Nightowl theme
If you have a low-end hardware and you get runtime errors try setting num_workers=0 in DataLoader()
very niceee
Thanks 🙏🏻
i ran into a error while running this code :
Error:
raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
RuntimeError: DataLoader worker (pid(s) 18988) exited unexpectedly
Solution:
set num_workers = 0
what is the explanation?
Hello! Thanks for your videos :) I've binged all these just as fast as I did some Netflix shows.
I was wondering if you have any videos on how to create custom datasets and dataLoaders for custom csv files using TensorDataSets? I'm trying to understand what structure a model() needs if I have custom rgb images from a csv file.
Thanks again for all your videos!
Schöne Sache! Grüße aus Hamburg!
Danke :) Grüße zurück
If anyone is getting an error implementing this code "data = dataiter.next()"
Try "data = next(dataiter) "
This code is tested on pytorch version 2.1.2
You didn't separate the trainset and testset. If you use all data for training, what are you going to test with? What really confused me was how to separate a dataset into train_x, train_y and test_x, test_y
how can i load data which is saved in .pkl format in my google drive ,and then performing codes above on that? thank you.
Hi thanks for the video! Quick question: why in our WineDataset's __init__() we did not use the super? In order to initialize the init of the superclass Dataset. Thanks
nice observation. this is simply because torch.utils.data.Dataset doesn’t define its own __init__ function, otherwise we should have done this
Great, thank you =)
Thanks for the video! I am applying a model that initializes variational parameters with random numbers in my PyTorch nn.Module called model (e.g., self.mu_q_alpha = nn.Parameter(torch.randn(10, 3,2))) . model() is called for every batch in every epoch to get the loss for that batch. I assume that I am mistaken, but my initial thought was that by calling model() in every batch, we also always reinitialize the values for the variational parameters to random numbers every time. Could you maybe give me feedback on that?:)
Can you suggest how to load leed sports database?The data uploaded on the website is not a csv file but rather two folders named data and visualized .I don't understand how to put them in dataloader
Extremely helpful. But, @5:53 why we write xy[ :,[0] ] instead of xy[ :,0 ]?
Thanks a lot for sharing your knowledge!!
12:03 when did we define input or labels so in 13:46 we can use them
we define it at 12:03 like you pointed to ;) this process is called unpacking, so we unpack the inputs and labels from the dataloader and define it in this line
Thanks for the series. Very good to learn PyTorh. For Dataloaer, line 69 produces "AttributeError: '_MultiProcessingDataLoaderIter' object has no attribute 'next'" error. Is it because this part of API has been modified?
use next(datatiter) instead of datatiter.next()
Hi. How can I convert my folder of images into csv file?
When I am trying to adapt this Dataset Class for my own use case I run into an error: RuntimeError: expected scalar type Long but found Float -> when I then read the data in with type np.int32 I get another error: RuntimeError: mat1 and mat2 must have the same dtype
I then tried to load the input with np.int32 and the labels (int class categories) with np.longlong and that finally worked.
In another tutorial about feed forward net, the labels are also class categories and the data is read with torchvision.datasets but no dtype is provided. Is this information present in the downloaded files and therefore also read in?
When my data is stored in json file, how can I adapt your method to my case. Thank you
This video was not clear :(
However, thank you very much. Your other videos are great!
thanks for the feedback!
How do I do when my dataset is 3D. In your case, you have N lines x M columns. In my case one sample is 2D, so I have 2 columns, two features, each of them has 18 points (it is a temporal series, 2 curves). Then I have 180 samples of these. I am confused about what a pass to my self.x
Don't let the x and y confuse you (the names make them sound like they refer to data dimensions, but they don't). In the examples that he is giving, x represents the training inputs and y represents the corresponding targets. This means that in your case, self.x would be your (180, 18, 2) inputs, while your y would be something like (180,) assuming your target is just one scalar value.
Hi I have a 8*9 array in which the last 2 columns are results then I made a tuple with 2tensors one with 8*7 and the other with 8*2 dimensions(I checked datasets of ur vids and it seemed they all were a tuple with 2 tensors) but I got this error "stack expects each tensor to be equal size, but got [8, 7] at entry 0 and [8, 2] at entry 1"
help me please
both need to be of same size, so either [8,2] or [8,7]
The code at 14:00 wouldn't work. I checked it through over and over again. Then after reading the blurb on Dataloader i removed the workers (making a default of zero) and it worked. Otherwise, apart from the file path to wine.csv, the code was the same. Any ideas about this?
Yes this is a known issue with Windows. You made the correct fix by setting the workers to 0. I should have mentioned this...
@@patloeber Thanks for your attention... albeit after hours of swearing at the computer!
@@sepgorut2492 sorry about the trouble! Great that you could figure it out!
Had same exact issue, this comment helped to fix it. Thanks.
dataiter = iter(dataloader)
data = next(dataiter)
pls help!!!
i get an error in my loss function... i have the same shape as you.. whats wrong??
Can you post a video for half precision tensors with pytorch autoencoder
thanks for the suggestion. I will have a look into this...
@@patloeber Thanks for the response.please post a video for tensorboard graph with gpu memory tracking vs the epoch and loss
Hi I have followed your approach on my dataset and I keep getting his error: "IndexError: Index (427) out of range for (0-99)". Do you have any idea why is that? I am trying to apply it on Windows 10 system. Is it a problem?
???????????????????????
can you pls explain collate_fn in dataloader
Hi Guys,
I got this error during the execution of the code. Could you please explain to me what is that?
RuntimeError: DataLoader worker (pid(s) 20384, 24284) exited unexpectedly
maybe set num_workers=0
what ide are u using?
VS code. One of my latest tutorials is about my editor setup
at 7:37, he used dataset= winedataset()
then first_data= dataset[0], what this means dataset is a object of winedataset class, how winddataset is returning first sample. can anybody explain
Basically, you have
*class Winedataset(dataset):*
So Winedataset is a class that inherits from Dataset class
Then you have
*dataset = Winedataset()*
So dataset is an instance of the Winedataset class.
Then you have
*first_data = dataset[0]*
Since dataset INSTANCE inherits from Dataset CLASS, it has a __getitem__() function. Now, I'm no programmer, but it seems to me like Dataset class is written in a way that indexing its instances in a fashion as seem above (that is, *InstanceOfDataset[some_number]* ) acts as the __getitem__() function above - that is, it returns the X[some_number] and Y[some_number] values. X and Y are defined in the __init__() method.
I hope it's not too late :)
@@horvathbenedek3596 its never too late , dear sir
when run the code it's show me error and show me multiprocessingdataloader has no attribute next
what IDE are you using
VS Code
all the singe batches
tensorflow users get no batches
How can we use pandas in dataloader
OSError: ./data/wine/wine.csv not found. from where i can download wine.csv, anybody can help
In my pytorch github repo: github.com/python-engineer/pytorchTutorial
@@patloeber Thank you very much,
Do you use Discord, i want to contact you
Not yet. You can contact me on twitter or email :)
Now do LARGE dataset on a 3080 hardware
Can you do a video on using a Dataloader on a very large dataset that will obviously not fit into gpu memory?
I'll have a look at that...
please explain what features and labels are instead of just saying them.
series is not beginner friendly, he directly uses the functions and modules. I dont know how these people are able to understand. We cant solve any other problems even after watching his series. So frustrating
watch at 1.5x speed. Feels like he slowed this video down by 2x