Grate Video! Thanks for the amazing playlist! One comment about the batch size analysis: usually we increase the learning rate with the same rate we increase the batch size! This seems to mitigate the convergency issue shown in your analysis.
I am a little confused on how the parameters (weights) are updated after the batch has been processed. If two different observations in the training set go through the same nodes in the network, it would seem that the contribution the first observation made to changes in the weights would be lost when the second observation pass through the weights since the weights are not changed until the batch is competed. I am obviously missing something, could someone point me in the right direction.
Please search for backpropagation and when you look into math you may find the answer. In short, the information gets accumulated during training from individual data in a batch and then gets (summed/averaged). Same holds true for batches vs epochs.
I have 100milion rows dataset , I want to do preprocessing for NLP (like tokenization , rearranging , label encoding etc..) how should I approach this problem .. pls help me
It seems to me that the optimal batch size is a function of how large the training dataset is. Using your example, you've chosen 32 as batch size with a dataset of 3000 rows. That means each batch is approximately .011% of the dataset. If your dataset was much larger, (for example: 1,000,000 rows), wouldn't that imply that you should choose a batch size of 11,000? that assumes that 11,000 rows fits within the system ram and gpu ram? Am I on the right track here? (great video!)
The problem is that the batch size as an optimization parameter also depends on other hyperparameters that need to be trained. In addition to that it will depend on your dataset and its complexity. If you have 20K images you can easily take a 1028 (provided you have a good GPU), if you have 2000 images, its too high. If you have a dataset of 400 images, you'd aim lower batch of 4 to 32 would be appropriate
@@SUNSZED I have done a little more research into this question. The correct batch size introduces an appropriate level of "noise" into the traning loop. Too much or too little noise will hinder training. This is based on personal experience.
@@lakeguy65616 Essentially depends on the dataset, in the case of "easy" dataset, you could afford the range of "appropriate" is wide and goes on the lower side. I think there's an hyperparam optimization automation that can be activated with YOLOv5 for object detection.
Sir, I am still confuse in this. if we have 500 images and we want to set batch size=20 so 500/20= 25 samples in each batch and epoch size is 5 so each epoch 25 sample will be given to model as forward pass and update weights right ?. my question is after given 25 samples and what about next epoch same 25 samples are given or other 25 samples from dataset which were not shown to model ? please answer my question.
Very informative and great video. I am able to learn it first after watching these videos. While explaining the batch size you mentioned that in 1 epoch the model covers all the samples in 94 iterations. I understand that in each batch operation the weight and biases are updated for those samples and then moved forward for next batch. If by 94 iterations all the samples are already visited then what is the use of 5000 epochs? Could you please explain that too. If someone knows the answer please welcome. Thanks once again for such wonderful videos. I am an Msc student and happily learning from this source.
I have seen this very often, that the batch size is 2 to some power (4 ,16 ,32 , 64 etc). Any reason behind that? If you have say 3000 samples, why not use a divisible batch size, such as 50?
Nothing wrong in using any batch size. Your processors come with a memory that is power of 2, it makes sense to use a batch size that maximally fits your memory. This is why we choose batch sizes 2, 4, 8, 16, 32, 64, etc.
After training your model using keras, you can save the model as h5 (hdf5 format). Please watch my latest video on understanding h5 files. ua-cam.com/video/_a-UYLfF6TE/v-deo.html&lc=UgwIE83hIAjTpcJ_ZT54AaABAg
Outstanding explanation !!! I want to know why we need 200 epoch as in each epoch all 1000 data is passing through the model. Why only one epoch is not enough as each epoch use hole dataset ?
The solution will not converge in one epoch. You need many epochs for the model to minimize loss function to a stage where your weights and biases are appropriately adjusted for the problem you are trying to solve. If your dataset size is humongous you may have a good solution after one epoch.
itecture @@DigitalSreeni Tnank you for your kind reply. Can you please make videoes on the topics like YOLO , RCNN, Faster RCNN models ? I do not find any appropriate tutorial where someone teach from scratch .
Batch size can be any number, not necessarily power of 2. But using batch sizes of power of 2 can help with optimal memory usage. Here is a discussion that may help you: datascience.stackexchange.com/questions/20179/what-is-the-advantage-of-keeping-batch-size-a-power-of-2
So far the most clear and concise explaination
Thanks
Thanks for the great video. It is useful to see how batch size affects the model.
Grate Video! Thanks for the amazing playlist!
One comment about the batch size analysis:
usually we increase the learning rate with the same rate we increase the batch size! This seems to mitigate the convergency issue shown in your analysis.
Great example!
Glad you liked it
This is a great video - I am happy i found your channel. It is amazing.!
Thank you for the great content! As always, very helpful and interesting to watch
My pleasure!
Great explanation sir, Thanks for sharing knowledge :)
Most welcome!
Amazing Explanation. Great Work
I am a little confused on how the parameters (weights) are updated after the batch has been processed. If two different observations in the training set go through the same nodes in the network, it would seem that the contribution the first observation made to changes in the weights would be lost when the second observation pass through the weights since the weights are not changed until the batch is competed. I am obviously missing something, could someone point me in the right direction.
Please search for backpropagation and when you look into math you may find the answer. In short, the information gets accumulated during training from individual data in a batch and then gets (summed/averaged). Same holds true for batches vs epochs.
Amazing explanation.... and Amazing demonstration...
Thank you. Good tutorial. Good topic, well prepared and excellently explained.
Glad it was helpful!
I have 100milion rows dataset , I want to do preprocessing for NLP (like tokenization , rearranging , label encoding etc..) how should I approach this problem .. pls help me
thank you so much for the explanation and the striking demonstration!
It seems to me that the optimal batch size is a function of how large the training dataset is. Using your example, you've chosen 32 as batch size with a dataset of 3000 rows. That means each batch is approximately .011% of the dataset. If your dataset was much larger, (for example: 1,000,000 rows), wouldn't that imply that you should choose a batch size of 11,000? that assumes that 11,000 rows fits within the system ram and gpu ram? Am I on the right track here? (great video!)
The problem is that the batch size as an optimization parameter also depends on other hyperparameters that need to be trained. In addition to that it will depend on your dataset and its complexity. If you have 20K images you can easily take a 1028 (provided you have a good GPU), if you have 2000 images, its too high. If you have a dataset of 400 images, you'd aim lower batch of 4 to 32 would be appropriate
@@SUNSZED I have done a little more research into this question. The correct batch size introduces an appropriate level of "noise" into the traning loop. Too much or too little noise will hinder training. This is based on personal experience.
@@lakeguy65616 Essentially depends on the dataset, in the case of "easy" dataset, you could afford the range of "appropriate" is wide and goes on the lower side. I think there's an hyperparam optimization automation that can be activated with YOLOv5 for object detection.
Sir, I am still confuse in this. if we have 500 images and we want to set batch size=20 so 500/20= 25 samples in each batch and epoch size is 5 so each epoch 25 sample will be given to model as forward pass and update weights right ?. my question is after given 25 samples and what about next epoch same 25 samples are given or other 25 samples from dataset which were not shown to model ? please answer my question.
Very informative and great video. I am able to learn it first after watching these videos. While explaining the batch size you mentioned that in 1 epoch the model covers all the samples in 94 iterations. I understand that in each batch operation the weight and biases are updated for those samples and then moved forward for next batch. If by 94 iterations all the samples are already visited then what is the use of 5000 epochs? Could you please explain that too. If someone knows the answer please welcome. Thanks once again for such wonderful videos. I am an Msc student and happily learning from this source.
In the next epoch, the model will update its parameter again.
You make the AI is simple ...
Very clear explanation 👍 👍
Glad you think so!
Thank you very very very much,this is deliciously useful
Excellent video!
I have seen this very often, that the batch size is 2 to some power (4 ,16 ,32 , 64 etc). Any reason behind that? If you have say 3000 samples, why not use a divisible batch size, such as 50?
Nothing wrong in using any batch size. Your processors come with a memory that is power of 2, it makes sense to use a batch size that maximally fits your memory. This is why we choose batch sizes 2, 4, 8, 16, 32, 64, etc.
@@DigitalSreeni Makes sense. Thanks!
Can use this code with GAN ???
And xtrain what choice? Real or fake image?
love your videos.
And what is the timestep?
How can I construct h5 file?
After training your model using keras, you can save the model as h5 (hdf5 format). Please watch my latest video on understanding h5 files. ua-cam.com/video/_a-UYLfF6TE/v-deo.html&lc=UgwIE83hIAjTpcJ_ZT54AaABAg
Outstanding explanation !!! I want to know why we need 200 epoch as in each epoch all 1000 data is passing through the model.
Why only one epoch is not enough as each epoch use hole dataset ?
The solution will not converge in one epoch. You need many epochs for the model to minimize loss function to a stage where your weights and biases are appropriately adjusted for the problem you are trying to solve. If your dataset size is humongous you may have a good solution after one epoch.
itecture @@DigitalSreeni Tnank you for your kind reply. Can you please make videoes on the topics like YOLO , RCNN, Faster RCNN models ? I do not find any appropriate tutorial where someone teach from scratch .
Thanks for this helpful video
It's a helpful sir
Why is your batch size the number two to the power of n?
Is it because of the pixel size of the images?
Batch size can be any number, not necessarily power of 2. But using batch sizes of power of 2 can help with optimal memory usage. Here is a discussion that may help you: datascience.stackexchange.com/questions/20179/what-is-the-advantage-of-keeping-batch-size-a-power-of-2
@@DigitalSreeni thank you!
You are awesome. Thank you very much.
Thank you so much.
Always welcome
Thank you ! Great tutorial :)
Glad it was helpful!
Thanks 🙏