AI Basics: Accuracy, Epochs, Learning Rate, Batch Size and Loss

Early Stopping. The Most Popular Regularization Technique In Machine Learning.

155 - How many hidden layers and neurons do you need in your artificial neural network?

«Шнурки не зрізайте, акуратненько»: медик про реакцію військових на поранення #shorts

СКОЛЬКО ИХ...?! #Shorts #Глент

«Я жити не хочу»: винесли «з нуля» пораненого побратима #shorts

The Wrong Batch Size Will Ruin Your Model

Underfitted

Переглядів 19 735

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 25 гру 2024

КОМЕНТАРІ • 32

@ErlendDavidson 2 роки тому ⁺²⁴
If you scale the batch size by the learning rate (i.e. lr=(batch_size/32.)*0.01) then the stochastic gradient descent looks sort of okay here.
@underfitted 2 роки тому
Interesting :)
@jasdeepsinghgrover2470 2 роки тому ⁺²
I completely agree ... Because the number of updates happening depend on batch size and even the size of the update. So if the learning rate is scaled according to batch size linearly the model can perform very well even with much smaller batches.
@Metryk 11 місяців тому ⁺²
Hi! Maybe you can help me with this one: if I want to test an already pre-trained image classifier, how do I proceed regarding the amount of images used? The set containing test images has 100k images, I guess it wouldn't make any sense to load them all at once, so how do I proceed? Thanks!
@OliverHennhoefer 2 роки тому ⁺⁴
Really like the videos. However, I want to warn against the general statement that a batch size of one is not recommended. It really depends on the problem/data. So don't simply dismiss stochastic gradient descent, try it!
@underfitted 2 роки тому ⁺¹
I think that’s fair. I’ve never used it in any of the problems I’ve worked on, but you are right.
@edmundfreeman7203 2 роки тому ⁺²
This is the kind of thing that I hate about deep learning. A single parameter in the optimization method can completely change the results. Batches should be small but not too small. How small? That's for heuristics but will change on different data sets.
@lakeguy65616 2 роки тому ⁺³
so, what is the optimal batch size?
@underfitted 2 роки тому ⁺¹
It depends. Start with 32 and experiment from there.
@lakeguy65616 2 роки тому ⁺¹
@@underfitted Does the amount of main memory Ram or GPU ram make a difference? (great videos!)
@underfitted 2 роки тому ⁺²
It does! Your batch has to fit in memory, or it won't work. When you are working with images, for example, you'll quickly find that your batch size can't be too large if you want to fit it in the GPU's memory.
@ErlendDavidson 2 роки тому ⁺⁵
What do you think of (artificially) adding noise to the learning rate. I feel like it used to be more popular to do that, but almost never see it these days.
@underfitted 2 роки тому ⁺²
Yeah… never seen that honestly. I’ve used schedules to decrease the learning rate over time, but never read about adding noise to it.
@johnmoustakas8897 2 роки тому ⁺²
Good work, hope your channel gets more attention
@underfitted 2 роки тому
Thanks, John! It takes time and work but I’ll make it happen.
@Agrover112 2 роки тому ⁺²
Hey love this video! Was losing touch of the basics !
@underfitted 2 роки тому ⁺¹
Glad it was helpful!
@OmarBoukchana Рік тому
i didnt see a helpful video like this one in the entire internet, thank you ♥
@underfitted Рік тому
Glad it was helpful!
@Darkraak 2 місяці тому
Great video man 👏
@Levy957 2 роки тому ⁺¹
Amazing!!
Did u know why the batch size os always 32, 64, 128?
@underfitted 2 роки тому ⁺²
I read somewhere about the ability to fit batches in a GPU... can't remember where exactly. That being said, I've seen experiments that show that it really doesn't matter much (if at all.)
@MrAleksander59 2 роки тому ⁺¹
It's better for memory usage. GPU, CPU, hard drives, SSD and other in the current 2-bit logic uses memory blocks with sizes of power 2. 2^5 = 32, 2^6=64, 2^7=128 etc. You always want maximum usage of memory. For example you have array with floats, each float will take 32 bits. So, at least it divisible by 32.
@axelanderson2030 Рік тому
If you generate a dummy dataset and set a static learning rate, then smaller batch sizes work better? wtf?
@ziquaftynny9285 2 роки тому
I love your presentation style! Very energetic :)
@underfitted 2 роки тому
Thanks
@muhammadtalmeez3276 2 роки тому
Your videos are amazing. Thank you so much for this great knowledge and beautiful videos.
@underfitted 2 роки тому ⁺¹
Glad you like them!
@akshay0072 7 місяців тому ⁺¹
Good content. Try improving ur way of teaching. Learning should in relaxed tone
@underfitted 7 місяців тому ⁺¹
Thanks! This was an old video. I’ve tried to improve in the latest few.
@michaelsprinzl9045 8 місяців тому ⁺¹
A new cat video. Cute.
@sarahpeterson2702 Рік тому
the question is whether if you use a batch and reach the global minimum is your model functionally equivalent to one that didn't batch? Are the weights identical... no they aren't . if your model is generative you don't have equivalence with batch/non batch.

Наступне

Автоматичне відтворення

AI Basics: Accuracy, Epochs, Learning Rate, Batch Size and Loss

AI Basics: Accuracy, Epochs, Learning Rate, Batch Size and Loss

Early Stopping. The Most Popular Regularization Technique In Machine Learning.

Early Stopping. The Most Popular Regularization Technique In Machine Learning.

155 - How many hidden layers and neurons do you need in your artificial neural network?

155 - How many hidden layers and neurons do you need in your artificial neural network?

«Шнурки не зрізайте, акуратненько»: медик про реакцію військових на поранення #shorts

«Шнурки не зрізайте, акуратненько»: медик про реакцію військових на поранення #shorts

СКОЛЬКО ИХ...?! #Shorts #Глент

СКОЛЬКО ИХ...?! #Shorts #Глент

«Я жити не хочу»: винесли «з нуля» пораненого побратима #shorts

«Я жити не хочу»: винесли «з нуля» пораненого побратима #shorts

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

Should You Stop Splitting Your Data Like This?

Should You Stop Splitting Your Data Like This?

Why LLMs Are Going to a Dead End Explained | AGI Lambda

Why LLMs Are Going to a Dead End Explained | AGI Lambda

136 understanding deep learning parameters batch size

136 understanding deep learning parameters batch size

The Function That Changed Everything

The Function That Changed Everything

8 Mistakes Holding Your Career Back | Machine Learning

8 Mistakes Holding Your Career Back | Machine Learning

Regularization in a Neural Network | Dealing with overfitting

Regularization in a Neural Network | Dealing with overfitting

How to Configure and Tune Batch Size for your Neural Network?

How to Configure and Tune Batch Size for your Neural Network?

154 - Understanding the training and validation loss curves

154 - Understanding the training and validation loss curves

The Most Important Algorithm in Machine Learning

The Most Important Algorithm in Machine Learning

Сестра обхитрила!

Сестра обхитрила!

УГАДАЙ КОНТЕЙНЕР - ЗАБЕРИ ТАЧКУ! Новогодний выпуск!

УГАДАЙ КОНТЕЙНЕР - ЗАБЕРИ ТАЧКУ! Новогодний выпуск!

МАФИЯ в РЕАЛЬНОЙ ЖИЗНИ: Дубровский, Позов, Мамикс, Катя Клэп, Егорик, Кадрол, Столяров, Масленников

МАФИЯ в РЕАЛЬНОЙ ЖИЗНИ: Дубровский, Позов, Мамикс, Катя Клэп, Егорик, Кадрол, Столяров, Масленников

😳Трамп ПОТІШИВ Скабєєву, але одразу РОЗЧАРУВАВ #shorts

😳Трамп ПОТІШИВ Скабєєву, але одразу РОЗЧАРУВАВ #shorts

Психіатр Глузман УПЕРШЕ сканує Зеленського, Путіна й Трампа

Психіатр Глузман УПЕРШЕ сканує Зеленського, Путіна й Трампа

Мама загинула у блокадному Чернігові, а тато у полоні РФ #війна #люди #україна #shorts #смерть

Мама загинула у блокадному Чернігові, а тато у полоні РФ #війна #люди #україна #shorts #смерть

Уличный боец с ДУХОМ воина

Уличный боец с ДУХОМ воина

Заява ЗАЛУЖНОГО ШОКУВАЛА увесь СВІТ😱ТРЕТЯ СВІТОВА ВІЙНА ПОЧАЛАСЬ?

Заява ЗАЛУЖНОГО ШОКУВАЛА увесь СВІТ😱ТРЕТЯ СВІТОВА ВІЙНА ПОЧАЛАСЬ?