Two ways to impute missing values for a categorical feature

Complete Guide to Cross Validation

Train Test Split | Training and Testing data | Machine Learning

НА ЦЕ можна дивитись ВІЧНО! Такої ПАЛКОЇ зустрічі НІХТО НЕ ЧЕКАВ

ЧТО ОПАСНЕЕ? ОТВЕТЫ ВАС ШОКИРУЮТ... (1% ОТВЕЧАЮТ ПРАВИЛЬНО) #Shorts #Глент

«Я жити не хочу»: винесли «з нуля» пораненого побратима #shorts

Use stratified sampling with train_test_split

Data School

Переглядів 19 589

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 10 січ 2025

КОМЕНТАРІ • 23

@dataschool 3 роки тому ⁺¹
Thanks for watching! 🙌 If you're new to Machine Learning, I'd love for you to take my FREE 4-hour introductory course: courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn
@nhamientayds Рік тому ⁺¹
Stratified spliting is indeed a valuable technique used in machine learning and data analysis to maintain the distribution of categorical variables, such as keywords in your example, in both traning and test sets. This ensures that the data split accurately represents the overall distribution of the caregorical variable, helping to mitigate potential biases and maintain data intergrity
@python_by_abhishek 3 роки тому ⁺²
As always..brilliant..Your tutorials motivated me to become data scientist. Your tricks just make me more confident everytime in handling these types of queries.
@dataschool 3 роки тому ⁺¹
Thank you so much Abhishek! 🙏
@gaatutube 3 роки тому ⁺²
If we handle the class imbalance by first oversampling the minority class (using smote library for example), then is there any reason left to do a stratified split ? I thought training most models do need balanced datasets and hence handling the class imbalance is imperative. And thus rendering stratified sampling as a moot point since the classes are now 50%-50% balanced. Is this understanding correct / not-correct ?
@aleksandartta 3 роки тому ⁺⁴
Thank you Kevin for your fantastic tips! I have the question: If we have regression problem, where one of the feature has 3 unordered categories (every category has 100 samples), can we define in train_test_split that stratify="THAT FEATURE", i.e. can we do the stratify per features when make split? If cannot, how to provide class proportions when make split train and test data? Thank you in advance!
@aldenpadilla1773 2 роки тому ⁺¹
I have the same question. Did you figure it out? I have a linear regression model but I want to stratify a feature (say 3 locations). Will it work by writing 'stratify=location feature' or what's the correct way?
@anandvyavahare2031 3 роки тому ⁺²
Please do a separate video on how to tackle class imbalance.
@dataschool 3 роки тому ⁺³
Thanks for the suggestion! But it's a far larger topic than a single video. In fact, I have already created two full chapters on this topic for my upcoming ML course... stay tuned!
@hardikvig2723 Рік тому
What if we have a feature which is categorical (say 0 & 1) and strongly correlated to the response class, it also has class imbalance (1's are more & 0's are less).
How would we split that equally through our training and testing data so that proportion of 1's and 0's are similar in training and testing data??
@Meriem_Sellami Рік тому
Thank you! What about data with multi-labels ?
@robertobruzzese8830 Рік тому ⁺¹
what about the dataset is made of images?
@dataschool Рік тому ⁺¹
Great question! Stratified sampling is concerned with the target values, not the features, thus the type of input data is not important. Hope that helps!
@shreyjain6447 3 роки тому ⁺³
what if we had 3 not fraud and 5 fraud? How would stratify=y split then?
@dataschool 3 роки тому ⁺¹
Great question! There would be 1 "not fraud" in train and 2 in test, or 2 "not fraud" in train and 1 in test. For any dataset with a sufficient number of samples, it doesn't matter if the proprortions are identical, just that they are close. Hope that helps!
@mahikumar8813 3 роки тому
You're genius!
@dataschool 3 роки тому
Thanks!
@atulsingh-uy2he 3 роки тому
Helpful.. Thanks Kevin..!!
@dataschool 3 роки тому
You're welcome! Great to hear!
@Antraxdurazzo 2 роки тому
Thank you so much ! 😎
@dataschool 2 роки тому
You're welcome!
@syali3635 2 роки тому
Thanks Brother
@dataschool 2 роки тому
You're welcome!

Наступне

Автоматичне відтворення

Two ways to impute missing values for a categorical feature

Two ways to impute missing values for a categorical feature

Complete Guide to Cross Validation

Complete Guide to Cross Validation

Train Test Split | Training and Testing data | Machine Learning

Train Test Split | Training and Testing data | Machine Learning

НА ЦЕ можна дивитись ВІЧНО! Такої ПАЛКОЇ зустрічі НІХТО НЕ ЧЕКАВ

НА ЦЕ можна дивитись ВІЧНО! Такої ПАЛКОЇ зустрічі НІХТО НЕ ЧЕКАВ

ЧТО ОПАСНЕЕ? ОТВЕТЫ ВАС ШОКИРУЮТ... (1% ОТВЕЧАЮТ ПРАВИЛЬНО) #Shorts #Глент

ЧТО ОПАСНЕЕ? ОТВЕТЫ ВАС ШОКИРУЮТ... (1% ОТВЕЧАЮТ ПРАВИЛЬНО) #Shorts #Глент

«Я жити не хочу»: винесли «з нуля» пораненого побратима #shorts

«Я жити не хочу»: винесли «з нуля» пораненого побратима #shorts

Прочистка шлюзов

Прочистка шлюзов

How do I encode categorical features using scikit-learn?

How do I encode categorical features using scikit-learn?

4.6. Train Test Split | Splitting the dataset to Training and Testing data | Machine Learning Course

4.6. Train Test Split | Splitting the dataset to Training and Testing data | Machine Learning Course

Learn Machine Learning Like a GENIUS and Not Waste Time

Learn Machine Learning Like a GENIUS and Not Waste Time

Getting a random sample from your pandas data frame

Getting a random sample from your pandas data frame

Train Test Split with Python Machine Learning (Scikit-Learn)

Train Test Split with Python Machine Learning (Scikit-Learn)

Stratified Sampling to Split Train Test Validation Data | Machine Learning

Stratified Sampling to Split Train Test Validation Data | Machine Learning

Transformers (how LLMs work) explained visually | DL5

Transformers (how LLMs work) explained visually | DL5

How You Should Split Your Datasets in Machine Learning

How You Should Split Your Datasets in Machine Learning

8.2. Cross Validation - Python implementation | cross_val_score | Cross Validation in Sklearn

8.2. Cross Validation - Python implementation | cross_val_score | Cross Validation in Sklearn

ФИЛЬМ! НЕВИНОВНЫЙ ГОТОВИТ ДЕРЗКИЙ ПОБЕГ С НЕПРИСТУПНОГО ОСТРОВА-ТЮРЬМЫ! Мотылёк! Русский фильм

ФИЛЬМ! НЕВИНОВНЫЙ ГОТОВИТ ДЕРЗКИЙ ПОБЕГ С НЕПРИСТУПНОГО ОСТРОВА-ТЮРЬМЫ! Мотылёк! Русский фильм

У ДЕТЕНЫША СТЕПЫ ИСЧЕЗ ГЛАЗИК

У ДЕТЕНЫША СТЕПЫ ИСЧЕЗ ГЛАЗИК

#JasonDeruloTV // Funny #GotPermissionToPost From @SofiManassyan #SlowLow

#JasonDeruloTV // Funny #GotPermissionToPost From @SofiManassyan #SlowLow

ГРАВИТАЦИЯ! ВЫЖИВАНИЕ на ЛЕТАЮЩЕМ ОСТРОВЕ(DDprod.) в РАСТ/RUST

ГРАВИТАЦИЯ! ВЫЖИВАНИЕ на ЛЕТАЮЩЕМ ОСТРОВЕ(DDprod.) в РАСТ/RUST

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

НА ЦЕ можна дивитись ВІЧНО! Такої ПАЛКОЇ зустрічі НІХТО НЕ ЧЕКАВ

НА ЦЕ можна дивитись ВІЧНО! Такої ПАЛКОЇ зустрічі НІХТО НЕ ЧЕКАВ

Разобрался голыми руками 😎 #start #кино #фильм #сериал #молотведьм #полиция #пацаны

Разобрался голыми руками 😎 #start #кино #фильм #сериал #молотведьм #полиция #пацаны

Анна Трінчер - Треш (Official Music Video)

Анна Трінчер - Треш (Official Music Video)