5 ways to work with imbalanced data | Imbalanced dataset machine learning | Imbalanced data

Unfold Data Science

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 26 лис 2024

КОМЕНТАРІ • 47

@enchanted_swiftie 2 роки тому ⁺⁹
I was at the same problem for the imbalance in dataset and by then I researched for different methods to take on. Here I am presenting my shortlist that I have created which might help you somewhere.
Possible Solutions:
1. Make some changes in the algorithm
• Adjust the class weight so it becomes sensitive to the minority class
• Adjust the decision threshold (we can check by PR curve)
• Penalize the algorithms by putting class_weight='balanced'
2. Discard the minority examples and treat all classes as one
• Here we can treat the problem as the "anomaly detection" problem instead of classification
For anomaly detection "Isolation forest" tend to give promising results
3. Balance the dataset by sampling
• Undersample
• Oversample & SMOTE
4. Ensemble learning by downsampling
• It bootstraps different samples and each time it will balance the classes by undersampling
the majority classes and then aggregates the results for voting
5. Usage other techniques
• Algorithms such as Tomek links (which removes k nearest majority pair to increase division)
• Focal loss
I have also tried to look for the kaggle notebooks there people have also found out that XGBoost slightly outperforms other algorithms even it would require to give different class weights.
-
This was my cheat sheet of the 5 ways. Share your thoughts!!
@UnfoldDataScience 2 роки тому ⁺⁴
Very good explanation and thanks for putting the learning here. I will pin this comment on top for others benefit.
My view - Data Science is all about trying/experimenting/failing and learning. Then something very good comes up.
@enchanted_swiftie 2 роки тому ⁺³
@@UnfoldDataScience Won't lie, but when I started watching your videos, your explanations made things much simpler. You know, I was used to freak out (sorry for the words) by listening DBSCAN, Hierarchical Clustering and what not, but when I see those topics explained by you I feel so comfortable that now I would understand this. How simply but accurately you explain without missing the important things.
PS: I was introduced to assumptions of linear regression by your channel. Before that I knew the model, came to know that there is something called "assumptions" and how important are they!! Totally missed by the instructions on online courses! Your channel is a huge contribution to the data science community on YT.
@KastijitBabar 6 місяців тому ⁺¹
You are the best Data Science And Machine Learning Teacher I have ever seen. Thanks a lot!!
@UnfoldDataScience 6 місяців тому
You are welcome!
@sreebvmcreation9388 4 місяці тому
Thank you sir, iam searching methods for imbalaced data , finally i got the methods with your video.Thank u so much once again. All in methods which one is best method .
@karthebans248 2 роки тому
Learned new things about the balancing of data sets for Imbalanced data sets. Thanks.
@UnfoldDataScience 2 роки тому
Welcome.
@Samtoosoon 25 днів тому
Undersampling, oversampling minority class, combo, ensemble random forest, batch selection
@nivednambiar6845 2 роки тому
An important concept when dealing with classification
Thanks for sharing Aman 👍👍
@UnfoldDataScience 2 роки тому
Thanks Nived.
@zahedinima732 2 роки тому
Such a clear and concise explanation. Thank you, Aman!
@UnfoldDataScience 2 роки тому
Thanks A lot.
@ayushparihar5989 Рік тому
Good explanation
@mamataparab9803 2 роки тому
Hello Aman, this is the third time I have watched this video, simply to learn your way of explaining things. Is it possible for you to create a video or give us some notes so we can find all the important questions for ensembling techniques?
@UnfoldDataScience 2 роки тому
Thanks Mamata, I do keep sharing on Instagram, please follow "unfolddatascience" On Instagram.
@mamataparab9803 2 роки тому
Sure, Aman. Thank you
@atod2572 2 роки тому
Awesome explanation. Can you please tell us when we use which technique? I mean with an example of dataset and selection of sampling technique.
@NeeRaja_Sweet_Home 2 роки тому
Hi Aman,
In most of videos we could see imbalanced Dataset for classification problems but how to check and Handle imbalanced Dataset for regression problem.
Thanks,
@bijaynayak6473 2 роки тому
Very Nice explanation kudos
@UnfoldDataScience 2 роки тому
Thanks for liking Bijay
@sadhnarai8757 2 роки тому
Very nice Aman
@UnfoldDataScience 2 роки тому
Thank you
@younesgasmi8518 10 місяців тому
Can I use oversampling or undersampling before Splitting the dataset into training and testing ?
@riva.4484 Рік тому
Thank you so much! This video help me a lot.
I have a question, how can we choose and decide which way is the best fit for our imbalance dataset?
@UnfoldDataScience Рік тому
Its always trial and error.
@dd3371 2 роки тому
Thanks very much for sharing and explaining. What's your thought on logistic regression? Would imbalanced data still a problem if you build the model in GLM using logistic regression?
@swapnilgiram1355 2 місяці тому
Can we use smote technique
@avikdinda7827 4 місяці тому
If oversampling gives data leakage issues in total data? Or if I use smote in train data after the train test split it is giving poor precision to the minority however recall is ok...so what do I do to improve the precision of the minority class?
@nagarajsundar7931 2 роки тому
Hi Aman, Thanks for explaining various method. One question, when to use which method ?
@UnfoldDataScience 2 роки тому
Thanks Naga, cant have like one to one go for rule. some pointers are there which I can cover in different video, thanks for asking
@dhanushraj3697 2 роки тому
The video was good but i request to add some extra information and explanation for each methods.
@dilshadmuhammed8224 11 місяців тому
in my case i have more than 2 classes and those classes are in text ,for eg- well being , business analytics etc
how will balance such classes
@chalmerilexus2072 2 роки тому ⁺¹
Which method is preferable?
@UnfoldDataScience 2 роки тому
This is discussed towards end.
@snehalvaidya5843 2 роки тому
Thanks for sharing knowledge 🙂, plz share how to explain PCA in front of interviewer..
@UnfoldDataScience 2 роки тому
ua-cam.com/video/osgqQy9Hr8s/v-deo.html
@mihretdesta9153 Рік тому
hey sir, how about imbalanced image data for deep learning?
@UnfoldDataScience Рік тому
Data augmentation is one option.
@maasahebbiustad8514 2 роки тому
Hello sir, How to solve A Classification problem in which training data has only one class? 'This solver needs samples of at least 2 classes in the data, but the data contains only one class: 1', please help me out
@tharindumadusanka3038 2 роки тому
i am doing MBA using apriori algorithm by using google colab. the problem is when i use more than 20 rows in csv transaction data it displays error. if the no of rows is less than 20 expected result come.
@UnfoldDataScience 2 роки тому
Thats not number of rows problem, some hidden issue may be there with row number 21 probably. I am just guessing.
@ratnajyotibhowmick9801 2 роки тому
Please share the source of the notebook. Thanks.
@UnfoldDataScience 2 роки тому ⁺¹
drive.google.com/drive/u/0/folders/13pZrCIqk1XN6W4I95A07bK8YRHBB3btt
@hasantalib6254 Рік тому
Hello
I’m irritated to know from you how can deal with unbalanced penal data ? How can i transform the data when there is missing year ??
@PalaSheshu111 2 роки тому
github link

Наступне

Автоматичне відтворення

Machine Learning for Everybody - Full Course