SMOTE (Synthetic Minority Oversampling Technique) for Handling Imbalanced Datasets

Поділитися
Вставка
  • Опубліковано 2 гру 2024

КОМЕНТАРІ • 152

  • @bhattbhavesh91
    @bhattbhavesh91  5 років тому +29

    Something went wrong while using pd.crosstab! So the updated confusion matrices are as follows -
    At 7:50
    The correct confusion matrix is
    92303 14
    1535 135
    At 10:30
    The correct confusion matrix is
    93798 41
    40 108
    Sorry for the mistake :)

    • @sahubiswajit1996
      @sahubiswajit1996 5 років тому +3

      Why we are using "random_state=12" ?

    • @chrislam1341
      @chrislam1341 5 років тому

      @@sahubiswajit1996 it is just his preference, for being able to get the same result from the randomness.

    • @sumitshukla3689
      @sumitshukla3689 4 роки тому

      When we apply SMOTE, the number of samples doesn't changes. But as explained by you, if we are adding some synthetic samples, the training example should also increase right??

    • @KumarHemjeet
      @KumarHemjeet 3 роки тому

      @@sahubiswajit1996 you can take any number

    • @elliothank2823
      @elliothank2823 3 роки тому

      I guess it's kinda off topic but does anybody know a good site to stream new tv shows online ?

  • @prathameshmohite3008
    @prathameshmohite3008 5 років тому +9

    Hi Bhavesh,
    Very good explanation. I was particularly confused about implementing SMOTE on the main data. But I guess you're correct that we must implement SMOTE on training data.
    Thank You

  • @bhargav7476
    @bhargav7476 4 роки тому

    You have no idea how helpful that was

  • @SurajSingh-pw9ew
    @SurajSingh-pw9ew 4 роки тому

    Thanku Bhavesh❣️❣️.Bina bore kiye padhaya 👏🏻👏🏻👏🏻 excellent

  • @siddharthkenia9089
    @siddharthkenia9089 3 роки тому

    Not only you explained really well the illustration were perfect for a beginner to understand what oversampling mean. Thank you:)

  • @ddxccc
    @ddxccc 3 роки тому

    Most helpful and professional video I found on SMOTE. Thanks a lot!

  • @bishalmohari8748
    @bishalmohari8748 3 роки тому

    I started watching the undersampling video for a problem and ended up watching the full series cause of how well explained they are. Gald I discovered your channel! Wish I did sooner xD

  • @zypeLLas
    @zypeLLas Рік тому

    I'll come back to this video. Seems helpful!

  • @dhananjaykansal8097
    @dhananjaykansal8097 5 років тому +1

    Your handwriting is pretty. Thanks for the explanation once again. Cheers!

  • @shandou5276
    @shandou5276 3 роки тому

    This is very well done :) Nothing overly flashy and yet very clear.

  • @bhuvneshsaini93
    @bhuvneshsaini93 5 років тому +4

    Hi, you used only two target 0 and 1 , how to do with more than two . Suppose target 1 is around 2000 , target 2 is around 200 , target 3 is around 11 and so on.

  • @KaushikJasced
    @KaushikJasced 2 роки тому +3

    Thank you sir for giving a wonderful lecture. Can you tell me how I can put the sampling ratio as per my choice instead of 1:1 using SMOTE?

  • @danielniels22
    @danielniels22 3 роки тому

    6:20 what library u imported before declaring SMOTE() class?

  • @powellmenezes584
    @powellmenezes584 5 років тому +6

    even i have this doubt -
    Hi, you used only two target 0 and 1 , how to do with more than two . Suppose target 1 is around 2000 , target 2 is around 200 , target 3 is around 11 and so on.

    • @TheRaviraaja
      @TheRaviraaja 4 роки тому

      arxiv.org/pdf/1106.1813.pdf - check out algorithm, neighbours does matters.

  • @harshparikh7060
    @harshparikh7060 3 роки тому

    Thanks, Bhavesh!

  • @princeok12
    @princeok12 5 років тому +4

    Very well explained Thank you. Especially appreciated the explanation of nearest neighbor

  • @charmilam920
    @charmilam920 3 роки тому +1

    Thank you for this video. Understood SMOTE very well. Please make videos more often and How do you explain things so effortlessly with such clarity ? Where is this clarity coming from ? Great job

  • @srikrshnap6036
    @srikrshnap6036 Рік тому

    Lovely Explanation! Thank you!

  • @nesrinehadjamar2197
    @nesrinehadjamar2197 Рік тому

    Thank you ! Simple and clear explanation

  • @sirvachjumani7215
    @sirvachjumani7215 3 роки тому

    Hi Bhavesh, very nicely explained can you please tell me the literature of the following examples. thanks

  • @7810
    @7810 4 роки тому +1

    Quite interesting! Thanks for the lesson.

  • @MarsLanding91
    @MarsLanding91 4 роки тому +2

    Thank you for this video! 2 thumbs up! Question - at 4:06 you selected KNN = 3 but I didn't see you applying that concept in the code section. Can you please elaborate on where you set KNN as 3 in the code section? Did I misunderstand something?

    • @IykeDx
      @IykeDx 8 місяців тому

      When KNN is not stated, the default is 5.

  • @thomasayele5389
    @thomasayele5389 Рік тому

    Excellent explanation!

  • @MY_PARIDE
    @MY_PARIDE 4 місяці тому

    Great Explanation....👏

  • @shishirdixit5996
    @shishirdixit5996 4 роки тому +1

    Here while fitting the training dataset after tuning hyperparameters using gridsearchcv why you have used X_train and y_train and why not X_train_res and y_train_res dataset

  • @Nirja3
    @Nirja3 3 роки тому +1

    When I tried to set up the smote ration, getting invalid ratio parameter for SMOTE.Can u help?

  • @AizirekTolonova-od1ks
    @AizirekTolonova-od1ks 6 місяців тому

    Thank you so much for the great explanation!

  • @ganeshreddypuli3101
    @ganeshreddypuli3101 3 роки тому +1

    If we want to normalize the data as well, should we do it before applying SMOTE?

  • @shishirdixit5996
    @shishirdixit5996 4 роки тому +1

    I have a categorical dependent variable with 3400 records in which the distribution of 0s and 1s are 2677 and 723 respectively, Will this be considered as an imbalanced dataset ? or if I would have 1s less than 5% of the total record only then it would be considered as imbalanced. Kindly clarify the doubt

  • @JT2751257
    @JT2751257 4 роки тому

    cello pointec- bachpan ki yaad dila di :)

  • @adityaraikwar6069
    @adityaraikwar6069 Рік тому

    very informative video, simple and to the point keep it up

  • @jampavy6446
    @jampavy6446 2 роки тому

    Nice explanation

  • @WordofSpirit
    @WordofSpirit 2 роки тому

    Looks like the weights is also not working on smote. Any alternative way to test different weights?

  • @bintehawa7712
    @bintehawa7712 Рік тому

    Thanks to explain with notes help me alot

  • @sparshdutta
    @sparshdutta 5 років тому

    Thanks for teaching new stuff.☺

  • @Asma-cx8uc
    @Asma-cx8uc 3 роки тому

    Hello Sir !
    Could you please describe how SMOTE technique can be used to balance data images

  • @ankushjamthikar9780
    @ankushjamthikar9780 4 роки тому

    Very Good Explanation. But, can we use this method for multiclass problem? Also, does SMOTE leads to overfitting issue?

  • @elaf8256
    @elaf8256 3 роки тому

    How we can overcame the problem of Overlapping when used SMOTE??

  • @karndeepsingh
    @karndeepsingh 4 роки тому

    Very well explained sir!!!

  • @MrFcapri
    @MrFcapri 3 роки тому

    kindly tell me I have 5 classes imbalanced data set. SMOTE will work for multi CLASS data set ?

  • @0SIGMA
    @0SIGMA 3 роки тому

    You are some DOPE shit brother and by that i mean youre really good ! explained the important stuffs like only on train set beautifully ! really great !

  • @priyas8871
    @priyas8871 2 роки тому

    Can u please tell how this SMOTE can be applied for streaming data- In Test then Train Framework??

  • @shwetasharma1996
    @shwetasharma1996 4 роки тому

    Nice content! I would like to compare some techniques of oversampling.. Can you pl help me out to get the hard code of SMOTE not the packaged one..thanks

  • @achyuthvishwamithra
    @achyuthvishwamithra 3 роки тому

    When the final ratio came out to be 0.005, doesn't it imply that the we are going to be generating a very small number (0.005 * majority) of samples for the minority class? How will the length of minority class samples ever be equal to that of majority class?

  • @UsmanAhmedKhi
    @UsmanAhmedKhi 3 роки тому

    Thanks alot. You mk it so simple :) Liked n subscribed bro.

  • @sadiaafrin7143
    @sadiaafrin7143 4 роки тому

    Good work man! Thanks

  • @mirroring_2035
    @mirroring_2035 2 роки тому

    in your crosstab function you have y_test[target]. What is that? why is target used to index the y_test object?

  • @sridhar6358
    @sridhar6358 4 роки тому

    so the idea of opting for ratio parameter in SMOTE to be a hyperparameter is to ensure we get better results is that correct, in general is it a good option to make ratio option of SMOTE to be a hyperparameter rather then fixing it to 1

  • @spadbob24
    @spadbob24 3 роки тому

    thank you so much - very informative video

  • @helll5894
    @helll5894 4 роки тому +1

    What if there are more than 2 classes? In your video Sir, there are only 2 classes.. For example, I want to make 3 classes.. How can I implemented 3 classes on python use SMOTE?? Thank you, Sir

  • @hieunguyenvan6590
    @hieunguyenvan6590 2 роки тому

    Do you need to remove outliers of dataset if you SMOTE?

  • @bhagyashreeln1304
    @bhagyashreeln1304 2 роки тому

    Hi, what do we do if we have a balanced dataset but still want to increase the number of rows

  • @channel-lk6xz
    @channel-lk6xz 11 місяців тому

    I don't understand how we infer from auc roc. What are we seeing there and what are the values plotted here.

  • @abhishekwagh8246
    @abhishekwagh8246 4 роки тому

    I have a sample of only 28. Unfortunately I don't have more sample. Will SMOTE work? Secondly, which logistic regression should be used? Sklearn or statsmodels? Both give different results. Please help.

  • @kokl123ify
    @kokl123ify 3 роки тому

    hi bhavesh could you please confirm in order to ensure the oversampling method doesnt reduce the accuracy of the model should we always use hyperparameter tuning or is there some other method also to undo the damage of oversampling method in logistic regression for attrition prediction

  • @jgubash100
    @jgubash100 3 роки тому

    Well explained

  • @AnupKumar-nz2qq
    @AnupKumar-nz2qq 4 роки тому

    After generating the synthetic data in which kind of situation this data can be useful any limitation of this type of data.

  • @syedshaulhameed
    @syedshaulhameed 3 роки тому

    How do I split my data into training and testing if my data is imbalanced?

  • @harishbagul1813
    @harishbagul1813 3 місяці тому

    Can you tell i should do scaling before or after the smote?

  • @clintpaul6653
    @clintpaul6653 2 роки тому

    Can i apply sampling for test set too.. Becuase its also very unbalanced??? Plzzz reply

  • @advaitshirvaikar4751
    @advaitshirvaikar4751 4 роки тому

    Hey, when I try using make_pipeline(SMOTE(), SVC())
    it gives me an error :
    All intermediate steps should be transformers and implement fit and transform or be the string 'passthrough' 'SMOTE(k_neighbors=5, kind='deprecated', m_neighbors='deprecated', n_jobs=1,
    out_step='deprecated', random_state=None, ratio=None,
    sampling_strategy='auto', svm_estimator='deprecated')' (type ) doesn't
    what's going wrong here

    • @bhattbhavesh91
      @bhattbhavesh91  4 роки тому +1

      The SMOTE function has changed after I created this video! Please refer to the documentation!

  • @makhboulame9654
    @makhboulame9654 3 роки тому

    Can SMOTE be used for Multi label classification dataset ?
    Thank you

  • @TejaDuggirala
    @TejaDuggirala 5 років тому

    Good work bro.. thank you

  • @hosseinroosta5154
    @hosseinroosta5154 Рік тому

    Realy thanks♥️

  • @alanblitzer744
    @alanblitzer744 4 роки тому

    You are great bro

  • @VINODKUMARIYA
    @VINODKUMARIYA Рік тому

    Thank you sir !

  • @debatradas1597
    @debatradas1597 2 роки тому

    Thank you so much Sir

  • @sourishmukherjee2404
    @sourishmukherjee2404 3 роки тому

    The final ratio for the final model after Grid search CV was for SMOTE=0.0005/Does thatg imply that the ratio(Minority class/Majority class)=0.005 .?Then how is the minority class gettting oversampled to equal proportion as the majority class??

  • @anshumanagrahri7816
    @anshumanagrahri7816 4 роки тому

    Hiii, can you please tell how to use SMOTE on time series and sequential data

    • @bhattbhavesh91
      @bhattbhavesh91  4 роки тому +1

      you are a google search away for an answer!

  • @dipankarrahuldey6249
    @dipankarrahuldey6249 3 роки тому

    With SMOTE, can we achieve higher f1 in practice? I saw that f1 was around 0.72

  • @harshavardhansvlkkb2290
    @harshavardhansvlkkb2290 3 роки тому

    Can we use smote to target column in data set

  • @akhilthekkedath1850
    @akhilthekkedath1850 5 років тому +1

    Sir, could you please make a video on outlier detection?

    • @bhattbhavesh91
      @bhattbhavesh91  5 років тому

      I have already created a video on outlier detection.
      Link - ua-cam.com/video/2Qrost474lQ/v-deo.html

  • @kavanalipanahi3505
    @kavanalipanahi3505 4 роки тому

    True positive is 0 in the confusion matrix(by the formula the Precision and Recall should be equal to zero) .So how did you get that great number (over 70 %)?

  • @mramesh7085
    @mramesh7085 3 роки тому

    Nice expalnation

  • @rishisolanki554
    @rishisolanki554 5 місяців тому

    Really help

  • @randomforrest9251
    @randomforrest9251 4 роки тому

    how does smote work with categorical data?

  • @akhilyeduresi8145
    @akhilyeduresi8145 3 роки тому

    gettings errors as :
    __init__() got an unexpected keyword argument 'ratio'
    AttributeError: 'SMOTE' object has no attribute 'fit_sample'

  • @deeptigupta518
    @deeptigupta518 4 роки тому

    Smote can only be used in Logistic Regression or any classification model

  • @dhananjaykansal8097
    @dhananjaykansal8097 5 років тому

    shouldn’t it be generate_auc_roc_curve(pipe, X_test). If no if Bhaveshbhai you or anyone can explain pls.

  • @saptarshibhattacharya1253
    @saptarshibhattacharya1253 Рік тому

    can u elaborate with a random forest algorithm in google colab?

  • @ashishraj5882
    @ashishraj5882 4 роки тому

    again ROC auc curve is used ??

  • @OriginalBernieBro
    @OriginalBernieBro 4 роки тому

    The smote ratio parameter is deprecated, my off balanced dataset sklearn classification_report is off balanced in the support column even after smoting.

    • @bhattbhavesh91
      @bhattbhavesh91  4 роки тому

      The SMOTE function has changed after I created this video! Please refer to the official documentation!

  • @DanielWeikert
    @DanielWeikert 5 років тому +1

    if we use smote in the pipeline, is it only upsampling on training or also on testing when we call predict? Thanks

  • @bintehawa7712
    @bintehawa7712 Рік тому

    Please start a playlist for beginners to learn AI ,ML please

  • @Eny11111
    @Eny11111 3 роки тому

    Thanks 👍

  • @hamzaraouia8975
    @hamzaraouia8975 4 роки тому

    I have got this error when trying to run the smote:
    __init__() got an unexpected keyword argument 'ratio'
    any clues ?
    Thank you

    • @GurunathHari
      @GurunathHari 4 роки тому +1

      You must have figured it out by now. Am only a student. It has been deprecated as the video is 1 year old.
      try using this sm = SMOTE(random_state=42, sampling_strategy = 'minority')

    • @bhattbhavesh91
      @bhattbhavesh91  4 роки тому

      Thanks Gurunath for sharing this!

  • @AnkitGupta-ec4pi
    @AnkitGupta-ec4pi 4 роки тому

    very well explained sir thank you

  • @deepikadusane9051
    @deepikadusane9051 4 роки тому

    Hii bhavesh , i used ur this code of smote bt i m getting an error of ratio ie invalid parameter ratio for estimator Smote , how to resolve this

    • @bhattbhavesh91
      @bhattbhavesh91  4 роки тому

      I guess the function has changed! Do have a look at the documentation to learn more about it!

  • @soumyadeeparinda1692
    @soumyadeeparinda1692 4 роки тому

    Can you please share the notebook with us using google colab?

  • @wenhongzhu8637
    @wenhongzhu8637 4 роки тому

    Hi~can you share the data set

  • @bhagwatchate7511
    @bhagwatchate7511 4 роки тому

    Nice

  • @burhanrashidhussein6037
    @burhanrashidhussein6037 5 років тому

    Does smote guarantee to improve classifier performance ?

    • @bhattbhavesh91
      @bhattbhavesh91  5 років тому

      Nope! It doesn't, it only upsamples your data by generating artificial samples! How good the model performs depends on how well your classes are apart!

  • @travelsome
    @travelsome 5 років тому

    Perfection

  • @sanyajain2127
    @sanyajain2127 4 роки тому

    Getting an error: ValueError: Unknown label type: 'continuous-multioutput'

  • @guico3lho
    @guico3lho Рік тому

    At the end of the video, how all the 4 metrics scored above 70% if the model did not predicted correct none of samples classified as 1? There was 0 True Positives and 63 False Negatives!

  • @atwinemugume
    @atwinemugume 5 років тому

    Thanks

  • @dastola8330
    @dastola8330 5 років тому

    what is the use of defining random_state ?

  • @niyazahmad9133
    @niyazahmad9133 4 роки тому

    Smote__ratio is not a parameter of smote help me out plz......

    • @bhattbhavesh91
      @bhattbhavesh91  4 роки тому +1

      The SMOTE function has changed after I created this video! Please refer to the official documentation!

  • @The_Option_Seller_Room
    @The_Option_Seller_Room 4 роки тому

    How to handled extremely imbalanced data for regression problem .

  • @dhananjaykansal8097
    @dhananjaykansal8097 5 років тому

    Lovelyyyyyyy