Good video! What does not convince me much about SMOTE, is that it basically creates new individuals in between minority class samples. When you work with real dataset unfortunately you usually never have nice and clean separated classes. Instead you have few minority samples exactly in the middle of a BIG and dense conglomerate of majority samples. I think applying SMOTE in such scenario might create artificial samples of the minority class with same characteristics of samples of majority class. How would you proceed in such scenario?
In SMOTE, what's the point of setting K when you are randomly choosing one of them? doesn't it implicitly mean that K = 1? I see sometimes (when k=2) we can pick 2nd best NN instead of 1st with this approach.
Try using embedded structured data with different classification thresholds penalizing back prop error, custom loss functions altering the gradient, unbalanced learning rates for different classes, and pretraining on a resampled data set . Batch normalization and dropout for regularization. I uses TensorFlow since it is easily modified. Resampling, no matter what kind, just does not generalize well.
That is absolutely correct! I was only dealing with the edge cases to build out a stronger intuition for the formula. If you think about the output of a logistic regression, for example, the prediction will take the form p = 1 / (1 + e^(-w * x)) where w is your weights and x is your input vector. Taking the simplest case, where x is one dimensional, if you plot out this formula (viz. 1 / (1 _+ e^(-w*x)) for different possible values of x you'll see that this formula's output spans the range of [0,1]. This means that the prediction can be any value between 0 and 1. Most packages will round this value to the nearest integer, which might be why you're thinking that predictions can't actually be a value between [0,1].
Simple and clean explanation of a complicated concept. Thank you.
This is an excellent explanation and is remarkably well-structured
Good video! What does not convince me much about SMOTE, is that it basically creates new individuals in between minority class samples. When you work with real dataset unfortunately you usually never have nice and clean separated classes. Instead you have few minority samples exactly in the middle of a BIG and dense conglomerate of majority samples. I think applying SMOTE in such scenario might create artificial samples of the minority class with same characteristics of samples of majority class. How would you proceed in such scenario?
In that case, I would probably try using a custom loss function.
Nice illustration, thanks!
The best video of them all, most of the others are just rushed.
In SMOTE, what's the point of setting K when you are randomly choosing one of them? doesn't it implicitly mean that K = 1? I see sometimes (when k=2) we can pick 2nd best NN instead of 1st with this approach.
Great explanation
Great video!
How do you decide a value for A in the custom loss function?
But, how to choose between undersample, oversample, smote?
the online store company?
For the log loss function, isn't p a value between [0,1] instead of being a prediction?
Try using embedded structured data with different classification thresholds penalizing back prop error, custom loss functions altering the gradient, unbalanced learning rates for different classes, and pretraining on a resampled data set . Batch normalization and dropout for regularization. I uses TensorFlow since it is easily modified. Resampling, no matter what kind, just does not generalize well.
That is absolutely correct! I was only dealing with the edge cases to build out a stronger intuition for the formula.
If you think about the output of a logistic regression, for example, the prediction will take the form p = 1 / (1 + e^(-w * x)) where w is your weights and x is your input vector. Taking the simplest case, where x is one dimensional, if you plot out this formula (viz. 1 / (1 _+ e^(-w*x)) for different possible values of x you'll see that this formula's output spans the range of [0,1]. This means that the prediction can be any value between 0 and 1. Most packages will round this value to the nearest integer, which might be why you're thinking that predictions can't actually be a value between [0,1].
🐇