Great Tutorial. How to do feature selection on imbalanced data? do we need to do with original data (imbalanced) or with oversampled data (SMOTE) ?? Which is better??
@@deepaksurya776 hey Deepak, glad you like the tutorial. There is no right or wrong answer here; it depends. The approach I follow is to run it on both and see if it's different and try to understand why
I am amazed with your explanations. There are several good, but yours are the best tutorials so far. I also like how your tutorial is very detailed in terms of steps and how you use the Sklearn manual to find the code for different methods. Thank you
Thank you so much. This is great. I like your video because it is very practical and comprehensive so I can understand the whole process. Thank you again.
Liked and subscribe d because you put a good effort on this video to teach us how to do this on python step by step instead of jumping to the model fitting part merely
Great work, thank you! One question : You used feature importance on all of the data before train/test/valid split. Shouldn't we do it only on test data ? Because we don't know which features are going to be important in the future in a data which we have never seen before.
Great video! Very helpful. I was wondering about something, though. I've heard that it's not a good idea to remove dummy variables, unless you just drop 1, but in your case you didn't use some of them because they weren't important. Doesn't that cause a problem because it's basically as if you're removing some of the levels of a certain categorical variable?
Each case is different so I cannot say what is right and what is wrong. Why is not a good idea to remove dummy variables? One solution would be to test both with and without the dummy variables and test the performance
Hi! Great video! would you please provide me with the links for feature importance details in decision trees and random forest models? Thanks in Advance :)
If your dependent y variable had instead 3 values (say 0, 1, 2), rather than binary, what would be the interpretation of feature importance in that case?
What do you consider an unbalanced data? How you deal with it? Actually I have an unbalanced dataset and I'm wondering how to deal with it by using logistic model. Thanks!
It depends from a lot of factors to be fair. Anything outside of 60-40 it's worth looking at if you should balance it but every case is different. To balance datasets you can look at up-sampling or down-sampling and then feeding it into your model and checking performance
Maybe would be better split this video since the beginning of: 'logistic regression'. It's a long time 45' for a video. Also, the definition of the screen is not very good. Particulary, I liked much the part 1. Try to don't speak quickly in the next videos please! Thank you for create content and share with us ;)
Hey everyone! Did you like this tutorial? Please let me know your thoughts below!
Great Tutorial. How to do feature selection on imbalanced data? do we need to do with original data (imbalanced) or with oversampled data (SMOTE) ?? Which is better??
@@deepaksurya776 hey Deepak, glad you like the tutorial. There is no right or wrong answer here; it depends. The approach I follow is to run it on both and see if it's different and try to understand why
Thanks
can all these steps be used also when one wants to become a machine learning engineer?
@@joe8ones yes ofc!
you showed me some steps that i have never thought of doing. thankyou for your input. hope to see more.
Glad you liked the video! will keep them coming!
Loved it. I learned from it because it is a full project from start to finish. Thank you for sharing your knowledge.
Glad it was helpful!
This is a huge damn useful tutorial !! Thanks for it :D
It is very well explained in my opinion
I am amazed with your explanations. There are several good, but yours are the best tutorials so far. I also like how your tutorial is very detailed in terms of steps and how you use the Sklearn manual to find the code for different methods. Thank you
Thank you very much!! More to come in the coming weeks!
This guy is awesome, to think have not come across your videos since, am really feeling your contents right now, well done and good job
Glad you like them!
Thank you so much. This is great. I like your video because it is very practical and comprehensive so I can understand the whole process. Thank you again.
Feature Selection at 11:30
This is good stuff. Congratz on the video and keep up with the great work!
Thanks Victor! Glad you liked it!
Incredible tutorial!. Please keep it up the good work.
Liked and subscribe d because you put a good effort on this video to teach us how to do this on python step by step instead of jumping to the model fitting part merely
Great video on the steps of building a logistic regression! I just wish you could zoom in some on the coding screen.
Glad you liked it! Noted!
Very informative second part tutorial! Keep it up!
Great tutorials!
so sick of little 5 minute over simplified tutorials--this is the real deal that doesn't skimp.Thanks for sharing!
Glad it was helpful!
Hi! Great video! It would be nice if you could make your notebooks a bit larger though, the text is pretty small.
Hey Tamas, yes you are right! I have started doing it in the new videos!
Great work, thank you! One question : You used feature importance on all of the data before train/test/valid split. Shouldn't we do it only on test data ? Because we don't know which features are going to be important in the future in a data which we have never seen before.
you make my day ... thanks
Glad you liked the video!
Great video! Very helpful. I was wondering about something, though. I've heard that it's not a good idea to remove dummy variables, unless you just drop 1, but in your case you didn't use some of them because they weren't important. Doesn't that cause a problem because it's basically as if you're removing some of the levels of a certain categorical variable?
Each case is different so I cannot say what is right and what is wrong. Why is not a good idea to remove dummy variables? One solution would be to test both with and without the dummy variables and test the performance
its really inspiring to see you coding :)
Glad you enjoy it!
What do we do in case we have inbalanced data set. Let's say it is an ecommerce website that has conversion rate of 1-3%?
Great tutorial
I will like to know how to deal with the outliers found using the boxplot
Thanks
Glad you liked it!!
Sir , I'm encountering a problem while getting the dummies it is coming as true false values , how to convert that into numeric 0 ,1
Hi! Great video!
would you please provide me with the links for feature importance details in decision trees and random forest models?
Thanks in Advance :)
gud one..thanks
Most welcome 😊
If your dependent y variable had instead 3 values (say 0, 1, 2), rather than binary, what would be the interpretation of feature importance in that case?
What do you consider an unbalanced data? How you deal with it? Actually I have an unbalanced dataset and I'm wondering how to deal with it by using logistic model. Thanks!
It depends from a lot of factors to be fair. Anything outside of 60-40 it's worth looking at if you should balance it but every case is different. To balance datasets you can look at up-sampling or down-sampling and then feeding it into your model and checking performance
@@YiannisPiThank you very much for your tutors. please do the imbalanced dataset.
Dummy variable trap ? shouldn't you drop_first = True in the get_dummies function ?
Dummy variable trap +1
How to plot the logistic regression model?
Best video
Glad you liked it!
Maybe would be better split this video since the beginning of: 'logistic regression'. It's a long time 45' for a video. Also, the definition of the screen is not very good. Particulary, I liked much the part 1. Try to don't speak quickly in the next videos please! Thank you for create content and share with us ;)