What is the use of Random State parameter bro? You are telling that, if you give same value, it will be splitted in the same way as yours. But my question is how does it split and what does it do?
can you tell me some mandatory and important steps to follow from getting a dataset to making a final model, which can be used on each and every project?
1. Data collection 2. Data cleaning & processing 3. Exploratory data analysis 4. Model selection 5. Model Training 6. Tuning the parameters & optimization of the model 7. Model evaluation 8. Deployment
In data pre processing, we have to make the data suitable to feed to a machine learning model. For example, standardizing the data, splitting the data, converting text data to numerical data, etc. (You can refer the videos in the 4th Module). Data Analysis is about understanding the data. To see which features are related to each other, to find correlation, etc. That's the difference.
Hey i watched your data standardization video and i guess in that video we splitted the data before we standardized it but here firstly we standardized that. Is that a problem or not exactly ? Thank you.
I watched it too. he said that "we can also standardize the data before splitting, but if our data has some outliers then that would be a problem , because outliers are abnormal, its better to split the data first then standardize." maybe there is no outlier in this diabetic data?
@@farahamirah2091 @Sezer Mezgil He said that in Placement_Dataset and that is regression based hence outliers are possible there. In this video he is talking about diabetes dataset which is a classification one in which outliers hold no meaning as there are only two possible values 1 or 0 while in the previous case many different salaries are possible.
Hi , Data Analysis, Train test split, data pre processing is better or Data Preprocessing, Analysis, Train test split which order is best?. My doubt is if we do preprocessing and analysis it may lead to bias as we impute/drop some data. and if we do splitting at the end it may cause feature leakage as imputed values or transformations are based on whole data could you please clarify?
I think we do Data Preprocessing, Analysis, Train test split You can read the comments above to understand the concepts (And we all are waiting for Mr Siddhardhan's answer to your question)
hi! it depends on the dataset and the approach we take. I often prefer doing analysis, data Pre-Processing & train test split. data leakage won't happen all the time. it happens only when we have outliers. if we find that the model is overfitting, then it maybe a sign that data leakage occurs. in that case we can do analysis, train test split & data Pre-Processing. this is the method I follow.
One of the best, underrated and most helpful channel I have ever found on UA-cam
thank you so much 😇
I was search the best content to learn Machine Learning and Data Science. This is the best resource I have found online.. Thank you brother.
Please Don't Stop your self. you are best
Thanks. I love your elaborations, i get them very easily
Thank you so much for this.
Great explanation i will keep going this course !
What is the use of Random State parameter bro? You are telling that, if you give same value, it will be splitted in the same way as yours. But my question is how does it split and what does it do?
great video well explained thanks
Amazing 😍
Should we apply data standardization before train test split or after train test split ??
After split, ik I am late in case you haven't figured it out yet.
absolutely before splitting, after splitting means you need to standardize the train and test data seperately.
Thanks for your effort
My pleasure
Thank you😊
Could you please tell me about inplace=true and inplace=false...when do we use this???
when you give inplace=True, the change will be saved in your original dataframe. if you don't mention it, the change won't be saved
@@Siddhardhan thanks..got it!
Thanks for Video. However the standardization is done wrongly before splitting data.
can you tell me some mandatory and important steps to follow from getting a dataset to making a final model, which can be used on each and every project?
1. Data collection
2. Data cleaning & processing
3. Exploratory data analysis
4. Model selection
5. Model Training
6. Tuning the parameters & optimization of the model
7. Model evaluation
8. Deployment
1. what is difference in data pre processing and data analysis.?
2. How to deploy model?
In data pre processing, we have to make the data suitable to feed to a machine learning model. For example, standardizing the data, splitting the data, converting text data to numerical data, etc. (You can refer the videos in the 4th Module). Data Analysis is about understanding the data. To see which features are related to each other, to find correlation, etc. That's the difference.
For deployment, you can use tools like Flask. Deployment won't be covered in this course. We can do it later.
@@Siddhardhan Thank you.. I got it the difference between data pre processing and data analysis..
You're welcome 😇
Does exploratory data analysis will be covered here ?
Hey i watched your data standardization video and i guess in that video we splitted the data before we standardized it but here firstly we standardized that. Is that a problem or not exactly ? Thank you.
I watched it too.
he said that
"we can also standardize the data before splitting, but if our data has some outliers then that would be a problem
, because outliers are abnormal, its better to split the data first then standardize." maybe there is no outlier in this diabetic data?
@@farahamirah2091 @Sezer Mezgil
He said that in Placement_Dataset and that is regression based hence outliers are possible there. In this video he is talking about diabetes dataset which is a classification one in which outliers hold no meaning as there are only two possible values 1 or 0 while in the previous case many different salaries are possible.
sick👍
Okay!
Thanky you
Content is good stuff. But would be more pleasant to listen if talking would be bit slower (0.75x) in the future videos. Other than that big thanks!
Thanks for the tip😇 will work on that
hi, i think the pace is good, maybe you should go to the setting and click playback speed 0.75
Hi ,
Data Analysis, Train test split, data pre processing is better or
Data Preprocessing, Analysis, Train test split
which order is best?.
My doubt is if we do preprocessing and analysis it may lead to bias as we impute/drop some data.
and if we do splitting at the end it may cause feature leakage as imputed values or transformations are based on whole data
could you please clarify?
I think we do Data Preprocessing, Analysis, Train test split
You can read the comments above to understand the concepts
(And we all are waiting for Mr Siddhardhan's answer to your question)
hi! it depends on the dataset and the approach we take. I often prefer doing analysis, data Pre-Processing & train test split. data leakage won't happen all the time. it happens only when we have outliers. if we find that the model is overfitting, then it maybe a sign that data leakage occurs. in that case we can do analysis, train test split & data Pre-Processing. this is the method I follow.
How can you visualize your splitted data in folders by using split function please?
9:14
Can you tell us how 'randome_state' worked?
Its is a pseudo random number that allows you to reproduce the same train_test_split each time you run it.