Hello. I have a question. Should we scale the features after the imputation or before because here you imputed the raw_df dataframe which is not imputed? Thanks
I have a doubt. When we do imputation, we take mean to replace the missing values. We take the mean from each columns of the entire data. The mean of data in each columns of the entire data should be different from means taken from train_df, val_df and test_df separately. It will create some discrepancy in the final result. What's your position on this ? Whether we should conduct imputation based on the entire dataframe or from its subsets
A sample of the data should represent the entire dataset. Also, the validation, and training set should be independent of the training set. So imputation can be done differently in validation set and training set.
I was working on a mini data science project in which test.csv and train.csv datasets given to me. I trained my model using training data. Now if i want to find accuracy score of my model on testing data what i will do? If i write model.predict(test_data) then how i will compare the predicted tesing values to the true values? Because there is no target values in the testing dataset
(1:53:40) when you plot the weights the negative weight would not be considered. And the negative weights also affect the model just in opposite direction. What are your thoughts should the negative weights be considered??
Yes, the negative weights should be considered. In fact, you can try and ignore the columns which has very less weights i.e. whose weights are closer to 0. Both negative and positive weights effect the model in some way.
hey, also isn't it a common practice to scale the test data that is transform the test data or validation data by fitting it only on training datasets?
Information Leakage timestamp: 1:25:10 , He fitted the scaler on the whole numerical dataset and transform it to train, validation and test sets. But isn't it the Information leakage because the scaler knew the test or validation while fitting?
Well, if you have access to the validation dataset, you can do scaling on the training and validation both. Generally, you won't be able to touch the test dataset so we shouldn't fit scaler/encoder on the test dataset.
Hi I noticed that in 1:53:44 you are making a prediction using the train inputs (X_train).... but shouldn't' t you be making a prediction using the validation inputs instead? I don't think you have passed the X_val into any of the logistic regression model prediction.... or am I just confuse ? HAAHHA.
Please check ua-cam.com/video/sjIzfC4AOI0/v-deo.html, at first we're predicting with the train set, later we are also predicting with the validation and test sets.
That was intense!!!
This is probably the first time I have watched a tutorial this long without any break
You are Awesome sir
Thanks a lot Aakash for the fabulous explanations and infectious passion to empower others. These tutorials are simply unmatched! Bravo!
Thanks for the feedback, help us spread the word :)
@@jovianhq sir what can we do if there is a column of string type values like disease name and symptoms
This video is still one of the best. A literal game changer!
Nicely explained Akash and Jovian Team..this was probably the most thorough and clearly explained tutorial I came across
great explanation with reasonable depth for this topic, such a great video...
Thanks for the feedback, help us spread the word :)
Really, a lecture full of knowledge
Thanks for the feedback, help us spread the word :)
Nice Video....Really appreciated. Can we also include the topic of setting up data pre processing pipelines in future sessions.
Thanks for the feedback and suggestion!
Thank you, this was very beginner friendly and it helped me understand a lot of practical topics.
You're very welcome! Glad it was helpful.
Great content Aakash sir , that too free...really amazed and impressed by jovian !
Glad you liked it!
Hello. I have a question. Should we scale the features after the imputation or before because here you imputed the raw_df dataframe which is not imputed? Thanks
I have a doubt. When we do imputation, we take mean to replace the missing values. We take the mean from each columns of the entire data.
The mean of data in each columns of the entire data should be different from means taken from train_df, val_df and test_df separately. It will create some discrepancy in the final result. What's your position on this ? Whether we should conduct imputation based on the entire dataframe or from its subsets
A sample of the data should represent the entire dataset. Also, the validation, and training set should be independent of the training set. So imputation can be done differently in validation set and training set.
Thank you for such a detailed lecture. Very very helpful. Would love to know about more.
Glad it was helpful! Go to zerotogbms.com for more lectures on Machine Learning
Salute Boss. This is wholesome 💝💝
Great video! I learned a lot! Thank you!
So higher the weight more important column is (but only if numerical columns are scaled)? If data is not scaled we cannot derive this conclusion?
True! Also, not just higher, the more negative the weight the more important it has i.e. The weight that are closest to 0 have minimum importance
I was working on a mini data science project in which test.csv and train.csv datasets given to me. I trained my model using training data. Now if i want to find accuracy score of my model on testing data what i will do? If i write model.predict(test_data) then how i will compare the predicted tesing values to the true values? Because there is no target values in the testing dataset
Very good tutorial.elaborate and detailed .thanks
Thanks for the feedback, help us spread the word :)
(1:53:40) when you plot the weights the negative weight would not be considered.
And the negative weights also affect the model just in opposite direction.
What are your thoughts should the negative weights be considered??
Yes, the negative weights should be considered. In fact, you can try and ignore the columns which has very less weights i.e. whose weights are closer to 0. Both negative and positive weights effect the model in some way.
hey, also isn't it a common practice to scale the test data that is transform the test data or validation data by fitting it only on training datasets?
would you mind switching to dark mode?
TIA
1:45:00 whilst you fitted the transformed cols in to your model, I am still getting a type error
finished watching
thnks sir...but how to deploy on the website?
At 1:35:35 ,encoder transform, i am getting an error that columns must be the same as length key.please tell me how to reolve it
Facing the same problem too
@@andme-tech102 try to use the "sparse_output =False" in OneHotEncoder function
3 hrs worth watching
Thanks for watching!
Information Leakage
timestamp: 1:25:10 , He fitted the scaler on the whole numerical dataset and transform it to train, validation and test sets. But isn't it the Information leakage because the scaler knew the test
or validation while fitting?
Well, if you have access to the validation dataset, you can do scaling on the training and validation both. Generally, you won't be able to touch the test dataset so we shouldn't fit scaler/encoder on the test dataset.
Thanks a Lot Bro its nice dataset and you covered very nice from start to end
excellent brother!
Thank you very much.🙏
Nice lecture
Great content
1:26:54 can't understand why is max value in some columns not 1, it should be 1....
Thank you🙂
Welcome!
FINISHED CODING FULL
Hi I noticed that in 1:53:44 you are making a prediction using the train inputs (X_train).... but shouldn't' t you be making a prediction using the validation inputs instead? I don't think you have passed the X_val into any of the logistic regression model prediction.... or am I just confuse ? HAAHHA.
Please check ua-cam.com/video/sjIzfC4AOI0/v-deo.html, at first we're predicting with the train set, later we are also predicting with the validation and test sets.
@@jovianhq I am sorry ahahah. you are right. I must have missed this part.
thanks u so good! thanks again
You're welcome!
Sir do you continue this videos
Yeah, this is a course on ml. The new videos structure is provided on his website. jovian.ai/learn/machine-learning-with-python-zero-to-gbms
amazing
THANKYOU!
bookmark 1:03:15 .. for me imp part start here
0:58:00
1:39:11
What's a solver
Hey, please go through the blog to know more about solvers. -> towardsdatascience.com/dont-sweat-the-solver-stuff-aea7cddc3451
Please add subtitles.
Hey we are in the process of adding subtitles to videos, it will be added soon. Thanks!
@@jovianhq thanks! you are doing great!
1:00:55
waoo
1 ;56;49 nicee
Here is another simplified Logistic Regression tutorial if you are a beginner: ua-cam.com/video/tcjR8JYSb9E/v-deo.html
1:18:01
1:06:09
1:08:30