@@jayadithyanalajala9604 you need to look at first Machine learning and deep learning tutorials and do lot of assignments or case studies then you comfortable with kaggle other you first jump directly into kaggle it may difficult to understand :)
big big big thanks for these videos... please continue this kaggle competition and do it in the same dataset ..show us how we can reverse engineer and iterate our solution
Nice! As you have already promised please let us know reg opportunities in Europe (esp. in Germany) for data science. Please guide us through the entire step by step process to crack interviews along with resume preparation. Let us know how to approach someone in LinkedIn. Thanks in advance! Looking forward to this pls 😊
Hi Krish, will LightGBM model perform better in this scenario with Stratified K-fold? Or a Catboost model? Inplace of xgboost. Asking this because i am currently implementing lgbm for a kaggle competition and it is comparatively better than xgboost.
Nice and informative one. I have one query .what if i have some categorical variables with some having levels of 50+. (i cant drop since it is imp variable) . I tried with dummy variable creation, but accuracy is less .Any assistance how can i deal with this
Hi krish u make very nice videos, 1 question Regarding deletion of features based on correlation, how to decide which feature to delete among the 2 features which are correleaed.
Sir i have a small question. I have a data set of 1000 values containing integer and floating values. Is there any way to generate more data based on it? i want machine to learn this small data set and generate more data similar to it up to 20,000 values. is it possible and how to do it? kindly reply me
ValueError: DataFrame.dtypes for data must be int, float, bool or category. When categorical type is supplied, DMatrix parameter `enable_categorical` must be set to `True`.
I have applied the same model to my individual project, in which I have around 200 data. I am getting an over-fitted model, it doesn't work good for test data. Can you please tell, what can I do.
@@durgeshkumar1023 that is fine but how it will be useful for training or he will be doing this for just gaining insight from data to decide which column to drop?
I am currently doing project on finding malicious website.. But I am getting difficulties to find out the features of dataset... Like.. URL server, special character , Remote Address... Can you help me to find this attributes.. Please sir.. 🙏
When I tried with linear regression am getting high mse with cross-validation data. I don't know what I miss. Any suggestions. Root Mean Square Error train = 0.08183219186293908 Root Mean Square Error CV = 6982770.936819642
@@krishnaik06 Hi sir when you mean combine the train and test sets, since you don't have access to the true y_test values. Do you mean you will firstly run the model on the initial train set, then get a model that you will use on the X_test to predict the y_test values. Then once you have these values then you combine your X_test and y_test to get your test group which you will randomly combine with the train group to complete a full dataset. Then once that's done you repeat the process again with a much more similar train and test set. If l understand correctly is this what you mean?
Sir because of your motivation I participated and now I'm feel capable to do some kaggle very nice apporch to guide others
IAM an 1st year cse....what are the main prerequisites to be comfortable in doing kaggle
@@jayadithyanalajala9604 you need to look at first Machine learning and deep learning tutorials and do lot of assignments or case studies then you comfortable with kaggle other you first jump directly into kaggle it may difficult to understand :)
@@teja2775 thank u sir
big big big thanks for these videos... please continue this kaggle competition and do it in the same dataset ..show us how we can reverse engineer and iterate our solution
Keep motivating the aspiring data scientist. Thank yiu sir
Thanks. These 2 videos were great to get me started
Good stuff. Appreciated your effort. Can you please make a video convering a scenario which involves CNN & Hyperparameter tuning w.r.t CNN.
Krish,
You are really doing awesome.i really appreciate it . please please keep doing it . Your are super and very good in explaining every steps .👏
I'm really thankful Krish for such informative videos, great work
Nice! As you have already promised please let us know reg opportunities in Europe (esp. in Germany) for data science. Please guide us through the entire step by step process to crack interviews along with resume preparation. Let us know how to approach someone in LinkedIn.
Thanks in advance!
Looking forward to this pls 😊
It was a nice tutorial.. thank you..
Informative video. A suggestion would be to fix the audio.
Thank you sir . Bari meharbani ji
Hi Krish, will LightGBM model perform better in this scenario with Stratified K-fold? Or a Catboost model? Inplace of xgboost. Asking this because i am currently implementing lgbm for a kaggle competition and it is comparatively better than xgboost.
How did you define the parameter grid for random search? Any anker we can refer to when deciding the range? thx
Can you Make Videos on Text Analytics, NLP, Text Mining concepts
Can box plot be used for better clarity here??
Nice and informative one. I have one query .what if i have some categorical variables with some having levels of 50+. (i cant drop since it is imp variable) . I tried with dummy variable creation, but accuracy is less .Any assistance how can i deal with this
Can you make a video about ms in business analytics ..i think it will get a lot of traction
My question is you are going to build xg boost using train and test both ,but we don't have target array for test, how we can include test ?
You shouldn't train using the test data, the cross-validation is done on the train data itsef
Hi krish u make very nice videos, 1 question Regarding deletion of features based on correlation, how to decide which feature to delete among the 2 features which are correleaed.
Sir i have a small question. I have a data set of 1000 values containing integer and floating values. Is there any way to generate more data based on it? i want machine to learn this small data set and generate more data similar to it up to 20,000 values. is it possible and how to do it? kindly reply me
Thank you so much
ValueError: DataFrame.dtypes for data must be int, float, bool or category. When
categorical type is supplied, DMatrix parameter `enable_categorical` must
be set to `True`.
I have applied the same model to my individual project, in which I have around 200 data. I am getting an over-fitted model, it doesn't work good for test data. Can you please tell, what can I do.
great video.
Sir, how you will combine train and test because we are not having predictions for test set?
He will first remove the output feature from training set and then combine both train and test set.
@@durgeshkumar1023 that is fine but how it will be useful for training or he will be doing this for just gaining insight from data to decide which column to drop?
@@rishisharma5249 @ 10:40 he said that he will be combining output of the test set along with train set and predict for that test set again.
I am currently doing project on finding malicious website.. But I am getting difficulties to find out the features of dataset... Like.. URL server, special character , Remote Address... Can you help me to find this attributes.. Please sir.. 🙏
Can I have a look onto the dataset?
When will you start online class for machine learning
Bro...update your github repo...You didn't uploaded the updated notebook in your repo it seems! Anyway nice Job....Thanks!
I have already mentioned in the video,please have perform the hyperparameter tuning by yourself by writing the code so that u can practise.. :)
@@krishnaik06 Sorry bro....As I haven't watched your video completely, I couldn't noticed about that....Now I watched....Thank you!
why my XGBoost predict negative values??
Can you tell me how many hours do the data scientist work in a week in service based companies?
Probably all of them. =|
Brilliant..!!
why create another model with the best parameters? Doesn't randomised search cv return a final model that has the best parameters?
When I tried with linear regression am getting high mse with cross-validation data. I don't know what I miss. Any suggestions.
Root Mean Square Error train = 0.08183219186293908
Root Mean Square Error CV = 6982770.936819642
Make sure your train feature and test features are same and also same for label encoding
Your voice is not clear when you share the screen
ideally we should not touch test data in training phase what i heard
In Kaggle problem we can
@@krishnaik06 Hi sir when you mean combine the train and test sets, since you don't have access to the true y_test values. Do you mean you will firstly run the model on the initial train set, then get a model that you will use on the X_test to predict the y_test values. Then once you have these values then you combine your X_test and y_test to get your test group which you will randomly combine with the train group to complete a full dataset. Then once that's done you repeat the process again with a much more similar train and test set. If l understand correctly is this what you mean?
audio quality is low
Mic sound very bad