Xgboost Classification Indepth Maths Intuition- Machine Learning Algorithms🔥🔥🔥🔥
Вставка
- Опубліковано 5 лют 2025
- XGBoost is a decision-tree-based ensemble Machine Learning algorithm that uses a gradient boosting framework. In prediction problems involving unstructured data (images, text, etc.) artificial neural networks tend to outperform all other algorithms or frameworks.
All Playlist In My channel
Complete ML Playlist : • Complete Machine Learn...
Complete NLP Playlist: • Natural Language Proce...
Docker End To End Implementation: • Docker End to End Impl...
Live stream Playlist: • Pytorch
Machine Learning Pipelines: • Docker End to End Impl...
Pytorch Playlist: • Pytorch
Feature Engineering : • Feature Engineering
Live Projects : • Live Projects
Kaggle competition : • Kaggle Competitions
Mongodb with Python : • MongoDb with Python
MySQL With Python : • MYSQL Database With Py...
Deployment Architectures: • Deployment Architectur...
Amazon sagemaker : • Amazon SageMaker
Please donate if you want to support the channel through GPay UPID,
Gpay: krishnaik06@okicici
Discord Server Link: / discord
Telegram link: t.me/joinchat/...
Please join as a member in my channel to get additional benefits like materials in Data Science, live streaming for Members and many more
/ @krishnaik06
Please do subscribe my other channel too
/ @krishnaikhindi
Connect with me here:
Twitter: / krishnaik06
Facebook: / krishnaik06
instagram: / krishnaik06
#xgboostclassifier
#xgboost
We are near 250k. Please do subscribe my channel and share with all your friends. :)
Krish Naik please make video on decisions tree pruning with mathematical details
Lgbm is Missing
@@tamildramaclips8548 Depends on your college. Which college with these branches are you talking about?
@@tamildramaclips8548 You should definitely go with ECE. Since AI DS is a very new branch there is no surety how your college would groom the students with this branch. Also your college is not a national level college. So you shouldn't take any risk. That's all my suggestion.
sir could you make any video for a roadmap of machine learning engineer??
Man, this guy is now coming in my dreams. Who else have been binge watching his channel for months?
😂😂
I am learnng from him for data science
Same here 😂😂😂 But this man should be given nobel prize for inspiring the present and future generations!
I have started following his machine learning series..And it's very nice..
I am also doing data science course simultaneously . His videos are helping a lot .
HAHAHAHA ! You are being haunted by Ghost Naik
Great video. Understood in depth
I have jotted down the processing steps from this video:
1. We have a Data
2. Constructing base leaner
3. Base learner takes probability 0.5 & computing residual
4. Constructing Decision as per below
Computing Similarity Weights: ∑(Residual)^2 / ∑P(1-P) + lambda
- Computing Similarity Weight of Root Node
- Computing Similarity Weight of left side decision node & its leaf node
- Computing Similarity Weight of right side decision node & its leaf node
Computing Gain = Leaf1 Similarity W + Leaf2 Similarity W - Root Node Similarity W
- Computing Gain of Root Node & left side of decision node and its leaf node
- Computing Gain of Root Node & right side of decision node and its leaf node
- Computing Gain of other combination of features of decision node and its leaf node
- Selecting the Root Node, Decision node and leaf node have high information gain
5. Predicting the probability = Sigmoid(log(odd) of Prediction of Base Learner + learning rate(Prediction of Decision Tree))
6. Predicting residual = Previous residual - Predicted Probability
7. Running the iteration from point 2 to 6 and at the end of the iteration, The residual will be the minimal.
8. Test Prediction on the model of iteration have minimal residual
what if there are no. of classification in output (0,1,2,3) the average will be 1.5 but this is more than 1 i.e this cant be probality which 0.5 to base learner that time what we should do..?
]
Thank you🙏
@@manojsamal7248 yes bro..same question ...did you get the answer of this?..please let me know..
@@manojrangera not yet bro
@@manojsamal7248 I was thinking if there are 4 classes then probability will be 1/4 = .25 and if there are 5 then 1/5 =.20 because we are calculating probability ..I will confirm this but I think this is right..
Great work Krish. Don't ever lose your passion for teaching, you're a natural. I appreciate how you simplify the details.
Hats off to you Krish for doing so much hardwork so that we can learn each and every concept of ML, DataScience!
I was desparately waiting for this since last 7 months...now I will complete mashine learning playlist💥
Than you Krish..god bless you😀
Guys, please watch for the mistake. There is a mistake made at 16:10 i.e. For credit >50 (G,B) = {-0.5,0.5} its not three, there is only two. The information gain for the right side is 0.67. However, you chose the right node.
Btw, your teaching very simple and understandable. Keep doing more videos. Love your content.
Thanks a lot, for eveyrthing you do. You did turn off the fan so that it doesn't interrupt the audio, you were sweating and breathing heavily with all this trouble and hardship you deserve more. I wish you success in life and a healthy and a prosperous life.
Very very important to crack in product based companies.Great explantion too.Thanks
Hi krish, i have been watching ur videos for the last few months and it has helped me a lot in my interviews. A special thanks from my end. In this video, at 10:54 min 0.33 - 0.14 should be 0.19.
yes indeed bdw were u a fresher when u went for an interview?
How do u stay so focused , strong and learn everything in a very efficient way?
Nation wants to know🙃
Willpower
Great Explanation sir... keep contributing to the community. We love your videos and most importantly you are serving your experience is the best thing.
So much to learn from a single video, hats off to you sir
Just what I was waiting for 🔥
This was amazing, I literally feel like I'm sitting in your class at a Uni.
This is pure gold! Thanks for the tutorial!
i am most happiest person to see this videos thank you
"Day 1 or 1 Day your Choice" Thanks a lot Krish!
what does this mean?
Thanks, Krish for building the nation Towards AI Journey.
chutiya nokri bhi to de
Great.... Clear explanation !! Thanks a lot 😄
Great
Sir the way you teaching us is more better than any varsity classes. pls do a practical implementation on XGBoost. sir pls it will be very helpful for us...
Please do a indepth maths intuition video on catboost
agree
I don't know why people don't talk about Catboost and LightGBM much..
Congratulations on your new job in E&Y. Checked you on LinkedIn. Very impressive profile.
Amazing !!!
Really Data science Bisham Pitama🙏 Respect you a lot👍
Hey Krish, you should also have a video about Similarity Based Modelling (SBM) and Multivariate State Estimation Technique (MSET). They are actually widely used in the industries since 90s. There are many research papers to validate that. They also calculate similarity weight and residuals.
hi, have one doubt, for p(1-p) + lambda in denominator to calculate similarity weight, if the residual is -0.5 it should be 0.5(1-(-0.5))= .75? or the negative sign does not matter?
Great sir🔥🔥
its tough to understand in first attempt ,but thanks for giving the outline so clearly, I will watch it untill I understand I implement it from scratch .
Lovely explanation !
16:33 In my opinion there is a mistake in calculations.
It should be computed for (>50K) but G & B are also included from
I also noticed that, i guess maybe that is a mistake
Thank You, Krish. Well explained!
Super explanation
Loved It. Thank You!
Sir you are too pleasant and amazing in teaching
the most awaited video
Quite amazing and clear explanation
1st view 1st like krish sir op
Finally !!!!
You are legend sir.
It started good but I got lost as the video ended. Can you please prepare something simpler and show that? as u did for adaboost and gradboost?
Is there any detailed videos about Adaboost regressor and gradient boosting classifier? Please help me
thank you alot sir, you are my best teacher
great
Good! Could you make a video explain the difference between XGB and Gradients Boosting? Thanks
thank you so much
Statquest Light !!!!
Fantastic effort though.
Seriously thank u so much
Hi, thank you very much for this explanation! Great video! But I have one question. In 19:39 you first wrote 0 which is the probability of first row then you added learning rate*similarity weight. My question is instead of 0 shouldn't we write 0.5 which is the average probability of first (base model). 0.5+learning rate*similarity. Please correct me if I am wrong.
base model comes after we put the first probability (0.5) through log(odds) at bottom right corner. Hence it is 0
Sir can you refer some NLP projects using python. I mean with live implementation
This video is "pretty much important!"
XG-Boost is the secret of my energy
Krish, I have a question:
when you compute the output value you are catching the similarity weighted. I think it is incorrect for classification, isn't it?
To compute the output you shouldn't square the residuals.
THANKS for the video!!
Sir, How will the Prob value( 0.5 for the base tree ) be updated in each tree?
The similarity score is not the output value, there is a different formula for calculating the output based on residuals, you just have to remove the square in the numerator of the similarity score function.
Dear Krish, We have a course on machine learning. Around 40000 people subscribe to this course. But since they dont understand many of them will drop out in the middle. Why dont you start creating videos parallel to what is taught in the class and make a playlist for it. So that you can easily many views with one shot. Are u interested in this.
Grt teacher. Just a doubt, can't we take the credit as first node?
Hi Krish,
I have a doubt here. Here all the input features (salary, credit) are categorical. so we are making the decision tree easily based on the categories. Say suppose if we get the salary feature as continuous like 30k, 50k and not like 50k, how this split of decision tree will be done.
Check out decision tree algorithm video in ml playlist. Inside it, he has mentioned how to handle numerical features..
Hi Ashwin, for numerical features, you have to set a threshold for each value by taking the average of adjacent values for example for 30k - 40k you have to take (30+40)/2 i.e 35k and create a decision tree by setting value less than 35k i.e
Hi Krish, I have a doubt, can you please confirm if XGBOOST is a part of ensemble technique or not as while importing from the library we are doing it separately not from sklearn library.
It is a seperate library
@@krishnaik06 but is it an ensemble technique?
@@vishaldas6346 what is XGBoost and where does it fit in the world of ML? Gradient Boosting Machines fit into a category of ML called Ensemble Learning, which is a branch of ML methods that train and predict with many models at once to produce a single superior output.
Finally❤
Please subtitle the videos in Spanish. There is a community that speaks Spanish and listens to your videos
is the formula for similarity score of the root node correct? since this is a classification problem?
Thank you so much for such a step to step explanation. but I have a quick question what would we do if we have continuous variable than categorical. would we proceed as we do in decision tree for continuous features? or it's not recommended to use XGBoost in case of continuous features?
i think we use all the models and will take the result by comparing those, I think It will be better for that.
for continous data, like salary , first it will sort that particular column in ascending, then for each consucutive value will create an avg.Now each avg will be taken as a spliting condition. The one where the gain is the highest will be considered for the split . Like suppose you have 5 salaries 10,20,30,40,50. first splt would be on salary
sir please make a video on differences in all the boosting techniques , they are elaborate and couldn't find out the exact differences
Hi @krish
First of all kudos to you Great video
Can you tell me how xgboost is different from Aprori alogrithm or does it cover every combination as in Aprori cover ( ie it's covers all the combination while creating tree as Aprori will cover for same problem statement)
Thanks and love your work
Keep rocking
How is Pr gonna change please explain!!!!
Thank you for your fabulous video! I enjoy it and understand well!
Could you tell me if the output from the xgb classifier gives 'confidence' in a specific output (allowing you to assign a class) ? or is this functionally equivalent to statistical probability of an event occuring?
what should be the new probability value we need to consider when we are considering the second decision tree?
How u determine value of pr in base model
When I training data first calculate residual and create dt but here we are not able to see how it classified the point and in this it say when new data point is come I am confused in this
How do you decide on the Learning Rate parameter?
can you do a video difference between statistical models and machine learning models
Please upload a video on Light GBM.
what is similarity weight why we use it what is its advantage what is the intution behind it
Can anyone explain to me the video during 21:38 Mins ( 0-0.6)=-0.6 right not 0.4 right? or did I get it wrong Please Advise
I got the same question .
yeaaa me toooooooooooo....helpppwwwww meeee!! arghhh
Sir . krish Do you have a code that deal with more than one target ( y1,y2,.. Y is 2 columns or 3 columns . (two target , three target )
Thank you sir! I have a question in this how we predict the probability value at the begging from 0-1
isnt gradient boosting and xgboost same with miner difference?
is any other value except 0 as a hyperparameter in XGboost algorithm
Wht is the role of lambda in the similarity weight here.
Shouldn't your similarity weight be 1? Residuals must be squared first before adding up.
Sir how does the model chooses which similarity weight should be multiplied with learning rate . Thank you sir u r doing great by helping us🙂
its not the similarity weight which is multiplied, its the Output of the leaf node. Similiraity weight is used to calculate the Gain for splitting the nodes of the decision tree.
250k coming soon
what is the need of LOG(odd) function
the max_depth in xgboost for each tree is 2? plz answer ,
sir please make a video on gradient boosting for classification problem
You didn't add the lamda. Why?
23:00 that's lambda not alpha, please correct that
Can you please do a video on feature selection approaches? Especially the use of Mutual Information. Thanks. Great videos!!
It's lambda as hyper parameter, which u mentioned as alpha...
Hi sir @Krish Naik. What will be the initial probability when there are multiple classes....if anyone knows the answer please share...
How you have calculated the probability ?? How you have got 0.5 ??
Krish How do u stay so focused
can someone please clear the log of odds part? similarity wt=1 means that's the output but to compute that we calculate the base model output with respect to 0.5 probability, why?
how can we subtract probability of a value from that value. if suppose i take approvals in terms of Y and N then also their probability remains same at 0.5. but we cannot subtract 0.5 from Y or N. I did not get your concept of subtracting the probability from value.
why we split G,N into one but not separately
Please put lgbm mathematical explanation sir
U didn't upload gradient boosting classification videos i. e part 3 and part 4 of gradient boosting
how krish calculating gain ??
What's is the use ?