Machine Learning Tutorial Python - 11 Random Forest
Вставка
- Опубліковано 4 лип 2024
- Random forest is a popular regression and classification algorithm. In this tutorial we will see how it works for classification problem in machine learning. It uses decision tree underneath and forms multiple trees and eventually takes majority vote out of it. We will go over some theory first and then solve digits classification problem using sklearn RandomForestClassifier. In the end we have an exercise for you to solve.
#MachineLearning #PythonMachineLearning #MachineLearningTutorial #Python #PythonTutorial #PythonTraining #MachineLearningCource #MachineLearningAlgorithm #RandomForest #sklearntutorials #scikitlearntutorials
Code: github.com/codebasics/py/blob...
Exercise: Exercise description is avialable in above notebook towards the end
Exercise solution: github.com/codebasics/py/blob...
Topics that are covered in this Video:
0:00 Random forest algorithm
0:50 How to build multiple decision trees based on single data set?
2:34 Use of sklearn digits data set to make a classification using random forest
3:04 Coding (Start) (Use sklearn digits dataset for classification using random forest)
7:10 sklearn.ensemble RandomForestClassifier
10:36 Confusion Matrix (sklearn.metrics confusion_matrix)
12:04 Exercise (Classify iris flower using sklearn iris flower dataset and random forest classifier)
Do you want to learn technology from me? Check codebasics.io/?... for my affordable video courses.
Next Video:
Machine Learning Tutorial Python 12 - K Fold Cross Validation: • Machine Learning Tutor...
Populor Playlist:
Data Science Full Course: • Data Science Full Cour...
Data Science Project: • Machine Learning & Dat...
Machine learning tutorials: • Machine Learning Tutor...
Pandas: • Python Pandas Tutorial...
matplotlib: • Matplotlib Tutorial 1 ...
Python: • Why Should You Learn P...
Jupyter Notebook: • What is Jupyter Notebo...
Tools and Libraries:
Scikit learn tutorials
Sklearn tutorials
Machine learning with scikit learn tutorials
Machine learning with sklearn tutorials
To download csv and code for all tutorials: go to github.com/codebasics/py, click on a green button to clone or download the entire repository and then go to relevant folder to get access to that specific file.
🌎 My Website For Video Courses: codebasics.io/?...
Need help building software or data analytics and AI solutions? My company www.atliq.com/ can help. Click on the Contact button on that website.
#️⃣ Social Media #️⃣
🔗 Discord: / discord
📸 Dhaval's Personal Instagram: / dhavalsays
📸 Codebasics Instagram: / codebasicshub
🔊 Facebook: / codebasicshub
📱 Twitter: / codebasicshub
📝 Linkedin (Personal): / dhavalsays
📝 Linkedin (Codebasics): / codebasics
🔗 Patreon: www.patreon.com/codebasics?fa...
Do you want to learn technology from me? Check codebasics.io/ for my affordable video courses.
Hi, I have a question regarding fitting the model. When we do model. fit in every training, there will be a random set of samples for training. For example, in the iris dataset, I fit my model and then fine-tune with n_estimators =10,20,100, etc. sometimes it is getting 1.0 score on 20, but if I run it again, it gets 0.98, so how can I fix the x_train and y_train so it will not change every time. ?
And I am really thankful for your lectures I am learning day by day.
Thank you.
github.com/codebasics/py/blob/master/ML/11_random_forest/Exercise/random_forest_exercise.ipynb
Complete machine learning tutorial playlist: ua-cam.com/video/gmvvaobm7eQ/v-deo.html
xlabel is the truth
and
ylabel is the prediction
but in the video it is reverse....
Am I right?
because we take "confusion_matrix(y_test,y_predicted)"
@@lokeshplssl8795 I do have same question
@@jiyabyju I figured it out
@@lokeshplssl8795 hope there is no mistake in code..
@@jiyabyju no mistake,
He took y_predicted as a model of prediction with X_test.
Keeping the tutorial part aside (which is great), I really love your sense of humor and it's an amazing way to make the video more engaging. Kudos!!
Also, thank you so much for imparting such great knowledge for free.
Thanks for your kind words and appreciation shankey 😊
The way you teach or explain the concepts completely different thanks a lot!!!!!! Please make more videos
I can watch this type of videos whole day without take any break. Thank you!!!
Sir, I am damn impressed by you!!!! You are the best ML instructor here on YT!!!!
Lets promote this channel.
I am just a humble python hobbies who took local course yet still I don't understand most of the lecturer says. Because this channel i've finally found fun with python. In just 2 weeks(more) I already this Level? Man....! Can't Wait for Neural Network but only from this channel
For Iris Datasets I got score =1 for n_estimators = 40,50,60
Thank sir very much
Great Video! I'm working on my first project using machine learning and am learning so much from your videos!
Hey Alex, good luck on your project buddy. I am glad these tutorials are helpful to you :)
again, just spectacular graphics and easy to understand explanation. thank you so much.
It is a good practice to make a for loop for the n_estimators check the score for one of these:
scores=[ ]
n_estimators=range(1,51) #example
for i in n_estimators :
model=RandomForestClassifier(n_estimators=i)
model.fit(X_train,y_train)
scores.append(model.score(X_test,y_test))
print('score:{}, n_estimator:{}'.format(scores[i-1],i))
plt.plot(n_estimators,scores)
plt.xlabel('n_estimators')
plt.ylabel(('testing accuracy')
And then you can sort of see what's going on. This practice is very useful for knearest neighbors technique for calculating k.
Thank you! I was looking for something like thi. I think in the fourth line the i is missing, as in model=RandomForestClassifier(n_estimators = i)
@@cololalo yep forgot it thanks.
Thank you, I am trying to find something like this since the previous video!
I cannot quite express how amazing teaching you are doing. I am doing masters one of the finest universities in America and this is better than the supervised learning class I am taking there. Kudos! Please keep it up. appreciate you are making this available for free although I would be willing to see your lectures even for a fee.
Thanks for leaving the feedback aditya
Just love ur videos. I was struggling with python. With ur videos was able to get everything in a weeks time. Also completed pandas and bumpy series. I would highly encourage u to start a machine learning course with some real life projects
FYI if you are using version 0.22 or later the default value of n_estimators changed from 10 to 100 in 0.22
Man, its great! Your videos is best i have seen ever about machine learning. Its very helpfull material. I am waiting when you make tutorial about gradient boosting and neural networks. I think you can make easily to report it. Thanks!
frankly telling your videos are more neat and clear than anyother videos in the youtube
Thanks Ramesh for your valuable feedback :)
Thank you very much! This tutorial is really amazing!
Hi Sir,
Can we use any other model (eg: svm) with the random forest approach, that is, by creating an ensemble out of 10 svm models and getting a majority vote?
Thank you for the wonderful video.
Amazing man, keep it up and share more tutorial like this.
Amazing, I like how you explain simply
Hello Sir, I have started learning pandas and ML from your channel, and i am amazed the way you are teaching.
For Iris Datasets I got score =1 for n_estimators = 30
Great Vivek. I am glad you are working on exercise. Thanks 😊
Thank you Sir for this awesome Explanation about RandomForestClassifier . I got score of 1.0 for every increased value in n_estimators
Nice work!
I achieved an accuracy of .9736. Earlier, I got an accuracy of .9 when the test size was 0.2 and changing the number of trees wasn't changing the accuracy much. So, I tweaked the test size to .25 and tried different number of tree size. The best I got was .9736 with n_estimators = 60 and criterion = entropy gives a better result.
Thank you so much sir for the series. This is the best UA-cam Series on Machine Learning out there!!
xlabel is the truth
and
ylabel is the prediction
but in the video it is reverse....
Am I right?
because we take "confusion_matrix(y_test,y_predicted)"
@@lokeshplssl8795 I think I know why you are probably confused. This not a plot chart. You should not assume that because you passed y_test as a first argument you would see it horizontally similarly you do with xlabel.
Unfortunately the confusion matrix is printed out unlabeled. True/Actual/test values are vertically alligned and predicted ones are horizontally.
A couple of videos before he used another library to demonstrate the matrix labeled.
If you have any questions regarding confusion matrix this is by far the best video ua-cam.com/video/8Oog7TXHvFY/v-deo.html .
Also a similar use case has to do with Bayesian statistics. Another great example ua-cam.com/video/-1dYY43DRMA/v-deo.html
You don't have to get into it since the software does it for you, but it would help understand what is going on
This is sooo awesome! Amazing work sir💎
It's nice to see you bhaiya again
Again a nice video from you.
Sir I have one general question. What is random_state and why we sometime take 0 and sometimes we assign value to it. What's the significance of this.
This is a great series! Would you be interested in allowing us to repost it on our channel? We'll link to your channel in the description and comment section. Send me an email to discuss further: beau [at] [channelname]
mega.nz/file/LaozDBrI#iDkMIu6v-aL9fMsl-X1DETkOqnMqwptkn54Z51KINyw (like data in this file )//help if anyone understand. mega.nz/file/LaozDBrI#iDkMIu6v-aL9fMsl-X1DETkOqnMqwptkn54Z51KINyw (like data in this file )//help if anyone understand.
sir, can you tell me how to plot random forest classification with multiple independent variables.so confused in that
yes sure. go ahead. You can post it.
You are amazing brother. I really loved this. You made it so simple. Thank you so much.
Thanks for another post.. It's really helpful.... Just a question- Considering the fact that Random forest takes the majority decision from multiple decision trees, does it imply that Random forest is better than using Decision tree algorithm? How do we decide when to use Decision tree versus Random forest?
ok so i read one comment and put test_size = 0.25 and n_estimator = 60. I rerun my test sample cell as well as model.fit and model.predict cell and got the accuracy of 100%. I am having a god complex right now thank you for this amazing series
Please upload frequently..we will wait for you
Another Great Video. Thanks for that. I got 1.0 as score with n_estimators=1000. Keep doing these kind of great videos. Thank you.
Anji, it's great you are getting such an excellent score. Good job 👍👏
Thank you so much. I need some help on this classifier for my data set. This helped a lot.
Glad it helped!
Nice to watch your videos.. you make us understand things end to end !!
👍😊
Thank you so much for very dynamic and clear content with the ideal depth on the topic details
Glad it was helpful!
Nice explanation as always. Great work.
Thank you so much! I love your videos.
This is so awesome explanation!! Thank you so much!!!
Maybe I am a bit late jumping on the train, even though, I still want to say thank you for everything you have been doing. Your videos are much better to understand the field rather than the courses of top class Universities such as MIT. I have to say that you outperform all your competitors in a very simple way. As far as I know you had some problems with your health and I hope everything is good now. Wish you good luck and stay healthy at least for your UA-cam community. ^_^
Hey Yea James, thanks for checking on my health. You are right, I was suffering from chronic ulcerative colitis and last year 2019 had been pretty rought. But guess what I cured it using raw vegan diet, ayurveda and homeopathy. I am 100% all right and symptoms free since past 10 months almost and back in full force doing youtube tutorials :)
@@codebasics Good to hear, Things are working out in a positive way! Be safe and I pray everything works well in the long run.
Jai SriRam
I Got 100% accuracy!.... by changing criterion = "entropy"
xlabel is the truth
and
ylabel is the prediction
but in the video it is reverse....
Am I right?
because we take "confusion_matrix(y_test,y_predicted)"
@@lokeshplssl8795 it doesn’t change much, i mean u are just transposing the confusion matrix. The info still remain the same
Thank you for such wonderful videos, I got accuracy score a 1 in the exercise question
This video was amazing. Thanks!
Love your videos. They're helping me a lot. thanks
Hey Mohamed,
Thanks for nice comment. Stay in touch for more videos.
Thanks a lot sir for the videos, I wanna know when to use random forest or just tree?
great videos! thank you so much
Hey, Hope you're doing well! I have a query regarding random forest algo! I want to ask that I have predicted random forests algo and made 70 30 ratio! But how i can specify the prediction for 30days! Any variable or specifier?
Looking forward to hearing from you soon!
Thanky!
Hi sir,i did your exercise of iris data and got an accuracy of 1.0 with n_estimators=80
I got 93.33 accuracy at n_estimators=30 after that accuracy not increasing w.r.t increase in n_estimators. Thankyou very much for simply great explanation
your teaching is superb, and your knowledge sharing to Data Science community is Nobe|.
I tried the exercise by giving the criterion = "entropy" got score as 1
Hi, just want to ask this question that, in a data set split why should we drop the target column. Like that is the actual or final result that either the row is true or false. Then while spliting why should we have to drop that?
first of all, your video is amazing. it simplifies image analysis to a 15 minute task. may i ask you a question regarding brain tumor data? i want to implement a random forest on the BraTS data set. i have a 4d array with 4 modalities: (flair,t1,t1ce,t2) => [modalities, image slices, x-plane,y-plane] and the labels are just 2d. your video is amazing but i don't know what to do with these 2d labels because target variable in your video is 1d. might you be able to give me an idea of how to deal with my labels or how to approach this problem generally?
Great content!! I have a question though, shouldn't the xlabel be 'Truth' and ylabel be 'Predicted' ?
i though the same thing as well
I've done all the Exercise till here. But I was planning not to do it for this video until I saw your last picture! I don't want you to be angry! so I am going to do it right now!
Ha ha nice. Javad. Wish you all the best 🤓👍
n_estimators = 1 (also 290 or bigger) is even made accuracy %100 but, as all we know , this type of datasets are prepared for learning phases, so making %100 accuracy is so easy as well.
this is crash course; if you are in hurry; this is the best series out there on youtube
Best explanation of Random Forest!!!!!!
I am happy this was helpful to you.
The default value of n_estimators changed from 10 to 100 in 0.22 version of skllearn. i got accuracy of 95.56 with n_estimators = 10 and for 100 the same.
I got an accuracy of 0.982579 by giving, n_estimators = 100, well 100 is the default value now, and sir, big fan of your teaching 🙂
Good job Abhinav, that’s a pretty good score. Thanks for working on the exercise
@@codebasics sir just wished to get in contact with you, to get a proper guidance
Is it possible to predict a set of numbers that will output from a random number generator, finding the algorithm, in order to duplicate the same pattern of results?
This is the only channel i subscribed.
J Es, thanks. I am happy to have you as a subscriber 👍😊
Hi Sir, we are blessed that we got your videos on youtube. Your videos are unmatchable. I am interested in your upcoming python course. When can I expect starting of the course?
Python course is launching in June, 2022. Not sure about exact date though
You made that so simple thank you so much
Fantastic!
Thanks a lot Bro great videos👍👍👍....where can I get more Exercises for machine Leaning
great, thank you so much!!!!
Excellent. Thank you.
I got 100% accuracy with default estimator and random_state=10. Thanks a lot Sir
Good job Praveen, that’s a pretty good score. Thanks for working on the exercise
Thank you so much for the tutorials sir. My interest in learning machine learning made easy by you. Can you please make tutorials on chatbots using python.
Thank you
Hey harshitha, thanks for your kind words of appreciation and sure I will note down the topic you suggested 👍
When I fit the data into model I didn't get the output as you like all feature are included in in model.It just showed me the model fitted nothing else.what can I do for to see full details of model at fitting??
default 100 n_estimators or 20 n_estimator , each case it gives 1.0 accuracy. well after getting on this channel , i can feel the warmth on the tip of my fingers.
you're so much fun dude
Sir u r great thnx for these kinds of videos please make more videos 😊😊😊😊
Sir make more videos
Hi sir, i have a simple query regarding jupyter notebook. I can't see the parameters of randomforestclassifier() after applying model.fit()
Is there any way to see those parameters
thank you for this tutorial how to visualize randomforest and decision tree
Hey your vidoes are great. But where do you go away in the middle of making videos then come back after a long time.
Awesome channel.
I have a question though.
To find the optimal n_estimators I made a loop that went from n_estimators=1 until a number of my choice (number_trees)
But I thought that a lucky train_test_split could give a very good score to a shitty model. So i made an inside loop that run up to a number of my choice (number_sets) the split, Train model, score and keep the best and worst scores.
The result is that I see absolutely no tendency on the score depending on n_estimators.
For example, with n_iterations = 3 and doing the split 5 times, the worst i get is 0.97 accuracy, which is great
But with n_iterations = 4, the worst i get is 0.89, which is worse
But then again, n_iterations = 10 i get 0.97
And so on so forth.
My question is, why do not I see a tendency on the score depending on n_estimators? I was expecting the score to go up up to a certain n_estimators and then not changing.
CODE (UA-cam doesnt allow copypaste so there might be a typo)
number_trees = 100
number_sets = 5
pd.set_Option("Display.max_rows", None)
results = pd.DataFrame(columns = ["min_score", "max_score"])
for i in range (1, number_trees+1):
modeli = RandomForestClassifier(n_estimators = i)
min_score = 1
max_score = 0
for j in range (number_sets):
X_Train, X_test, y_train, y_test = Train_test_split(X,y)
modeli.fit(X_train, y_train)
score = modeli.score(X_test, y_test)
if score > max_score:
max_score = score
if score < min_score:
min_score = score
results.loc[i, "min_score"]= min_score
results.loc[i,"max_score"]= max_score
results
The R2 we got is for test set (R2test), what about the model's R2 which is generally termed as R2training
Sir how are you deciding the xlabel and ylabel in the heatmap
n_estimators = 10, criterion = 'entropy' led to a 100% accurate model !! Thanks!
Great job Sagnik :) Thanks for working on exercise
@@codebasics My pleasure ! Amazing tutorials !! Been a great learning experience so far ! Cheers :)
sir suppose to consider the 4 decision trees in that 2 trees give the same output and another 2 trees give the same output then which one considered both having the majority at that time plz clarify this doubt
Nice work.
What makes you put truth on the y_label and predicted on the x_label?
I see many people is saying that in Irises they had 1.0 with 50+ esitmators. I am just starting with ML but for me 4 functions in Irises means that we don't need much estimators, there is actually only 6 unique combinations of functions. 10 if we used also solo columns as estimators which I presume is not happening. Am I correct that anything beyond 6 estimators shouldn't improve the model?
Sir when we are loading dir( iris) or dir(digits) datsets we are getting some other stuff.
Sir can you please provide information on django framework
Nice videos, Your videos are the best..Keep doing
Jyothish, I am happy this was helpful to you.
Good one.
Very nice sir.... Expecting more videos 😀
Glad it was helpful!
ure great , thank you so much
very Great video!!!!! thanks
glad you liked it Han
looking for NLP videos on Sentimental analysis
Sir I got score=1.0 for estimator=10
And random_state=10
Very nice explanation👌👌👌
Great score. Good job 👌👏
when will u make an video on NLP?
I can't get any better than 93.3333333% on the exercise even with more n_estimators.
Sir I have Done the Exercise with 100% Accuracy
Sir while changing the number of trees in the code //RandomForestClassifier(n_estimates=100)// i am getting this error "__init__() got an unexpected keyword argument 'n_estimator'" but without mentioning the number of trees the model works fine!
'Kindly help
Sir you are great !! We eagerly waiting for your videos ..thank you so much
I hope soon you will teach us unsupervised algorithms such as K means DBScan ! Guys what do you say?? Thanks once again
excellent lesson!
Glad it was helpful!
Ty sir, any chance to make a overview of GridSearch applied to those models you chose?
Sure george. I recently uploaded a tutorial on hyperparameter tunning and GridsearchCV. Please check it out.
I am not afraid of you, but I respect you!
So I am gonna do the exercise right now!