Decision Tree with R | Complete Example

Поділитися
Вставка
  • Опубліковано 5 вер 2024

КОМЕНТАРІ • 261

  • @olivergasior8005
    @olivergasior8005 Рік тому +1

    I watched your videos to help through a data analytics degree and I'm now working in a job type similar to business analyst and looking back at these videos. Very easy to follow, punctual, and informative for getting the job done. Thank you

    • @bkrai
      @bkrai  Рік тому

      You are welcome and god luck!

  • @animeshdevarshi
    @animeshdevarshi 7 років тому +5

    Sir, I've been following lot of courses but never found something with so clarity. Thanks for posting these!

    • @bkrai
      @bkrai  7 років тому

      Thanks for the feedback!

  • @user-uf5bk8zc7n
    @user-uf5bk8zc7n 4 роки тому +3

    Thanks Doc, after my 6 hrs class ...you went through all my confusions in just 18:43 mins. Such a worthy job!!!

    • @bkrai
      @bkrai  4 роки тому

      Thanks for your feedback and comments!

  • @vijayarjunwadkar
    @vijayarjunwadkar 2 роки тому +1

    Take a bow sir! For the first time, I had full clarity on Decision Tree and it's usage! Thanks a lot for this superb tutorial, lucky to find your channel, stay blessed! 👌👍🙏

    • @bkrai
      @bkrai  2 роки тому

      Thanks for comments!

  • @askpioneer
    @askpioneer 2 роки тому +1

    hello sir, your way of explaining is so simple and effective. made topic simple.
    i would like to add comment for all as well that i was getting error while using controls=ctree_control and after doing google and forum support , now i am able to run. and veiw tree. Great work sir.

    • @bkrai
      @bkrai  2 роки тому

      Thanks for the update!

  • @ShivaKumarbudda
    @ShivaKumarbudda 4 роки тому +1

    Hi, video posted 4 years ago today has become a saviour for my internal assessment
    Thank you 😃

    • @bkrai
      @bkrai  4 роки тому

      Welcome! You may also find this recent one useful:
      ua-cam.com/play/PL34t5iLfZddvGr66DPf-L-sSJ50XNwN3K.html

  • @UmairSajid
    @UmairSajid 5 років тому +2

    Hello Dr. Rai, thank you for a very informative video.
    One thing that I would like to add based on my limited knowledge:
    For a skewed class distribution such as in the data, it is more importance that the model is able to predict the abnormal cases then it is to predict normal cases. If we just look at the mis-classification error, then the model may be aligned towards the class with higher percentage of data. One way to avoid that is to reduce the disparity between the class types by over/under sampling techniques. Another way is to use the Area under the precision-recall curve as a measure of model evaluation.
    Your comments and feedback on this would be appreciated.

    • @bkrai
      @bkrai  5 років тому

      That's correct. For more details about class imbalance problem, refer to this link:
      ua-cam.com/video/Ho2Klvzjegg/v-deo.html

  • @ivanjcardona
    @ivanjcardona 2 роки тому +1

    You really made it simple. I have been watching others tutorial, but not anymore. I already subscribed. Thanks a lot.

    • @bkrai
      @bkrai  2 роки тому

      You are welcome!

  • @kabeeradebayo9014
    @kabeeradebayo9014 7 років тому +1

    Thank you again for these complete episodes. You have been of a great help to me "Rai". Please, I'd appreciate a complete episode on the ensembles, essentially, heterogeneous ensemble using DT, SVM etc. inclusive as the base classifiers.
    Comprehensive videos on ensembles are not common, in fact, I haven't come across any. It will go a long way If you could put something together on this. Thank you for your help!

    • @bkrai
      @bkrai  7 років тому +1

      Thanks for the suggestion, I'll do it in near future!

    • @kabeeradebayo9014
      @kabeeradebayo9014 7 років тому

      Sounds really great. Looking forward to it. Can't wait!

  • @plum-ish6679
    @plum-ish6679 2 роки тому +2

    You are truly remarkable! The way you explain things is very simple to understand.

    • @bkrai
      @bkrai  2 роки тому

      Thanks for comments!

  • @sujitcap
    @sujitcap 6 років тому +1

    Sir, so much clarity ...How simple and easy you created ! Thank you .

    • @bkrai
      @bkrai  6 років тому

      Thanks for comments!

  • @sudzbyte2215
    @sudzbyte2215 4 роки тому +2

    This is a great example of decision trees. Thank you!

    • @bkrai
      @bkrai  4 роки тому

      Thanks for comments!

  • @wasafisafi612
    @wasafisafi612 2 роки тому +1

    Thank you so much for your videos. I am learning everyday with them. May God bless you

    • @bkrai
      @bkrai  2 роки тому

      Thanks for comments!

  • @christan7434
    @christan7434 5 років тому +1

    Thank you Professor Rai for taking the time to show us the ropes. Regarding the mis-classification error table, may I know: what is the difference between that and the Confusion Matrix. I notice the calculation for "accuracy" is the same as the Confusion Matrix, simply "sum(diag(tab))/sum(tab)", but for Confusion Matrix, the Actual is on the vertical versus what you stated in video for Actuals in the horizontal. Thanks, and looking forward to more videos from you

    • @bkrai
      @bkrai  5 років тому

      Both confusion matrix or mis-classification table are same.

  • @ekfistek
    @ekfistek 4 роки тому +1

    Dr Rai, thanks for your videos. I have them useful in explaining basic machine learning methods. Thank you!

    • @bkrai
      @bkrai  4 роки тому

      Thanks for comments!

  • @shesadevsha1994
    @shesadevsha1994 5 років тому +1

    Hi Sir, I am so glad to see your all videos on related to machine learning in R, So request one thing if you share your datasets which you have used in your session that will be great

    • @bkrai
      @bkrai  5 років тому

      You can get data file from the link in description area below the video.

  • @rithishvikram1759
    @rithishvikram1759 4 роки тому +3

    wow thank you sir....!!!!sir please make video of entropy splitting creation calculation it is very useful sir

    • @bkrai
      @bkrai  4 роки тому +1

      Thanks for the suggestion, I've added it to my list.

  • @shaliniguha1822
    @shaliniguha1822 6 років тому +1

    Sir, it'd be really nice if you can make a blog explaining the output in more details. For instance, an explanation of the statistical parameters measured in the confusion matrix. Your videos are really helpful! :)

    • @bkrai
      @bkrai  6 років тому

      Thanks for your comments and suggestion! You may find decision tree related explanations in following video too:
      ua-cam.com/video/J2a9yV3kl-M/v-deo.html

  • @DABANG125
    @DABANG125 4 роки тому +3

    Sir,
    Greetings from the US,
    I have enrolled in the machine learning course through Udemy as well but your explanation super simple and easier to implement.
    Please do guide me with any book which I can use to practice more of such datasets

    • @bkrai
      @bkrai  4 роки тому

      Deep learning is the hottest topic currently within machine learning field. To get started with practical examples you can try:
      www.amazon.com/Advanced-Deep-Learning-designing-improving/dp/1789538777

  • @carlosfernandezgalvez3023
    @carlosfernandezgalvez3023 5 років тому +3

    Hi! thank you for all your videos.
    I'd just like make a little comment: ctree function implements 'Conditional Inference Tree', not 'Clasification Tree'. In fact, it can develop clasification trees, but the fundamentals are different.
    Thank you for all the work you are doing! very usefull.
    Carlos

    • @bkrai
      @bkrai  5 років тому +1

      Thanks for the update!

  • @nayeemislam8123
    @nayeemislam8123 5 років тому +1

    Sir, I have a few questions:
    1. How do you find statistically significant variable after developing a decision tree model with all variables? Ho
    2. Suppose all variables in a decision tree is coded as POOR, FAIR, GOOD, then how to find the probabilities of each (POOR, FAIR, GOOD) at non terminal nodes of the tree and also number of sample in each category? I need to show this in my plot.
    3. What is the best approach in developing a decision tree model? Developing a model on the training data using K Fold Cross Validation OR Developing a model on training data and then going for cross-validation and pruning process using a function like cv.tree() which allows us to choose the tree with lowest cross validation error rate? Which method is better?
    4. How to find out the value of the standardized importance of independent variables using CART in R?

    • @bkrai
      @bkrai  5 років тому

      1. P-values on the tree indicate statistical significance.
      2. You can find it only at the terminal node.
      3. k-fold CV is always better to avoid over-fitting.
      4. Higher a variable on the tree, more important it is. For variable importance you can also try this link:
      ua-cam.com/video/dJclNIN-TPo/v-deo.html

  • @hridayborah9750
    @hridayborah9750 4 роки тому +1

    very very clear and helpful. thanks tons

    • @bkrai
      @bkrai  4 роки тому

      Thanks for comments!

  • @akshitbhalla874
    @akshitbhalla874 5 років тому +1

    Your videos are honestly so amazing.

    • @bkrai
      @bkrai  5 років тому +1

      Thanks for comments!

  • @halyad4384
    @halyad4384 7 років тому +1

    Very informative and easy to understand.Thanks for sharing such an useful video.

    • @bkrai
      @bkrai  7 років тому

      Thanks for the feedback!

  • @ehtishamraza2623
    @ehtishamraza2623 3 роки тому +1

    Really Great Explanation

    • @bkrai
      @bkrai  3 роки тому

      Thanks for comments!

    • @bkrai
      @bkrai  3 роки тому

      Also here is a link to more recent one:
      ua-cam.com/video/RCdu0z2Vyrw/v-deo.html

  • @rakeshv6322
    @rakeshv6322 2 роки тому +1

    Thanks sir for detailed video..

    • @bkrai
      @bkrai  2 роки тому

      Most welcome!

  • @nagarajaraja2546
    @nagarajaraja2546 7 років тому +1

    Hi sir ,
    my s.nagaraj adiga your vedios are very simple to listen and it is easy to understand thank you very much .

    • @bkrai
      @bkrai  7 років тому

      Thanks for the feedback!

  • @vairachilai3588
    @vairachilai3588 4 роки тому +1

    in confusion matrix(tab), the column is predicted data and row-wise actual data

    • @bkrai
      @bkrai  4 роки тому

      In this video I have used predicted data in row and actual in column for the confusion matrix.

    • @vairachilai3588
      @vairachilai3588 4 роки тому +1

      Kindly check it, (table(predict(tree),data$NSP), Then the output will be taken in the following way, column is predicted data and row-wise actual data

    • @bkrai
      @bkrai  4 роки тому

      Try this, it will make it more clear:
      table(Predicted = predict(tree), Actual = data$NSP)

  • @raymondjiii
    @raymondjiii 2 роки тому +1

    That was awesome but I found that with my dataset I get a completely different decision tree using the rpart package. Without rpart, the tree is what I expected it to be and with rpart - in some ways it's almost opposite. I'm only comparing the two trees with my training data.

    • @raymondjiii
      @raymondjiii 2 роки тому +1

      I think I know what the problem is - with rpart trees you only get a little "yes" and "no" marker on the root node. In my case "yes" goes to the left of the tree and "no" goes to the right of the tree. If I assume that direction is always the case then things are okay. I do wish that the "yes", "no" little while boxes were printed at every non leaf node so it's very clear which way the path is going. (I wonder if there's an option for that?) Thanks for the great video.

    • @bkrai
      @bkrai  2 роки тому

      See link below that has more detailed coverage:
      ua-cam.com/video/6SMrjEwFiQY/v-deo.html

  • @bonelwamnyameni
    @bonelwamnyameni 7 років тому +1

    This video as helped me a lot with my assignment, thank you so much.

    • @bkrai
      @bkrai  7 років тому

      that's great!

  • @tarapaider1729
    @tarapaider1729 7 років тому

    Your videos are always very easy to follow!!

    • @bkrai
      @bkrai  7 років тому

      +Tara Paider thanks for the feedback 👍

  • @mateuszbielik2912
    @mateuszbielik2912 2 роки тому +1

    Greetings! I came back to this video after a while as it still seems to be the best one regarding Decision Trees out there. I have a quiestion regarding significance of variables. Do you have a video covering this subject? Any techniques I could apply while working on my Decision Tree? thank you

    • @bkrai
      @bkrai  2 роки тому

      You can use this link. For tree based methods, it provides variable importance plots to show which variables are important and which ones do not contribute much.
      ua-cam.com/video/hCLKMiZBTrU/v-deo.html

  • @lorihearn6859
    @lorihearn6859 3 роки тому +1

    Is it only useful for numerical data? when all the independent variable are continuous? or it can be used for categorical ones too?

    • @bkrai
      @bkrai  3 роки тому

      It's useful for both. See this more detailed example:
      ua-cam.com/video/6SMrjEwFiQY/v-deo.html

  • @aditidalvi255
    @aditidalvi255 5 років тому +1

    Sir plz can u suggest a good book for beginners in machine learning to have basic knowledge of all statistical tools ??

  • @sushantchaudhary2008
    @sushantchaudhary2008 3 роки тому

    Thank you Dr Rai. I have a question about the tree pruning. Prior to the pruning some of the trees were able to classify patients as pathological but after pruning( by changing the control functions) none of the trees identify the pathological patients. If we were to specifically identify patients with suspected pathology how can we modify the control functions or the initial formula included in the "ctree()" function?

  • @sovon08
    @sovon08 6 років тому +2

    Sir, if you could create a video for how to calculate gini, KS using R that would be really great

    • @bkrai
      @bkrai  6 років тому

      Thanks for the suggestion, I've added this to my list.

  • @AmarLakel
    @AmarLakel 5 років тому +1

    Thank you for your help and all your videos. It's help me a lot

    • @bkrai
      @bkrai  5 років тому

      Thanks for your comments!

  • @atiquerahman3766
    @atiquerahman3766 7 років тому +1

    Hi Sir, Your videos are really helpful.It has really helped me a lot, I have few doubts though.I have just started learning data science so these doubts may be naive.
    1) On what basis we decide that we should put this much data into training, validation, and testing respectively?
    2)Is there any criteria(such as r-square in regression models, Chi-square for logistic regression) for decision trees so that we can say how good our model is?

    • @bkrai
      @bkrai  7 років тому

      1) one may experiment with different partitions such as 50:50, 60:40, 70:30, etc., and see what works best. There is no single partition ratio that will work well in all situations.
      2) if your y variable is categorical, mis-classification error is used for model performance assessment.

    • @atiquerahman3766
      @atiquerahman3766 7 років тому +1

      Thank you, sir!!

  • @rakeshvikhar
    @rakeshvikhar 2 роки тому +1

    I am a beginner.. could you help me understand if we can use linear/logistic regression todo the prediction here? I have referred your vehicle example and so got confused if we can use that model here.

    • @bkrai
      @bkrai  2 роки тому

      Yes, you can use logistic regression as response variable is of factor type. For more see:
      ua-cam.com/video/AVx7Wc1CQ7Y/v-deo.html

  • @vishalaaa1
    @vishalaaa1 4 роки тому +1

    ctree dont support the dates. I tried the dates converted from posix. Can you please suggest the parameter in ctree that resolved this problem ?

    • @bkrai
      @bkrai  4 роки тому

      Decision tree is not a good methods to work with dates. For dates you should use time series:
      ua-cam.com/play/PL34t5iLfZddt9X6Q6aq0H38gn-_JQ1RjS.html

  • @satishbharadwaj9539
    @satishbharadwaj9539 6 років тому +1

    Sir, please post a video on Regression Splines, Polynomial Regression & Step Functions etc

    • @bkrai
      @bkrai  6 років тому

      Thanks for the suggestion, I've added it to my list.

  • @ricardobrubaker4109
    @ricardobrubaker4109 2 роки тому

    How can we export the first tree prediction (View(predict(tree,validate,type="prob"))) into XL? When using a data frame they come out horizontally and unreadable.

  • @sallymusungu8983
    @sallymusungu8983 Рік тому

    How do you remove ticks on the axes? Or realign the axis labels?

  • @harishnagpal21
    @harishnagpal21 6 років тому +1

    Nice video Bharatendra. One question.. you said that we need to optimize the model.... how to do that ie how to optimize our model! Thanks

    • @bkrai
      @bkrai  6 років тому +1

      You can make changes to settings in 'control' to see what helps to improve the model. In the example, I used only 3 variables just for illustration, but you must start with all variables for a better performance.

    • @harishnagpal21
      @harishnagpal21 6 років тому

      thanks :)

  • @bala4you01
    @bala4you01 8 років тому

    Thank you, Dr. Roy for sharing simple and detailed explanation on Decision Tree. My query is can we plot ROC curve for Multiclass Data. (pROC package provides to calculate the AUC but I could not find how to plot ROC graph for multinominal data).

    • @bkrai
      @bkrai  7 років тому

      At this time it only does it for binomial situation. You can now find roc curve video here:
      ua-cam.com/video/ypO1DPEKYFo/v-deo.html

  • @mayankhmathur
    @mayankhmathur 6 років тому +1

    Nice explanation. thanks.

  • @sudiptomitra
    @sudiptomitra 3 роки тому

    A comparative analysis on pre/post pruning of model would have completed the tutorial on Decision Tree.

  • @mateuszbielik2912
    @mateuszbielik2912 2 роки тому +1

    great video, everything explained step by step. I have a question tho. some of my data in the DB file is char and i keep getting an error "data class "character" is not supported". how can i include this data in my experiments?

    • @bkrai
      @bkrai  2 роки тому +1

      You change such variables to ‘factor’.

    • @mateuszbielik2912
      @mateuszbielik2912 2 роки тому +1

      @@bkrai omg thank you. so I can just use data$variableF

    • @bkrai
      @bkrai  2 роки тому +1

      yes that should work.

  • @MrCaptainJeeves
    @MrCaptainJeeves 8 років тому +1

    love all your videos...Please keeping uploading

    • @bkrai
      @bkrai  8 років тому

      +pradeep paul Thanks for your feedback!

  • @takakosuzuki2514
    @takakosuzuki2514 5 років тому +1

    Hi Dr.Rai. I encountered an error on #Misclassification part. I got the table for using the library(party), but I got "all argument must have the same length" when using the rpart() one. But if I use validate set with the rpart package, the table can be generated.

    • @bkrai
      @bkrai  4 роки тому

      Difficult to say much without looking a the code. But you can review your code again, there may be some typo.

  • @anananan3635
    @anananan3635 2 роки тому +1

    its just for numaric variables? is their another cod to charachter variabls

    • @bkrai
      @bkrai  2 роки тому

      Change character variables to factor variables before using this.

  • @Twiste_Z
    @Twiste_Z 4 роки тому +1

    i followed ur method with a dataset i created...its a simple one but the output is just printing the values of my dataset rather than plotting a tree and predicting...can u help me understand why

    • @bkrai
      @bkrai  4 роки тому

      Difficult to say much without looking at data and code.

  • @TheIanoTube
    @TheIanoTube 4 роки тому +3

    Would this work just as well if some variables were categorical? I.e. written in text but limited options
    Thanks for the video

    • @bkrai
      @bkrai  4 роки тому +1

      Yes, absolutely

    • @bkrai
      @bkrai  4 роки тому +1

      You may also try this link:
      ua-cam.com/play/PL34t5iLfZddvGr66DPf-L-sSJ50XNwN3K.html

    • @TheIanoTube
      @TheIanoTube 4 роки тому +1

      Thank you, great channel. Subscribed!

    • @bkrai
      @bkrai  4 роки тому

      Thanks!

  • @muhammadnurdzakki1605
    @muhammadnurdzakki1605 4 роки тому +2

    Reading /Preparing csv data : 0:32
    Decision Tree using rpart Package : 11:22

    • @bkrai
      @bkrai  4 роки тому

      Thanks!

  • @m.z.1809
    @m.z.1809 5 років тому +1

    how can we validate the accuracy or discriminatory from this model?
    i believe you can use the model outputs from train and validate to somehow calculate chi-square etc?

    • @bkrai
      @bkrai  5 років тому

      You can validate the model built on training data with the help of validate data.

  • @Fsp01
    @Fsp01 4 роки тому +1

    brilliant! thank you Dr

    • @bkrai
      @bkrai  4 роки тому

      You're most welcome!

  • @leolee618
    @leolee618 6 років тому +1

    Thank you so much for your awesome video. I've learned a lot from it.

    • @bkrai
      @bkrai  6 років тому

      Thanks for your feedback!

    • @bkrai
      @bkrai  6 років тому

      Thanks for your feedback!

  • @oguzyavuz2010
    @oguzyavuz2010 4 роки тому +1

    let me ask, top of the variable of the picture is not dependent variable right? 5:46

    • @bkrai
      @bkrai  4 роки тому

      It's a independent variable.

    • @oguzyavuz2010
      @oguzyavuz2010 4 роки тому

      @@bkrai sir can i ask some simple questions about tree diagram if you do not mind. I leave it here my gmail adress: ogzhnyvzz@gmail.com

  • @fadedmachine
    @fadedmachine 6 років тому +1

    You're the man. Keep up the great work!

  • @uhsay1986
    @uhsay1986 5 років тому +1

    Hi SIR , how do we apply test set to predict function where the target var have NA values ? As wen i run the function it says predictor must have 2 levels.

    • @bkrai
      @bkrai  5 років тому

      You need to impute missing values before developing the model.

  • @piyalichoudhury3493
    @piyalichoudhury3493 5 років тому +1

    like your videos... can you upload some on ensemble and AIC as well. will be very kind of you

    • @bkrai
      @bkrai  5 років тому +1

      Thanks for comments and suggestion, I've added it to my list.

  • @abhinavmishra7786
    @abhinavmishra7786 6 років тому +1

    Hi sir nice explanation...learnt about ctree function. Can you please illustrate how we can tune the decision tree model?

    • @bkrai
      @bkrai  6 років тому +1

      Around 7:30 point in the video tuning is shown using "mincriterion" and "minsplit".

    • @abhinavmishra7786
      @abhinavmishra7786 6 років тому

      Bharatendra Rai my mistake sir...I mean pruning the decision tree

    • @bkrai
      @bkrai  6 років тому +1

      You can do pruning by increasing values for "mincriterion" and "minsplit".

    • @abhinavmishra7786
      @abhinavmishra7786 6 років тому +1

      Bharatendra Rai thank u for clarifying sir

  • @kartikchauhan2845
    @kartikchauhan2845 4 роки тому +1

    Sir how would you increase the number of nodes?

    • @bkrai
      @bkrai  4 роки тому

      You can change mincriterion and minsplit in the controls part for that.

    • @bkrai
      @bkrai  4 роки тому

      For a more recent one, see below:
      ua-cam.com/play/PL34t5iLfZddvGr66DPf-L-sSJ50XNwN3K.html

  • @aravindhp5612
    @aravindhp5612 4 роки тому +1

    Sir why you will give set.seed(1234) why you can't give set.seed(12345).can you pls tell

    • @bkrai
      @bkrai  4 роки тому +1

      It can be any number, but to get same samples use the same number next time too.

  • @zahraadamabdallah4116
    @zahraadamabdallah4116 Рік тому +1

    مفيد جدن

  • @sachiniwickramasinghe1912
    @sachiniwickramasinghe1912 4 роки тому +1

    thank you ! so helpful !

    • @bkrai
      @bkrai  4 роки тому

      Thanks for comments!

  • @sudanmac4918
    @sudanmac4918 4 роки тому +1

    Sir what is the difference between rpart() and ctree(). And when to use it??

    • @bkrai
      @bkrai  4 роки тому

      It's just a different way to represent a tree. Note that both use the same algorithm.

  • @ronithNR
    @ronithNR 7 років тому +1

    sir, could u make a video on Random forest.

  • @satyanarayanajammala5129
    @satyanarayanajammala5129 7 років тому

    very nice explanation keep it up

    • @bkrai
      @bkrai  7 років тому

      thanks for the feedback!

  • @sndrstpnv8419
    @sndrstpnv8419 7 років тому +1

    may add more about CHAID trees

    • @bkrai
      @bkrai  7 років тому +1

      Thanks! I'll keep it in mind.

  • @ateendraagnihotri9744
    @ateendraagnihotri9744 3 роки тому +1

    Sir can you provide this dataset which you have used

    • @bkrai
      @bkrai  2 роки тому

      There is a link below this video.

  • @saniamadoo5558
    @saniamadoo5558 6 років тому +1

    hello sir....can you plz make a tutorial on how to implement fpgrowth in Rstudio!!! its urgent! plz plz help!

  • @kanhabira
    @kanhabira 3 роки тому

    Thanks sir for this interesting video. I am facing a problem. My dependent variable is binary(0,1). When I run predict, the estimated values appear in in decimals despite remove "type". So, misspecification error is close to 1. Could you please suggest how I can get the predicted value as 0/1.

  • @vishnukowndinya
    @vishnukowndinya 7 років тому

    hi sir can u pls explain about pruning of tree. on what basis we do prune ?

    • @bkrai
      @bkrai  7 років тому +1

      When you have decision trees that are too big, 'pruning' helps to reduce size of the tree by removing those parts that do not help much in correct prediction of the outcome. It helps to avoid over-fitting and improve prediction model accuracy.

  • @romanozzie3530
    @romanozzie3530 6 років тому +1

    Amazing, thanks

  • @raniash3ban383
    @raniash3ban383 6 років тому +1

    very helpful thanks

    • @bkrai
      @bkrai  6 років тому

      Thanks for comments!

    • @bkrai
      @bkrai  6 років тому

      Thanks for comments!

  • @javeda
    @javeda 6 років тому

    Hi,
    I wanted to ask which is most appropriate software for conducting SEM along with moderation analysis, in case of categorical, nominal (binary and multinomial) and ordinal variables as outcome/dependent/endogenous variables ?
    P.S:The predictor variables are scale,nominal and ordinal variables.
    Regards

  • @sriharshabsathreya
    @sriharshabsathreya 7 років тому +1

    Sir,how to choose the Complexity parameter (CP Value)for Tree pruning ?

    • @kumarmithun2723
      @kumarmithun2723 6 років тому

      For this, you will have to build rpart model and then you can prune the tree basis on CP value(by printcp(rpart_model) and we choose cp value minimum to prune tree further )

  • @ningrongye339
    @ningrongye339 7 років тому

    Hi sir, Thank you for the video, it's very helpful! But I still not understand why your model could not predict the 3 model? If we you all the items could we predict more precisely? Thank you!

    • @bkrai
      @bkrai  7 років тому

      That's correct! To obtain the final model we need to include all items and that will improve model performance.

  • @OrcaChess
    @OrcaChess 6 років тому +1

    Hello! I gave my decision tree 97 different features but the decision tree only picked one of these features
    to make his decision. Is that normal that it doesn't consider all the features for its decision?

    • @bkrai
      @bkrai  6 років тому

      It runs with default setting. By making changes to default settings you may be able to make it include some more. But features that have very little impact on the response are unlikely to be included.

    • @DhingraRajan
      @DhingraRajan 6 років тому

      It can happen when one of the feature is the close predictor for y. Then that value is quite enough to predict the y alone.

  • @mahumadil
    @mahumadil 8 років тому

    I have a query and i tried to google it but I couldn't find any satisfactory answer against it. The question is what is the difference between ctree and rpart tree?

    • @bkrai
      @bkrai  8 років тому

      +Mahum Khan Cree is a function within package called "party" for decision tree. Similarly rpart is a function within a package with the same name "rpart". Both are use for decision tree. I prefer party as it is said to be more accurate. If you search "party vs rpart' you can see many good explanations.

  • @ronithNR
    @ronithNR 7 років тому

    hello sir its great video does the rpart uses gini index?

    • @bkrai
      @bkrai  7 років тому

      It uses altered priors method.

  • @uchenzei5160
    @uchenzei5160 4 роки тому +1

    When i try to create the missclassification table, it always gives me an error "all arguments have to be the same". Please what can i do ? I am new to data science

    • @neera842006
      @neera842006 4 роки тому +1

      I am also getting same error message

    • @dhavalpatel1843
      @dhavalpatel1843 4 роки тому +1

      You should always pass the model as the first argumnet in predict function. The second parameter should be a data frame of predictor variables only. You can specify type=”prob” as an extra argument to get probabilities of every factor of y. Either type=”class” directly gives you the class of predicted values. By default type argument is set up differently for every R version.

    • @bkrai
      @bkrai  4 роки тому +1

      Thanks for the update!

  • @Steamlala
    @Steamlala 5 років тому +1

    Dear Sir
    Thank you for your video. Can you do a tutorial on R where multiple tree base models ( Decision tree , Random Forest, Gradient Boosting, Logistic and etc..) comparing each other on the same chart using ROC to represent the visualization and split them by training vs validate data set? It would be a great help for this type of visualization especially presenting to management. Thank you !

    • @bkrai
      @bkrai  5 років тому

      Thanks for comments and suggestion that I'll work on in near future. Meanwhile here is a link where you can quickly get ROC that plots and compares several methods such as decision tree, logistic regression, svm, random forest, etc., on the same ROC plot.
      ua-cam.com/video/J2a9yV3kl-M/v-deo.html

    • @Steamlala
      @Steamlala 5 років тому +1

      Thank you Sir. The above youtube tutorial is really good. Looking forward on your awesome tutorial on comparison of multiple classification models comparison in one graph split between Train & validate.

    • @bkrai
      @bkrai  5 років тому

      Thanks!

  • @caterinacevallos9822
    @caterinacevallos9822 6 років тому

    Could you please explain me this a little bit more?
    pd

    • @bkrai
      @bkrai  6 років тому

      You can go over this that has more detail:
      ua-cam.com/video/aS1O8EiGLdg/v-deo.html

  • @raghul4457
    @raghul4457 6 років тому

    hi, can u provide me the explanation of how over fitting occurs in decision tree?

    • @bkrai
      @bkrai  6 років тому

      When terminal nodes have very small sample sizes, decision tree model is likely to have over-fitting. Due to small sample sizes, decisions arrived in the terminal node may not be very stable.

  • @anandsalunke180
    @anandsalunke180 8 років тому

    what if there are two target variables like NSP and some other. what deecision tree techniques to use?what will be the formula?

    • @bkrai
      @bkrai  8 років тому

      You can make two separate trees.

    • @anandsalunke180
      @anandsalunke180 8 років тому

      how we will derive the formula?based on what atributes

    • @bkrai
      @bkrai  8 років тому

      Decision tree algorithm will automatically choose the attributes or independent variables depending on the parameters such as minimum sample size for splitting, statistical significance, etc., that you choose.

  • @preeyank5
    @preeyank5 8 років тому +1

    Thanks a ton!!

    • @bkrai
      @bkrai  8 років тому

      +Preeyank Pable 👍👍👍

    • @tayabakhanum9707
      @tayabakhanum9707 8 років тому

      sir please tell me about classical or crisp decision tree

  • @sriharshabsathreya
    @sriharshabsathreya 7 років тому

    Sir how can be decision tree can be used for variable selection

    • @bkrai
      @bkrai  7 років тому

      Importance of a variable in the tree is reflected by it's position. For example, the one at top of the tree is the most important.

  • @ITGuySam
    @ITGuySam 8 років тому

    Thank you for your video. I'd like to know that what do you mean "set.seed(1234)"? why don't use set.seed (2) or ..
    and do we can use "ifelse" instead of definition "pd"? which way is better?

    • @bkrai
      @bkrai  8 років тому

      +Info A set.seed(1234) is just an example, you may use any other number. The idea is to reproduce results which any number can achieve. 'pd' was used for 'partitioning data' and it's just a name, you may use any other name, that will be fine too.

  • @akkimalhotra26
    @akkimalhotra26 8 років тому +1

    dear sir, how can i get the data set that you are using

    • @bkrai
      @bkrai  8 років тому +1

      your email?

    • @bkrai
      @bkrai  8 років тому +1

      Actually I don't need email. You can get data from:
      sites.google.com/site/raibharatendra/home/decision-tree

  • @divyadamodaran53
    @divyadamodaran53 8 років тому +1

    what does the p value represents??

    • @bkrai
      @bkrai  8 років тому

      +divya damodaran A p-value of 0.05 means 95% (1 - 0.05 = 0.95) confidence in concluding the variable to be statistically significant.

    • @divyadamodaran53
      @divyadamodaran53 8 років тому

      okay thankyou..

  • @vishnukowndinya
    @vishnukowndinya 7 років тому

    how cross validation is useful i pruning the tree ??

    • @bkrai
      @bkrai  7 років тому

      When you develop different trees with different validation data, you can choose the one that has smaller size as well as better accuracy. This way you are able to prune decision tree.

  • @meghadabhade6967
    @meghadabhade6967 7 років тому

    Hello Bharatendra Sir, Can you please guide me how to implement perturbation method in R?
    Currently I classified data using classification (decision tree). Now I want to perturb data and follow same classification. I am unable to proceed. Can you please upload some videos illustrating how to implement perturbation method using R. its very urgent for me.

    • @bkrai
      @bkrai  7 років тому +1

      Megha, here is the link for perturbation analysis. Note that it can be used for only regression like models. It may not work with decision trees.
      ua-cam.com/video/Jz97ccAIyj8/v-deo.html

  • @gebriadinda6405
    @gebriadinda6405 7 років тому

    Excuse me, sir. Can you help me? I tried this script into my data. i have 100 observation of 1383 variables. I got the result "Conditional inference tree with 1 terminal nodes" and "Number of observations: 83". However, i can't get the decision trees., i just get the histogram. Can you help me, sir? why it's happen?? Thank you, sir.

    • @bkrai
      @bkrai  7 років тому

      +Gebri Adinda you can send data and I can look into it.

    • @aisha555ms2000
      @aisha555ms2000 5 років тому

      @@bkrai , Sir I get the same error , "Conditional inference tree with 1 terminal nodes" only histogram and number of observations=144..can you help?

  • @Walkot2
    @Walkot2 6 років тому

    Can you use categorical variables as predictor? For example, Man/Woman or American/Asian/African

    • @bkrai
      @bkrai  6 років тому

      Yes, you use categorical variables as predictors.

    • @Walkot2
      @Walkot2 6 років тому

      Thank you! So i just input them as factors into the model?

    • @bkrai
      @bkrai  6 років тому

      That's correct.

    • @Walkot2
      @Walkot2 6 років тому

      How can you plot the tree for the prediction of a new data entry?

  • @subashinirajan2841
    @subashinirajan2841 7 років тому

    Hello sir, I'm implementing the same steps for my own set of data. But I am getting an error in the Misclassification part as "all arguments must have the same length". Will it be ok if you can check my code and let me know where I am going wrong? If it's ok for you then I will send you the code and data.

    • @bkrai
      @bkrai  7 років тому

      yes send the code.

    • @subashinirajan2841
      @subashinirajan2841 7 років тому

      Thank you sir. To which email id I should send the code. My email id is subashinivec@gmail.com

  • @ranjithkarnamsatya8400
    @ranjithkarnamsatya8400 7 років тому

    sir nice video
    i want ID3 and C4.5 implementation using R. please help me with the code

  • @VenkateshDataScientist
    @VenkateshDataScientist 6 років тому

    R Studio doubt :
    I am building a predictive model with 1 million observations and having 15 variables .i am getting error like -" Can not allocate the vector of 432GB "
    or " Can not allocate the vector of 3.8 GB "
    I am using 16GB RAM .my file size is just 140MB . and i closed all the applications in my system .still error remains same .
    Any suggestions much appreciated..

    • @bkrai
      @bkrai  6 років тому

      You can probably take sample for creating model with huge data. The difference between model based on a good sample and all data may not be significant. You can also try faster algorithms such as extreme gradient boosting:
      ua-cam.com/video/woVTNwRrFHE/v-deo.html

    • @VenkateshDataScientist
      @VenkateshDataScientist 6 років тому +1

      Bharatendra Rai sure sir ,I will try today

  • @bharathjc4700
    @bharathjc4700 7 років тому

    Hi sir,how far learning math of the algorithim needed?

    • @bkrai
      @bkrai  7 років тому

      In business application you don't really need any math. It's more about how to correctly apply a method, and do interpretation of results to solve a business problem.

    • @bharathjc4700
      @bharathjc4700 7 років тому +1

      Thanks sir for your valuable inputs