Machine Learning Mastery
Machine Learning Mastery
  • 42
  • 228 595
What is KFold Cross Validation? When NOT to use it? How to use it with modifications for your data
KFold cross validation plays a very important role in understanding the variance play in your model. Most people take it for granted and don't use its full potential. I explain how to use it right, how to read its variance play and also highlight when NOT to use vanilla KFold ? But, rather use its extensions as implemented in SKLearn.
My AI and Generative AI Courses are details here:
ai.generativeminds.co
To get a FREE invite to our classes, fill below link:
invite.generativeminds.co
Переглядів: 390

Відео

How to really find if my Test Data is diverging from my Training dataset? This WORKS!
Переглядів 3897 місяців тому
Adversarial Validation is a practical method used for finding if testset (seen in production) has started to diverge from training set. We detail the scoring function and how you can implement this. Very effective for mixed tabular data usecases. My AI and Generative AI Courses are details here: ai.generativeminds.co To get a FREE invite to our classes, fill below link: invite.generativeminds.co
Use CentralLimit Theorem to turn any distribution to Normal ? Really?
Переглядів 2007 місяців тому
Central Limit Theorem defines the law of large numbers. We list exactly what the law defines and how empirically non-gaussian distributions can be handled using this theorem for our applications. My AI and Generative AI Courses are details here: ai.generativeminds.co To get a FREE invite to our classes, fill below link: invite.generativeminds.co
How Bootstrapping helps with scoring your Train Test Divergences?
Переглядів 1777 місяців тому
How do you score Train Test Divergences? Bootstrapping is one simple approach to hep you get a grip on this topic. Relying on random sampling methods, its statistically valid and practically a good reference point to be used along side Adversarial scoring techniques. My AI and Generative AI Courses are details here: ai.generativeminds.co To get a FREE invite to our classes, fill below link: inv...
How I built Generative AI for Retail in 60 Days
Переглядів 518Рік тому
Below is the link to an FREE interactive video where I explain the step by step path to building your own Generative AI for your businesses within 60 Days. Just follow the steps and you will get RESULTS !! WATCH it FREE here : how-to-llm.generativeminds.co/ My AI and Generative AI Courses are details here: ai.generativeminds.co To get a FREE invite to our classes, fill below link: invite.genera...
Bayesian Optimization - Math and Algorithm Explained
Переглядів 48 тис.3 роки тому
Learn the algorithmic behind Bayesian optimization, Surrogate Function calculations and Acquisition Function (Upper Confidence Bound). Visualize a scratch implementation on how the approximation works iteratively. Finally, understand how to use scikit-optimize package todo hyperparameter tuning using bayesian optimization. My AI and Generative AI Courses are details here: ai.generativeminds.co ...
Decision Tree Hyperparam Tuning
Переглядів 3,7 тис.3 роки тому
Learn how to use Training and Validation dataset to find the optimum values for your hyperparameters of your decision Tree. Demonstrated for - Max Tree Depth and Min Sample Leaves hyper parameters. My AI and Generative AI Courses are details here: ai.generativeminds.co To get a FREE invite to our classes, fill below link: invite.generativeminds.co
Decision Tree Cost Pruning - Hands On
Переглядів 2,3 тис.3 роки тому
In this handson video you will Learn how to find the right Cost Pruning Alpha parameter for your decision tree. My AI and Generative AI Courses are details here: ai.generativeminds.co To get a FREE invite to our classes, fill below link: invite.generativeminds.co
Gradient Boosting Hands-On Step by Step from Scratch
Переглядів 2,7 тис.3 роки тому
Learn how to write gradient boosting tree algorithm from scratch. Learn how the Loss function is derived and applied into python code as part of your boosting iteration. Learn a trick to present your charts as interpretable categorical values rather than encoded numerical values. (This is done a lot in practice) My AI and Generative AI Courses are details here: ai.generativeminds.co To get a FR...
Hyperparameters - Introduction & Search
Переглядів 5 тис.3 роки тому
We cover: 1. What are Hyperparameters and their difference from model parameters. 2. Why Hyperparameter tuning is important with 2 examples from Deep Learning 3. Searching Hyper parameters - Grid vs RandomSearch 4. What is the mathematical edge for Random Search over Grid Search My AI and Generative AI Courses are details here: ai.generativeminds.co To get a FREE invite to our classes, fill bel...
Feature Importance Formulation of Decision Trees
Переглядів 6 тис.3 роки тому
Learn the Feature importance formulation for both single decision tree and for multiple trees, illustrated with a simple example. My AI and Generative AI Courses are details here: ai.generativeminds.co To get a FREE invite to our classes, fill below link: invite.generativeminds.co
How to Regularize with Dropouts | Deep Learning Hands On
Переглядів 5983 роки тому
1. Dropout - Benefits, Effect on your architecture and Types of Dropouts. 2. How to implement Dropout with Weight Constraints? 3. How to implement Dropout and still maintain the original capacity of the network? 4. How to find the ideal Dropout Ratio for your architecture? 5. How important are Activation and Gradient Distributions in deciding the dropout rate for your architecture? All this wit...
How to Regularizing with Weight & Activation Regularizations | Deep Learning
Переглядів 5513 роки тому
1. How to regularize neural networks using Weight and Activation Regularizations. 2. How Weight & Activity Regularizations are two sides of the same coin. 3. What are the signatures of Activation distribution for your architecture and How to understand if you are correctly optimizing your hyper parameters for regularization 4. Identify the signature of "optimal" Activation distributions using f...
How to Fix Vanishing & Exploding Gradient Problems | Deep Learning
Переглядів 2,8 тис.3 роки тому
1. How to identify if you are facing a Vanishing or Exploding Gradient problem. Take a classification example and understand the signature of convergence with and without gradient problems. 2. Fix Vanishing Gradient with Relu & correct Weight Initializations. 3. What causes exploding gradient? Take a regression example & analyze how it looks when we do sensitivity analysis on the architecture. ...
How to Accelerate training with Batch Normalization? | Deep Learning
Переглядів 7523 роки тому
1. How to perform Sensitivity analysis of your neural network architecture when Scaling and Batch Normalization is part of your design. 2. Understand how scaling really benefits convergence properties for your architecture. 3. Input Scaling & Output Scaling benefits 4. What does Batch Normalization really do? 5. How BatchNormalization boosts both the speed of convergence & smoothness of converg...
What is a Perceptron Learning Algorithm - Step By Step Clearly Explained using Python
Переглядів 21 тис.3 роки тому
What is a Perceptron Learning Algorithm - Step By Step Clearly Explained using Python
How to Tune Learning Rate for your Architecture? | Deep Learning
Переглядів 1,4 тис.3 роки тому
How to Tune Learning Rate for your Architecture? | Deep Learning
How to Find the Right number of Layers/Neurons for your Neural Network?
Переглядів 12 тис.3 роки тому
How to Find the Right number of Layers/Neurons for your Neural Network?
How to Configure and Tune Batch Size for your Neural Network?
Переглядів 2,6 тис.3 роки тому
How to Configure and Tune Batch Size for your Neural Network?
Back Propagation Math Step By Step Detailed with an Example | Deep Learning
Переглядів 2,4 тис.3 роки тому
Back Propagation Math Step By Step Detailed with an Example | Deep Learning
Back Propagation Concept Math Step By Step for a Two Layer Feed Forward Network
Переглядів 4453 роки тому
Back Propagation Concept Math Step By Step for a Two Layer Feed Forward Network
How Gradient Descent finds the weights? Gradient Descent Math Step By Step with Example | Neural Net
Переглядів 12 тис.3 роки тому
How Gradient Descent finds the weights? Gradient Descent Math Step By Step with Example | Neural Net
How to use Gaussian Mixture Models, EM algorithm for Clustering? | Machine Learning Step By Step
Переглядів 19 тис.4 роки тому
How to use Gaussian Mixture Models, EM algorithm for Clustering? | Machine Learning Step By Step
Principal Component Analysis (PCA) Maths Explained with Implementation from Scratch
Переглядів 6324 роки тому
Principal Component Analysis (PCA) Maths Explained with Implementation from Scratch
How to cluster using Hierarchical Clustering Algorithm | Machine Learning Step By Step
Переглядів 7084 роки тому
How to cluster using Hierarchical Clustering Algorithm | Machine Learning Step By Step
DBSCAN Math and Algorithm Explained Step by Step
Переглядів 1,7 тис.4 роки тому
DBSCAN Math and Algorithm Explained Step by Step
KMeans Clustering Math Assumptions & Algorithm Explained - When Not to use? , How it works?
Переглядів 1,5 тис.4 роки тому
KMeans Clustering Math Assumptions & Algorithm Explained - When Not to use? , How it works?
XGBOOST Math Explained - Objective function derivation & Tree Growing | Step By Step
Переглядів 9 тис.4 роки тому
XGBOOST Math Explained - Objective function derivation & Tree Growing | Step By Step
What is Extreme about XGBoost?, Why XGBoost wins Kaggle?, Algorithmic, Model & System Optimizations.
Переглядів 2,7 тис.4 роки тому
What is Extreme about XGBoost?, Why XGBoost wins Kaggle?, Algorithmic, Model & System Optimizations.
Gradient Boosting - Math Clearly Explained Step By Step | Machine Learning Step By Step
Переглядів 5 тис.4 роки тому
Gradient Boosting - Math Clearly Explained Step By Step | Machine Learning Step By Step

КОМЕНТАРІ

  • @sumangorkhali5748
    @sumangorkhali5748 11 днів тому

    Best explained... millions of thanks

  • @mshika2150
    @mshika2150 27 днів тому

    can i get the code ?

  • @khemchand494
    @khemchand494 Місяць тому

    Very well explained. I got the complete intuition of GMMs in a go.

  • @vrhstpso
    @vrhstpso 2 місяці тому

    😀

  • @sm-pz8er
    @sm-pz8er 3 місяці тому

    Very well simplified explanation. Thank you

  • @prabhjot-ud6ru
    @prabhjot-ud6ru 4 місяці тому

    best ever explanation for GMM. Thanks a lot for such a helpful video.

  • @benheller472
    @benheller472 5 місяців тому

    Hello, I’ve been watching your videos. Thank you! They are great. Is there a way to contact you directly?

  • @9951468414
    @9951468414 5 місяців тому

    Which reference book do you use?

  • @9951468414
    @9951468414 5 місяців тому

    Hello there, Can you give the material notes

  • @VIVEK_InLoop
    @VIVEK_InLoop 6 місяців тому

    Nice sir

  • @xiaoyongli
    @xiaoyongli 7 місяців тому

    well done in <15 min!!! highly recommended

  • @DM-py7pj
    @DM-py7pj 7 місяців тому

    is the end of the video missing?

  • @nashtashasaint-pier7404
    @nashtashasaint-pier7404 7 місяців тому

    This seems to be correct if and only if you assume that your three models are independant. This is fine, but I think this does not say much in practical cases, as it is very unlikely that you will have 3 base learners that are not correlated. In general, it seems pretty complicated to come up with a "comprehensive" formula that takes into account the respective covariances of these three models with each others and expresses the probabilistic advantage ensembling has.

    • @machinelearningmastery
      @machinelearningmastery 7 місяців тому

      The formulation has been the premise of why variance reduces theoretically when ensembling is in place compared to independent models. From practical standpoint, it works well which is why random forest is such a star with so many hyperparams to ensure you get different trees as much possible across 100s of features faced in real applications.

  • @VictorTimely-9
    @VictorTimely-9 7 місяців тому

    More on Statistics.

  • @wenkuchen
    @wenkuchen 8 місяців тому

    very clear explanation for decision tree features importance, thanks

  • @meha1233
    @meha1233 8 місяців тому

    You should mention the normalized method. I kill myself to find out how to normalize those numbers

    • @machinelearningmastery
      @machinelearningmastery 8 місяців тому

      Which normalization would you like to see? The wgt computation in each iteration is normalized. Could you clarify.

  • @countrylifevlog524
    @countrylifevlog524 8 місяців тому

    can you provide these slides

  • @tomryan7679
    @tomryan7679 8 місяців тому

    @machinelearningmaster Great video, thanks! Could you please share the dataset used so that we can replicate this?

  • @namanjha4964
    @namanjha4964 9 місяців тому

    Thanks a lot for the video

  • @saleemun8842
    @saleemun8842 9 місяців тому

    by far the clearest explanation of bayesian optimization, great work, thanks man!

  • @Xavier-Ma
    @Xavier-Ma 9 місяців тому

    Wonderful explaination! Thanks professor.

  • @YuekselG
    @YuekselG 9 місяців тому

    is there a mistake in 9:10 ? there is 1 f(x) too much i think. Has to be N(f(x_1), ... (x_n) l o, C*)) / N(f(x_1), ... (x_n) l o, C)). Can anyone confirm this? ty

  • @syedtalhaabidalishah961
    @syedtalhaabidalishah961 9 місяців тому

    what a video!!! simple and straight forward

  • @Goop3
    @Goop3 10 місяців тому

    Very intuitive explanation!! Thank you so much! I found this gem of a channel today!

  • @hosseindahaee2886
    @hosseindahaee2886 10 місяців тому

    thanks but there is a typo in y=-1 wtx+b<= -1 not wtx+b<= 1

  • @gvdkamdar
    @gvdkamdar 10 місяців тому

    This entire series is one of the most comprehensive explanations I have found for SVMs. Extremely grateful for it

  • @agc444
    @agc444 11 місяців тому

    Wonderful video, many thanks. Perhaps it would be nice if you made the code available for us learners to play with. Thanks.

  • @saremish
    @saremish 11 місяців тому

    Very clear and informative. Thanks!

  • @hatemmohamed8387
    @hatemmohamed8387 11 місяців тому

    is there any repo containing the codes for the entire playlist

  • @mahdiyehbasereh
    @mahdiyehbasereh 11 місяців тому

    Why don't we inherit from the keras.model class? Thanks alot for your tutorials

    • @machinelearningmastery
      @machinelearningmastery 7 місяців тому

      Yes, you can do that and make it easier to use in multiple places.

  • @ywbc1217
    @ywbc1217 11 місяців тому

    extremely not good explanations

  • @dhanushka5
    @dhanushka5 11 місяців тому

    Thanks

  • @Ruhgtfo
    @Ruhgtfo Рік тому

    Best explanation find the most, thank-you

  • @DilipKumar-dc2rx
    @DilipKumar-dc2rx Рік тому

    You taught better than my instructor 🙂

  • @farhaddotita8855
    @farhaddotita8855 Рік тому

    Thanks so much, the best explanation of xgBoost I´ve seen so far, most people doesnt matter about the math intuition!

  • @JLBorloo
    @JLBorloo Рік тому

    Good stuff but consider sharing the Notebooks in the future

  • @chinmayb172
    @chinmayb172 Рік тому

    Can you please tell me if I have 10 classes of training data, what number of epochs should I use?

    • @machinelearningmastery
      @machinelearningmastery Рік тому

      In general, I recommend that we set epochs to very large value say 50,000. Then in your code you setup early exit logic as part of training. This will work best for most cases since the training fit will automatically exit when convergence has happened. Hope that helps.

  • @fardian6818
    @fardian6818 Рік тому

    I am a silent internet user, what I usually do when I like a content is just by pressing the like button and save the link on the txt file, but this time is an exception, your content is very simple and completely what I'm looking for. I write you a comment, as the first commentator in this video 😀 You have a new subscriber now. Keep up the good work

  • @isultan
    @isultan Рік тому

    Wow!!! Excellent lecture!!

  • @mikehawk4583
    @mikehawk4583 Рік тому

    Why do you add the mean of the predicted points back to the predicted points?

    • @machinelearningmastery
      @machinelearningmastery Рік тому

      Lets see if can correlate it with a hypotheses that humans would do to learn. Lets say we are in a Forest & searching for trails of human foot marks to get out of it. Every time we find a footprint, we valid & learn about surroundings, vegetation, terrain,etc. Over a period of time we learn ehat leads to exit And what doen't. That precisely the idea here. Hope that helps.

    • @mikehawk4583
      @mikehawk4583 Рік тому

      @@machinelearningmastery I'm sorry but I still don't get it. You can explain it with more math. What I don't get is after predicting a miu, why do we need to add omega? Like what does omega do where?

  • @abhishekchaudhary6975
    @abhishekchaudhary6975 Рік тому

    NIce video!!

  • @mohammedakl2077
    @mohammedakl2077 Рік тому

    thank you

  • @vipuldogra6600
    @vipuldogra6600 Рік тому

    The best there is.

  • @yurigansmith
    @yurigansmith Рік тому

    In this example the new weights for the formerly misclassified examples are increased, while the weights for the correctly classified are decreased (which seems reasonable to me at the moment). But if e_t becomes greater than 0.5, lambda_t becomes negative and the direction of the weight adaptation is swapped, which would lead to undersampling of the misclassified and oversampling of the correctly classified examples in the next round. Is lambda "allowed" to become negative in the first place? Somewhere (slides on boosting algorithms) I read that lambda is supposed to be non-negative, but I'm not sure if I understood the statement resp. context of the statement correctly.

    • @machinelearningmastery
      @machinelearningmastery Рік тому

      Great Question. A. First, there are two weights in the system - one driving the weight of data point and another driving weight of the classifier. I shall explain both in a second. B. Second, the fact is error >0.5 will give negative lambda. This is part of the design as I shall clarify below. So, when error<0.5 then lambda >0; when error = 0.5, then lambda =0; when error > 0.5 then lambda <0. This means, the model actually goes nowhere when error is exactly 0.5. This is why in implementations, they creatively make sure it won't match this magic number and break the model. Now, back to the first point about weights. Let me give with examples. Say error is 0.1, then Lambda=1.10. Wgt for correctly classified data point is 0.33 weight for misclassified data point=3.0; Weight of this classifier is 1.10 (the lambda itself) Say error is 0.9, then Lambda=-1.10. Wgt for correctly classified data point is 3.0 weight for misclassified data point=0.33; Weight of this classifier is -1.10 (the lambda itself) If we see above, few things are clear 1. If a classifier is classifying well (with low error rate), its overall weight in the ensemble stack in positive. And whatever misclassification is there is given priority in the next run. 2. If a classifier is poor (with high error rate), its overall weight in the ensemble stack is deeply negative. And whatever classification happened correctly, we continue to keep that going for the next run with the hope we can improve its influence gradually. Hope that clarifies. Let me know if there is still an open point on this.

  • @kevinchaplin672
    @kevinchaplin672 Рік тому

    Very nice, clear and concise

  • @stefanmisanovic3341
    @stefanmisanovic3341 Рік тому

    Great video, very helpful to get ideas how to visualize the accuracy of a model and how to automate the process.

  • @ranaiit
    @ranaiit Рік тому

    Thanks....missing negative sign in exponent of Gaussian function !

  • @bigh8438
    @bigh8438 Рік тому

    what about the bias term? can you do one with b?

    • @machinelearningmastery
      @machinelearningmastery Рік тому

      Yes, bias term do exist. Remember whenever you see W matrix in the optimization space, W = [w,b]. The optimizer is working on W as a whole, so its a parallel update so that it can calculate the actual loss and minimize it. I will try to see if I can do a video to illustrate this point. Also, as you progress in your ML study, you may come across models like RNN etc. -- in such cases the W = [w,u,b]. Therefore, to generalize our learning, it is important to see that W is a matrix that contains all the weight-units (includn bias) that the optimizer must search and input into the system to get a loss. Hope that clarifies.

    • @bigh8438
      @bigh8438 Рік тому

      @@machinelearningmastery I see, what I meant was the video we have function = wx but if it was possible to do function = wx + b, I think it is the same but just another term added

    • @machinelearningmastery
      @machinelearningmastery Рік тому

      Yes, that is correct , bias would be another parameter to be found by GD along with all other weights it finds for each input features.

  • @Darazfinds4350
    @Darazfinds4350 Рік тому

    Can you tell me the values to be put to calculate gradient for 1st iteration ???

    • @machinelearningmastery
      @machinelearningmastery Рік тому

      For use in say python snippet or excel to get similar results as in my video, you may use a small value like 0.01 to start off. Such small values should not impact your convergence time. If we use large values (and say we got it wrong), it would take longer to converge. But, what happens in practice with NeuralNet training is -- use the first 10 or 25 epochs as "warm up" epochs. Generally, large learning rate is used during the warmup epochs and purpose of warm up is to get some of the parameters initialized. Following that, actual training would start. Hope that clarifies.

  • @enter-galactic
    @enter-galactic Рік тому

    Can you provide code on how to plot sensitivity analysis?

    • @machinelearningmastery
      @machinelearningmastery Рік тому

      If you are on Keras, its a 2 liner, Let me give you code here: #fit the model and capture return value of history history = model.fit(X_train, y_train, epochs=epochs,.....) #plot how validation vs training curves look using the "history" variable. plt.plot(history.history['accuracy'], label='train') plt.plot(history.history['val_accuracy'], label='test') plt.legend() plt.show()