When you draw the line on your screen and then it will automatically become straight , that is the best example of application of best fit line (linear regression)
04:08:29 The Gini impurity is a measure of how often a randomly chosen element from the set would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the subset. The Gini impurity can be computed by summing the probability of each item being chosen times the probability of a mistake in categorizing that item. It reaches its minimum (zero) when all cases in the node fall into a single target category. In the case of the Iris dataset, the root node contains all the instances, and if they are evenly distributed among the three classes (setosa, versicolor, virginica), the Gini impurity will be 0.667. This is because the probability of choosing an instance from any class is 1/3, and the probability of misclassifying it is 2/3 (since there are two other classes). The calculation is as follows: Gini Impurity = 1 - (1/3)^2 - (1/3)^2 - (1/3)^2 = 0.667 This indicates that there is a 66.7% chance of misclassifying a randomly chosen element from the dataset if it was labeled according to the distribution of labels in the entire dataset. The code you provided is plotting the decision tree. The Gini impurity for each node is calculated during the creation of the decision tree, not during the plotting. The Gini impurity is shown on the plot for each node.
ALERT!!!!! For a new person who is here to explore ML and thinking whether this video is good or its just another video which will waste your time. So believe me its best ever video on youtube from a Indian. Its totally worth to watch this and make notes. From Now onwards I am a big fan of Krish Naik
@@JohnCena-uf8sz I assure you that it is not. I've been religiously following his ML and AI related content and I'm just so grateful that I found him. You can learn entire ML and AI by watching his videos with simple explanations. No need for any other channel.
🎯 Key points for quick navigation: 00:48 *🤖 Introduction to AI vs ML vs DL vs Data Science* - Explanation of AI as creating applications without human intervention. - Supervised ML focuses on regression and classification problems. 18:21 *📈 Linear Regression Basics* - Definition of linear regression and its purpose in modeling relationships between variables. 22:30 *📉 Understanding Linear Regression Basics* - Understanding intercept (theta 0) and slope (theta 1), 25:17 *📊 Cost Function in Linear Regression* - Definition and significance of the cost function, 34:35 *📉 Impact of Theta 1 on Cost Function* - Demonstrating the effect of different theta 1 values on the cost function, 41:50 *🔄 Gradient Descent and Convergence Algorithm* - Introduction to gradient descent as an optimization technique, 45:25 *📈 Gradient Descent Basics* - Understanding gradient descent in machine learning, 47:31 *🏔️ Dealing with Local Minima* - Addressing challenges posed by local minima in gradient descent, 49:37 *🔄 Iterative Convergence* - Iterative convergence process in gradient descent algorithms, 55:20 *📊 Performance Metrics in Linear Regression* - Explaining the importance of R-squared and adjusted R-squared in evaluating model performance, 01:07:05 *🔍 Overview of Regression Techniques* - Introduction to ridge and lasso regression as regularization techniques, 01:09:16 *📊 Understanding Overfitting and Underfitting* - Understanding overfitting and underfitting in machine learning, 01:16:13 *🧮 Introducing Ridge and Lasso Regression* - Introducing ridge and lasso regression for regularization purposes, 01:30:56 *📊 Linear Regression Assumptions and Standardization* - Linear regression assumes linearity between variables and the target. 01:33:18 *📈 Introduction to Logistic Regression* - Logistic regression is used for binary classification tasks. 01:40:16 *🎯 Understanding Logistic Regression's Decision Boundary* - Logistic regression's decision boundary is determined by the sigmoid function. 01:51:28 *📉 Logistic Regression Cost Function and Gradient Descent* - Logistic regression cost function derivation and explanation, 01:59:06 *📊 Performance Metrics: Confusion Matrix and Accuracy Calculation* - Detailed explanation of the confusion matrix in binary classification, 02:03:08 *⚖️ Handling Imbalanced Data in Classification* - Definition and identification of imbalanced datasets in classification problems, 02:08:57 *📈 Precision, Recall, and F-Score: Choosing Metrics for Different Problems* - Explanation of precision and recall metrics in classification evaluation, 02:14:13 *📊 Introduction to sklearn Linear Regression* - Introduction to sklearn's linear regression model. 02:16:15 *📈 Dataset Loading and Preparation* - Loading the Boston house pricing dataset from sklearn. 02:22:08 *📉 Data Splitting for Regression* - Separating the dataset into independent (X) and dependent (y) features. 02:24:04 *📊 Cross Validation and Mean Squared Error Calculation* - Explanation of cross-validation importance in machine learning model evaluation. 02:28:31 *🔄 Introduction to Ridge Regression and Hyperparameter Tuning* - Introduction to Ridge Regression as a method to mitigate overfitting in linear regression. 02:34:00 *📊 Ridge Regression Hyperparameter Tuning* - Understanding Ridge Regression and its role in reducing overfitting, 02:37:30 *📉 Impact of Hyperparameters on Model Performance* - Exploring the effect of different alpha values on Ridge Regression's performance, 02:45:30 *🔄 Logistic Regression for Classification* - Introduction to Logistic Regression for binary classification tasks, 02:55:14 *🎲 Probability Fundamentals* - Probability basics: Understanding independent and dependent events. 02:56:34 *📊 Conditional Probability* - Explaining conditional probability using the example of drawing marbles. 02:58:12 *🧮 Bayes' Theorem* - Introduction to Bayes' Theorem and its significance in probability. 03:05:14 *📊 Applying Probability in Classification* - Applying probability concepts (e.g., conditional probability) in classification problems. 03:17:30 *📊 Understanding Distance Metrics in Machine Learning* - Understanding Euclidean and Manhattan distances, 03:20:18 *🌳 Exploring Decision Trees for Classification and Regression* - Decision tree structure and node representation, 03:24:15 *🔍 Information Gain and Splitting Criteria in Decision Trees* - Explaining entropy and Gini impurity as measures of impurity, 03:39:40 *📊 Understanding Entropy and Information Gain* - Explained the concept of entropy in decision trees and how it relates to determining pure splits. 03:41:19 *📈 Using Information Gain for Feature Selection* - Detailed the process of calculating information gain for different features in decision tree nodes. 03:49:17 *📉 Understanding Gini Impurity vs. Entropy* - Explained the concept of Gini impurity as an alternative to entropy for decision tree construction. 03:54:01 *🧮 Handling Numerical Features in Decision Trees* - Explored how decision trees handle continuous (numerical) features using sorted feature values. 03:59:34 *⚙️ Hyperparameters in Decision Trees* - Defined hyperparameters and their role in controlling decision tree complexity. 04:06:03 *🌳 Decision Tree Visualization and Pruning Techniques* - Understanding the structure of a decision tree through visualization. 04:09:16 *🛠️ Ensemble Techniques: Bagging and Boosting* 04:21:31 *🌲 Random Forest Classifier and Regressor* - Solving overfitting in decision trees through ensemble learning. 04:24:20 *🌳 Random Forest: Overview and Working* - Random Forest combines multiple decision trees to create a generalized model with low bias and low variance. - Combines predictions from multiple decision trees (ensemble method). - Uses bootstrapping and feature sampling to train each tree on different subsets of data. - Prevents overfitting present in individual decision trees. 04:29:27 *🚀 Boosting Techniques: Introduction to Adaboost* - Adaboost is a boosting technique that sequentially combines weak learners to form a strong learner. - Begins by assigning equal weights to all training examples. - Focuses on correcting misclassified examples in subsequent models. - Uses weighted voting to combine outputs of weak learners into a final prediction. 04:42:27 *📊 Adaboost: Training Process and Weight Update* - Adaboost updates weights of training examples based on the performance of each weak learner. - Calculates the total error of each weak learner to determine performance. - Adjusts weights of training examples to emphasize incorrectly classified instances. - Normalizes weights to ensure they sum up to 1 for the next iteration of training. 04:45:24 *🌲 Decision between Black Box and White Box Models* - Decision trees are considered white box models because their splits are visible and interpretable. 04:47:15 *🎯 Introduction to K-means Clustering* - K-means clustering is an unsupervised learning method used to group similar data points together. 04:50:00 *📊 Understanding Centroids in K-means* - Centroids in K-means represent the center of each cluster and are initially placed randomly. 04:56:31 *📉 Determining Optimal K in K-means Clustering* - The elbow method is used to determine the optimal number of clusters (k) by plotting within-cluster sum of squares (WCSS) against different k values. 05:05:22 *🌐 Hierarchical Clustering Overview* - Understanding hierarchical clustering involves identifying clusters based on the longest vertical lines without horizontal intersections. 05:07:30 *🕰️ Time Complexity in Clustering Algorithms* - Hierarchical clustering generally takes longer with large datasets due to dendrogram construction, compared to faster performance by k-means. 05:09:04 *📊 Validating Clustering Models* - For clustering validation, methods like silhouette scores are crucial, quantifying cluster quality. 05:17:21 *🌌 DBSCAN Clustering Essentials* - DBSCAN identifies core points, border points, and noise points based on defined parameters like epsilon and min points. 05:26:37 *📊 Exploring K-Means Clustering and Silhouette Score* - Explains the process of using K-Means clustering and evaluating it with silhouette scores. 05:35:30 *🧠 Understanding Bias and Variance* - Defines bias as a phenomenon influencing algorithm results towards or against a specific idea or training data. 05:48:51 *🌳 Decision Tree Construction* - Understanding binary decision tree creation in XGBoost, 05:51:39 *📊 Similarity Weight Calculation* 05:57:22 *📈 Information Gain Computation* 06:05:01 *🚀 XGBoost Classifier Inference Process* 06:09:39 *🌳 Decision Tree - Splitting Based on Experience* 06:11:31 *📊 Calculation of Similarity Weight and Information Gain* 06:18:59 *🌳 Regression Tree - Inference and Output* 06:26:24 *🚀 SVM - Marginal Planes and Hyperplanes* 06:30:51 *📈 SVM Margin Maximization* 06:31:34 *🛠️ SVM Optimization Objectives* 06:32:29 *🔍 SVM Decision Boundary Clarity* Made with HARPA AI
Not recommended for beginners, but if you already have some knowledge and wants to revise concepts this is the best video. very clear and concise explation
Hello sir, I'm delighted to inform you that I've secured my dream job as an AI/ML developer, despite being a fresher. Your video on machine learning was instrumental in my success, and I'm extremely grateful for your contribution to my learning journey.
One of the best ML videos available in the internet. This video is crisp yet covers most of the topics of ML.. Also I like the way Krish explains theory part first and then explains the same using practical examples.
The video is an aggregation of the live machine learning community sessions that Krish did. But he has edited out all the time wasting discussions and kept only the most important bits where the topics are explained. Lots of time and efforts have been placed in compiling and editing these videos. Kudos to him for that.
For me this is the best video on krish channel...The knowledge and its presentation at class level...The mastery over major and minute things at its best.May lord Shiva bless you with happiness brother. Kudos...
I have a question around 1:25:40. You mentioned that we use Lasso to avoid less important features. The lower the slope, the lower is the modulus of that slope (or theta). If I consider the mathematical definition, in L2 Regularization: Cost is is J(theta) + lambda (sum of squares of thetas) and in L1 Regularization: Cost is is J(theta) + lambda (sum of modulus of thetas) So, if the absolute value of the slope is less than one, the square of it would be lesser, and hence we would be able to discard that feature more prominently. Eg., (0.5)^2 = 0.25 < |0.5| Correct me if my understanding is wrong. Thanks
So, that is partially true, but the logic is flawed a bit. Yes - x^2 makes numbers less than 1 smaller, and numbers greater than 1 larger. And that's the whole point. If we want to decide whether a certain theta parameter is suitable to omit (meaning we don't want to select that feature), we want to look at the sole value of that parameter (or the absolute value, in this case), not the square, the reason being that squaring makes small errors smaller and large errors larger. Discarding a certain feature based on the square of the parameter would be more prone to mistakes. In other words, it gets increasingly more difficult to tell well suited and badly suited parameters apart based on squares of their values, rather than modulus of their values, as the values grow large or small (basically when they start to deviate from 1 more and more). That's how L1 differs from using L2 and why it can help with feature selection. We can square the value of the slope, but that doesn't change the slope's value itself, just how we look at it. Otherwise, we could just raise the slope to some astronomical power, and discard all slopes that were smaller than 1 (because all of them would end up close to 0 after raising to some huge power). But that does not reflect reality. If we want to look at slope values in L1, to imply some feature selection, we don't want to make those values artificially smaller or larger, because there is no benefit to that - we would basically be losing information. You usually want to apply that transformation to errors, because when it comes to predictions, the error of 4 (2^2) is obviously worse than error of 2, and error of 0.1 is not that bad, so making it 0.01 (0.1^2) isn't a big deal. So you focus on minimizing the error of 4 rather then the error of 0.01 (actual errors are 2 and 0.1). So Ridge basically treats slopes in the same way as it treats errors, and Lasso does not. And by the way, that is a big reason behind choosing loss functions to be (error)^2. We punish large errors and deminish small errors. Because at the end, when we look at our cost function, and the value it produces (when we sum up our losses), the small errors/losses don't add up to that much, but the large errors/losses do - so we want to focus on them a bit more. So (error)^2 is especially good for linear regression, beause it serves 3 purposes. One - squaring makes negative values positive, so the errors don't cancel out, but add up. Two - as stated previously, squaring helps disregard small errors and focus on large errors, because that's where the gain in performance is. Three, it's convex, because y_hat (the estimator of y) is linear, and linear functions are both convex and concave, so L(theta)=[ y_hat(theta) - y ]^2 is also convex (y_hat(theta) simply doesn't impact convexity, which is not true in general, like in DL or logistic regression). That grants us the abililty to use regular gradient descent algorithm without any issues. This cannot be said for things like logistic regression, or many square loss functions in deep learning, because the estimate itself is not linear, so the square may not be convex, and we might introduce multiple local minima (Loss function L(theta, y_i) is basically a composition of estimation function for ith observation minus that observation and some other function, like x^2). Therefore, for logistic regression, you adjust loss and cost functions (in reality they come directly from MLE), and for neural networks, you can use stuff like Adam optimizer and so on, so x^2 in this case is still nice and still leaves us with benefits from points 1 and 2. Hope that clears it up, but if not, I'm sure there's someone better than me at relaying this information somewhere on the internet. Cheers.
An excellent and valuable 6 hours session on ML Algos. Very handy to make the learning process on ML smoother for the people who are new to it. Thank you Sir!!
For underfitting fitting models we have high bias and low variance, as bias means wrong prediction and variance means how the model is flexible enough to adapt to different datasets.
while you might encounter Gini impurity values higher than 0.5 in the context of the Iris dataset, this is due to the multiclass nature of the problem and the specific calculation used for multiclass Gini impurity. It doesn't imply that the maximum impurity for multiclass problems is 0.5; that limit applies to the binary case.
In 1:25:50, When we're talking about getting slope closer to zero. It is definitely in between 0 and 1. So squaring them in ridge will definitely make them smaller. So I dont get the explanation that you're giving. Can we say that for a feature when it is not that useful, while computing lasso cost function(say LCF), LCF increases faster due to mod values of slopes(in between 0 and 1) but the ridge cost function(RCF) increases slowly as the squared values of slope will decrease the increasing rate of RCF? Here I am talking about the change in LCF and RCF with respect to increasing lambda values. So, the Lasso will give optimal slope equal to zero when lambda is increased. But Ridge might not give zero slope even with higher lambdas due to slowly increasing rcf wrt increasing lambdas. Please refer this link and give an explanation to your statements made at the above-mentioned timestamp: ua-cam.com/video/Xm2C_gTAl8c/v-deo.htmlfeature=shared
05:38:37 If data performs well on training data it is Low Bias right? 06:25:30 How you are multiplying the maatrices? 2*1 and 1*2 will not give a constant value it gives 2*2 matrix.
Your explanation is really good and content wise excellent sir. Thanks for sharing your videos and roadmaps and End2end explanation interview point of view .
Definitely good and great refresher who has exposure in ML,STATS and MATH(calculus and Algebra)but not for absolute beginners.... , if you want to learn ML without prior knowledge, Andrew's course in coursera is the best, you can audit the course for free over there.
Definitely not the beginner-friendly!! But you will get an idea of all the Algorithms. Watch if you are revising concepts. He never starts from basics. I needed to do many searches, pausing this video.
Highly recommend this video. Dont look any further for a classical ML methods. Couple of notes: Should've explain logistic regression from the odds, log odds part imo. Also best ridge/lasso regression explanation i've seen and i've seen a lot. the only thing to add is since we have squares the whole 'slope' parameter will be a hypersphere in an n dimensional space of weights. And so it will not touch 0 at any point basically, will stop before.
Why is Random forest not effected by Outlier? Ans on Google: The intuitive answer is that a decision tree works on splits and splits aren't sensitive to outliers: a split only has to fall anywhere between two groups of points to split them.
Really great content right here; from the rudiments to the practical application is covered here regarding all the traditional ML Algorithms! Just Amazing Period.
Thanks for the session krish sir, now i am doing practical things, i notice sklearn delete the dataset in new version so i found some solution below, please check :- #housing pricing dataset from sklearn.datasets import fetch_openml # Load the Boston Housing dataset df = fetch_openml(name="boston", version=1, parser='auto') #here some issue also on When using .iloc, it returns a DataFrame, and you need to convert it to a NumPy array or pandas Series for scikit-learn to work with it correctly. X = dataset.iloc[:, :-1] # Independent features y = dataset.iloc[:, -1] # Dependent feature # Convert X and y to NumPy arrays or pandas Series X = X.values if isinstance(X, pd.DataFrame) else X y = y.values if isinstance(y, pd.Series) else y Hope this will help to get accurate solutions.
thankyou for this amazing lecture sir..its currently 2:30 am at night and i just finished this whole lecture .... i must say i gained a lot ..thankyou ❤❤❤❤
Great explanations, Krish. I just started my data science prep and have been following you for a few days. This will be my second marathon after just finishing your statistics tutorial. It is a fun learning experience watching your lectures. Thanks again for your efforts! Please let me know if I am wrong. I have a query in the adjusted R square performance metric, explained around 1 hour into the video. According to the formula, When we substitute p=2, the value of adjusted Rsquare should be the same as that of Rsquare, right? However, you've shown it as lesser in your example, or is there a condition that we should only use adjusted Rsquare when the p-value is greater than 2.
Hi Krish, thanks for making this. In this video you missed out PCA topic can you please make a video of that? And some detailed videos on model selection, feature selection & feature engineering.
Hi Krish. This video is very helpful and lots of fun to watch and it’s amazing that within such a short span of time you’ve completed sort of a bridge course on ML. Kudos to you 👏🏻! However, I had a doubt that I would like to raise here. You mentioned in your video that Lasso Regularisation helps feature selection. If the theta or slope values are negligible, say close to zeros, then squaring them wouldn’t increase the values but decrease further right? Why can’t we do feature selection using ridge regularisation then? But for slopes greater than 1 this would make sense, however, in those cases we would not be able to neglect those right?
we use mod in lasso to do feature selection and for lasso we will have multiple features and slope so on adding them it will itself neglect those features that are not of use
we do feature selection using lasso regularisation because it makes some of the coefficient to zero which are not important to our analysis. may be that's the reason we used lasso for feature selection.
This lecture series is amazing ,a slight correction I found in confusion matrix, please correct it ( your TN should be FN and FN should be TN) ...I think so
@krish, I think there is a mistake in the practical implementation of the linear, ridge regression, Since u used negative MSE, ur rdige regression was actually better than ur linear one. In normal MSE scores, more the value - worse the model, in negative MSE scores this would change.
thank you for such a informative video Krish Naik.can you make video on standard scalar ,feature transformation another preprocessing on data before model implementation
@krishnaik06 You are nothing but simply amazing. True help to ones in need. This learning has made a difference to my understanding of ML. This has helped me transition to hands-on.
I am watching this video now but could not fetch this boston housing prices data set as sk learn maintainers are telling us strongly not to use this dataset..how can i complete this tutorial now??@krishnaik sir
All the materials are given below github.com/krishnaik06/The-Grand-Complete-Data-Science-Materials/tree/main
Thanks, Sir!!
Sir I am unable to find the pdf link of this video.
Can i get the notes
When you draw the line on your screen and then it will automatically become straight , that is the best example of application of best fit line (linear regression)
you are definitely an anime watcher aren't you!? That observation was something!!!
How come it is an example of linear regression?😊
@@swatisingh-yw1fw bro is on weeds
i think it draw line inbetween start point and endpoint
wah bete
04:08:29
The Gini impurity is a measure of how often a randomly chosen element from the set would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the subset. The Gini impurity can be computed by summing the probability of each item being chosen times the probability of a mistake in categorizing that item. It reaches its minimum (zero) when all cases in the node fall into a single target category.
In the case of the Iris dataset, the root node contains all the instances, and if they are evenly distributed among the three classes (setosa, versicolor, virginica), the Gini impurity will be 0.667. This is because the probability of choosing an instance from any class is 1/3, and the probability of misclassifying it is 2/3 (since there are two other classes). The calculation is as follows:
Gini Impurity = 1 - (1/3)^2 - (1/3)^2 - (1/3)^2 = 0.667
This indicates that there is a 66.7% chance of misclassifying a randomly chosen element from the dataset if it was labeled according to the distribution of labels in the entire dataset.
The code you provided is plotting the decision tree. The Gini impurity for each node is calculated during the creation of the decision tree, not during the plotting. The Gini impurity is shown on the plot for each node.
How to get note of this video?
ALERT!!!!!
For a new person who is here to explore ML and thinking whether this video is good or its just another video which will waste your time. So believe me its best ever video on youtube from a Indian. Its totally worth to watch this and make notes. From Now onwards I am a big fan of Krish Naik
Is it enough? Please reply
hoping this is not a paid comment I'm gonna watch this video
@@sandeepyadav8397 Yes, trust me it is more than enough.
@@JohnCena-uf8sz I assure you that it is not. I've been religiously following his ML and AI related content and I'm just so grateful that I found him. You can learn entire ML and AI by watching his videos with simple explanations. No need for any other channel.
bruh i'm from chile and watching this. This is the best teacher and clear explanation i could find! all of his courses!!
🎯 Key points for quick navigation:
00:48 *🤖 Introduction to AI vs ML vs DL vs Data Science*
- Explanation of AI as creating applications without human intervention.
- Supervised ML focuses on regression and classification problems.
18:21 *📈 Linear Regression Basics*
- Definition of linear regression and its purpose in modeling relationships between variables.
22:30 *📉 Understanding Linear Regression Basics*
- Understanding intercept (theta 0) and slope (theta 1),
25:17 *📊 Cost Function in Linear Regression*
- Definition and significance of the cost function,
34:35 *📉 Impact of Theta 1 on Cost Function*
- Demonstrating the effect of different theta 1 values on the cost function,
41:50 *🔄 Gradient Descent and Convergence Algorithm*
- Introduction to gradient descent as an optimization technique,
45:25 *📈 Gradient Descent Basics*
- Understanding gradient descent in machine learning,
47:31 *🏔️ Dealing with Local Minima*
- Addressing challenges posed by local minima in gradient descent,
49:37 *🔄 Iterative Convergence*
- Iterative convergence process in gradient descent algorithms,
55:20 *📊 Performance Metrics in Linear Regression*
- Explaining the importance of R-squared and adjusted R-squared in evaluating model performance,
01:07:05 *🔍 Overview of Regression Techniques*
- Introduction to ridge and lasso regression as regularization techniques,
01:09:16 *📊 Understanding Overfitting and Underfitting*
- Understanding overfitting and underfitting in machine learning,
01:16:13 *🧮 Introducing Ridge and Lasso Regression*
- Introducing ridge and lasso regression for regularization purposes,
01:30:56 *📊 Linear Regression Assumptions and Standardization*
- Linear regression assumes linearity between variables and the target.
01:33:18 *📈 Introduction to Logistic Regression*
- Logistic regression is used for binary classification tasks.
01:40:16 *🎯 Understanding Logistic Regression's Decision Boundary*
- Logistic regression's decision boundary is determined by the sigmoid function.
01:51:28 *📉 Logistic Regression Cost Function and Gradient Descent*
- Logistic regression cost function derivation and explanation,
01:59:06 *📊 Performance Metrics: Confusion Matrix and Accuracy Calculation*
- Detailed explanation of the confusion matrix in binary classification,
02:03:08 *⚖️ Handling Imbalanced Data in Classification*
- Definition and identification of imbalanced datasets in classification problems,
02:08:57 *📈 Precision, Recall, and F-Score: Choosing Metrics for Different Problems*
- Explanation of precision and recall metrics in classification evaluation,
02:14:13 *📊 Introduction to sklearn Linear Regression*
- Introduction to sklearn's linear regression model.
02:16:15 *📈 Dataset Loading and Preparation*
- Loading the Boston house pricing dataset from sklearn.
02:22:08 *📉 Data Splitting for Regression*
- Separating the dataset into independent (X) and dependent (y) features.
02:24:04 *📊 Cross Validation and Mean Squared Error Calculation*
- Explanation of cross-validation importance in machine learning model evaluation.
02:28:31 *🔄 Introduction to Ridge Regression and Hyperparameter Tuning*
- Introduction to Ridge Regression as a method to mitigate overfitting in linear regression.
02:34:00 *📊 Ridge Regression Hyperparameter Tuning*
- Understanding Ridge Regression and its role in reducing overfitting,
02:37:30 *📉 Impact of Hyperparameters on Model Performance*
- Exploring the effect of different alpha values on Ridge Regression's performance,
02:45:30 *🔄 Logistic Regression for Classification*
- Introduction to Logistic Regression for binary classification tasks,
02:55:14 *🎲 Probability Fundamentals*
- Probability basics: Understanding independent and dependent events.
02:56:34 *📊 Conditional Probability*
- Explaining conditional probability using the example of drawing marbles.
02:58:12 *🧮 Bayes' Theorem*
- Introduction to Bayes' Theorem and its significance in probability.
03:05:14 *📊 Applying Probability in Classification*
- Applying probability concepts (e.g., conditional probability) in classification problems.
03:17:30 *📊 Understanding Distance Metrics in Machine Learning*
- Understanding Euclidean and Manhattan distances,
03:20:18 *🌳 Exploring Decision Trees for Classification and Regression*
- Decision tree structure and node representation,
03:24:15 *🔍 Information Gain and Splitting Criteria in Decision Trees*
- Explaining entropy and Gini impurity as measures of impurity,
03:39:40 *📊 Understanding Entropy and Information Gain*
- Explained the concept of entropy in decision trees and how it relates to determining pure splits.
03:41:19 *📈 Using Information Gain for Feature Selection*
- Detailed the process of calculating information gain for different features in decision tree nodes.
03:49:17 *📉 Understanding Gini Impurity vs. Entropy*
- Explained the concept of Gini impurity as an alternative to entropy for decision tree construction.
03:54:01 *🧮 Handling Numerical Features in Decision Trees*
- Explored how decision trees handle continuous (numerical) features using sorted feature values.
03:59:34 *⚙️ Hyperparameters in Decision Trees*
- Defined hyperparameters and their role in controlling decision tree complexity.
04:06:03 *🌳 Decision Tree Visualization and Pruning Techniques*
- Understanding the structure of a decision tree through visualization.
04:09:16 *🛠️ Ensemble Techniques: Bagging and Boosting*
04:21:31 *🌲 Random Forest Classifier and Regressor*
- Solving overfitting in decision trees through ensemble learning.
04:24:20 *🌳 Random Forest: Overview and Working*
- Random Forest combines multiple decision trees to create a generalized model with low bias and low variance.
- Combines predictions from multiple decision trees (ensemble method).
- Uses bootstrapping and feature sampling to train each tree on different subsets of data.
- Prevents overfitting present in individual decision trees.
04:29:27 *🚀 Boosting Techniques: Introduction to Adaboost*
- Adaboost is a boosting technique that sequentially combines weak learners to form a strong learner.
- Begins by assigning equal weights to all training examples.
- Focuses on correcting misclassified examples in subsequent models.
- Uses weighted voting to combine outputs of weak learners into a final prediction.
04:42:27 *📊 Adaboost: Training Process and Weight Update*
- Adaboost updates weights of training examples based on the performance of each weak learner.
- Calculates the total error of each weak learner to determine performance.
- Adjusts weights of training examples to emphasize incorrectly classified instances.
- Normalizes weights to ensure they sum up to 1 for the next iteration of training.
04:45:24 *🌲 Decision between Black Box and White Box Models*
- Decision trees are considered white box models because their splits are visible and interpretable.
04:47:15 *🎯 Introduction to K-means Clustering*
- K-means clustering is an unsupervised learning method used to group similar data points together.
04:50:00 *📊 Understanding Centroids in K-means*
- Centroids in K-means represent the center of each cluster and are initially placed randomly.
04:56:31 *📉 Determining Optimal K in K-means Clustering*
- The elbow method is used to determine the optimal number of clusters (k) by plotting within-cluster sum of squares (WCSS) against different k values.
05:05:22 *🌐 Hierarchical Clustering Overview*
- Understanding hierarchical clustering involves identifying clusters based on the longest vertical lines without horizontal intersections.
05:07:30 *🕰️ Time Complexity in Clustering Algorithms*
- Hierarchical clustering generally takes longer with large datasets due to dendrogram construction, compared to faster performance by k-means.
05:09:04 *📊 Validating Clustering Models*
- For clustering validation, methods like silhouette scores are crucial, quantifying cluster quality.
05:17:21 *🌌 DBSCAN Clustering Essentials*
- DBSCAN identifies core points, border points, and noise points based on defined parameters like epsilon and min points.
05:26:37 *📊 Exploring K-Means Clustering and Silhouette Score*
- Explains the process of using K-Means clustering and evaluating it with silhouette scores.
05:35:30 *🧠 Understanding Bias and Variance*
- Defines bias as a phenomenon influencing algorithm results towards or against a specific idea or training data.
05:48:51 *🌳 Decision Tree Construction*
- Understanding binary decision tree creation in XGBoost,
05:51:39 *📊 Similarity Weight Calculation*
05:57:22 *📈 Information Gain Computation*
06:05:01 *🚀 XGBoost Classifier Inference Process*
06:09:39 *🌳 Decision Tree - Splitting Based on Experience*
06:11:31 *📊 Calculation of Similarity Weight and Information Gain*
06:18:59 *🌳 Regression Tree - Inference and Output*
06:26:24 *🚀 SVM - Marginal Planes and Hyperplanes*
06:30:51 *📈 SVM Margin Maximization*
06:31:34 *🛠️ SVM Optimization Objectives*
06:32:29 *🔍 SVM Decision Boundary Clarity*
Made with HARPA AI
and Credits to Andrew NG 20:23
The world should have so many people like you sir, your way of teaching is outstanding thank you for your time to educate the world
1:13:13. Underfitting me High-bias & Low-var aayega @krish naik
Not recommended for beginners, but if you already have some knowledge and wants to revise concepts this is the best video. very clear and concise explation
Any suggestions for beginner
@@ahmedhaigi2900 Machine learning ka syllabus chakko kahi se agr nahi hai to mujhe mail dedo mai de dunga and topic wise padho hr cheez
@@ahmedhaigi2900 Andrew ng course on Coursera you can Audit that course
Thanks
Then recommend for beginners
52:00 The partial derivative for Theta 0(bias) is 1/m sigma (i=1 to n)(y(hat)-y) there will be no square after doing partial derivative
Hello sir, I'm delighted to inform you that I've secured my dream job as an AI/ML developer, despite being a fresher. Your video on machine learning was instrumental in my success, and I'm extremely grateful for your contribution to my learning journey.
hi bro tell me how i starts i have a gap
Hey bro...which all videos are you covered....since I have AI/ML developer intern interview for monday
This course is very easy to understand if you’ll have a high school maths background . Thank you Krish Naik sir for explaining the concepts clearly.
I'm amazed by seeing your understanding with every algorithm👏👏. one day I'll also be able to do the same.
One of the best ML videos available in the internet. This video is crisp yet covers most of the topics of ML.. Also I like the way Krish explains theory part first and then explains the same using practical examples.
The video is an aggregation of the live machine learning community sessions that Krish did.
But he has edited out all the time wasting discussions and kept only the most important bits where the topics are explained.
Lots of time and efforts have been placed in compiling and editing these videos. Kudos to him for that.
Brother can you tell, is this ml video covers the whole syllabus of ml?
@@anexocelisia9377 ofc not, theres so much more to ml
@@yes.0 is the content in this video enough to crack data science interviews?
5:38:30 youre interchanging the definition of high bias and low bias
the way you teach is cake wake coaching... even a ground scratching beginner can shine in DS if they watch all your Video... Thank you!!!
2:49:15 -> We set dataset='balanced' when we want the class weights to be automatically assigned as per the class distribution.
5:38:43 When a machine performs well with the training dataset, it is said to have low bias.
For me this is the best video on krish channel...The knowledge and its presentation at class level...The mastery over major and minute things at its best.May lord Shiva bless you with happiness brother. Kudos...
I agree
Bro can you send me the notes of the lecture
Perfect binge watch for interview preparation. Thanks for uploading this Krish.
at least u have interviews and work places in such field u are lucky to get to apply and earn livin with your knowledge // where i live there non
@@amrdel2730 apply to other countries bro...Simple....if opportunities are not there in your place , u have to go outside
Excellent session...everything about ML is summarised in a single video, which provides the complete picture of the elephant!
Thank you Krish and Team , a million of course in free of cost . Thank you
I have a question around 1:25:40. You mentioned that we use Lasso to avoid less important features. The lower the slope, the lower is the modulus of that slope (or theta).
If I consider the mathematical definition,
in L2 Regularization: Cost is is J(theta) + lambda (sum of squares of thetas)
and in L1 Regularization: Cost is is J(theta) + lambda (sum of modulus of thetas)
So, if the absolute value of the slope is less than one, the square of it would be lesser, and hence we would be able to discard that feature more prominently.
Eg., (0.5)^2 = 0.25 < |0.5|
Correct me if my understanding is wrong. Thanks
So, that is partially true, but the logic is flawed a bit. Yes - x^2 makes numbers less than 1 smaller, and numbers greater than 1 larger. And that's the whole point. If we want to decide whether a certain theta parameter is suitable to omit (meaning we don't want to select that feature), we want to look at the sole value of that parameter (or the absolute value, in this case), not the square, the reason being that squaring makes small errors smaller and large errors larger. Discarding a certain feature based on the square of the parameter would be more prone to mistakes. In other words, it gets increasingly more difficult to tell well suited and badly suited parameters apart based on squares of their values, rather than modulus of their values, as the values grow large or small (basically when they start to deviate from 1 more and more). That's how L1 differs from using L2 and why it can help with feature selection. We can square the value of the slope, but that doesn't change the slope's value itself, just how we look at it. Otherwise, we could just raise the slope to some astronomical power, and discard all slopes that were smaller than 1 (because all of them would end up close to 0 after raising to some huge power). But that does not reflect reality. If we want to look at slope values in L1, to imply some feature selection, we don't want to make those values artificially smaller or larger, because there is no benefit to that - we would basically be losing information. You usually want to apply that transformation to errors, because when it comes to predictions, the error of 4 (2^2) is obviously worse than error of 2, and error of 0.1 is not that bad, so making it 0.01 (0.1^2) isn't a big deal. So you focus on minimizing the error of 4 rather then the error of 0.01 (actual errors are 2 and 0.1). So Ridge basically treats slopes in the same way as it treats errors, and Lasso does not.
And by the way, that is a big reason behind choosing loss functions to be (error)^2. We punish large errors and deminish small errors. Because at the end, when we look at our cost function, and the value it produces (when we sum up our losses), the small errors/losses don't add up to that much, but the large errors/losses do - so we want to focus on them a bit more. So (error)^2 is especially good for linear regression, beause it serves 3 purposes. One - squaring makes negative values positive, so the errors don't cancel out, but add up. Two - as stated previously, squaring helps disregard small errors and focus on large errors, because that's where the gain in performance is. Three, it's convex, because y_hat (the estimator of y) is linear, and linear functions are both convex and concave, so L(theta)=[ y_hat(theta) - y ]^2 is also convex (y_hat(theta) simply doesn't impact convexity, which is not true in general, like in DL or logistic regression). That grants us the abililty to use regular gradient descent algorithm without any issues. This cannot be said for things like logistic regression, or many square loss functions in deep learning, because the estimate itself is not linear, so the square may not be convex, and we might introduce multiple local minima (Loss function L(theta, y_i) is basically a composition of estimation function for ith observation minus that observation and some other function, like x^2). Therefore, for logistic regression, you adjust loss and cost functions (in reality they come directly from MLE), and for neural networks, you can use stuff like Adam optimizer and so on, so x^2 in this case is still nice and still leaves us with benefits from points 1 and 2.
Hope that clears it up, but if not, I'm sure there's someone better than me at relaying this information somewhere on the internet. Cheers.
An excellent and valuable 6 hours session on ML Algos. Very handy to make the learning process on ML smoother for the people who are new to it. Thank you Sir!!
Thanks!
For underfitting fitting models we have high bias and low variance, as bias means wrong prediction and variance means how the model is flexible enough to adapt to different datasets.
while you might encounter Gini impurity values higher than 0.5 in the context of the Iris dataset, this is due to the multiclass nature of the problem and the specific calculation used for multiclass Gini impurity. It doesn't imply that the maximum impurity for multiclass problems is 0.5; that limit applies to the binary case.
In 1:25:50,
When we're talking about getting slope closer to zero. It is definitely in between 0 and 1. So squaring them in ridge will definitely make them smaller. So I dont get the explanation that you're giving.
Can we say that for a feature when it is not that useful, while computing lasso cost function(say LCF), LCF increases faster due to mod values of slopes(in between 0 and 1) but the ridge cost function(RCF) increases slowly as the squared values of slope will decrease the increasing rate of RCF? Here I am talking about the change in LCF and RCF with respect to increasing lambda values.
So, the Lasso will give optimal slope equal to zero when lambda is increased. But Ridge might not give zero slope even with higher lambdas due to slowly increasing rcf wrt increasing lambdas.
Please refer this link and give an explanation to your statements made at the above-mentioned timestamp:
ua-cam.com/video/Xm2C_gTAl8c/v-deo.htmlfeature=shared
05:38:37 If data performs well on training data it is Low Bias right?
06:25:30 How you are multiplying the maatrices? 2*1 and 1*2 will not give a constant value it gives 2*2 matrix.
your 1st question is right...dude
for your2 question go through the vector operation it is correct dude
@@sam-uw3gf vector multiplication are you saying
@@chinmay4452 yes
You are great!! That's all I need to say after this class.
Your explanation is really good and content wise excellent sir. Thanks for sharing your videos and roadmaps and End2end explanation interview point of view .
Please be patient in this course . Definitely a awesome course
Definitely good and great refresher who has exposure in ML,STATS and MATH(calculus and Algebra)but not for absolute beginners.... , if you want to learn ML without prior knowledge, Andrew's course in coursera is the best, you can audit the course for free over there.
i've completed the stats part, should I watch this , or should i learn the extra math parts then starts here,i mean the algebra and calculus part
do you know any free resources to learn machine learning?
@@ritamsantra2372 did you watch this video or how you went about it? and where did you learn stats part?
Thoroughly enjoyed the videos. I was able to get over the fear of learning ML as it made my learning process smooth. Thank you ❤️
Definitely not the beginner-friendly!! But you will get an idea of all the Algorithms. Watch if you are revising concepts. He never starts from basics. I needed to do many searches, pausing this video.
Can I do if I am a beginner
Plz suggest me some youtube channel
Other than this video which all videos needed to watch in order to complete machine learning
6hrs ago, I don't know machine learning 💀💥. Classic✨
Really??
*I didn't knew
It u take only 6 hours that means u did not understand it fully I just watched it not practicingb so u can say that u still did not know ml
@@roninbromine1670didn't know*
@@roninbromine1670I didn't Know!!
You are the best teacher 🥰. Regards from Malaysia.
Highly recommend this video. Dont look any further for a classical ML methods.
Couple of notes:
Should've explain logistic regression from the odds, log odds part imo.
Also best ridge/lasso regression explanation i've seen and i've seen a lot. the only thing to add is since we have squares the whole 'slope' parameter will be a hypersphere in an n dimensional space of weights. And so it will not touch 0 at any point basically, will stop before.
Your presesentation and teaching is excellent!
Why is Random forest not effected by Outlier?
Ans on Google:
The intuitive answer is that a decision tree works on splits and splits aren't sensitive to outliers: a split only has to fall anywhere between two groups of points to split them.
Really great content right here; from the rudiments to the practical application is covered here regarding all the traditional ML Algorithms! Just Amazing Period.
just now completed full video until 6:37:51 video in 2X mode.. Tq Guruji
Thank you soo much Krish for summarising everything here.
Brother can you tell, is this ml video covers the whole syllabus of ml?
outstanding session with precise and quality knowledge
Thank you sir for explaining the concepts in such a manner that they seem easy to understand...
Thanks krish. Superb delivery.
Great video, thanks Krish!
Sir, make a separate playlist on, Reinforcement learning, Deep reinforcement learning and imitaiton learning. thanks
I am on the search for the same
Krish, thank you for these wonderful lectures! Much love.
very nice voice, no confusion for listening
Thanks for the session krish sir, now i am doing practical things, i notice sklearn delete the dataset in new version so i found some solution below, please check :-
#housing pricing dataset
from sklearn.datasets import fetch_openml
# Load the Boston Housing dataset
df = fetch_openml(name="boston", version=1, parser='auto')
#here some issue also on When using .iloc, it returns a DataFrame, and you need to convert it to a NumPy array or pandas Series for scikit-learn to work with it correctly.
X = dataset.iloc[:, :-1] # Independent features
y = dataset.iloc[:, -1] # Dependent feature
# Convert X and y to NumPy arrays or pandas Series
X = X.values if isinstance(X, pd.DataFrame) else X
y = y.values if isinstance(y, pd.Series) else y
Hope this will help to get accurate solutions.
Hi Krish, I really appretiate your work, Your delivery is great, easy to undertand and remember.
thanks for the great content.
that's exactly what I'm waiting for. Thankyouu Soo Muuch Sir for Sparing That Muuchh Knowledge . 😍🙏🏼🙏🏼
Much needed video sir.....sab video hain par apka....❤️🔥🔥
Explanation of logistic regression was the most awesome explanation that i ever found.Thank you for the session Krish.
no option to give more likes...love from bangladesh.
thankyou for this amazing lecture sir..its currently 2:30 am at night and i just finished this whole lecture .... i must say i gained a lot ..thankyou ❤❤❤❤
Great explanations, Krish. I just started my data science prep and have been following you for a few days. This will be my second marathon after just finishing your statistics tutorial. It is a fun learning experience watching your lectures. Thanks again for your efforts!
Please let me know if I am wrong. I have a query in the adjusted R square performance metric, explained around 1 hour into the video. According to the formula, When we substitute p=2, the value of adjusted Rsquare should be the same as that of Rsquare, right? However, you've shown it as lesser in your example, or is there a condition that we should only use adjusted Rsquare when the p-value is greater than 2.
did you get a job ???
Sir I just want to say Thank you to help us gain this knowledge and encourage us to start our data science journey
Machine learning pa ada
wow! very useful content❤❤
Perfect Ml video all over UA-cam... You're explanation is just amazing 🤩... Thank you so much ( I'm now only at the beginning😅... many more to go )
Wah Wah - Excellent Video
What an amazing tutorial ever seen, Thank you, Krish, but Have you put all the pdf materials kindly.
Thank you, Krish, for such an incredible Tutorial, Have you made all the PDF files available?
Love the way Krish teaches!
Great job krish!
Thanks for adding the timestamp 💯
great explanation of so many algorithms in a short time
Sunday spent well !! ♥️
Really appreciate to you sir. your explanation is very understandable sir.
Thank you sir.
You are really good teacher
sir, it's very useful algorithm. i am following this . thanks
Very well explained....thank you sir 🥰💐💐
Amazing teaching skills
🤝🤝
Might be unpopular opinion but Krish is less boring and more coherent than Andrew Ng
Hi Krish, thanks for making this. In this video you missed out PCA topic can you please make a video of that? And some detailed videos on model selection, feature selection & feature engineering.
Great video! I think your Bias definition is backwards @5:38:37
48:57
56:00
1:02:00
1:05:00
1:30:00
1:48:00
1:52:00
2:08:00
3:19:00
3:33:45
3:51:20
3:53:10
3:57:10
4:27:40
4:56:20
5:07:40
5:27:16
Thank you Krish, so helpfull, as mine is commerce background i feel tuff but understanding the concepts
Thanks by bringing multiple live stream into a same video👍👍
This is an excellent collection, thanks krish for this:)))
bro do you have the notes for this lecture?
Hi Krish. This video is very helpful and lots of fun to watch and it’s amazing that within such a short span of time you’ve completed sort of a bridge course on ML. Kudos to you 👏🏻! However, I had a doubt that I would like to raise here. You mentioned in your video that Lasso Regularisation helps feature selection. If the theta or slope values are negligible, say close to zeros, then squaring them wouldn’t increase the values but decrease further right? Why can’t we do feature selection using ridge regularisation then? But for slopes greater than 1 this would make sense, however, in those cases we would not be able to neglect those right?
we use mod in lasso to do feature selection and for lasso we will have multiple features and slope so on adding them it will itself neglect those features that are not of use
we do feature selection using lasso regularisation because it makes some of the coefficient to zero which are not important to our analysis. may be that's the reason we used lasso for feature selection.
thank you soo much sir for your great explanation
5:38:29,please clarify on the statement .how model has high bias when its performing well on training data
1:31,standardization is not the assumption of LR
This lecture series is amazing ,a slight correction I found in confusion matrix, please correct it ( your TN should be FN and FN should be TN) ...I think so
Thank you sir for making it this concise
Thank you for putting everything together ☺️
@krish, I think there is a mistake in the practical implementation of the linear, ridge regression, Since u used negative MSE, ur rdige regression was actually better than ur linear one. In normal MSE scores, more the value - worse the model, in negative MSE scores this would change.
Thanks sir, thank you for merging all videos
Please check at 1:18:48 there is a calculation mistake. 0+1(2) = 3?
Krish, you have been like a brother to me when it came to understanding machine learning. Hope to meet you some day.
thank you for such a informative video Krish Naik.can you make video on standard scalar ,feature transformation another preprocessing on data before model implementation
Clear explained 👍
@krishnaik06 You are nothing but simply amazing. True help to ones in need. This learning has made a difference to my understanding of ML. This has helped me transition to hands-on.
Thank you soo much sir for ur efforts ☺
Clear information, clarify every important point cover all topics. Thanks krish I participated you live session also... 👍
Vathiyare......... 🙏🙏
a thousand dollar course just free. thankyou krish sir.
krish at 52:06 during derivative at j==o how it comes square when we are derivative wrt to theta 0
Same doubt
I am watching this video now but could not fetch this boston housing prices data set as sk learn maintainers are telling us strongly not to use this dataset..how can i complete this tutorial now??@krishnaik sir
2:32:33 ye badhiya tha guru