Having problems with the code? I just finished updating the notebooks to use *scikit-learn 0.23* and *Python 3.9* 🎉! You can download the updated notebooks here: github.com/justmarkham/scikit-learn-videos
It is so awesome, that you combine first class knowledge+ impressive pronunciation of a professional voice actor. It's super clear! Thank you for the series
You're videos are so far superior to the commercial products out there I just can't believe it. I wish I had found them before dumping a small fortune into the "pay-to-play courses." Thank you for sharing this information and be sure that I will join the Paetron group.
Wow! Thank you so much for your kind words! :) I look forward to having you in the Data School Insiders group... you can join here: www.patreon.com/dataschool
One of the best ML online tutor I have come across, very well thought, every minute in informative. AND I do support Kevin's slow speech pace; it makes it much easier to comprehend the complext concepts. Thank you Kevin.
You have done an awesome job. I'm the TA for a course on Bioinformatics and I'll be using your videos to teach my students a short primer on getting started with ML just so that they can shed that fear and get down to work :)
you are doing a very good job. The stuff and tutorials you are providing for free seriously shows your dedication for your work and how much you care for those who cant afford such expensive tutorials. Thanks
Many thanks for this video series. I really like the way you develop the subject in manageable chunks and focus on what is really needed to master the subject.
Thanks so much for making it so easy to understand. I have watch many videos on Machine Learning and have never felt so confident in applying the concepts. Well Done!
Overwhelming, I have been trying to learn these basics since a long time, and finally got this video series, Thank you so much for such clear presentation of such a complex (esp for me) topic.
Totally agree with you. It's a great channel. I just published an e-book about machine learning with clustering algorithms. it's available for free for 5 days. Would you like to get a free copy?
You can easily create an account on Amazon if you don't have one (you don't necessarily need to enter your credit card). After that, you will be able to read free e-books on Amazon website through their cloud reader. I regularly read free e-books available on Amazon, and it's very convenient.
Thanks Kevin. I like the videos very much. Wish I had known about this series a month back. I dropped my ML course in this semester because the material was very overwhelming. Very useful videos and the material is presented in a very organized manner. Keep up the good work!
*Note:* This video was recorded using Python 2.7 and scikit-learn 0.16. Recently, I updated the code to use Python 3.6 and scikit-learn 0.19.1. You can download the updated code here: github.com/justmarkham/scikit-learn-videos
I'm running the latest Anaconda 1.9.7 Jupyter Notebook server 5.7.8, Python 3.6.5, iPython 7.4. Upon hitting Run for the 1st two lines of code: "from IPython.display import IFrame IFrame('archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data', width=300, height=200)" Jupyter doesn't run and outputs the columns of numbers like in the video, but asks for "iris.data". What should I do? Your Pandas videos have helped so much. I'm enrolled at DataQuest but have been considering enrolling in yours too. Thanks Kevin.
This is the best Machine learning video I've ever watched, amazing how you did break a complicated topic like machine learning into small sections accompanied with very, very clear explanations, thank you very much I hope you continue it's been a long time since you posted a video on UA-cam.
+Victor Ekwueme Thanks! I spent a lot of time figuring out how to teach this material in the classroom, and so I thought it was important to spread the knowledge using videos as well :)
Simply outstanding work. It's highly structured and clearly explained. I also greatly appreciate the excellent references you link for various sections.
My friend...Thanks a lot..This is the best introduction to machine learning I have ever come across...Please do a deep learning tutorial...Again thanks a lot.
Such an amazing video. I have never seen this type of clear video. I understand many things. Thanks a lot, Please make a video on unsupervised learning also.
this is the best speed to make a beginner understand the terminologies one by one ... i really appreciate and thankful yo you for this video . i have seen some videos and i was not able to get wat they were saying bcoz of the speed .... thanks...!!!!!!
Kevin, you said you don't know how well your model do on new data, but when you test your model with predict on the test data, I think it is standard to evaluate the accuracy (or any other metric) of your model.
To be clear, if we are talking about truly "new" data, meaning out-of-sample data, then you actually don't know the true target values, and thus there's no way to check how accurate your model was with those samples. Hope that helps!
@@dataschool Certainly because they are not using the appropriate classifier for high-end teaching videos :-) Congrats for your video series: very instructive, clearly articulated (pondering theory and examples) and with perfect emphasis in critical points. As a well-known French philosopher used to say: "whatever is well conceived is clearly said and the words to say it flow with ease". Bravo Kevin! Very talented teacher!
00:04 Introduction to K Nearest Neighbors (KNN) classification model 02:13 K-nearest neighbors classification model works by selecting the nearest observations and using the most popular response value. 04:44 KNN is a simple machine learning model that predicts the response value based on the nearest neighbor. 07:26 The first step is to import the relevant class and the second step is to instantiate the estimator. 10:11 Training a machine learning model with scikit-learn 12:44 The predict method returns a numpy array with the predicted response value 15:05 Different models can be easily trained using scikit-learn 17:22 Understanding nearest neighbor algorithms and class documentation Crafted by Merlin AI.
Very helpful and clear - thank you, incl the updated notebooks. Toward the end (t=15:40), using the logreg.fit(X, y) function results in a "/usr/local/lib/python3.6/dist-packages/sklearn/linear_model/_logistic.py:940: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT." warning and a result of [0, 0] even with the updated code. Any suggestions? Changing max_iter to 500 gets rid of the warning, but still ends up with [0, 0] rather than [2. 0] as shown in the video. Any suggestions? I'm using Colab notebooks.
The default solver for LogisticRegression has changed from liblinear to lbfgs. If you change it back to liblinear, it will converge. Try: logreg = LogisticRegression(solver='liblinear') before fitting. Hope that helps!
Hello again, your tutorials are awesome. I have an error at here: In [8]:knn.predict([3, 5, 4, 2]) /usr/local/lib/python3.5/dist-packages/sklearn/utils/validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample. DeprecationWarning) Out[8]:array([2]) But i still get the correct output. I think it is related to numpy. Where should i use the numpy and how exactly? Or should i just ignore it?
You bring up a great point! It's a long explanation: The 0.17 release of scikit-learn included the following change: "Passing 1D data arrays as input to estimators is now deprecated as it caused confusion in how the array elements should be interpreted as features or as samples. All data arrays are now expected to be explicitly shaped (n_samples, n_features)." Here's what that means: When you pass data to a model (an "estimator" in scikit-learn terminology), it is now required that you pass it as a 2D array, in which the number of rows is the number of observations ("samples"), and the number of columns is the number of features. In this example, I make a prediction by passing a Python list to the predict method: knn.predict([3, 5, 4, 2]). The problem is that the list gets converted to a NumPy array of shape (4,), which is a 1D array. Because I wanted scikit-learn to interpret this as 1 sample with 4 features, it now requires a NumPy array of shape (1, 4), which is a 2D array. There are three separate ways to fix this: 1. Explicitly change the shape to (1, 4): import numpy as np X_new = np.reshape([3, 5, 4, 2], (1, 4)) knn.predict(X_new) 2. Tell NumPy that you want the first dimension to be 1, and have it infer the shape of the second dimension to be 4: import numpy as np X_new = np.reshape([3, 5, 4, 2], (1, -1)) knn.predict(X_new) 3. Pass a list of lists (instead of just a list) to the predict method, which will get interpreted as having shape (1, 4): X_new = [[3, 5, 4, 2]] knn.predict(X_new) Solution #2 is scikit-learn's suggested solution. Solution #3 is the simplest, but also the least clear to someone reading the code.
Hi thanks for these video, they are amazing! One thing I noticed: it turns out that with the 0.17 release if you just type >>knn.predict(X_new) nothing will be output. My workaround is to type >>print knn.predict(X_new) >>[2] But I am not sure it is the best solution...
Glad the videos are helpful to you! I'm using scikit-learn 0.17, and I'm not seeing the behavior you are describing. Are you sure you're running exactly the same code, in exactly the same order I'm running it?
The scikit-learn documentation refers to it as a "feature matrix", thus I do as well. Calling it a "feature matrix" indicates that it's made up of features, and it's 2-dimensional, and it's implied that the other dimension is the samples.
Hi, thank you for your explanation it's very clear. But there is something I don't really understand. At 4:00 you say that the data is represented by 2 numerical features, so you have two axes X and Y. But what if there were more features like the iris dataset. How does NNB work in that case? Is it taking the same steps as you explain in this video, but not on a 2D graph but 4D graph?
@@dataschool That is very unfortunate to hear. Is it possible if we make an appointment via skype? I do not get out and need this information for my thesis. I hope you can help me with it.
I don't work with anyone one-on-one, I'm sorry! However, you are welcome to join Data School Insiders and ask a question during a live webcast or on the private forum: www.patreon.com/dataschool - I prioritize answering questions from Insiders because they are investing in me.
Thanks for another great video. I have one conceptual question regarding implementing logistic regression as shown in the video. What I understand is logistic regression is used where the outcome is binary (for example, A or B), In Iris dataset the outcome can be from 3 categories so how does logistic regression work here.
That's really a mathematical question rather than a conceptual question. Logistic regression can be used for multiclass problems, and a few details are here: scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html You can learn more about this topic by searching for multinomial logistic regression. Hope that helps!
Hello, Thanks for explaining such topics in clean manner. Could you please do a explanation for categorical and numerical NAN values imputations? ( How to handle NAN in Machine Learning ? )
This video series is the best I have watched about scikit-learn so far. By the time I finished watching all the videos; I will let you know my comments. At this time I am just wondering if you have in mind to something similar but based on the R-project platform. I mean, to go over the principal supervised and unsupervised machine learning methods but sin R-project. Are you planning to do this?
Great! I hope you can cover other algorithms like SVM, Decision Trees, Random forest and, Discriminant Analysis (DA). At the same time, by the time we use linear regression, Logistics regression and DA, for instance, sometimes we need to tune our models to check if they are in line with the assumptions. It would be nice if you can consider those topics in the next video series about machine learning and scikit-learn. How can we get the output graphs in all the models too? For instance: Tree graphs; ROC curve graph, etc. I am already working on it by myself (a lot of googling and reading work so far) but it would be great to have this add-in from you too. You are a great teacher, very precise and direct. Besides, you gave very good additional support in your notes that follows each video. So, it is very easy to follow your examples and recommendations. I hope you can get the new video series done soon. My best regards!
Really enjoying this series! Thanks for creating it. Do you know where I might find the code for making one of the lovely classification maps you show at e.g. 4.15?
This solution: scikit-learn.org/stable/auto_examples/neighbors/plot_classification.html works straight out of the box for Iris data, though sadly struggling to adapt it to my dataset.
Hi, I am getting an error as "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT." and the predictions is coming as array([0, 0]). Any help would be appreciated.
Thank you for the video. It is really well-explained. I got several future warnings with LogisticRegression. When I used the fit method, it says that the default solver would change to 'lbfgs' and that I should specify a solver. Also, I got a warning that the default multi_class is also going to change to 'auto' and that I have to specify the multi_class myself. Even after I specify these two, I get a ConvergenceWarning, claiming that lbfgs failed to converge. I am new to machine learning, and I don't know what to do. Can you please tell me what I can do to solve these warnings?
Thanks for the video! I noticed you have code completion in your notebook, though I'm not seeing it in my notebook (I installed the latest version of Jupyter). Does it require some kind of plugin? Looking forward to the next lesson!
Laurent Van Winckel You're welcome! It looks like tab completion requires the readline library. More information is here: ipython.org/ipython-doc/stable/interactive/tutorial.html Let me know if you are able to get it working!
thanks for this great work, have got question. I am using Anaconda distribution, while coding in the notebook, how would I get the pop up list of the objects as you get ,,ex. when you "from sklearn." after that it list methods/properties and you select the one you need..in this case you selected "neighbor"..I am not getting all this option
Thank you sir for your vedio on machine learning.Sir,i got an erro saying expected 2d array,got 1d array instead when using knn.predict. Nb: got the outpu when it is knn.predict( [ [ 3,5,4,2 ] ] ). 2 [ is used
When I try to do the Linear Regression part, when I predict (Logreg.predict), I get the wrong array [0,0] and a convergence error. What should I do? I tried changing the max_iter number/verbose... but I have no idea if that is where the problem comes from.
Thanks for material ! One question : In the video you try to solve classification problem with regression model , if I recall correctly from previous video , the regression models are good for regression problems and classification models for classification problems . Is there any criteria when I can choose with confidence regression model to solve classification problem ?
Eli Lavi Actually, "logistic regression" is a classification model (not a regression model), despite its name! That's why I used logistic regression in this case. There are some limited circumstances in which regression models can be used to solve classification problems, but it usually doesn't make sense. I wouldn't worry about it for now... that's a very advanced technique. Does that answer your question?
Hi Mark, How do you prepare the data for prediction? If the train data has one empty column value in the row . How to replace this value? Can I use KNN for this purpose?
Having problems with the code? I just finished updating the notebooks to use *scikit-learn 0.23* and *Python 3.9* 🎉! You can download the updated notebooks here: github.com/justmarkham/scikit-learn-videos
i appreciate the fact that you speak very slowly and express clearly!
Thanks, I try to make it easy for others to understand me! :)
I totally agree. I don't have to pause the video as frequently while taking notes.
This is hands down the best machine learning tutorial. Definition and concept is well-explained. THANK YOU SO MUCH!
Thank you for your kind words! 🙏
It is so awesome, that you combine first class knowledge+ impressive pronunciation of a professional voice actor. It's super clear! Thank you for the series
Wow, thank you! 🙏 I really appreciate your truly kind words!
Despite this being an old playlist, without a doubt still the best one I found on youtube so far...
Thank you so much!
Oh yeah! It doesn't get better than that!
You're videos are so far superior to the commercial products out there I just can't believe it. I wish I had found them before dumping a small fortune into the "pay-to-play courses." Thank you for sharing this information and be sure that I will join the Paetron group.
Wow! Thank you so much for your kind words! :) I look forward to having you in the Data School Insiders group... you can join here: www.patreon.com/dataschool
No words to Describe How awesome it is...after watching so many tutorials .
Thanks very much for your kind words!
One of the best ML online tutor I have come across, very well thought, every minute in informative. AND I do support Kevin's slow speech pace; it makes it much easier to comprehend the complext concepts. Thank you Kevin.
You're very welcome!
Your ability to explain this topic in simple terms is remarkable. Thank you so much for these videos.
You're very welcome!
It's a real pleasure to follow this serie, clear, concise and so well teached. Being a non native English speaker, it's 100% understandable. Bravo !
Wow, thanks so much for your kind words! I really appreciate it.
Not just educated, but a talented teacher. Fantastic combination
Thanks so much! I really appreciate it! :)
Loss of words! Ur explanation is with the purpose to answer every root question and with an aim so that one clearly understands. Thanks a lot.
You're welcome!
You have done an awesome job. I'm the TA for a course on Bioinformatics and I'll be using your videos to teach my students a short primer on getting started with ML just so that they can shed that fear and get down to work :)
+Spandan Madan That's awesome! Please let me know how it goes!
you are doing a very good job. The stuff and tutorials you are providing for free seriously shows your dedication for your work and how much you care for those who cant afford such expensive tutorials. Thanks
Thanks so much for your kind words! I'm really glad the tutorials have been helpful to you!
Thanks for making such lucid videos Kevin! You have no idea how helpful these videos are for a novice like me.
Excellent! That's very nice to hear!
This literally is best tutorial guide on the internet.. thank you so much
Wow! What a kind thing to say... thank you!
Me too!!!!!!
Wow, ML suddenly feels a lot less scary. Can't wait to watch the rest of the series.
Excellent! Here's a link to the entire video series, for others who are interested: ua-cam.com/play/PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A.html
Many thanks for this video series. I really like the way you develop the subject in manageable chunks and focus on what is really needed to master the subject.
Garrie Daden That is excellent to hear, and is exactly what I was trying to do! Thanks for your thoughtful comment.
Thanks so much for making it so easy to understand. I have watch many videos on Machine Learning and have never felt so confident in applying the concepts. Well Done!
You are very welcome! I'm glad to hear my video was helpful to you!
Overwhelming, I have been trying to learn these basics since a long time, and finally got this video series, Thank you so much for such clear presentation of such a complex (esp for me) topic.
You are very welcome! I'm so glad to hear it was helpful to you!
One of the best channels. Nice to see someone speaking so coherent and educational, compared to other channels. Great job Kevin.
Totally agree with you. It's a great channel. I just published an e-book about machine learning with clustering algorithms. it's available for free for 5 days. Would you like to get a free copy?
of course, thank you Artem
it's here www.amazon.com/dp/B076NX6KY7 You would really help me if you leave a little review on Amazon
Great thank you, and will do. I dont use Amazon kindle but I try to figure out how to get around it. Thanks
You can easily create an account on Amazon if you don't have one (you don't necessarily need to enter your credit card). After that, you will be able to read free e-books on Amazon website through their cloud reader. I regularly read free e-books available on Amazon, and it's very convenient.
Thanks Kevin. I like the videos very much. Wish I had known about this series a month back. I dropped my ML course in this semester because the material was very overwhelming. Very useful videos and the material is presented in a very organized manner. Keep up the good work!
Thank you so much for your kind comments!
*Note:* This video was recorded using Python 2.7 and scikit-learn 0.16. Recently, I updated the code to use Python 3.6 and scikit-learn 0.19.1. You can download the updated code here: github.com/justmarkham/scikit-learn-videos
thanks a lot for your job
You're very welcome!
Hi i have trained my model using NN and model is saved so how can i use model to classify images?
I'm running the latest Anaconda 1.9.7 Jupyter Notebook server 5.7.8, Python 3.6.5, iPython 7.4. Upon hitting Run for the 1st two lines of code:
"from IPython.display import IFrame
IFrame('archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data', width=300, height=200)"
Jupyter doesn't run and outputs the columns of numbers like in the video, but asks for "iris.data".
What should I do? Your Pandas videos have helped so much. I'm enrolled at DataQuest but have been considering enrolling in yours too. Thanks Kevin.
@@dataschool Can we also have a lecture for TensorFlow?
You choose your words very carefully. Awesome teaching 👏
Thanks so much! 🙌
This is the best Machine learning video I've ever watched, amazing how you did break a complicated topic like machine learning into small sections accompanied with very, very clear explanations, thank you very much I hope you continue it's been a long time since you posted a video on UA-cam.
Thanks so much for your kind comments! I really appreciate it :)
P.S. I published 10 videos last month, and will have more in the future!
Wow thanks for the update, I'm gonna check them right now. bless you.
This is the best explanation, I have gone through many videos but this video helped me a lot for better understanding.... Thank you markham.
You're very welcome! Glad it was helpful to you!
Best series of machine learning tutorials out there!
Ricardo Ferraz Leal Wow, thank you! What a kind compliment. I really appreciate it!
Your explanations in your videos are easy to understand and very or should I say extremely helpful. Keep it up....
+Victor Ekwueme Thanks! I spent a lot of time figuring out how to teach this material in the classroom, and so I thought it was important to spread the knowledge using videos as well :)
Wow, you're the best teacher I've learned so far. Easy to understand and the contents are well explained.
Please keep on making videos of the same quality. Thank you so much
Thanks!
You are one of the best teacher ever got taught !
Thanks so much! :)
Step by step explanation in a clear way. Just love it. Thank you so much.
You're very welcome!
Love your Speed and Clarity man.
Thank you!
Simply outstanding work. It's highly structured and clearly explained. I also greatly appreciate the excellent references you link for various sections.
Thank you so much for your kind words!
My friend...Thanks a lot..This is the best introduction to machine learning I have ever come across...Please do a deep learning tutorial...Again thanks a lot.
Thanks very much for your kind words! I really appreciate it.
Such an amazing video. I have never seen this type of clear video. I understand many things. Thanks a lot, Please make a video on unsupervised learning also.
Thanks for your suggestion!
You are the best Kevin. I always find the most relevant stuff in your videos.
Thanks Achin!
Amazing explanation! I'm so excited to finish the series! Congrats!
Thanks very much! Glad you are enjoying the series :)
you are one of the best instructors online, thank you so much.
Wow, thanks so much for your kind comment! :)
finding these tutorials very interesting.....do continue putting them up...thanks a lot
***** You're very welcome!
Very well explained, you are a great teacher! Loving this series !
this is the best speed to make a beginner understand the terminologies one by one ... i really appreciate and thankful yo you for this video . i have seen some videos and i was not able to get wat they were saying bcoz of the speed .... thanks...!!!!!!
You're very welcome!
This whole series is helpful and fun to watch. Thanks!
That's excellent to hear. Thanks for watching!
Thank you for all these videos, Kevin! Very clear and easily understandable.
You're very welcome! :)
This is great man, I am watching this in x2 speed, haha.
Great!
same here
I have to much ADHD to process it in x2
Exactly that's why it is so understandable
One of the best tutorial I ever seen. I love your speech also.
Thanks! :)
Hi. Are you going to make some new paid course?
I am continuing to work on both free content and paid content. Stay tuned!
You're just awesome...best videos in recent times...like your way of explanation and please do continue teaching and sharing your knowledge...peace..
Thanks so much for your kind words! :)
Kevin, your a great teacher, your explanations are top notch! Subbed on the channel and the news letter! Thanks a lot! :)
Wow, thanks so much! Great to hear :)
its pretty clear and precise explanation. Thanks for making such videos and keep us educated @data school
Thank you very much. Your videos really help me understand ML deeply.
That's great to hear! 🙌
Mr Kevin, I really appreciate this tutorials. I hope to become as good as you are some day..
You're very welcome!
your pronuncation is awesome ,im really understand because of you thanks a lot teacher !
Thanks!
It is because of these guys we are able to learn Machine Learning concepts so clearly and easily🎉🎉❤❤
Thank you so much!
@@dataschool Thanks to you Sir!
Very clear and easy to follow. Thanks!
Excellent! You're very welcome!
Thank u very much I wanted to start off with ML and tried many tutorials but all were very fast. But u explained each line very nicely
Great to hear!
Kevin, you said you don't know how well your model do on new data, but when you test your model with predict on the test data, I think it is standard to evaluate the accuracy (or any other metric) of your model.
To be clear, if we are talking about truly "new" data, meaning out-of-sample data, then you actually don't know the true target values, and thus there's no way to check how accurate your model was with those samples. Hope that helps!
@@dataschool Ah ok, thanks for elaborating. Yes, indeed, e g. a new client asking for a loan (default or not)
why would anyone dislike this video?
Ha! I ask myself that same question :)
probably some other e-learn teaching competitors :)
They hate-im, cus they ain't-im. ;)
@@dataschool Certainly because they are not using the appropriate classifier for high-end teaching videos :-) Congrats for your video series: very instructive, clearly articulated (pondering theory and examples) and with perfect emphasis in critical points. As a well-known French philosopher used to say: "whatever is well conceived is clearly said and the words to say it flow with ease". Bravo Kevin! Very talented teacher!
@@bardamu9662 very good ML lecture!!!!
¡Gracias!
Wow, thank you so much Luis! I truly appreciate it! 🙏
Great explanation, simple & effective, Big Thank you for the videos
You're very welcome!
Great video man. You are hero of humanity.
Wow, thank you! :)
it only worked when i used two square bracket knn.predict([[3,5,4,2]])
Right. See here for an explanation: www.dataschool.io/how-to-update-your-scikit-learn-code-for-2018/#only2ddataarrayscanbepassedtomodels
Thanks Sandeep
op = [[1.77, 2.55],]
linreg.predict(op)
this also works. it is expecting a 2d arrays but i dont know y. adding a comma next to a list makes sense
Very very good and easy to learn lectures. Thank you..
You're very welcome!
very simple and straight forward, thank you data school.
You're welcome!
00:04 Introduction to K Nearest Neighbors (KNN) classification model
02:13 K-nearest neighbors classification model works by selecting the nearest observations and using the most popular response value.
04:44 KNN is a simple machine learning model that predicts the response value based on the nearest neighbor.
07:26 The first step is to import the relevant class and the second step is to instantiate the estimator.
10:11 Training a machine learning model with scikit-learn
12:44 The predict method returns a numpy array with the predicted response value
15:05 Different models can be easily trained using scikit-learn
17:22 Understanding nearest neighbor algorithms and class documentation
Crafted by Merlin AI.
Thanks for sharing! 🙌
Thank you a lot for giving such a great video for beginner!
Very thanks!
+TA OR You're very welcome!
Very helpful and clear - thank you, incl the updated notebooks.
Toward the end (t=15:40), using the logreg.fit(X, y) function results in a "/usr/local/lib/python3.6/dist-packages/sklearn/linear_model/_logistic.py:940: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT." warning and a result of [0, 0] even with the updated code. Any suggestions? Changing max_iter to 500 gets rid of the warning, but still ends up with [0, 0] rather than [2. 0] as shown in the video. Any suggestions? I'm using Colab notebooks.
The default solver for LogisticRegression has changed from liblinear to lbfgs. If you change it back to liblinear, it will converge. Try: logreg = LogisticRegression(solver='liblinear') before fitting. Hope that helps!
Keep up the amazing work!
Thanks!
Thank you thank you .. I was having some doubts in concept and now it's cleared. I request you to please make some video on data normaliser
Tq sir for your great explanation it will bring confidence in us that we can learn ml
You're very helpful and intelligent. thank you for these very polished videos.
You're very welcome!
Hello again, your tutorials are awesome. I have an error at here:
In [8]:knn.predict([3, 5, 4, 2])
/usr/local/lib/python3.5/dist-packages/sklearn/utils/validation.py:386:
DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and
willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1)
if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
DeprecationWarning)
Out[8]:array([2])
But i still get the correct output. I think it is related to numpy. Where should i use the numpy and how exactly? Or should i just ignore it?
You bring up a great point! It's a long explanation:
The 0.17 release of scikit-learn included the following change: "Passing 1D data arrays as input to estimators is now deprecated as it caused confusion in how the array elements should be interpreted as features or as samples. All data arrays are now expected to be explicitly shaped (n_samples, n_features)." Here's what that means:
When you pass data to a model (an "estimator" in scikit-learn terminology), it is now required that you pass it as a 2D array, in which the number of rows is the number of observations ("samples"), and the number of columns is the number of features. In this example, I make a prediction by passing a Python list to the predict method: knn.predict([3, 5, 4, 2]). The problem is that the list gets converted to a NumPy array of shape (4,), which is a 1D array. Because I wanted scikit-learn to interpret this as 1 sample with 4 features, it now requires a NumPy array of shape (1, 4), which is a 2D array. There are three separate ways to fix this:
1. Explicitly change the shape to (1, 4):
import numpy as np
X_new = np.reshape([3, 5, 4, 2], (1, 4))
knn.predict(X_new)
2. Tell NumPy that you want the first dimension to be 1, and have it infer the shape of the second dimension to be 4:
import numpy as np
X_new = np.reshape([3, 5, 4, 2], (1, -1))
knn.predict(X_new)
3. Pass a list of lists (instead of just a list) to the predict method, which will get interpreted as having shape (1, 4):
X_new = [[3, 5, 4, 2]]
knn.predict(X_new)
Solution #2 is scikit-learn's suggested solution. Solution #3 is the simplest, but also the least clear to someone reading the code.
Thanks for the explanation. I think i will use second option. Which one you are using? :)
If I'm writing code for myself, I use option #3, otherwise I use option #2.
Hi thanks for these video, they are amazing!
One thing I noticed: it turns out that with the 0.17 release if you just type
>>knn.predict(X_new)
nothing will be output.
My workaround is to type
>>print knn.predict(X_new)
>>[2]
But I am not sure it is the best solution...
Glad the videos are helpful to you!
I'm using scikit-learn 0.17, and I'm not seeing the behavior you are describing. Are you sure you're running exactly the same code, in exactly the same order I'm running it?
clear cut and to the point .thanks.
+vishwas s You're welcome!
Just to clarify ... it's a "(n_samples, n_features) matrix" ... not a "feature matrix" as you simply put ... great video ... thumbs up ...
The scikit-learn documentation refers to it as a "feature matrix", thus I do as well. Calling it a "feature matrix" indicates that it's made up of features, and it's 2-dimensional, and it's implied that the other dimension is the samples.
Hi, thank you for your explanation it's very clear. But there is something I don't really understand. At 4:00 you say that the data is represented by 2 numerical features, so you have two axes X and Y. But what if there were more features like the iris dataset. How does NNB work in that case? Is it taking the same steps as you explain in this video, but not on a 2D graph but 4D graph?
I don't know how to explain this briefly, I'm sorry!
@@dataschool That is very unfortunate to hear. Is it possible if we make an appointment via skype? I do not get out and need this information for my thesis. I hope you can help me with it.
I don't work with anyone one-on-one, I'm sorry! However, you are welcome to join Data School Insiders and ask a question during a live webcast or on the private forum: www.patreon.com/dataschool - I prioritize answering questions from Insiders because they are investing in me.
Very effective tutorial series
+HarshaXSoaD Thanks! Glad it's helpful to you.
Great series of video. Thanks.
You're welcome! Glad you are enjoying them :)
Thanks for another great video. I have one conceptual question regarding implementing logistic regression as shown in the video. What I understand is logistic regression is used where the outcome is binary (for example, A or B), In Iris dataset the outcome can be from 3 categories so how does logistic regression work here.
That's really a mathematical question rather than a conceptual question. Logistic regression can be used for multiclass problems, and a few details are here: scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
You can learn more about this topic by searching for multinomial logistic regression.
Hope that helps!
Great videos, as great talent and easy method to teach, thanks!
You're welcome!
Hello, Thanks for explaining such topics in clean manner. Could you please do a explanation for categorical and numerical NAN values imputations? ( How to handle NAN in Machine Learning ? )
Thanks for your suggestion!
6:01 Sir, is the white area color called outlier / noise?
This video series is the best I have watched about scikit-learn so far.
By the time I finished watching all the videos; I will let you know my comments.
At this time I am just wondering if you have in mind to something similar but based on the R-project platform. I mean, to go over the principal supervised and unsupervised machine learning methods but sin R-project. Are you planning to do this?
Glad you like the videos! To answer your question, I'm not planning on making any more videos about R at this time.
How about other machine learning models using python scikit-learn like tree models, cluster models, SVM model and Neural networks?
I do plan on covering more machine learning in Python in the future! :)
Great!
I hope you can cover other algorithms like SVM, Decision Trees, Random forest and, Discriminant Analysis (DA).
At the same time, by the time we use linear regression, Logistics regression and DA, for instance, sometimes we need to tune our models to check if they are in line with the assumptions. It would be nice if you can consider those topics in the next video series about machine learning and scikit-learn. How can we get the output graphs in all the models too? For instance: Tree graphs; ROC curve graph, etc. I am already working on it by myself (a lot of googling and reading work so far) but it would be great to have this add-in from you too.
You are a great teacher, very precise and direct. Besides, you gave very good additional support in your notes that follows each video. So, it is very easy to follow your examples and recommendations.
I hope you can get the new video series done soon.
My best regards!
Thanks so much for your detailed suggestions, and your kind comments! I really appreciate it and will certainly consider your suggestions.
You are great teacher! Thank you very much!
You're very welcome!
Lovinng the series
Thanks!
You are awesome! Thanks to you for making these videos.
You're very welcome!
These video's are really great, thanks!
You're welcome!
Really enjoying this series! Thanks for creating it. Do you know where I might find the code for making one of the lovely classification maps you show at e.g. 4.15?
I'm sorry, I don't know how those classification maps were made! If you find a way, feel free to let me know :)
OK, will do! Thanks for letting me know!
This solution: scikit-learn.org/stable/auto_examples/neighbors/plot_classification.html
works straight out of the box for Iris data, though sadly struggling to adapt it to my dataset.
Thanks for sharing! Keep in mind that this technique will only work with two features.
Hi, I am getting an error as "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT." and the predictions is coming as array([0, 0]). Any help would be appreciated.
try logreg = LogisticRegression(solver = 'liblinear') instead
@@alenaosipova4660 Thanks a lot, that helps.
Exactly!
Thank you for the video. It is really well-explained. I got several future warnings with LogisticRegression. When I used the fit method, it says that the default solver would change to 'lbfgs' and that I should specify a solver. Also, I got a warning that the default multi_class is also going to change to 'auto' and that I have to specify the multi_class myself. Even after I specify these two, I get a ConvergenceWarning, claiming that lbfgs failed to converge. I am new to machine learning, and I don't know what to do. Can you please tell me what I can do to solve these warnings?
You're doing the right thing! I would just try a different solver. Sorry, I know these warnings can be hard to understand.
Very good explanation. Thank you.
***** You're very welcome!
Clear to understand. Thanks kevin
You're welcome!
This totally saved me. I love you so much.
Ha! You are very welcome :)
Great communicator! Thanks!
Thank you!
Thanks for the video! I noticed you have code completion in your notebook, though I'm not seeing it in my notebook (I installed the latest version of Jupyter). Does it require some kind of plugin? Looking forward to the next lesson!
Laurent Van Winckel You're welcome! It looks like tab completion requires the readline library. More information is here: ipython.org/ipython-doc/stable/interactive/tutorial.html
Let me know if you are able to get it working!
Data School Ah the tab key! It's working. Thank you!
Laurent Van Winckel Ah, I should have mentioned that I was hitting the Tab key! :)
thanks for this great work, have got question. I am using Anaconda distribution, while coding in the notebook, how would I get the pop up list of the objects as you get ,,ex. when you "from sklearn." after that it list methods/properties and you select the one you need..in this case you selected "neighbor"..I am not getting all this option
I am hitting the Tab key to autocomplete. Does that work for you?
Oh Yes, Thanks Sir:) My mind froze
No problem! :)
Thank you sir for your vedio on machine learning.Sir,i got an erro saying expected 2d array,got 1d array instead when using knn.predict.
Nb: got the outpu when it is knn.predict( [ [ 3,5,4,2 ] ] ). 2 [ is used
the same problem for me please if you have resolve this problem please let me know
@@salmafrikha7228 use 2 brackets as given in my comment.it worked for me.but i dont know how he got.maybe he used numpy
See this blog post for a detailed explanation: www.dataschool.io/how-to-update-your-scikit-learn-code-for-2018/#only2ddataarrayscanbepassedtomodels
When I try to do the Linear Regression part, when I predict (Logreg.predict), I get the wrong array [0,0] and a convergence error. What should I do? I tried changing the max_iter number/verbose... but I have no idea if that is where the problem comes from.
The default solver changed in version 0.22, so you can try solver='liblinear' instead.
Please inform the reason why you selected KNeighborsClassifier in this case. Why not other classifiers?
It was just for teaching purposes.
Thanks Kevin, these are really great!
You are very welcome!
Thanks for material ! One question : In the video you try to solve classification problem with regression model , if I recall correctly from previous video , the regression models are good for regression problems and classification models for classification problems . Is there any criteria when I can choose with confidence regression model to solve classification problem ?
Eli Lavi Actually, "logistic regression" is a classification model (not a regression model), despite its name! That's why I used logistic regression in this case.
There are some limited circumstances in which regression models can be used to solve classification problems, but it usually doesn't make sense. I wouldn't worry about it for now... that's a very advanced technique.
Does that answer your question?
Hi Mark,
How do you prepare the data for prediction? If the train data has one empty column value in the row . How to replace this value? Can I use KNN for this purpose?
There's no one answer for this. Yes, you could use KNN, or other imputation methods, or you could drop the row.