Having problems with the code? I just finished updating the notebooks to use *scikit-learn 0.23* and *Python 3.9* 🎉! You can download the updated notebooks here: github.com/justmarkham/scikit-learn-videos
I know im randomly asking but does any of you know of a method to log back into an instagram account? I was stupid forgot the account password. I appreciate any help you can give me!
@Stetson Davian thanks for your reply. I found the site through google and im trying it out now. I see it takes quite some time so I will get back to you later when my account password hopefully is recovered.
I'm a beginner but your way of teaching makes me love machine learning, I feel it's so easy. Even you make me understand how the algo is working behind the scene. Love from India...
This is unreal! I literally abandoned my datacamp machine learning course for this one and no regret at all. I especially like that you taught the underlying mathematical concept of how these codes come to be. You also speak clear and understandable English plus the sound system is top notch. I've taken your Data science course and your and prof Allen's remains my best to date with Hugo's coming in a distant 3rd. And to think you recorded this more than 7 years ago makes you conclude that this is way ahead of its time
You're a way better instructor than my college professors. The syntax is fairly simple and the explanation of the statistical intuition behind the metrics made this enjoyable.
MANY THANKS!!! All other data science tutorials (for beginners) go by way to quickly. Some people may find you going slowly a nuisance, but I found it to be EXTREMELY HELPFUL. THANK YOU! Subbed ^__^
I was searching for appropriate videos on ML from long time. After following this series i can say that it is the best which i have ever seen.Each and every concept is covered with great detail. Same applies for study material and links. Thanks Data School .....!!!!
*Note:* This video was recorded using Python 2.7 and scikit-learn 0.16. Recently, I updated the code to use Python 3.6 and scikit-learn 0.19.1. You can download the updated code here: github.com/justmarkham/scikit-learn-videos
Your video tutorial is outstanding! You can simplify complex concepts in an elegant manner. And unlike other instructors you don't show-off on how smart you are. That's why we know that you're really a smart guy :)
More pandas please! And more Seaborn! A large part of Machine Learning is "messing" with the data BEFORE you apply any of the algorithms on it, and pd and sns are really good at that. Also, I think it'd be interesting (maybe latter in the series) that you could go on an all out example, like working with the titanic dataset from Kaggle, and giving hints on how to visualize, understand the data and choose the best algorithm for it. As a final note, I'm already a bit familiar with the techniques you use, but your comments and clear explanations makes everything clearer and helps me fixate some of these techniques. Thank you for that! Excellent series, and keep on the good work.
Antonio Augusto Santos Thanks for the feedback! I am planning to cover more examples later in the series, probably using a Kaggle competition. And, I appreciate your kind words! I was hoping to reach both users new to machine learning and those with some machine learning familiarity, so it's nice to hear that it's working :)
+Siddharth Gupta For people who have random chunks of exposure to certain aspects of sklearn/pandas/etc: watch the video at 1.25 or 1.5x speed. You can get through the lesson faster, and the increased speed will actually have a counterintuitive effect of making you focus more. Also when you start losing focus or miss a concept, you will notice right away because you will suddenly be totally lost, so you will know to rewind.
Thank-you so much for your explanations of sk-learn, it finally makes sense to me! I'm already pretty familiar with Pandas so I'd love to learn more about sk-learn, because I feel there are so many other machine learning algorithms I'd love to get my head around.
Thank you for the awesome videos. I am currently learning Machine Learning as part of a course. I don't have previous knowledge of Python (currently learning an introduction to Python as well), I am really struggling to understand; this is my midterm break; I found one of your videos while I was searching, I am one of the fortunate to found your videos. Thanks for your effort.
This is the best video tutorial series on Machine learning I have seen. You have hooked me up! Thanks for creating the series and you are an amazing teacher. Keep it up!
wonderful videos! I would like you to focus on scikit-learn, and your style of teaching which combines hands-on with scikit-learnt, real examples, explanation of ML techniques are very helpful!
Say one thing....you are an excellent teacher. My teachers at engineering school and on Udemy don't explain things half as well as you do! That should tell you a lot! I wish I could hire you personally.
As for an answer for your question: I would like to learn more about sklearn. Pandas is amazing, and I'm just starting to learn it, but there are already a lot of nice tutourials out there. Keep up the good job :)
really thankful for your video series. it is straightforward and easy to understand, highly recommend to other guys who are interested in python, machine learning etc.
This is one of the best available online resource for introduction to data science. Thank you for these amazing videos. Its teachers like you who inspire students like me :)
Very very great way teaching. I really liked the speed and pronounce you do, the possible mistakes which you cover, also explanation. This is great series and you are a great tutor. Fan of you and subscribed. Please make a separate series on Machine Learning (Bit more detailed), Deeplearning, AI, Data Science. I am not sure which one should be learnt first and how. I decided you are the best guru for me to make me some good level in all these skills. Please help.
Great work,please upload more tutorials lyk these,really helpful to get started. Before watching this tutorial i was not at al aware of ML,but now after watching 4/5 videos i've got a good overview ,thank you
Hi Kevin, First of all thank you very much for those great videos. If you have a chance to make tutorial regarding deep learning it would be great. You are the best instructor, I've ever seen in this field. You are the best
Excellent and straight to the point content again. Thanks a lot for the videos and also the additional references you provide. It's always good to know where to go next :) And please continue on with scikit-learn rather than pandas/seaborn.
Watched all your videos. Your teaching skills are amazing, thank you for compiling those videos. I'm looking forward to your next videos about machine learning using sklearn.
+AvivProg Wow, thank you! You are very welcome -- I enjoyed creating the videos. Here is the playlist containing the entire video series: ua-cam.com/play/PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A.html
Pretty amazing video! +1 for sk-learn as next video in this series. I also think that plotting stuff helps a lot. Whenever possible it would be nice to show seaborn in action. Great job and looking forward to the next one.
I am answering your question 5 years later but I would love to see more video tutorials from you about scikit-learn (e.g Neural network models (supervised)) or scikit-multilearn if you want!! :) Thnx a lot Kevin!
Cool video!I just finish your pandas video series, but I thought pandas should be learned before the sklearn, well, anyway thank you for making such great videos for us.
These videos are outstanding. Am new to data science and many of the videos are too simple or too hard. You have found the goldilocks zone of data science. I also like that they are on youtube where I can speed them up to 1.5x to match my comprehension rate.Vimeo can't do that. I would like you to focus on Scikit, but use Pandas as most of use will be using both. I think a single lesson on how to use Pandas, as well as how to customize Ipython/Jupyter, would also be useful. I'd also like to see a video focused on data sources and on how to approach complex problems (ala kaggle challenges) Improvement suggestions: 1. Focus on technnical quality. Use basic stage lighting (difussed above, side, front, w/ reflector) and a condensor mic to better pic up your voice w/o echo. 2) put a whiteboard or suchsimple background behind you - way to much background clutter. And I think you are missing an opportunity to end with marketing your courses at data school, your book, etc.Not that I love ads, but... marketing!
Harvey Summers Thanks for all of the suggestions, and your kind comments! Very helpful. Building up to more complex problems is definitely on the list. And, it's nice to know that I'm hitting the "sweet spot" in terms of difficulty level.
Your videos really helped me understand the sklearn basics easily. It would be great if you could do a similar video series on SVMs using scikit-learn and its applications. Your explanations and methods are great! Thanks a lot!
Great video!! Thanks for that. I'd like to keep learning about Scikit-learn. Although, Pandas is also definitely a powerful Python data analysis toolkit.
Great video once again. I think the focus of this series should be on ML and Scikit learn. You can explain the relevant pandas code wherever required as you did in this video. One question: Is there any algorithm in ML which can select the most relevant / explanatory predictor variables (features) from the data set (instead of user using trial and error approach)? I think this is critical for the data sets with high number of features
umair durrani Great question! There is no "silver bullet" for feature selection, meaning no single strategy that will always tell you which variables to keep in your model. Domain understanding, data exploration, and human intuition are key. That being said, the Random Forests model will give you a measure of "variable importance" (on a scale of 0 to 1), and you could use that to guide the selection. As well, regularized linear models will shrink coefficients down to zero as the "penalty term" increases, effectively performing feature selection. Just keep in mind that both need to be tuned to perform properly, and features need to be scaled when performing regularization. scikit-learn has some more guidance on feature selection here: scikit-learn.org/stable/modules/feature_selection.html Thanks again for your kind and helpful comments!
umair durrani Umair, there are several useful techniques for feature selection that I recommend you look into. Statistical methods such as forward- and backward-elimination are perfectly suited for determining the most predictive variables in a regression model and easy to understand and implement. Decision Trees inherently perform feature selection in that the variable splits are deemed significant and automatically chosen by the algorithm. A bit more on the complex side are Principle Component Analysis (PCA) and Association Rules which I believe PCA is in sci-kit-learn. Good luck! Darron. www.linkedin.com/in/votefordata
+Data School Could you please advise in another course more about Feature Selection? Which models are more suitable for several cases etc. Like for example, sorting features' scores from RandomizedLasso, or by ranking from RecursiveFeatureElimination, or by selecting K best?
It's wonderful tutorial ever I seen regarding machine learning. I expect more videos related to machine learning. if you made some video regarding some optimization technique of linear regression, then it should be more beneficial. ( like bfgs etc )
Thank you so much for the video, really great introduction to Pandas and SKlearn, I hope you can focus more on the sklearn with pandas dataframe, again, thanks for the great video!
Great tutorial!! After watching this and looking at the sklearn docs, it seems as if the LinearRegression() object has only coef_ and intercept_ attributes. Does sklearn not provide metrics such as standard errors, t-statistics, p-values, and R-squared? If not, what is the reasoning behind it ? Thanks.
Troy Walters Thanks for your comment! You can indeed compute R-squared using the r2_score function in the sklearn.metrics module. Regarding the others, I think the scikit-learn contributors would argue that those metrics belong in a statistics library, not a machine learning library. Here is a relevant discussion from the scikit-learn mailing list: www.mail-archive.com/scikit-learn-general%40lists.sourceforge.net/msg13102.html
Fantastic tutorial series for PYTHON beginners ...Can you please start teaching us deep learning and neural network? I learn PANDAS, Numpy from your tutorial.. Thanks a lot man
+C. Drithin It's correct to say that a unit increase in radio spending is associated with a greater increase in sales than a unit increase in TV spending.
Absolutely amazing material, thank you Kevin! I just wanted to know how would you deal with non-numerical features (i.e Gender, Occupation, Education, etc.) when constructing your ML model? Would you assign them numerical values? If possible, I'd like some guidance or a push in the right direction. Again you explain this material much better than most channels do, please keep up the phenomenal work!
Thanks for good video ! Will be great if you can in a future video take any data set from some kaggle competition any try to work with , feature engineering is an interesting issue too. Two technical notes : - for people who works with proxy , to install seaborn with anaconda have to define http/https proxy first , so on anaconda prompt execute following command : "set http_proxy=X.X.X.X:port_number" - for Python 3 users zip command looks like : "list(zip(feature_cols,linreg.coef_))"
Hi Kevin, I'm new to both Python and machine learning. Your tutorials are great learning materials. I understanding this is a 5-year old presentation and I'm wondering if you would still answer a question I have related to this tutorial. Specifically, when I was trying to get the pairplots you demonstrated, I got the following error: KeyError: "['Sales'] not in index" and I got three blank boxes. What was wrong? Many Thanks for your help. FYI, I also tried to find answers by Googling online and haven't been able to find any answers that work.
It's not perfect, but if you check out minutes four and five you'll see via the .head() and .tail() methods 10 records out of the overall 200 with sales in the 5 - 25 range. Aside from that I'm not sure it's explicitly explained aside from the reference you called out.
I was not stating this based on the RMSE. Rather, I was stating this because I knew this about the dataset from examining it. Sorry that was not clear!
Thank you for the awesome videos, clear and to the point. However, I have a question regarding the retraining for the feature selection part (starting 30:31) : Won't it introduce data snooping bias when retraining to pick for different features?
Great content, you have an inspiring way of presenting, keep it up! I have one question though, why is the TV coefficient smaller than the Radio coefficient, even though from the plots and best fit line it looks like the sales go up faster with more TV ad spending?
Having problems with the code? I just finished updating the notebooks to use *scikit-learn 0.23* and *Python 3.9* 🎉! You can download the updated notebooks here: github.com/justmarkham/scikit-learn-videos
I know im randomly asking but does any of you know of a method to log back into an instagram account?
I was stupid forgot the account password. I appreciate any help you can give me!
@Aryan Terrance instablaster ;)
@Stetson Davian thanks for your reply. I found the site through google and im trying it out now.
I see it takes quite some time so I will get back to you later when my account password hopefully is recovered.
@Stetson Davian It did the trick and I now got access to my account again. I'm so happy:D
Thank you so much you really help me out :D
@Aryan Terrance no problem :)
I'm a beginner but your way of teaching makes me love machine learning, I feel it's so easy. Even you make me understand how the algo is working behind the scene. Love from India...
That's awesome to hear! 😊
This is the best ML tutorials I have ever seen! Thank you very much Sir.
Thank you!
This is unreal! I literally abandoned my datacamp machine learning course for this one and no regret at all. I especially like that you taught the underlying mathematical concept of how these codes come to be. You also speak clear and understandable English plus the sound system is top notch. I've taken your Data science course and your and prof Allen's remains my best to date with Hugo's coming in a distant 3rd. And to think you recorded this more than 7 years ago makes you conclude that this is way ahead of its time
Thank you so much for your kind words, Moruf! 🙏
You're a way better instructor than my college professors. The syntax is fairly simple and the explanation of the statistical intuition behind the metrics made this enjoyable.
Thanks very much for your kind words! Really appreciate it!
MANY THANKS!!!
All other data science tutorials (for beginners) go by way to quickly. Some people may find you going slowly a nuisance, but I found it to be EXTREMELY HELPFUL. THANK YOU! Subbed ^__^
Awesome! That's so great to hear... thanks very much for your comment!
@@dataschool yes very good explanation for the beginner like me
I was searching for appropriate videos on ML from long time. After following this series i can say that it is the best which i have ever seen.Each and every concept is covered with great detail. Same applies for study material and links. Thanks Data School .....!!!!
That is great to hear, thanks so much for your very kind words!!
I watch way too much training videos and I would like to say that I wish you were the presenter in all of them. You rule at this training thing!
Thanks so much! :)
Kinda complete one, putting together all at-once! The best, I have watched until now!
Thank you!
*Note:* This video was recorded using Python 2.7 and scikit-learn 0.16. Recently, I updated the code to use Python 3.6 and scikit-learn 0.19.1. You can download the updated code here: github.com/justmarkham/scikit-learn-videos
Can we please get a video about ensemble learning (bagging and boosting)
These are the best tutorial series on machine learning.
Wow, thank you so much!
To be candid, this is the best video I've ever watched on scikit-learn. Thumbs up!!!
That's awesome to hear... thank you! 🙏
Your video tutorial is outstanding! You can simplify complex concepts in an elegant manner. And unlike other instructors you don't show-off on how smart you are. That's why we know that you're really a smart guy :)
Thank you SO MUCH for this kind comment! I truly appreciate it.
Please add more videos to the series. It is really helpful and amazing to watch your videos. You are a great teacher.
Thanks for your suggestion, and for your kind words!
More pandas please! And more Seaborn!
A large part of Machine Learning is "messing" with the data BEFORE you apply any of the algorithms on it, and pd and sns are really good at that.
Also, I think it'd be interesting (maybe latter in the series) that you could go on an all out example, like working with the titanic dataset from Kaggle, and giving hints on how to visualize, understand the data and choose the best algorithm for it.
As a final note, I'm already a bit familiar with the techniques you use, but your comments and clear explanations makes everything clearer and helps me fixate some of these techniques.
Thank you for that! Excellent series, and keep on the good work.
Antonio Augusto Santos Thanks for the feedback! I am planning to cover more examples later in the series, probably using a Kaggle competition. And, I appreciate your kind words! I was hoping to reach both users new to machine learning and those with some machine learning familiarity, so it's nice to hear that it's working :)
Your videos are fantastic, for people with random gaps in their knowledge you explain things very clearly.
+Siddharth Gupta For people who have random chunks of exposure to certain aspects of sklearn/pandas/etc: watch the video at 1.25 or 1.5x speed. You can get through the lesson faster, and the increased speed will actually have a counterintuitive effect of making you focus more. Also when you start losing focus or miss a concept, you will notice right away because you will suddenly be totally lost, so you will know to rewind.
+Siddharth Gupta Thanks for your kind comments!
Thank-you so much for your explanations of sk-learn, it finally makes sense to me! I'm already pretty familiar with Pandas so I'd love to learn more about sk-learn, because I feel there are so many other machine learning algorithms I'd love to get my head around.
***** Nice! I love to hear that my explanations are helping things to "click" for people. Thanks for your comment!
Definitely one of the best tutorials I've ever watched. Can't wait to work through the 3 hour presentation at the end of this. Thank you!
Thanks so much for your very nice comment! You're very welcome! :)
Thank you for the awesome videos. I am currently learning Machine Learning as part of a course. I don't have previous knowledge of Python (currently learning an introduction to Python as well), I am really struggling to understand; this is my midterm break; I found one of your videos while I was searching, I am one of the fortunate to found your videos. Thanks for your effort.
You're very welcome! Glad I could help!
Really appreciate that you also explain the algorithms and how to find the coefficient governing the equations. Thank you so much!
This is the best video tutorial series on Machine learning I have seen. You have hooked me up! Thanks for creating the series and you are an amazing teacher. Keep it up!
+Aashish Kumar You're very welcome, and thanks for your kind words!
You are undeniably the best tutor i have ever had. Thank you for teaching DS precisely. :)
Wow, thank you! I'm glad my teaching style works well for you :)
Your teaching methodology is best,you step by step teaching method is very helpful for me to understand.You are the best.
Thank you!
I wish you were my data analysis lecturer... Thank you very much for this.
Thanks very much for your kind words!
wonderful videos! I would like you to focus on scikit-learn, and your style of teaching which combines hands-on with scikit-learnt, real examples, explanation of ML techniques are very helpful!
+Kenny L Thanks for your kind comments and your feedback!
Word! I agree with you!
Kenny L i
Say one thing....you are an excellent teacher. My teachers at engineering school and on Udemy don't explain things half as well as you do! That should tell you a lot!
I wish I could hire you personally.
Thanks so very much for your kind words! You might be interested in joining my membership community: www.patreon.com/dataschool
Want to learn more pandas? I have a new video series about it: ua-cam.com/play/PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y.html
Hi, im a begginer in data science and your videos are helping me a lot of, thanks.
You're welcome!
As for an answer for your question: I would like to learn more about sklearn. Pandas is amazing, and I'm just starting to learn it, but there are already a lot of nice tutourials out there. Keep up the good job :)
Daniel Andreasen Good point! There are lots of Pandas tutorials already out there.
really thankful for your video series. it is straightforward and easy to understand, highly recommend to other guys who are interested in python, machine learning etc.
Awesome! Thanks for sharing it with others :)
Lots of great information at the end and links in the description. Very valuable. Really appreciate it!
Thanks!
Excellent description of the end-to-end ML flow. Thank you.
You're welcome!
You are the best teacher in the world. I learned something very important to me in this video. Thank you so much. Please keep the good work going.
Wow! Thank you so much for the very kind comment! Good luck to you :)
This is one of the best available online resource for introduction to data science. Thank you for these amazing videos. Its teachers like you who inspire students like me :)
Wow, what a kind comment! Thank you so much!
Gaurav, Im having trouble reading advertisemets.csv
Can you help ma?
Tutorial content is pretty cool. adding humor while explaining will add good experience for learners. :)
Thanks!
Thank you very much
Your teaching methodology is awesome making things crystal clear.
Thanks!
Very good content. I have tried so many video series for data science and this is by far the best! Thanks!
That's great to hear - thanks so much for your kind comment!
you made it so easy to learn. you lead me to ML right here. Thank you so much.
You're very welcome!
Wow, one of the best YT tutorials about this topic, thank you!
Thank you!
@Apocalypse-, Hey, I need some help!!
Really Awesome tutorials sir...
Its very easy to understand...Better that other ML tutorials I have watched...☺☺☺
Thanks for your kind comment!
Very very great way teaching. I really liked the speed and pronounce you do, the possible mistakes which you cover, also explanation. This is great series and you are a great tutor. Fan of you and subscribed. Please make a separate series on Machine Learning (Bit more detailed), Deeplearning, AI, Data Science. I am not sure which one should be learnt first and how. I decided you are the best guru for me to make me some good level in all these skills. Please help.
Thanks for your suggestions! I'll consider them for the future :)
Great work,please upload more tutorials lyk these,really helpful to get started.
Before watching this tutorial i was not at al aware of ML,but now after watching 4/5 videos i've got a good overview ,thank you
Great to hear! Thanks for your kind comment.
Hi Kevin, First of all thank you very much for those great videos. If you have a chance to make tutorial regarding deep learning it would be great. You are the best instructor, I've ever seen in this field. You are the best
Thanks so much for your kind words, and for your suggestion!
This guy is great at teaching. Much appreciated!
Thanks for your kind comment!
Excellent and straight to the point content again. Thanks a lot for the videos and also the additional references you provide. It's always good to know where to go next :)
And please continue on with scikit-learn rather than pandas/seaborn.
Romain Lepert Thanks for the feedback! :)
Thank you for making the effort to produce these videos. It's a great resource and your delivery is superb.
Wow, what a kind compliment, thank you so much!
Another excellent video.
Please continue to focus on ML and scikit-learn.
João Matos Thanks for your feedback, much appreciated!
Thank you! Your videos are helping to make the concepts click! This is the best resource I have found
You're very welcome!
தலைவரே - (tamil language) Thalaiva you are great.....
Thank you! :)
Watched all your videos. Your teaching skills are amazing, thank you for compiling those videos.
I'm looking forward to your next videos about machine learning using sklearn.
+AvivProg Wow, thank you! You are very welcome -- I enjoyed creating the videos.
Here is the playlist containing the entire video series: ua-cam.com/play/PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A.html
Pretty amazing video!
+1 for sk-learn as next video in this series. I also think that plotting stuff helps a lot. Whenever possible it would be nice to show seaborn in action.
Great job and looking forward to the next one.
Ricardo Ferraz Leal Thanks for the feedback!
Really perfect explanation and walk through. Thanks a lot!
You're very welcome!
I am answering your question 5 years later but I would love to see more video tutorials from you about scikit-learn (e.g Neural network models (supervised)) or
scikit-multilearn if you want!! :) Thnx a lot Kevin!
Thanks for your suggestions!
Thank you so much for having this series!
You're welcome!
This series is amazing, thank you!
You're welcome! Thanks for your kind words!
You are doing a great job !!!!!! Thank you very much for all your valuable videos !!! They are really helping me !!!! Thanks again :-)
That's great to hear! I'm glad the videos are helpful to you!
Cool video!I just finish your pandas video series, but I thought pandas should be learned before the sklearn, well, anyway thank you for making such great videos for us.
Great! I also have a scikit-learn video series: ua-cam.com/play/PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A.html
I totally agree, the excellent guide for data learning , visualisation and machine learning.Great work
Thanks for your kind comment!
These videos are outstanding. Am new to data science and many of the videos are too simple or too hard. You have found the goldilocks zone of data science. I also like that they are on youtube where I can speed them up to 1.5x to match my comprehension rate.Vimeo can't do that.
I would like you to focus on Scikit, but use Pandas as most of use will be using both. I think a single lesson on how to use Pandas, as well as how to customize Ipython/Jupyter, would also be useful. I'd also like to see a video focused on data sources and on how to approach complex problems (ala kaggle challenges)
Improvement suggestions: 1. Focus on technnical quality. Use basic stage lighting (difussed above, side, front, w/ reflector) and a condensor mic to better pic up your voice w/o echo. 2) put a whiteboard or suchsimple background behind you - way to much background clutter.
And I think you are missing an opportunity to end with marketing your courses at data school, your book, etc.Not that I love ads, but... marketing!
Harvey Summers Thanks for all of the suggestions, and your kind comments! Very helpful. Building up to more complex problems is definitely on the list. And, it's nice to know that I'm hitting the "sweet spot" in terms of difficulty level.
Thanks!
Wow, thank you so much! That is truly kind of you! 🙏
Your explanations are wonderful. Thank you.
Thanks!
Your videos really helped me understand the sklearn basics easily. It would be great if you could do a similar video series on SVMs using scikit-learn and its applications. Your explanations and methods are great!
Thanks a lot!
Thanks for your suggestion as well as your kind words! I appreciate it :)
The csv file does not load up. Has the url changed?
Thanks for your lessons :-)
Clear, detailed and to the point.
Thanks for your kind comments!
Great video!! Thanks for that. I'd like to keep learning about Scikit-learn. Although, Pandas is also definitely a powerful Python data analysis toolkit.
Sebastian Pineda Arango Glad you liked it! Thanks for the feedback.
amazing videos. Very streamlined and easy to understand.
Thanks!
You are the best.
Thank you, I'm glad this content is useful to you!
Currently the url for the dataset is : faculty.marshall.usc.edu/gareth-james/ISL/Advertising.csv
Thanks for sharing! I also have it on GitHub: github.com/justmarkham/scikit-learn-videos/tree/master/data
Great video once again. I think the focus of this series should be on ML and Scikit learn. You can explain the relevant pandas code wherever required as you did in this video.
One question: Is there any algorithm in ML which can select the most relevant / explanatory predictor variables (features) from the data set (instead of user using trial and error approach)? I think this is critical for the data sets with high number of features
umair durrani Great question! There is no "silver bullet" for feature selection, meaning no single strategy that will always tell you which variables to keep in your model. Domain understanding, data exploration, and human intuition are key.
That being said, the Random Forests model will give you a measure of "variable importance" (on a scale of 0 to 1), and you could use that to guide the selection. As well, regularized linear models will shrink coefficients down to zero as the "penalty term" increases, effectively performing feature selection. Just keep in mind that both need to be tuned to perform properly, and features need to be scaled when performing regularization. scikit-learn has some more guidance on feature selection here: scikit-learn.org/stable/modules/feature_selection.html
Thanks again for your kind and helpful comments!
umair durrani Umair, there are several useful techniques for feature selection that I recommend you look into. Statistical methods such as forward- and backward-elimination are perfectly suited for determining the most predictive variables in a regression model and easy to understand and implement. Decision Trees inherently perform feature selection in that the variable splits are deemed significant and automatically chosen by the algorithm. A bit more on the complex side are Principle Component Analysis (PCA) and Association Rules which I believe PCA is in sci-kit-learn. Good luck! Darron. www.linkedin.com/in/votefordata
+Data School Could you please advise in another course more about Feature Selection? Which models are more suitable for several cases etc. Like for example, sorting features' scores from RandomizedLasso, or by ranking from RecursiveFeatureElimination, or by selecting K best?
+Sabr Tasbolatov Thanks for the suggestion! I'll consider it for the future.
I just released a video about feature selection which might be helpful to you! ua-cam.com/video/YaKMeAlHgqQ/v-deo.html
very useful video on liner regression
thanks very much Mr.
Kevin Markham
You're very welcome! :)
Such great content you provide sir! Thank you so much.
You're very welcome!
It's wonderful tutorial ever I seen regarding machine learning. I expect more videos related to machine learning. if you made some video regarding some optimization technique of linear regression, then it should be more beneficial. ( like bfgs etc )
Thanks so much for your kind words! I'll take your suggestion under consideration.
Cool stuff, would like to see more pandas integrated with scikit learn.
Tony770jr Thanks for the suggestion!
Thank you so much for the video, really great introduction to Pandas and SKlearn, I hope you can focus more on the sklearn with pandas dataframe, again, thanks for the great video!
+Siming Zhao You're very welcome, and thanks for your comment!
I absolutely love what you do!. Thank you very very much!
You are very very welcome!
These videos helped me a lot! Thank you so much!!
Great, I'm glad the series is helpful to you!
Super Helpful! Your explanation are clear and clean :) thanks
You're very welcome!
Nicely presented and delivered. Thank you!. I have subscribed to your channel!
Thank you!
dude you're one of the best
Thank you very much for this video series!!! This is really helpful!
+shalin LUO You're very welcome!
Great tutorial!! After watching this and looking at the sklearn docs, it seems as if the LinearRegression() object has only coef_ and intercept_ attributes. Does sklearn not provide metrics such as standard errors, t-statistics, p-values, and R-squared? If not, what is the reasoning behind it ? Thanks.
Troy Walters Thanks for your comment! You can indeed compute R-squared using the r2_score function in the sklearn.metrics module. Regarding the others, I think the scikit-learn contributors would argue that those metrics belong in a statistics library, not a machine learning library. Here is a relevant discussion from the scikit-learn mailing list: www.mail-archive.com/scikit-learn-general%40lists.sourceforge.net/msg13102.html
Fantastic tutorial series for PYTHON beginners ...Can you please start teaching us deep learning and neural network?
I learn PANDAS, Numpy from your tutorial..
Thanks a lot man
Thanks for your suggestion!
At 20:35, Isn't 0.179 > 0.046 ?. Then Radio ads should lead to an increase in more sales than Tv right?
Please clarify :)
+C. Drithin It's correct to say that a unit increase in radio spending is associated with a greater increase in sales than a unit increase in TV spending.
Hi, the file URL isn' valid. Can you please share it?
Wow! this is very clear. You are the best.
Thanks very much for your kind comment!
Very helpful video !!! thanks for sharing your knowledge.
looking forward for more !!
You're very welcome! Glad to hear it was helpful to you!
Impressive teacher!
Absolutely amazing material, thank you Kevin!
I just wanted to know how would you deal with non-numerical features (i.e Gender, Occupation, Education, etc.) when constructing your ML model? Would you assign them numerical values? If possible, I'd like some guidance or a push in the right direction.
Again you explain this material much better than most channels do, please keep up the phenomenal work!
Thanks very much for your kind words!
This might be helpful to you: ua-cam.com/video/0s_1IsROgDc/v-deo.html
Thanks for good video !
Will be great if you can in a future video take any data set from some kaggle competition any try to work with ,
feature engineering is an interesting issue too.
Two technical notes :
- for people who works with proxy , to install seaborn with anaconda have to define http/https proxy first , so on anaconda prompt execute following command : "set http_proxy=X.X.X.X:port_number"
- for Python 3 users zip command looks like :
"list(zip(feature_cols,linreg.coef_))"
Eli Lavi Sounds good... thanks for the notes!
Great videos - all of them! Thanks for doing this.
Thanks for your kind comment!
wonderful videos for machine learning beginners.
Thanks! Glad it was helpful to you.
Thanks. Awesome tutorials. I'm learning a lot. Thank you again.
You're very welcome!
Hi Kevin, I'm new to both Python and machine learning. Your tutorials are great learning materials. I understanding this is a 5-year old presentation and I'm wondering if you would still answer a question I have related to this tutorial. Specifically, when I was trying to get the pairplots you demonstrated, I got the following error: KeyError: "['Sales'] not in index" and I got three blank boxes. What was wrong? Many Thanks for your help. FYI, I also tried to find answers by Googling online and haven't been able to find any answers that work.
at 29:28, how could you indicate 'sales ranged from 5 to 25' based on the RMSE?
It's not perfect, but if you check out minutes four and five you'll see via the .head() and .tail() methods 10 records out of the overall 200 with sales in the 5 - 25 range. Aside from that I'm not sure it's explicitly explained aside from the reference you called out.
I was not stating this based on the RMSE. Rather, I was stating this because I knew this about the dataset from examining it. Sorry that was not clear!
Thank you sooo much these are the best tutorial series :)
Thanks for your kind comment!
Could you teach how to program Neural Networks and SVM using sckit-learn ?
+ankit biradar Thanks for the suggestion! I'll consider it for a future video.
Thank you for the awesome videos, clear and to the point. However, I have a question regarding the retraining for the feature selection part (starting 30:31) : Won't it introduce data snooping bias when retraining to pick for different features?
Great content, you have an inspiring way of presenting, keep it up!
I have one question though, why is the TV coefficient smaller than the Radio coefficient, even though from the plots and best fit line it looks like the sales go up faster with more TV ad spending?
Very Very nice explanation. Thank you Kevin
You're welcome!
I am getting a parser error for reading the csv file from the website. (3:00)
Here it is..
github.com/justmarkham/scikit-learn-videos/tree/master/data