Comparing machine learning models in scikit-learn

Поділитися
Вставка
  • Опубліковано 24 гру 2024

КОМЕНТАРІ •

  • @dataschool
    @dataschool  3 роки тому +10

    Having problems with the code? I just finished updating the notebooks to use *scikit-learn 0.23* and *Python 3.9* 🎉! You can download the updated notebooks here: github.com/justmarkham/scikit-learn-videos

    • @emmittkyrie2965
      @emmittkyrie2965 3 роки тому

      Sorry to be off topic but does anyone know of a method to get back into an Instagram account..?
      I was stupid forgot the password. I would love any tricks you can give me!

    • @lyleelias8595
      @lyleelias8595 3 роки тому

      @Emmitt Kyrie Instablaster :)

    • @emmittkyrie2965
      @emmittkyrie2965 3 роки тому

      @Lyle Elias Thanks for your reply. I got to the site thru google and im trying it out atm.
      Seems to take quite some time so I will get back to you later with my results.

  • @rajatpai5048
    @rajatpai5048 5 років тому +3

    Really simple to understand. Doesn't make it seem like "its a library thing, library does it for ya". Thank you for doing this

    • @dataschool
      @dataschool  5 років тому

      You're very welcome! Thanks for your kind words!

  • @suhailchougle7315
    @suhailchougle7315 4 роки тому +5

    This is by far the best Sci-kit Learn tutorial on UA-cam. I can say this because I have seen almost every tutorial and this covers everything starting from scratch.I knew how all the algorithms work but what I needed was how do I implement those algorithms from loading the data set to all terminologies to checking the accuracy and what not and this series has everything I was looking for ,thank you so much for this.Really appreciate it.

    • @dataschool
      @dataschool  4 роки тому +1

      Wow! Thank you so much for your kind words! :)

  • @siddharthkotwal8823
    @siddharthkotwal8823 8 років тому +63

    That's some killer delivery, you didn't waste a word! Great tutorial!

  • @WanderingJoy
    @WanderingJoy 5 років тому +7

    "models that overfit have learned the noise in the data rather than the signal" - yes, well said!

    • @dataschool
      @dataschool  5 років тому

      Glad it was helpful to you!

  • @LordBadenRulez
    @LordBadenRulez 8 років тому +11

    I like the pace of these videos. You speak really slow and clear which helps your viewer to digest the information on the fly. Loving your work!

    • @dataschool
      @dataschool  8 років тому +4

      Thanks for the feedback! I'm really glad to hear that my presentation of the material works well for you. Good luck with your education!

    • @CausticCatastrophe
      @CausticCatastrophe 7 років тому

      Yeah, the slow pace is generally great, though personally I view these at 1.25 speed. Still clear at that rate too. :)

  • @dataschool
    @dataschool  6 років тому +19

    *Note:* This video was recorded using Python 2.7 and scikit-learn 0.16. Recently, I updated the code to use Python 3.6 and scikit-learn 0.19.1. You can download the updated code here: github.com/justmarkham/scikit-learn-videos

    • @dataschool
      @dataschool  6 років тому +3

      You're very welcome!

    • @aquaman788
      @aquaman788 4 роки тому

      @@dataschool can we have a lecture about Tensorflow?

  • @fubar0sid
    @fubar0sid 8 років тому +3

    Most notable take aways from the video:
    - "Plotting testing accuracy vs model complexity is a very useful way to tune any parameters that relate to model complexity."
    - "Once you have chosen a model and it's optimal parameters and are ready to make predictions on out of sample data, it's important to re train your model on all of the available training data."
    - Repeating the train/test split process multiple times in a systematic way using k fold cross_validation

    • @dataschool
      @dataschool  8 років тому

      Great summary! I approve :)

  • @priyanshugupta7591
    @priyanshugupta7591 4 роки тому +2

    Your way of delivery is exceptional. I have never seen somebody teaching so well like you. I made me interested in ML Thanks bro...God bless U

  • @beansgoya
    @beansgoya 7 років тому

    The last two videos are the best ones I’ve seen someone explain scikit learn’s predictions. Every other video jumps straight to the full analysis but in reality, you can predict in as little as 4 lines of code. Great job!

  • @pranavjoshi7021
    @pranavjoshi7021 8 років тому

    This video series sets such a high standards for Content, Context and Delivery of Machine Learning training ! Its a winner for all those who are starting to learn Machine Learning !! Thank you so much for your efforts Kevin !!!

    • @dataschool
      @dataschool  8 років тому

      Wow, thank you so much for your very kind comment! I really appreciate your support!

  • @frankacito8076
    @frankacito8076 6 років тому

    Your teaching style is outstanding. As someone who has used R in the past, I really appreciate the clarity of your explanations and demonstrations.

  • @pabloalonso5440
    @pabloalonso5440 8 років тому +38

    Dear Kevin. To me your videos are a reference, as those of Mr Andrew Ng. Very good job! Thank you very much from Spain :)

  • @miguelgutierrez2902
    @miguelgutierrez2902 4 роки тому +3

    Uno de los mejores manuales sobre "Machine learning" que he visto. Gracias por ofrecernos la oportunidad de aprender. Además, tu pronunciación es perfecta para hispanohablantes

  • @xiangxinzhang1770
    @xiangxinzhang1770 Рік тому +1

    When it comes to using the model for future predictions on real-life data, you can directly use the trained model without retraining it with the whole training data, including the test data. The idea is that the model has learned patterns and relationships from the training data that generalize well to unseen data, including real-life data.
    Retraining the model with the entire dataset, including the test data, is generally not recommended as it may lead to overfitting. Overfitting occurs when the model becomes too specific to the training data, capturing noise and irrelevant patterns, which can reduce its performance on new data.

    • @dataschool
      @dataschool  Рік тому

      Thanks for sharing, but if I'm understanding you correctly, I respectfully disagree.

  • @kushsheth4801
    @kushsheth4801 3 роки тому +1

    loving this series man just started out with ML and DS understanding everything

  • @FULLCOUNSEL
    @FULLCOUNSEL 7 років тому

    I thank God I landed on your videos. I see things clearer than ever. You are a gifted tutor. God bless you sir.

    • @dataschool
      @dataschool  7 років тому

      Wow, thanks so much for your incredibly kind comments!

  • @daokou1851
    @daokou1851 8 років тому

    A student from CN jumping across the Great Wall learned this excellent class. Thx.

    • @dataschool
      @dataschool  8 років тому

      Awesome! You're very welcome!

  • @juancastillo2249
    @juancastillo2249 8 років тому

    Wow I must say your teaching style is amazing. Very organized, thorough and easy to follow. Thanks for your time, and keep making great videos! I wish more professors were like you at my school.

    • @dataschool
      @dataschool  8 років тому

      +Juan P Castillo What a nice comment! Thank you so much for your generous words! I'm glad the series has been helpful to you :)

  • @DionMJulien
    @DionMJulien 8 років тому

    I couldn't agree more with berry jordaan.
    The way you deliver the content of a quite complex topic naturally guides me to want to learn more about machine learning.
    Thank you very much

    • @dataschool
      @dataschool  8 років тому

      Thank you so much for your comment - you're very welcome!

  • @NirajKumar-hq2rj
    @NirajKumar-hq2rj 8 років тому

    Excellent teaching !!! I am required to set up competency around advance analytic involving ML/DS (since I am coming from DWH and BI practice) in my organization, so I wanted to learn and practice. Now , I feel like taking this as a full time profession and become Data Scientist. It's so much fun and exciting work, such video has made it lot easier. Thank you !!!

    • @dataschool
      @dataschool  8 років тому

      Awesome! So glad to hear! Thanks for your kind words, and good luck on your educational journey :)

  • @kamruzzamantanim2055
    @kamruzzamantanim2055 6 років тому

    My confident level is super high to learn Machine Learning after seeing this video. Your every word is very clear and correct. Thank you very much.

  • @susmit03
    @susmit03 7 років тому +103

    for i in range(1, 10001) :
    print(“THANK YOU VERY MUCH")

    • @dataschool
      @dataschool  7 років тому +5

      HA! Love it! You're very welcome :)

    • @susmit03
      @susmit03 7 років тому +3

      Data School Thank you.
      Your reply shows your passion for programming.
      Keep up the good work of teaching.

    • @dataschool
      @dataschool  7 років тому +2

      Thanks! :)

    • @dioszegizoltan4493
      @dioszegizoltan4493 7 років тому +4

      No ,this is the correct one:
      while 1 == 1:
      print("THANK YOU VERY MUCH")

    • @shawndonaldson5674
      @shawndonaldson5674 7 років тому +7

      while True: print("THANK YOU VERY MUCH")

  • @frankgiardina205
    @frankgiardina205 4 роки тому +1

    Kevin this series is excellent, you are able to really simplify the topic to make it easy to learn Thanks

  • @personsname0
    @personsname0 6 років тому

    Best video series I've come across on sklearn! I tried a few other channels before this and was left feeling like I still had no idea what was going on, but after only 5 of your videos I already feel way more confident that I can actually get into it, cheers!

    • @dataschool
      @dataschool  6 років тому

      Awesome! Thanks for your kind comments, and good for you! :)

  • @vikramsamal85
    @vikramsamal85 7 років тому

    I have been reading from a lot of source but till date this series is the best! I wish there much more videos and reference which will take us to the advanced level!

    • @dataschool
      @dataschool  7 років тому

      Thanks so much for your kind comment!

  • @TheGautamj
    @TheGautamj 4 роки тому

    Man, he just makes it so easy to learn.
    Wish we had half as good teachers as him in school.

    • @dataschool
      @dataschool  3 роки тому +1

      Thank you so much Gautam!

  • @galustbayburcyan1083
    @galustbayburcyan1083 7 років тому +1

    I was looking for ML tutorials and can say that your videos are simply the best.Thanks a lot

    • @dataschool
      @dataschool  7 років тому

      Wow, thank you so much! What a nice comment!

  • @BrianMoyer-kq2gl
    @BrianMoyer-kq2gl Рік тому

    This was one of the best videos on the topic that I've found. Thank you for being so succinct and breaking this down so clearly!

  • @beansgoya
    @beansgoya 7 років тому +1

    “Overfitting learns the noise of the data, rather than the signals”
    I finally understand what overfitting means.

  • @jordandixon5307
    @jordandixon5307 7 років тому +1

    I really disliked machine learning after we got taught it at uni. you really have sparked my interest again thank you so much for this series.

    • @dataschool
      @dataschool  7 років тому

      That's great to hear! You are very welcome.

  • @RajeshSriMuthu
    @RajeshSriMuthu 6 років тому +1

    OMG finally found a ML tutor who is awesome.... i cant skip any seconds in your videos, even each words are informative

    • @dataschool
      @dataschool  6 років тому

      Thanks so much for your kind words! I truly appreciate it!

  • @muhammadbilalahmad4888
    @muhammadbilalahmad4888 7 років тому

    Thank you Kevin for sharing well organized, normal speed video lectures on scikit learn. These videos are very helpful to teach ML in python to graduate students. The links in the resources are also very valuable. You deserve appreciations. I would suggest to upload lectures ML with R.

    • @dataschool
      @dataschool  7 років тому

      You're very welcome! I'm glad to hear the videos have been helpful to you! I'm focused on Python these days, so I don't anticipate making any videos on R - sorry!

  • @TheGrebull
    @TheGrebull 5 років тому

    It used to be hard for me to learn Machine Learning, but now thanks to you it isn't anymore

    • @dataschool
      @dataschool  5 років тому +1

      Thanks so much! That is awesome to hear 😄

  • @uppubhai
    @uppubhai 8 років тому +51

    This is such a gem for beginners .Thank you very much Kevin

  • @bijayamanandhar3890
    @bijayamanandhar3890 3 роки тому

    Would you please clarify why we need to use `solver='liblinear'` as one of the parameters in LogisticRegression model. Why we assume rest of the parameters as default ? Also, why we import `metrics` from `sklearn` to have the score function work to compute accuracy where as we can simply make use of the `score` function straight from the model `LogisticRegression` that we imported?

    • @dataschool
      @dataschool  3 роки тому +1

      Great questions!
      1. liblinear was the default solver when I recorded the video, but it is no longer the default solver. Thus, in the current version of scikit-learn, you have to set it explicitly because it happens that the current default solver does not converge with this particular dataset.
      2. scikit-learn uses sensible defaults, and so generally you start with the defaults and tweak them as needed (or as part of the hyperparameter tuning process).
      3. You can use the score function, but the metrics module offers far more flexibility, and thus that is what I tend to use.

    • @bijayamanandhar3890
      @bijayamanandhar3890 3 роки тому +1

      @@dataschool thank you!

  • @galanixstudio673
    @galanixstudio673 6 років тому

    Hi, Kevin. Let me do some remark. On 3:00 you`ve mentioned that train dataset must take ALL the samples (so including the "test" as well)? The thing if we do so test samples willn`t demostrate any error at all. So that must be said that BEFORE trainig model the whole data should be split into two groups (for training model and for test). So what`s your opinion about that? Yep. And hopefully on 11:00 min that you talked about. Iwas worried that I can be lost. Still that`t totally great as you covered that information. Thanks.

    • @dataschool
      @dataschool  6 років тому

      In the video, I outline why you should not use evaluation procedure #1. I included it for explanatory purposes.
      Hope that helps!

  • @SomeIndoGuy
    @SomeIndoGuy 8 років тому +3

    Great series, honestly it's the most easily understandable lecture about one of the complicated topic in computer science.
    Love the flow of the video, the tempo of the complexity, really easy to follow. I have several comments to improve in my opinion:
    1. When you point out on specific parts of the screen, it would be great to not just use the cursor but also a more visually impactful feedback (there are tools for this)
    2. Would love to get a repeated definition of the specific terms (such as model complexity, what does that mean? The higher the value of n_neighbors the more complex it is? what does it mean to be complex?)
    3. I understand that this is an introduction class, but it would be really helpful to show the industry's best practices (advanced series?)
    Great work, I subscribed, and liking all of your videos.

    • @dataschool
      @dataschool  8 років тому

      +SomeIndoGuy Thanks for your very kind comments, as well as your feedback!
      Regarding model complexity, this is an excellent essay on the bias-variance tradeoff (a critical machine learning topic) that touches on model complexity: scott.fortmann-roe.com/docs/BiasVariance.html

  • @vishwass9491
    @vishwass9491 8 років тому

    I believe the K value you have set for teaching is perfect for my learning. thanks

    • @dataschool
      @dataschool  8 років тому

      +vishwas s HA! Love a good machine learning joke :)

  • @umarnasir1
    @umarnasir1 9 років тому

    I have a question, Why have you used logistic regression here (03:24), while before you said that this is not regression problem, its a classification problem ?

    • @dataschool
      @dataschool  9 років тому +1

      +umar0021 Logistic regression is actually a model used for classification problems, despite its name. (Confusing, I know!)

  • @KhalilMuhammad
    @KhalilMuhammad 9 років тому +17

    This is a brilliant tutorial -- I love everything about it. Thanks.

    • @dataschool
      @dataschool  9 років тому +1

      +Khalil Muhammad Wow, thank you! I really appreciate your kind words!

  • @hasslefreelabs
    @hasslefreelabs 7 років тому

    I'm watching this tutorial from last few days..Very very precise and accurate content.. it made me to rewind watch many times..! great..!

    • @dataschool
      @dataschool  7 років тому

      Awesome! Glad it's helpful to you!

  • @PankajMishra-ey3yh
    @PankajMishra-ey3yh 8 років тому +1

    I started after you put a video on how to make submission on kaggle on my request,I did well in last contest and finished 144 in leader board :) All credit goes to you

    • @dataschool
      @dataschool  8 років тому +1

      Amazing!!! That's great to hear! :)
      For others who might be interested, this is my video about creating Kaggle submissions: ua-cam.com/video/ylRlGCtAtiE/v-deo.html

  • @louisebuijs3221
    @louisebuijs3221 4 роки тому +1

    Thanks so much for all these videos! Im doing an internship at a really nice group but they're letting me figure out most of the stuff by myself so this is super useful!

  • @oi4252
    @oi4252 4 роки тому +1

    Quarantine with Data School is lit!!

  • @anshusharma-hn3ef
    @anshusharma-hn3ef 6 років тому

    Thank you so much for putting up this series. I was looking for something basic yet comprehensive and something easy to follow. This is being very helpful to me . Thanks.

    • @dataschool
      @dataschool  6 років тому

      That's great to hear! You are very welcome.

  • @flashersan4564
    @flashersan4564 8 років тому

    Hi Kevin, Regression is supervised learning in which the response is ordered and continuous, but why we can use it to analysis the iris dataset? Thank you!

    • @dataschool
      @dataschool  7 років тому

      Logistic regression is a classification model, not a regression model. (It's confusing, I know!) That's why we can use logistic regression in this case.

  • @MridulBanikcse
    @MridulBanikcse 7 років тому

    Thanks Sir , for giving effort to make these videos . Being a beginner I find these resources extremely helpful .

  • @medgarfsantos
    @medgarfsantos 8 років тому

    awesome video. i would also love to see a video regarding SVM kernels, major differences among them, when to choose them, and how the different parameters may affect the classification and the metrics.

    • @dataschool
      @dataschool  8 років тому

      Glad you liked it, and thanks for the suggestion!

  • @cbdave79
    @cbdave79 8 років тому +1

    Thank you for these videos! they are well made and clear. I don't think i understood ML until sitting through your videos.

    • @dataschool
      @dataschool  8 років тому

      Thanks for your kind comment! That's so nice to hear.

  • @RagibHasanOrnob13
    @RagibHasanOrnob13 7 років тому

    I have a question. How can I test the accuracy score of prediction of a random sample? Let's consider your previous video where you calculated knn.predict([3,5,4,2]) which gives the output value: 2 (virginica). How can I calculate the accuracy score for this prediction? Thanks anyway

    • @dataschool
      @dataschool  7 років тому

      The only way to check the accuracy of a single prediction is if you know the "ground truth", meaning the actual value. If you don't know the ground truth, then you can't measure whether the prediction was accurate. Hope that helps!

    • @RagibHasanOrnob13
      @RagibHasanOrnob13 7 років тому

      thanks

    • @zliemails
      @zliemails 7 років тому

      My understanding is that for any kind of prediction (single or multiple) to know the accuracy you have to know the "ground truth", unless the accuracy is derived from the test/train models. However, you can calculate the Variance (some also call it Precision) for multiple predictions. I read the short paper you linked "Understanding the Bias-Variance Tradeoff" and wonder when it comes to a relative system (in some cases an absolute ground truth is not available but just have to rely on a reference system to build a "relative ground truth", in this case, what to do to guide the precision (or variance)?
      I also calculated the knn values versus the scores of accuracy in this video. I have different values each time I ran the function. And the differences are quite significant, is it supposed to be so?

    • @dataschool
      @dataschool  7 років тому

      When you say "I have different values each time I ran the function", I can't think of a reason that that would occur, unless you are changing which observations are in the training and testing sets.

  • @dilipgawade9686
    @dilipgawade9686 5 років тому

    Hi Kevin,
    Do you have any blog which you continuously update to stay in touch with your latest info on machine learning/Data science articles ?

    • @dataschool
      @dataschool  5 років тому

      I have a blog, but it doesn't focus on what you are describing: www.dataschool.io

  • @liuchengyu5420
    @liuchengyu5420 5 років тому

    Hello,
    I don't understand why eventually we fit the whole dataset into the model after we get the best k value after we use training and testing dataset?
    Jason

    • @dataschool
      @dataschool  5 років тому

      You use all of the data when fitting so that the model can learn from all of the data you have available.

  • @guangleiwang8328
    @guangleiwang8328 7 років тому

    Hi, Kevin. Wonderful videos for freshmen.Thank you very much.
    In 19:54 , 'the relationship between the value of `X` and the testing accuracy', I think it should be `K` not `X` , am i right?

    • @dataschool
      @dataschool  7 років тому

      You're right! I meant to say "K".
      Glad the videos have been helpful to you!

  • @robindong3802
    @robindong3802 7 років тому +1

    I have to say this is another great lesson by Kevin. Thank you very much indeed.

    • @dataschool
      @dataschool  7 років тому

      Thanks! I'm glad my videos are helpful to you!

  • @hyanhyan-bf1dx
    @hyanhyan-bf1dx 7 років тому

    Hi, in 21:56 why KNN low k value = more complex??

    • @dataschool
      @dataschool  7 років тому

      This article might be helpful to you: scott.fortmann-roe.com/docs/BiasVariance.html

  • @saisreenivas8875
    @saisreenivas8875 5 років тому

    I didn't understand what is the use of random_state ?
    Is it the case that same random_state value ,same test_size give the same accuracy on a particular data_set ?

    • @dataschool
      @dataschool  5 років тому

      It's complicated to explain briefly, but random_state is used for reproducibility.

  • @bardamu9662
    @bardamu9662 5 років тому

    HI Kevin,
    For logistic regression, the model is fit with (X,y) input data. When you call the predict method for this model with input X, we may expect the model to output y outcome since it has been fit with this sampling of data (X,y). I am getting it wrong here? Thank for your answer.

    • @dataschool
      @dataschool  4 роки тому

      Great question! Models (with a few exceptions) don't exactly memorize the training data. Thus when given the same exact data, they don't make the same exact predictions.

    • @bardamu9662
      @bardamu9662 4 роки тому +1

      @@dataschool Hi Kevin,
      Thanks for the clarification. Your channel is very helpful and informative. Keep it up :-)

    • @dataschool
      @dataschool  4 роки тому

      Thanks!

  • @gustavomello278
    @gustavomello278 6 років тому

    Great videos kevin. I like your deliberately slow style. It is hard to improve, but if I may suggest something. As your videos are long, it would be useful if you have an index in the description with links to the times of the subtopics. That would help a lot on review and certainly would increase the number of re-visits.

    • @dataschool
      @dataschool  6 років тому +1

      Thanks for the suggestion! I know the videos are super long, but ever since making this series, I have tried to make shorter videos.
      And, thanks for the time-coding suggestion! I'll consider it.

  • @나누나누-h6t
    @나누나누-h6t 8 років тому

    I have a question on 05_model_evaluation about prediction. they fit the model with their own X, and y logreg.fit(X,y)
    and why logreg.predict(X) is little bit different with actual target(Y). this X is not even from other resouce. this part i confused

    • @dataschool
      @dataschool  8 років тому

      I'm sorry, I don't understand your question!

    • @aleksandrluferov3254
      @aleksandrluferov3254 8 років тому

      Hello jake Monk. I think that we have value of accuracy lower than 1.0 because our model have some approximations. You can see this at this plot scikit-learn.org/stable/auto_examples/linear_model/plot_iris_logistic.html#example-linear-model-plot-iris-logistic-py
      We divide all points to the three areas, but some points belonging to one area, in our model hit to another area.

  • @berisoalemu2511
    @berisoalemu2511 8 місяців тому

    what will happen ,if value of feature are different in size ?

  • @a.s.5961
    @a.s.5961 4 роки тому +2

    I love you man, i have watched every single video of yours.

  • @sharonwaithira
    @sharonwaithira 6 років тому +1

    I love the series so far. I have learned so much. Thank you for creating these. They are quite easy to follow.

    • @dataschool
      @dataschool  6 років тому

      Awesome, that's great to hear!

  • @elilavi7514
    @elilavi7514 9 років тому

    Thanks for video .What do you think about division of data for 3 categories : training (60%) , cross-validation(20%) and test(20%) or more advanced techniques : divide training set to 10 pieces and train the model several times with 9/10 of training set using each time another 1/10 for test .

    • @dataschool
      @dataschool  9 років тому

      Eli Lavi Great questions!
      1. A three-way split of your data (sometimes called train/test/holdout) is useful if you need a less biased estimate of out-of-sample error. In that case, you use train/test to select model parameters (as shown in the video), and then use the holdout set at the very end to estimate out-of-sample error.
      2. What you described with the 10-way split is called 10-fold cross-validation. It's very useful, and I'll cover it in an upcoming video! Briefly, it provides slightly better estimates of out-of-sample error than train/test split, but is also 10 times more computationally expensive and less flexible than train/test split (in terms of use cases).

  • @andreygorbunov2968
    @andreygorbunov2968 8 років тому

    Hello Kevin.
    I was following your tutorials, but encountered this problem.
    I trained classifier on (12000,5) data frame and I tried to predict from new data set (15,5),but I got an error: shapes (15,5) and (9,) not aligned: 5 (dim 1) != 9 (dim 0).
    Would you please, explain what am I doing wrong.
    Thank you.

    • @dataschool
      @dataschool  7 років тому

      I'm sorry, I wouldn't be able to help without having access to more of your code and data. Good luck!

  • @rrn0520
    @rrn0520 7 років тому

    Dear Kevin. I need to find the significance of each feature (like p-value in R). odds ratios and other stats regarding the model, while using Scikit learn for Logistic regression. Most sources saying for that we should use Statsmodel insted of scikit learn. can't that possible with scikit-learn. please help

    • @dataschool
      @dataschool  7 років тому

      Yes, it's correct that Statsmodels is more appropriate for those tasks. Hope that helps!

    • @rrn0520
      @rrn0520 7 років тому

      okay. thanks a lot. can we use both lib's in the same model? since cross-validation, prob's prediction and class prediction are pretty simple in scikit learn right !

    • @dataschool
      @dataschool  7 років тому

      No, you can't use both libraries with the same model. And yes, scikit-learn does make many common machine learning tasks simple!

  • @suprotikdey1910
    @suprotikdey1910 7 років тому

    You did the job quite quick for me to jump start ML. All courses I saw had durations ranging from 3 months to a year.. thank you!!
    do you have any tutorial for neural nets and deep learning??

    • @dataschool
      @dataschool  7 років тому

      Glad the videos were helpful to you! I don't currently have any tutorials on neural networks or deep learning, but please subscribe to my newsletter for updates on future tutorials: www.dataschool.io/subscribe/

  • @aimene_tayebbey
    @aimene_tayebbey 7 років тому

    if it means anything to you, i really like the way you put things and simplify them, thanx man

    • @dataschool
      @dataschool  7 років тому

      Thanks for your kind comment!

  • @Golden_B1
    @Golden_B1 6 років тому

    I get this error when trying to make a predictionValueError: Expected 2D array, got 1D array instead:
    array=[3 5 4 2].
    Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
    can anyone help me?

    • @dataschool
      @dataschool  6 років тому

      Try this: knn.predict([[3, 5, 4, 2]])

  • @humphreymuriuki3715
    @humphreymuriuki3715 3 роки тому

    I am getting the error below when I try to use the LogisticRegression model. "ConvergenceWarning: lbfgs failed to converge (status=1):
    STOP: TOTAL NO. of ITERATIONS REACHED LIMIT." Anyone who knows how I can resolve it?

    • @dataschool
      @dataschool  3 роки тому

      Try changing the solver to liblinear when creating the LogisticRegression object.

  • @albertovalesalonso
    @albertovalesalonso 7 років тому

    Great explanation of noise and signal!

  • @jamesrajendranVideo
    @jamesrajendranVideo 7 років тому

    Awesome.....Highly effective communication......So for the best of Machine Learning videos......very grateful to the author. The flow and methodology makes Machine Learning look so simple which in fact is quite complex for beginners like me.

    • @dataschool
      @dataschool  7 років тому

      Thanks so much for your kind comment! I'm glad to hear the machine learning videos have been helpful to you. I know it's complex but you will get it eventually... good luck with your education!

  • @khurshidkhan3984
    @khurshidkhan3984 8 років тому

    train_test_split uses simple random sampling or stratified random sampling? If it uses simple random sampling then how can i use stratified sampling?

    • @WillKriski
      @WillKriski 8 років тому +1

      there are input options to the function you can specify stratified. see the scikit-learn train test split page for details

    • @dataschool
      @dataschool  7 років тому +1

      This section of the scikit-learn documentation should be helpful to you: scikit-learn.org/stable/modules/cross_validation.html#a-note-on-shuffling

  • @harikrishnan-pp2un
    @harikrishnan-pp2un 5 років тому

    I have a doubt in accuracy part , we are testing (y,y_pred) in which both are the targets(labelled), then how we can predict the accuracy with targets only. I think that we need both data from x_test and target (y_pedt) from the result of X_train for accuracy, please explain anyone to clarify it

    • @dataschool
      @dataschool  4 роки тому

      I don't quite follow your question, I'm sorry!

  • @alifazal9062
    @alifazal9062 4 роки тому

    you're doing a great job, I would just emphasize on giving more examples that are relatable and speaking like you're talking to another person in the room. I only give feedbacks because thats what I would've wanted from people tuning in.

    • @dataschool
      @dataschool  4 роки тому

      Thanks for your suggestions!

  • @kostasnikoloutsos5172
    @kostasnikoloutsos5172 7 років тому

    Dear lovely Kevin,
    I have read the numpy pdf you have suggested.
    I am wondering if you suggest any pdf for matplotlib .

    • @dataschool
      @dataschool  7 років тому

      Dear lovely Kostas, I can't think of a PDF for matplotlib right now, sorry! :)

  • @somimukherjee5151
    @somimukherjee5151 6 років тому

    Awesome .... Can you post some videos on using random forecast and SVM techniques with examples

    • @dataschool
      @dataschool  6 років тому

      Thanks for your suggestion!

  • @libardomm.trasimaco
    @libardomm.trasimaco 7 років тому

    Wow. This video in particular is one of the most useful videos that I have found in the entire UA-cam. Thanks you very much, your a great person and a great teacher!

    • @dataschool
      @dataschool  7 років тому +1

      Wow, thank you so much! :)

  • @karlavillarrealleal1254
    @karlavillarrealleal1254 Рік тому +1

    Thank you for such clear and well done tutorials!

  • @TheReferrer72
    @TheReferrer72 7 років тому

    Nicely paced set of tutorials. Thanks

  • @asawanted
    @asawanted 3 роки тому

    Can we say that KNN would overfit as K values get smaller?

  • @turnoffthesystem4789
    @turnoffthesystem4789 5 років тому +1

    matee, awesome videos. You saved my ass for an ML deadline. Awesome, really

  • @feravladimirovna1044
    @feravladimirovna1044 5 років тому

    I have a repeated problem please and I could not find a solution
    yesterday i Followed the instructions and every thing was ok but
    this error
    ModuleNotFoundError: No module named 'sklearn'
    pop up
    sometime it is enough to restart a computer and every thing work fine
    sometimes it does not working and I need to reinstall anaconda and reinstall the packages
    this happens with all packages
    what I should do
    to install everything for ones and work always
    i am tired from these errors

  • @okao08
    @okao08 7 років тому

    Hi Kevin....I have several tokenized text files... I want to compare each of these text file with another text file and check the similarities or differences
    how i am i able to do that using scikit or nltk

    • @dataschool
      @dataschool  7 років тому +1

      I'm sorry, it's hard for me to say how you should approach this task without knowing a lot more information from you. Good luck!

    • @okao08
      @okao08 7 років тому

      i want to do a ranking system for cv's applied for a job post. first through an api i am getting all the cv's in a json format. than i separate each skill, education and experience of each resume into separate text files after stemming and lemmatizing it. each cv has many skills, education and experience. The same way through an api i am getting the requirements where i extract the requirement skills, education and experience. After doing that i want to compare each cv skill, education and experience with the requirement skiill, education and experience. for skills i did keyword matching and checked whether each skill is there or not in the requirement skills set. but for experience and education i am planning to build models and through machine learning i want to do the comparision. i wish you can give me any idea how to approach this sir. :)

    • @dataschool
      @dataschool  7 років тому

      If you want to use supervised machine learning with this task, there has to be a "ground truth" that you are trying to predict. It doesn't sound like there is a ground truth in your case, such as "did a particular resume result in a job offer for a particular job".

    • @okao08
      @okao08 7 років тому

      I can provide you my code sir. I am not good at python and my code is not in the standards. pastebin.com/4CQnjMd8
      I know most of the code there are redundant. But I followed that way since i am testing some of the outputs while i am writing the code

  • @chiradeepdeb745
    @chiradeepdeb745 6 років тому

    how to deal with 3d medical images dats sets and suppose I have a 3d image then how i can test that image to get the categorical
    result

    • @dataschool
      @dataschool  6 років тому +1

      I'm not sure I understand your question, I'm sorry!

    • @chiradeepdeb745
      @chiradeepdeb745 6 років тому

      1.Suppose i want to do k learning or CNN on medical images let for example take skin diseases images now how will I preprocess it and will create that k learning or cnn network
      2.if in skin diseases images we have our own image and values then how to feed that image and csv values to get the categorical result that what kind of disease it is

    • @dataschool
      @dataschool  6 років тому

      Sorry, I won't be able to help... good luck!

  • @vishalshah1363
    @vishalshah1363 8 років тому

    If we train the data on the entire dataset with KNN (K=5), and then pass the same dataset to it to evaluate it's performance, how come we don't receive a 100% accuracy?

    • @dataschool
      @dataschool  7 років тому

      When you train and test on the same dataset, you will get an overly optimistic estimate of out-of-sample performance. However, the estimated performance will rarely be 100%, because most models will only learn an approximation of the training data during the model fitting step, rather than learning it exactly.
      I'm sorry if that is not clear - it's a very difficult question to answer in a few sentences!

  • @manojg9024
    @manojg9024 7 років тому

    Good Video, Thanks for information, Is there any other video with an example other than Iris data set ??(regarding Logistic Regression,KNN)

    • @dataschool
      @dataschool  7 років тому

      Sure, how about this video: ua-cam.com/video/85dtiMz9tSo/v-deo.html

    • @manojg9024
      @manojg9024 7 років тому

      Thank u

    • @manojg9024
      @manojg9024 7 років тому

      Thank u, Is there any video on Time series forecasting. thank u..

  • @sangyetenphel
    @sangyetenphel 6 років тому

    Great videos Sir. But i have one question i.e. how come the accuracy is not 100% since we are predicting on the same features X that we fit it with :
    logreg.fit(X,y)
    y_pred = logreg.predict(X)
    from sklearn import metrics
    print(metrics.accuracy_score(y, y_pred))
    0.96

    • @dataschool
      @dataschool  6 років тому +1

      Great question! It's because this type of model does not memorize the training data in the way you are assuming. That's the best I can explain it briefly.

  • @saider895
    @saider895 7 років тому +1

    Hi, thanks for the very detailed tutorial, very useful, I am just having problem understanding this line:
    X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.4, random_state=4)
    Maybe this is a Python question more than a machine learning question , but how can you have "X_train, X_test, y_train, y_test = something" , aren't all these variable going to end-up with the same value? Thanks

    • @dataschool
      @dataschool  7 років тому

      That is a great question! It's a Python question, not a machine learning question.
      It's called tuple unpacking. Try this:
      a, b = 1, 2
      print(a)
      print(b)
      Search for "tuple unpacking" on this page for more examples: www.dataschool.io/python-quick-reference/
      Hope that helps!

    • @saider895
      @saider895 7 років тому

      awesome, thanks

  • @ggg-ox3hr
    @ggg-ox3hr 7 років тому

    the train test module will be remove in .20, any suggestions?

    • @dataschool
      @dataschool  7 років тому

      It was just moved to the model_selection module after I recorded this video: scikit-learn.org/stable/modules/classes.html#module-sklearn.model_selection

  • @zacharykirby1361
    @zacharykirby1361 8 років тому

    Love the videos! I'm curious, why does the accuracy drop so harshly at k = 18, but then rise again at k = 19. Is there a mathematical explanation? Does it just have to do with the way testing/ training accuracy works. Just trying to fully wrap my head around everything, thank you!

    • @dataschool
      @dataschool  8 років тому

      Glad the videos are helpful to you! Regarding your question, that drop is just due to the natural variation in the data. As well, train/test split is a high variance procedure, meaning its results will vary depending upon which observations happen to be in the training set versus the testing set. Finally, I would just mention that the drop is not actually large -- this is a tiny dataset, and we are zoomed way in to the plot. Hope that helps!

  • @BillTubbs
    @BillTubbs 8 років тому

    Great video lecture series. Love the slow, clear delivery. I noticed a few deprecation warnings when running the code myself. Is there a forum for reporting technical issues/questions?

    • @dataschool
      @dataschool  8 років тому

      Glad you like the series!
      Regarding technical issues, you are welcome to log them as issues in my GitHub repo: github.com/justmarkham/scikit-learn-videos/issues. I will eventually update the notebooks to reflect Python 3, and I know the API is changing slightly in the upcoming scikit-learn 0.18 release, so I will address that as well. But I'd love to know the specifics of any errors or warnings that you receive!
      Regarding questions, you can ask them on GitHub, or post UA-cam comments, and I'll see them either way. Thanks!

  • @Hairihusky
    @Hairihusky 7 років тому

    I've tried using the fit attribute but I've got the following error:
    fit() missing 1 required positional argument y
    Help?

    • @dataschool
      @dataschool  7 років тому

      I'm sorry, I can't diagnose your code without a lot more information. Good luck!

  • @chuanliangjiang7390
    @chuanliangjiang7390 7 років тому

    This lecture is fantastic and extremely helpful to learn machine learning from scratch, very appreciate to share this wonderful vedio

    • @dataschool
      @dataschool  7 років тому

      Thanks for your kind comment!

  • @pierrelaurent8284
    @pierrelaurent8284 8 років тому +2

    great video ! great resources to understand the Bias-Variance trade-off. You are a reference Kevin. Thanks a ton.

  • @UnknownUserx-xe1tm
    @UnknownUserx-xe1tm 8 років тому

    I tried to play with the iris data set and a question came to me if can help me please. here the response vector is evenly splited in 3x50 (for each type of iris) so when you "split and test the data set" the split command have a high chance to equally split and provide a representative training data set of the overall starting data set to train your classification algorithm.Question: what if the response vector that is given to you is "not" evenly splited (say: 30% of irises type 0, 50% type 1, 20% type 2) ? does the split command of scikit-learn take care of it automatically so that i have a "representative" training data set to train my model ? or do i have to do it manually ? or is there another command i should use ?
    Ps: Sorry if it is a silly question, and thanks again for your time.

    • @dataschool
      @dataschool  8 років тому

      +arab ilies Great question! You're asking about stratified sampling, which means that the response class proportions should be (approximately) preserved between the training and testing sets. As to whether 'train_test_split' does this by default, I believe the answer is "no" prior to version 0.17. Starting in version 0.17, there is a 'stratify' parameter you can use with 'train_test_split' to accomplish this. More information is here: scikit-learn.org/stable/modules/generated/sklearn.cross_validation.train_test_split.html

    • @UnknownUserx-xe1tm
      @UnknownUserx-xe1tm 8 років тому

      thanks a lot sir !!

  • @philipginand555
    @philipginand555 5 років тому

    do you have a full machine learning python course i can do ? post the link please

    • @dataschool
      @dataschool  5 років тому

      www.dataschool.io/learn/

  • @odeshsingh2330
    @odeshsingh2330 5 років тому +1

    Very well explained and great teaching style!! I am doing my first pass through your videos. I will go back and enter the python code and run these on my next pass. I was hoping I could find a set of graded exercises at the end of each video. Any thoughts on this ?

    • @dataschool
      @dataschool  4 роки тому

      Glad you like the videos! The only course for which I offer exercises is my paid online course, Machine Learning with Text: www.dataschool.io/learn/

  • @sbk1398
    @sbk1398 6 років тому +1

    my goodness. what intriguing and useful videos. you have a true gift

  • @rzhang86
    @rzhang86 8 років тому

    What is the convention behind naming X capital and y lowercase?

    • @aleksandrluferov3254
      @aleksandrluferov3254 8 років тому

      As I remember, in the previous parts of this series this convention explained as: using capital letter for matrices (2d array), and using lowercase letter for vectors (1d array).

    • @dataschool
      @dataschool  8 років тому

      Exactly correct, thanks! More information is available in this video: ua-cam.com/video/hd1W4CyPX58/v-deo.html