K Nearest Neighbor classification with Intuition and practical solution

Поділитися
Вставка
  • Опубліковано 11 лют 2019
  • In pattern recognition, the k-nearest neighbors algorithm (k-NN) is a non-parametric method used for classification and regression. k-NN is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until classification.
    Github link: github.com/krishnaik06/K-NEar...
    You can buy my book where I have provided a detailed explanation of how we can use Machine Learning, Deep Learning in Finance using python
    Packt url : prod.packtpub.com/in/big-data...
    Amazon url: www.amazon.com/Hands-Python-F...

КОМЕНТАРІ • 93

  • @programmingpurpose1329
    @programmingpurpose1329 2 роки тому +2

    This explaination is one of the most precise explanation that I have seen on Internet.

  • @nelsondelarosa5490
    @nelsondelarosa5490 9 місяців тому +1

    This is in fact well explained, defining every term, and assuming no previous knowledge. Thanks so much!

  • @Tapsthequant
    @Tapsthequant 5 років тому +5

    Thank you, you asked a question I had in my head, looking forward to applying the suggested solution, about imbalanced dataset...

  • @TechnoSparkBigData
    @TechnoSparkBigData 4 роки тому +1

    Sir you are great inspiration to me. Thanks a lot for making every complex problem simpler.

  • @sandipansarkar9211
    @sandipansarkar9211 3 роки тому +7

    Cool. Also finished my practice in Jupyter notebook. Thanks

  • @sivareddynagireddy56
    @sivareddynagireddy56 2 роки тому

    No words about u r explanation sir,simple lucid way explanation !!!!!

  • @shubhamsongire6712
    @shubhamsongire6712 2 роки тому

    Thank you so much Krish for this great playlist. You are gem

  • @ijeffking
    @ijeffking 5 років тому +5

    Very well explained again. Thank you so much.

  • @MaanVihaan
    @MaanVihaan 4 роки тому

    Very nice sir ur explanation and coding technique is very nice....
    I am new learner of data science please keep uploading such video and new techniques of different kinds of algorithms which help us make easy to understand to deal with different kinds of datasets.

  • @shyam15287
    @shyam15287 4 роки тому +1

    All the best Superb Explanation you are a superb resource u will reach great heights continue ur good work

  • @vaibhavchaudhary1569
    @vaibhavchaudhary1569 3 роки тому +9

    Feature scaling (StandardScalar) should be applied after train test split. As it will not lead to information leak.

  • @deshduniya360scan7
    @deshduniya360scan7 3 роки тому

    Explain like a pro,thank you

  • @shreeyajoshi9771
    @shreeyajoshi9771 2 роки тому

    Thank you very much for this video. Helped a lot!

  • @kiruthigan2014
    @kiruthigan2014 3 роки тому +7

    Loved Ur videos and Ur taste in music..kadhal vanthale in the bookmark 😂❤️🔥

  • @padduchennamsetti6516
    @padduchennamsetti6516 7 днів тому

    congratulations krish on 1million subscribers🥳

  • @codyphillippi8831
    @codyphillippi8831 3 роки тому +4

    This is awesome! Thank you so much. I am working on a project at work for lead segmentation to help us find our "ideal lead" for our sales reps with a lot of very messy data. This is a great starting point. Quick question (might be a loaded question ha) - after we find these clusters, how do we go about seeing the "cluster profiles"? Or what all data points make up these clusters (in categorical form)

    • @joeljoseph26
      @joeljoseph26 8 місяців тому

      use any visualization library to see the clustering.

  • @abdelrhmandameen2215
    @abdelrhmandameen2215 3 роки тому +1

    Great work, thank you

  • @sazeebulbashar5686
    @sazeebulbashar5686 2 роки тому

    Thank You Naik......
    This is a very helpful video

  • @rambaldotra2221
    @rambaldotra2221 3 роки тому

    Grateful Sir ✨✨Thanks A lot.

  • @sandipansarkar9211
    @sandipansarkar9211 3 роки тому

    great explanation Krish.

  • @aditisrivastava7079
    @aditisrivastava7079 5 років тому

    Very well explained

  • @Kishor-ai
    @Kishor-ai Рік тому

    Thanks for this video krish naik sir🤩

  • @chaithanyar9961
    @chaithanyar9961 5 років тому +1

    Hello sir , will this code work in tensor flow?? any changes to be made if I want excecute it in tf

  • @indrajitbanerjee5131
    @indrajitbanerjee5131 3 роки тому

    Nicely explained.

  • @laxmiagarwal3285
    @laxmiagarwal3285 3 роки тому

    This is very nice video.. But I'm having one doubt..what value u are taking for calculating the mean of error rate as prediceted values are in terms of 0 and 1

  • @asawanted
    @asawanted 3 роки тому

    What if we choose a K value and hit a local optima? How would we know if I should stop at that K value or proceed to a higher value in search of global optima?

  • @louerleseigneur4532
    @louerleseigneur4532 3 роки тому

    Thanks Krish

  • @shaz-z506
    @shaz-z506 4 роки тому

    Hi Krish,
    Just wanna verify since you've said at 5:10 that model is ready, but KNN is instance-based learning with no parameter assumption then I don't think so that it creates any model out of the algorithm. Please let me know I'm wrong as I need some clarity.

  • @ramu7762
    @ramu7762 2 роки тому

    spot on. thank you.

  • @903vishnu
    @903vishnu 3 роки тому +1

    really its good... but you mentioned K=150, as per my knowledge we are not supposed to take even number. there might be chance of equal number of classes got selected nearest neighbor... algorithm may not be able to estimate the class for new record...

  • @ManashreeKorgaonkar
    @ManashreeKorgaonkar Рік тому

    Thank you so much for sharing this information. I'd just one doubt sir if we will scale before train_test_split wont it be lead to data leakage? as during scaling process during fit when it consider average of all the data points it also take the value of test data set so my model will already have some hint regarding it??

  • @appiahdennis2725
    @appiahdennis2725 3 роки тому

    Respect Krish Naik

  • @shaz-z506
    @shaz-z506 5 років тому

    Hi Krish,
    In what scenario we'll use manhattan over euclidean.

  • @parammehta3559
    @parammehta3559 3 роки тому

    Sir is it normal that sometimes as the value of n_neighbors is increasing the error rate is also increasing?

  • @vishalaaa1
    @vishalaaa1 3 роки тому

    Thank you

  • @scifimoviesinparts3837
    @scifimoviesinparts3837 3 роки тому

    At 18:52, you said larger value of K will lead to overfitting, which is not true. Smaller value of K leads to overfitting. I think, if there are 2 K-values giving same error, we choose the one that is bigger because it is less impacted by outliers.

  • @adhvaithstudio6412
    @adhvaithstudio6412 3 роки тому

    can you exlplain how hyper parameters will helps in what scenarious

  • @vignesh7687
    @vignesh7687 3 роки тому +2

    Sooper Explanation Krish. I have a doubt here.. When do we need to use MinMaxScaler() and when do we use StandardScaler()? Is there any difference? or we have to try using both and see which gives better results? Please clarify

    • @ankurkaiser
      @ankurkaiser 7 місяців тому

      Hope this answer finds you well, MinMaxScaler() an StandardScaler() are basically the same standard process except for Normalization data does't follow Gaussian Distribution and for Standardization it should. Normalization is used with models like KNN and Neutral Networks. It rescales the data between 0 to 1, so if your data doesnt't follow GD and you data ppoints are basically closer to 0/1 or if your business requirements are to normalize the data you go with MinMaxScaler(), else in general we use StandardScaler(), and its fast and easier to implement.

  • @princessazula99
    @princessazula99 2 роки тому

    for my assignment i am not allowed to use import packages for knn but I have to write it myself. do you have a code without the imported knn method?

  • @DeepakSharma-od5ym
    @DeepakSharma-od5ym 4 роки тому +1

    error_rate = []
    for i in (1,40):
    knn = KNeighborsClassifier(n_neighbors=i)
    knn.fit(X_train,y_train)
    pred_i = knn.predict(X_test)
    error_rate.append(np.mean(pred_i != y_test))
    plt.figure(figsize = (10,6))
    plt.plot(range(1,40), error_rate, color = 'blue', linstyle = 'dashed', marker = 'o' )
    plt.xlabel('k')
    plt.ylabel('error rate')
    My above code giving error "x and y must have same first dimension, but have shapes (39,) and (2,)"
    Please suggest

  • @manusharma8527
    @manusharma8527 4 роки тому

    i am not getting any any Classified Data csv file on Keggal.Please can you tell me the real name of that csv file

  • @birendrasingh7133
    @birendrasingh7133 3 роки тому

    Awesome 😁

  • @mahikhan5716
    @mahikhan5716 3 роки тому +1

    how can we choose optimal value of k by KNN ?

  • @yathishl9973
    @yathishl9973 4 роки тому +3

    Hi Krish, you are really amazing. I learn many things from you.
    I have a doubt, what measures should I take if the error rate increases with K-Value, please advice

    • @KnowledgeAmplifier1
      @KnowledgeAmplifier1 3 роки тому +2

      Then you should decrease that k value, too small k value leads to overfitting, too large k value leads to underfitting , you have to wisely choose some middle value ☺️so that both bias and variance become as less as possible

    • @adipurnomo5683
      @adipurnomo5683 3 роки тому

      If K too small that Will sensitive to outlier, if K too large that other class Will be included

  • @makrandrastogi5588
    @makrandrastogi5588 3 роки тому

    can anybody tell why in most of the cases we use euclidian distance and not manhattan distance ?

  • @colabwork1910
    @colabwork1910 2 роки тому

    Awesome

  • @mdmynuddin1888
    @mdmynuddin1888 3 роки тому

    if both category same neighbor point , than which category belongs to new data point?

  • @krish3486
    @krish3486 Рік тому

    Sir why there we check only 1 to 40 neighbours only in the for loop

  • @devinpython5555
    @devinpython5555 4 роки тому +1

    Could you please explain to me why fit and transform is done for the x values (in the above example leaving target column rest data is x values)

  • @madeye1258
    @madeye1258 3 роки тому

    5.03 , if we are classifying the points based on the number of points next to it, then why we need to calculate the distance in step 2

    • @adipurnomo5683
      @adipurnomo5683 3 роки тому +1

      Because calculate distance purpose is to sort value training data point before voting based K value.

  • @joeljoseph26
    @joeljoseph26 8 місяців тому

    Minkowski distance = (Manhattan) and (Euclidean)

  • @vibhutigoyal769
    @vibhutigoyal769 3 роки тому

    Is knn non- linear algorithm???

  • @sunilkumarkatta9062
    @sunilkumarkatta9062 3 роки тому

    How we will get error value to calculate accurate k value😅

  • @Anubhav_Rajput07007
    @Anubhav_Rajput07007 3 роки тому

    #Hi Krish, hope you are doing well. i trying to find the best value for K. but the code is not execute.. its running last 20 mint.

    • @techtalki
      @techtalki 3 роки тому

      It will check all the cases of 'K'. If you want to speed up choose the less value of K or a smaller dataset.

  • @shayanshafiqahmad
    @shayanshafiqahmad 4 роки тому

    What is the reason for taking pred_i !=y_test?

    • @shivaprakashranga8688
      @shivaprakashranga8688 4 роки тому

      Pred_i value contains all the prediction values (like 1,0,1,0,0,1...) upon y_test(1,0,0,1,1,...) when K=1, pred_i !=y_test takes the value which is not predicted correctly(error) . No need to take correct predicted values. ex: out of 100 data points 60 not predicted correctly wrt y_test so these 60 data points we calculate mean. This will be continue for K=2,3.. upto 40. which ever having low mean value that we consider for K (elbow method)

    • @im_joshila_aj
      @im_joshila_aj 4 роки тому

      So this pred_i! =y_test will return a true /false value right? In the form of 0 and 1 and then mean will be calculated?

  • @uchuynguyen9927
    @uchuynguyen9927 4 роки тому

    where you taking np.mean(pred_i != y_test), i think it should be pred_i = knn.predict(y_test) so then we will compare the predict y_test to actual y_test, then we''ll find the errors. If i wrong can somebody explain, thank you!

    • @manikaransingh3234
      @manikaransingh3234 4 роки тому +4

      No, I'm sorry but you're not right!
      actually, pred_i is already predicted values with knn model (what you say it should be, its already done in the line above)
      There is nothing like finding error because it is a classification problem, not a regression one.
      suppose,
      pred_i=[1,1,1,1,0,1,0,0,0,1]
      test_i= [1,1,1,1,1,1,0,1,0,1]
      pred_i != test_i will result in [f,f,f,f,t,f,f,t,f,f] f= False, t=True
      thenn np.mean will take mean of true values which in this case will be 0.2(The error).
      I hope you get it

    • @shivangiexclusive123
      @shivangiexclusive123 3 роки тому

      @@manikaransingh3234 mean of true values mean??

    • @shivangiexclusive123
      @shivangiexclusive123 3 роки тому

      how is it 0.2 ??

    • @manikaransingh3234
      @manikaransingh3234 3 роки тому

      @@shivangiexclusive123 mean of true values is the number of True values divided by the sample.
      The result has 2 True values out of 10.
      2/10 = 0.2

    • @shivangiexclusive123
      @shivangiexclusive123 3 роки тому

      Ok got it..thanks

  • @sriramswar
    @sriramswar 5 років тому

    Hi Krish, unable to open ipynb file in Jupiter noteboox. Getting the below error:
    Error loading notebook
    Unreadable Notebook: C:\Users\Srira\01_K_Nearest_Neighbors.ipynb NotJSONError('Notebook does not appear to be JSON: \'\
    \

    • @krishnaik06
      @krishnaik06  5 років тому

      Dear Sriram I am able to open the ipynb file. Please use the jupyter notebook to open the file

    • @sriramswar
      @sriramswar 5 років тому

      @@krishnaik06 Hi Krish, I used Jupyter Notebook only. Not sure, if there is a problem at my end. Also, a suggestion! It would be better if random_state parameter is used in the code/tutorial so that everyone gets consistent results. I got different results when I tried the same code and I was confused for a moment and then understood the reason. Others may get confused, so just giving a suggestion!

    • @krishnaik06
      @krishnaik06  5 років тому

      Then probably there may be a problem with jupyter notebook file

  • @weekendresearcher
    @weekendresearcher 4 роки тому

    So, shud K always be an Odd number?

    • @KnowledgeAmplifier1
      @KnowledgeAmplifier1 3 роки тому

      If you choose k value odd ,then there is more probability that tie will not occur, but still there are tie breaker available , so that we can have flexibility in choosing the k value☺️like sometime we consider weighted Nearest Neighbours or use the class with the nearest neighbor among tied groups, sometime we use a random tiebreaker among tied groups etc☺️✌️

  • @dragolov
    @dragolov 3 роки тому

    These are 2 musical (jazz) solos generated using K Nearest Neighbor classifier:
    ua-cam.com/video/zt3oZ1U5ADo/v-deo.html
    ua-cam.com/video/Shetz_3KWks/v-deo.html

  • @sensei-guide
    @sensei-guide 4 роки тому

    As my k value increase my error rate also increasing bro

    • @ahmedjyad
      @ahmedjyad 4 роки тому

      It's a normal outcome and common example of overfitting. Basically, if you k value is too high, you risk the chance of having an algorithm that just outputs the class with the highest occurrence.

    • @yathishl9973
      @yathishl9973 4 роки тому

      @@ahmedjyad Hi, thanks for your input, please advice how to correct it

    • @ahmedjyad
      @ahmedjyad 4 роки тому

      @@yathishl9973 Changing your k value is an example of hyperparamter tuning, which is a process to find the best parameter that produces the best classification model. You shouldn't really have a very high k value because that would result in over-fitting. So basically you getting higher error as u increase the k value is basically correct itself as it is expected. I hope you understand what I am trying to say. If not feel free to reach out to me.

  • @tagheuer001
    @tagheuer001 Рік тому

    Why do repeat phrases so many times?

  • @ArunKumar-yb2jn
    @ArunKumar-yb2jn 2 роки тому

    Krish - This seems to be a repeat of over a thousand similar videos on the internet, barring a few. What new insight have you brought here? You didn't define what that Y and X were and simply jumped into drawing X marks on the chart. Why do we need intuition of KNN? Why can't we really understand what it IS? This sort of explanation 'appears' to be clear, but in fact it really doesn't add to a student's understanding. Please take some actual data points and run the algorithm.