Median, Mean, Mode, Percentile | Math, Statistics for data science, machine learning

Поділитися
Вставка
  • Опубліковано 17 лис 2024

КОМЕНТАРІ • 123

  • @vikashdas1852
    @vikashdas1852 3 роки тому +14

    Subscribing to this channel proved to be lot more helpful than enrolling into college for graduation

  • @thomass4153
    @thomass4153 3 роки тому +41

    Mean, median, mode and percentile are also known as 'Measures of Central Tendency'.

  • @zishanafzal6671
    @zishanafzal6671 3 роки тому +2

    You are the best teacher and have the best content on data analysis. NO need to go any channel.

    • @codebasics
      @codebasics  3 роки тому +1

      I am happy this was helpful to you.

  • @belfloretkoriciza5279
    @belfloretkoriciza5279 2 роки тому +7

    Thank you so much Sir you're a good teacher and you're different from others because of the practice you demonstrate

  • @Damian-lu8sx
    @Damian-lu8sx Місяць тому

    Clear, to the point, with real life examples. I've been learning pandas and I decided to do a recap on math. The fact that you provided examples in pandas is the happiest coincidence I've come across this week. Thank you!

  • @accountingsapayag
    @accountingsapayag 2 роки тому

    As a. Beginner, I should say this is the best.

  • @georgetzimas6882
    @georgetzimas6882 3 роки тому +11

    2:45 When you added an extra value you did not sort them in ascending order (7000,7500,8000) instead of (7000,8000,7500).

  • @bilalzubair6843
    @bilalzubair6843 2 роки тому

    The best video to understand the concept of removing outliers

  • @Akshay-vq1uv
    @Akshay-vq1uv 2 роки тому +3

    Your content and examples are great😃.
    Please don't stop making such easily explained content.

  • @ravidawade5178
    @ravidawade5178 3 роки тому +2

    Sr please make one video for freshers on real life data science project, your teaching skills are so simple everyone can understand very easily

  • @wasimrajamiddya7560
    @wasimrajamiddya7560 2 роки тому +6

    Thank you Sir, for making such kind of beginners friendly videos. I really enjoyed and learned a lot. Please make make more such kind of videos so that we can understand easily. ❤️

  • @MuhammadUmar-px6ij
    @MuhammadUmar-px6ij 3 роки тому +4

    Before exploring the Codebasics channel. I never had an interest in Math & Stat. Thanks, Bro. Love & Respect from Pakistan

    • @codebasics
      @codebasics  3 роки тому +3

      I am happy this was helpful to you.

  • @arunadang7872
    @arunadang7872 3 роки тому +10

    This series are masterpiece. Thank you.

    • @ankurhalke139
      @ankurhalke139 2 роки тому

      Yeah . So true ...*uck education system

  • @gowthamannarayanan361
    @gowthamannarayanan361 3 місяці тому +1

    Thank you very much for detailed and nice explanation.
    Have a question, Do we need to remove outlier all time? What if the salary range is constant not like unusual high salary(Elan musk as per mentioned use case)?

  • @chilledvibes5700
    @chilledvibes5700 2 роки тому +2

    I have no words to say, really awesome series!

  • @ridayefatima-w5i
    @ridayefatima-w5i 3 місяці тому +1

    percentile_95=df.price.quantile(0.95) sir i do by this approach, it give 350 something if i increase value of quantile outlierss comes max gap that's why i remove all values upon this condition

  • @Baburao_Aapte
    @Baburao_Aapte 3 роки тому +5

    Your way of teaching is incredible, I love your videos. Whenever anyone ask me from where you learn all this then, I share link of ur channel to my juniors.

    • @codebasics
      @codebasics  3 роки тому

      Thanks for sharing! I am happy this was helpful to you.

  • @kakumanusridhanalakshmi3203

    Ultimate Explanation🎉 Got a good idea on using mean and medain

  • @albertology
    @albertology 3 місяці тому +1

    you earned a sub man!!! what an explanation

  • @parishjain159
    @parishjain159 2 роки тому

    Sir your way of teaching is very awesome

  • @pijushdhar7310
    @pijushdhar7310 2 місяці тому +1

    Sir
    Your quartile calculation seems to be wrong. The formula for the rank of 25th percentile is 25/100*(7+1) which is 2. This is universally accepted. It means the value should be 5000 only. I really don’t know how pandas is also doing the same mistake

  • @Gadgets-v2w
    @Gadgets-v2w 11 місяців тому +1

    In the median example at minute 2:40 , shouldn't we order the values first before guessing about which value is the median?
    shouldn't the values be like that: 4,000 < 5,000 < 6,000 < 7,000 < 7,500 < 8,000 < 8,000 < 10 million
    so, the median would be the average of 7,000 and 7,500 which is 7,250

  • @balajib.9561
    @balajib.9561 3 роки тому +24

    Sir upload real life data science project 👍😁

    • @codebasics
      @codebasics  3 роки тому +17

      On UA-cam search for "codebasics data science project", you will find my videos please watch it

  • @sudarshanm.s6736
    @sudarshanm.s6736 4 місяці тому +1

    Sir , how is the median of the data points 7500 , since the median has to be the average of Tao's and Sofia's income so it will be (7000+7500)/2 = 7250 right.. So I meant after arranging in ascending order

  • @sathesht7532
    @sathesht7532 3 роки тому +5

    Hi sir, thanks a lot for your extraordinary teaching, I have learned lot and did my homework by following your machine learning tutorial. Sir, Can you do for a video about Generative Adversarial Network (GAN) for regression prediction?

  • @kelvinticllahuanacohuachac9562
    @kelvinticllahuanacohuachac9562 2 роки тому

    furthermore to learn, this was even a enjoyable video, thanks a lot sir.

  • @locu83
    @locu83 2 роки тому

    Exactly what I wanted a mentor 👍🏻❤️🙂.

  • @kirankapruwan8892
    @kirankapruwan8892 11 місяців тому

    While calculating the median( when data values are even) we need to sort data values in ascending order.

  • @Kaafirpeado54-6ayesha
    @Kaafirpeado54-6ayesha 17 днів тому

    Im hoping that by end of covering all playlist ill become master at data science and the following know of ML,DL,LLM

  • @iamthebearerofchrist
    @iamthebearerofchrist 2 місяці тому

    why is using median better than leaving musk out and getting the average of the rest? is it compulsory for all data to be used?

  • @ParamitasPotpourri
    @ParamitasPotpourri 6 місяців тому

    I'm near about 50 . I have completed MCA from IGNOU and Digital marketing from NIIT imperia. I worked as a software developer and now im a digital marketer. If I want to change my career in data science after learning this field, can i get a job in data science field?

  • @harishkannan8023
    @harishkannan8023 3 роки тому +1

    Beautiful explanation

  • @pavan2926
    @pavan2926 3 роки тому

    Only one word loved your explination

  • @kIocuchl2
    @kIocuchl2 Рік тому

    2:43 there should be sorted values and median will be equals to (7000+7500)/2

  • @alokbhushan9026
    @alokbhushan9026 5 місяців тому

    At 3:02 adding prem to the dataset is disturbing the ascending sorting order. So the median should really be 7000+7500 / 2 = 7250.

  • @himanshusemwal1889
    @himanshusemwal1889 3 роки тому +2

    Again Great Video Sir. I have a silly doubt. As you said we cant take average to fill null value if outlier have very large value like Elon musk(10 million$) and now we are going to take Median to fill na values.but nan values itself present at the middle of datapoints .So how we gonna calculate median if nan value is present at those points. median=(nan+nan)/2 ?

    • @abhijeetjain2098
      @abhijeetjain2098 3 роки тому +1

      maybe you can take the median of non-null values and fill up

    • @shutterup24-7
      @shutterup24-7 3 роки тому

      I think for taking median of dataset first we have to rearrange data to ascending order that will shift position of Nan value!!

    • @samvhora9076
      @samvhora9076 2 роки тому

      @@shutterup24-7 yes thats the first step

  • @unstoppablesaad1978
    @unstoppablesaad1978 21 день тому

    i am a jr. data analyst with less than a year experience if i apply for jobs is it expected of be to be able to code advance python funcions? cause now i feel that i am just able to understand code by debugging it but if i try to write similar code i am not able to but i k what function does what and if i a problem statement is given i will be able to identify what thing we should be doing to achieve the result but i am unable to implement it. please give your opinions on this. cause coming from non-it i am always havinng a sens eof insecurity that i dont know python enough.

  • @andresfrr100
    @andresfrr100 3 роки тому +1

    Hi! in time = 2:44 for the median you take Tao and Prem, but they must be first sorted and Prem it is not counted in the median, but Sofia do. So m=(Tao + Sofia)/2?

  • @_craig_
    @_craig_ 3 роки тому

    Nice video. I would like to suggest a change. 100th percentile doesn't exist, only 99th. In your example, Musk would have to be earning higher than himself to be the 100th percentile.

  • @bhuralal5299
    @bhuralal5299 3 роки тому

    Thanks for making this video its very helpful

  • @SURAJKUMAR-ug4oi
    @SURAJKUMAR-ug4oi 2 роки тому

    Sir there could have been possibility that sofia's income would really high then median will not work well?

  • @abhinavkumbalwar6837
    @abhinavkumbalwar6837 Рік тому

    Very informative video.

  • @siddharudtevaramani1055
    @siddharudtevaramani1055 3 роки тому

    Example of Mode is lit 😀

  • @ridayefatima-w5i
    @ridayefatima-w5i 3 місяці тому +1

    Really great no raatta

  • @Murlik1604
    @Murlik1604 3 роки тому

    One very basic question - Should the outlier removal be applied on labels (values to be predicted) as well if outliers exist on such data labels as well ?

  • @Javeria_jamil00
    @Javeria_jamil00 2 місяці тому

    Sir apney last maen outlier ko remove kaisey kia ?

  • @friendonymous
    @friendonymous Рік тому

    What is the difference between average and mean?

  • @VishalSingh-dv2vg
    @VishalSingh-dv2vg 5 місяців тому

    Sir what if the data is missing from or below 25% ,75% then how to find The Average.please reply

  • @cyptowithkelv
    @cyptowithkelv Рік тому

    do you have any full course on data analysis?

  • @annonymous.
    @annonymous. 2 роки тому

    Why don't we fill missing values with mode?
    Mode is the one that appears most but why we use mean and median most of the time?

  • @arupgorai2320
    @arupgorai2320 3 роки тому +2

    Sir I want to know which language is very important? Should we start with Java or python

  • @soheilpalermo491
    @soheilpalermo491 3 роки тому

    Thank you that was very informative content.

  • @HitmanBlitz15
    @HitmanBlitz15 3 роки тому

    Sir can u explain the steps to become a data analyst and skills required for that

    • @codebasics
      @codebasics  3 роки тому +1

      On UA-cam search for "codebasics learn data analyst skills", you will find my videos please watch it

    • @HitmanBlitz15
      @HitmanBlitz15 3 роки тому

      @@codebasics tq sir

  • @shreyas_._
    @shreyas_._ 3 роки тому

    One of the best tutorial ❤️🔥

  • @micagar2510
    @micagar2510 3 роки тому

    Should we first learn pandas then attempt exercises?

  • @renuprasadnaidu7554
    @renuprasadnaidu7554 4 місяці тому

    Sir, could you please add the assignment link?

  • @mivaangadewadvlogs
    @mivaangadewadvlogs 3 роки тому +1

    Hi Sir,can we use multiple median for multiple NaN data like you did in sofia;s case?

  • @mimosveta
    @mimosveta 3 роки тому +1

    am I just scatter brain, or did you not include the link to video where you explain how to use iqr to remove outliers? I only see a link to a playlist, but none of them seem to be on that particular topic?
    EDIT: okay, seems you explained it later in this video, but it really sounded like you had a link for us...

    • @codebasics
      @codebasics  3 роки тому

      mimosvera, you are right I forgot to include a link but I just added it now. Please check video description

  • @mvcutube
    @mvcutube 3 роки тому

    Thanks for such a nice tutorial

  • @d3v487
    @d3v487 2 роки тому

    Hi , I have a dataset where 3 columns are independent categorical features and 5 dependent features that are 10th ,25th, 50th ,75th , 90th percentile of annual wage. How can I get values (annual wage ,which is missing) from the 5 percentile columns ?

  • @shantanughode275
    @shantanughode275 3 роки тому

    Is the amount of statistics required for data science and data analytics the same?

  • @ParulBedi
    @ParulBedi 3 роки тому

    what is the difference between Linear Quantile and Midpoint quantile ??

  • @iamthebearerofchrist
    @iamthebearerofchrist 2 місяці тому

    links to softwares used?

  • @BusinessAnalyticsTV
    @BusinessAnalyticsTV 3 роки тому

    Awesome learning 🆗😎👍

  • @sundar6323
    @sundar6323 3 роки тому

    Is careerera a good institute to join as a beginner.
    Im final yr ECE student.

  • @lathaloganathan4429
    @lathaloganathan4429 Рік тому

    So, How to identify there is an outlier in the dataset? please calrify

  • @saikatdutta1991
    @saikatdutta1991 6 місяців тому

    Consider my data points: 100 100 100 100
    here the 50th percentile which is 100 is kinda misleading right? because 2 more 100 values are present in the right side of median. SO.. 100% of the data values are equals to 50th percentile. Can you please explain where I am confused??

    • @shivasiddharthnarayanan
      @shivasiddharthnarayanan 2 місяці тому

      I am not sure still, you can double-check with someone else too .In your case, you should consider mode as your measure and ignore mean or median.

  • @universal4334
    @universal4334 3 роки тому

    It is good if you should have taught why not median and mode in some cases

  • @shubhampathare4892
    @shubhampathare4892 Рік тому

    in the example at 3:00 u havent sort data in ascending order for median

  • @shariqueansari9921
    @shariqueansari9921 3 роки тому

    Sir, I need your suggestion. Can you help me ?

  • @momincomputer9967
    @momincomputer9967 Рік тому

    great sir 🥰

  • @vidhikapadia9700
    @vidhikapadia9700 3 роки тому

    What is the difference between 0.99 and 0.999 quantile range as in exercise 0.999 is used?

  • @mayur_variya1219
    @mayur_variya1219 Рік тому

    in case of even n.of data point you have not sorted them so median is wrong

  • @prathampatel582
    @prathampatel582 Рік тому

    why cannt we use trimmed mean?

  • @dimpisayed9710
    @dimpisayed9710 3 роки тому

    How can i code in Jupyter, just like you.

  • @philtoa334
    @philtoa334 3 роки тому

    so clear, thx.

  • @pankajjoshi8292
    @pankajjoshi8292 Рік тому

    Power Bi KO Course Kaha Cha Hola?

  • @wallahengineer9989
    @wallahengineer9989 7 місяців тому +1

    Sandeep Jain sir GFG samjhne wale haath uthao😅😅

  • @universal4334
    @universal4334 3 роки тому

    For suppose the data is like this
    4,4,6,7,40,100,110,120,1300...in this case taking median doesn't make sense right ....same for mean outlier 1300 involved...and for mode also 4,4 just repeating 4 for 2 times doesn't make sense right... What to do in this case please any one answer me ...could we find solution from this video..

    • @codebasics
      @codebasics  3 роки тому

      Taking mode of 4 is perfectly ok because you are looking for a value that is most frequently occurring and 4 is that value. It really depends on what problem you are trying to solve here. Can you suggest what type of dataset this is? You just made up the values and are generally curious about such distribution?

    • @universal4334
      @universal4334 3 роки тому

      @@codebasics I just take it as an example...but just for repeating 4 for 2 times blindly we can't take 4 for filling the missing value right because it is far less than other higher values

  • @ramananagavelli3055
    @ramananagavelli3055 10 місяців тому

    how do you know that your data has oulier

  • @arshad1781
    @arshad1781 3 роки тому +1

    nice

  • @ABOUTEverything56
    @ABOUTEverything56 3 роки тому

    why 0.999 in the exercise ?

  • @financewithsom485
    @financewithsom485 3 роки тому

    removing elon from twitter as an outlier is also great

  • @sagarhirapara5455
    @sagarhirapara5455 3 роки тому

    Sir tamari sathe contact kai rite kari saku?

  • @TradewithSalim
    @TradewithSalim Рік тому

    2:44 median will be 725

  • @catherinezeng4917
    @catherinezeng4917 2 роки тому

    Hi, I'm a bit confused with the solution of the exercise. To me, the outlier is not simply removed by percentile, we should exclude the line with 365 availability and 0 reviews + 0 availability and 0 reviews because those lists are just "ghost" lists that no one actually rent them or just the data is not accurate. If we go further down, we should probably clean the data by review date also, I see some of them are with 2011 date, but if we are analyzing the average of this/recent year then there should be a cut off of the latest year we can use. Please let me know your thoughts. Thanks.

    • @codebasics
      @codebasics  2 роки тому

      Totally agreed with your thoughts here. Percentile is just one of the ways, using common sense simple logic is totally a legit way of treating outliers

    • @catherinezeng4917
      @catherinezeng4917 2 роки тому

      @@codebasics Thank you for replying to me so quickly, so if I apply what I said in the post first and then apply percentile, is that going to be right, or let's say with better accuracy? Also, how do we measure the accuracy? should the mean be close to the 50% percentile? how do we know our analysis is good or bad? Thank you so much!

  • @Catwomanloyal24009
    @Catwomanloyal24009 2 роки тому

    🙏🏻

  • @RH-hv4ir
    @RH-hv4ir 4 місяці тому

    The video is great but i didnt like the exercise because there is more in it than it has been covered in the video

  • @mrrshaqproduction1255
    @mrrshaqproduction1255 4 дні тому

    Legend

  • @troubution
    @troubution 3 роки тому +1

    The funniest part is if Elon Musk lives in our town😂😂

    • @codebasics
      @codebasics  3 роки тому +1

      Ha ha.. yes he is my neighbor ☺️🧐

  • @akshitsinghal8590
    @akshitsinghal8590 3 роки тому

    Sorry sir , you miss one part in the video first we have to sort the nos. ( When the count of no is even (while finding median )

  • @Kaafirpeado54-6ayesha
    @Kaafirpeado54-6ayesha 17 днів тому

    I didn't get percentile in first glance

  • @ankurhalke139
    @ankurhalke139 2 роки тому

    This is legend . Go to hell teachers and education system...

  • @jayasurya3864
    @jayasurya3864 2 роки тому

    You really wish musk to be your neighbour it seems

  • @farhanraza42744
    @farhanraza42744 3 роки тому

    Your medin answer is totally wrong

  • @aquapisces
    @aquapisces Рік тому +2

    16:04 df.income.iloc[3] =Nan will work too

  • @ParulBedi
    @ParulBedi 3 роки тому

    what is the difference between Linear Quantile and Midpoint quantile ??