Outlier detection techniques(python)| how to avoid outliers without deleting it
Вставка
- Опубліковано 28 гру 2020
- we will discuss Outlier detection techniques or outlier detection techniques in data mining and ways to Treat outliers effectively using interquartile range.
we will also discuss how to avoid outliers without deleting it
What is an outlier?
in simple terms an outllier is an unusual term which stands out completely from rest of the observations and does cause significate change to sample mean etc, we will plot qq plot and histograms to visualize outlers.
Due to outlier our anlysis and understanding of the data can be completely different from the reality , posing an incorrect or false representation.
for example lets take salarys of 5 individuals are as following:
10000,12000,9500,8800,1000000
we can see that the salary of the 5th individual is way higher than rest of the persons , and if we say then we can conclude that the mean salary is .
There are multiple statistiscal approaches such as z score , proximity models etc to detect outliers but for this demonstration we will more convinient and followed approach and will determine using histograms and box plots etc.
In this demo we will follow the IQR approach to filter and deal witg outliers. as we know that lower limit for any observation is Q1- 1.5* IQR and upper limit is Q3 + 1.5 IQR
these terms are as follow:
- Q1 = 25th percentile
- Q3 = 75th percentile
- IQR = Q3- Q1
Such a blessing! Been dealing with outliers for the past week and I created duplicates instead of treating them! Thank you!
So glad to hear
excellent overview. thank you :) how did you do the headlines green? :D it must be a markdown feature I'm not yet familiar with. Thank you again.
Excellent! love the explaination - so clear and detailed
Glad it was helpful!
Thanks a lot, have been having a lot of difficulties dealing with outliers, virtually almost all of my dataset contains outliers and using drop/deleting techniques virtually removed of all of the rows in my dataset but with the capping methods in this tutorial it seems the problem will be solved. I don’t know if I can get the codes of this tutorial from you. Thanks once again.
how do you color the markdown to green ? its look cool!
Very well illustrated.
thank you
Thank you, this is what I am looking for
I am really glad to know that it has been helpful for you.
for categorical feature how we can fine and remove outliers
why is it showing invalid syntax near variable?
thank you bhai it really help🥺🥺🥺🥺🥺🥺
Excellent
Glad, it is helpful
Siddhesh 👍
Thanks.
Can you also make code availalble
Very well explained 🥳
Thank you shivu
Github link for code please