Complete guide to outliers| how to work with outliers | Finding an outlier in dataset using python,
Вставка
- Опубліковано 20 лип 2024
- Complete guide to outliers| How to work with outliers|Finding an outlier in dataset using python
#machinelearning #datascience #chatgpt
Hello,
My name is Aman and I am a Data Scientist.
All amazing data science courses at the most affordable price here: www.unfolddatascience.com
Book one on one session here(Note - These supports are chargable): docs.google.com/forms/d/1Wgle...
Follow on Instagram: unfold_data_science
About Unfold Data science: This channel is to help people understand the basics of data science through simple examples in an easy way. Anybody without prior knowledge of computer programming or statistics or machine learning and artificial intelligence can get an understanding of data science at a high level through this channel. The videos uploaded will not be very technical in nature and hence can be easily grasped by viewers from different backgrounds as well.
Book recommendation for Data Science:
Category 1 - Must Read For Every Data Scientist:
The Elements of Statistical Learning by Trevor Hastie - amzn.to/37wMo9H
Python Data Science Handbook - amzn.to/31UCScm
Business Statistics By Ken Black - amzn.to/2LObAA5
Hands-On Machine Learning with Scikit Learn, Keras, and TensorFlow by Aurelien Geron - amzn.to/3gV8sO9
Category 2 - Overall Data Science:
The Art of Data Science By Roger D. Peng - amzn.to/2KD75aD
Predictive Analytics By By Eric Siegel - amzn.to/3nsQftV
Data Science for Business By Foster Provost - amzn.to/3ajN8QZ
Category 3 - Statistics and Mathematics:
Naked Statistics By Charles Wheelan - amzn.to/3gXLdmp
Practical Statistics for Data Scientist By Peter Bruce - amzn.to/37wL9Y5
Category 4 - Machine Learning:
Introduction to machine learning by Andreas C Muller - amzn.to/3oZ3X7T
The Hundred Page Machine Learning Book by Andriy Burkov - amzn.to/3pdqCxJ
Category 5 - Programming:
The Pragmatic Programmer by David Thomas - amzn.to/2WqWXVj
Clean Code by Robert C. Martin - amzn.to/3oYOdlt
My Studio Setup:
My Camera: amzn.to/3mwXI9I
My Mic: amzn.to/34phfD0
My Tripod: amzn.to/3r4HeJA
My Ring Light: amzn.to/3gZz00F
Join the Facebook group :
groups/41022...
Follow on medium: / amanrai77
Follow on quora: www.quora.com/profile/Aman-Ku...
Follow on Twitter: @unfoldds
Watch the Introduction to Data Science full playlist here: • Data Science In 15 Min...
Watch python for data science playlist here:
• Python Basics For Data...
Watch the statistics and mathematics playlist here :
• Measures of Central Te...
Watch End to End Implementation of a simple machine-learning model in Python here:
• How Does Machine Learn...
Learn Ensemble Model, Bagging, and Boosting here:
• Introduction to Ensemb...
Build Career in Data Science Playlist:
• Channel updates - Unfo...
Artificial Neural Network and Deep Learning Playlist:
• Intuition behind neura...
Natural language Processing playlist:
• Natural Language Proce...
Understanding and building a recommendation system:
• Recommendation System ...
Access all my codes here:
drive.google.com/drive/folder...
Have a different question for me? Ask me here : docs.google.com/forms/d/1ccgl...
My Music: www.bensound.com/royalty-free...
i can't lie to you all,unfold data science is one of the best data science learning platform
,i learned many usefull skills from his videos..............
Distance-based methods for outlier detection are well-suited for handling outliers. Here are a few distance-based algorithms commonly used for outlier detection:
1 ] k-nearest neighbors (k-NN): In k-NN, each data point is classified based on the majority class among its k nearest neighbors. Outliers can be identified as data points that have few or no neighbors within a certain distance.
2] Local Outlier Factor (LOF): LOF calculates the local density of a data point compared to its neighbors. It identifies outliers as data points with significantly lower density compared to their neighbors. LOF takes into account the distance to k-nearest neighbors and provides an outlier score for each data point.
3] Isolation Forest: Isolation Forest constructs random decision trees to isolate outliers. It measures the number of splits required to isolate a data point from the rest of the data. Outliers are identified as data points with a shorter average path length in the tree construction.
4] DBSCAN (Density-Based Spatial Clustering of Applications with Noise): DBSCAN groups together data points that are close to each other based on a density criterion. Outliers are considered as data points that do not belong to any dense cluster.
Thanks Sanket for adding your points.
what a beautiful explanation
Thank you for the video. Do you recommend combining multiple outlier treatment methods? For example, log transform + winsorization? Or log transform + winsorization + standard scaler? If so, what should be the order of applying these methods?
Sir actually I'm following your playlist "Big Data Hadoop and Unix playlist" but after video 'Sqoop' installation step, there are no other videos, could you please tell me where are the continuation videos of these playlist. Kindly update that playlist.
Also one doubt, is Big data even in 2023 is important for data science or else can be managed with the cloud technologies like databricks pyspark in aws or azure or GCP, Kindly reply sir
Hi Aman
Thanq so much for your valuable videos.
Pinged you on linked in. Please reply🙏
Thanks a lot. Sure.
Tx, Aman
Welcome Umesh.
This Dataset pl