Complete guide to outliers| how to work with outliers | Finding an outlier in dataset using python,

Поділитися
Вставка
  • Опубліковано 20 лип 2024
  • Complete guide to outliers| How to work with outliers|Finding an outlier in dataset using python
    #machinelearning #datascience #chatgpt
    Hello,
    My name is Aman and I am a Data Scientist.
    All amazing data science courses at the most affordable price here: www.unfolddatascience.com
    Book one on one session here(Note - These supports are chargable): docs.google.com/forms/d/1Wgle...
    Follow on Instagram: unfold_data_science
    About Unfold Data science: This channel is to help people understand the basics of data science through simple examples in an easy way. Anybody without prior knowledge of computer programming or statistics or machine learning and artificial intelligence can get an understanding of data science at a high level through this channel. The videos uploaded will not be very technical in nature and hence can be easily grasped by viewers from different backgrounds as well.
    Book recommendation for Data Science:
    Category 1 - Must Read For Every Data Scientist:
    The Elements of Statistical Learning by Trevor Hastie - amzn.to/37wMo9H
    Python Data Science Handbook - amzn.to/31UCScm
    Business Statistics By Ken Black - amzn.to/2LObAA5
    Hands-On Machine Learning with Scikit Learn, Keras, and TensorFlow by Aurelien Geron - amzn.to/3gV8sO9
    Category 2 - Overall Data Science:
    The Art of Data Science By Roger D. Peng - amzn.to/2KD75aD
    Predictive Analytics By By Eric Siegel - amzn.to/3nsQftV
    Data Science for Business By Foster Provost - amzn.to/3ajN8QZ
    Category 3 - Statistics and Mathematics:
    Naked Statistics By Charles Wheelan - amzn.to/3gXLdmp
    Practical Statistics for Data Scientist By Peter Bruce - amzn.to/37wL9Y5
    Category 4 - Machine Learning:
    Introduction to machine learning by Andreas C Muller - amzn.to/3oZ3X7T
    The Hundred Page Machine Learning Book by Andriy Burkov - amzn.to/3pdqCxJ
    Category 5 - Programming:
    The Pragmatic Programmer by David Thomas - amzn.to/2WqWXVj
    Clean Code by Robert C. Martin - amzn.to/3oYOdlt
    My Studio Setup:
    My Camera: amzn.to/3mwXI9I
    My Mic: amzn.to/34phfD0
    My Tripod: amzn.to/3r4HeJA
    My Ring Light: amzn.to/3gZz00F
    Join the Facebook group :
    groups/41022...
    Follow on medium: / amanrai77
    Follow on quora: www.quora.com/profile/Aman-Ku...
    Follow on Twitter: @unfoldds
    Watch the Introduction to Data Science full playlist here: • Data Science In 15 Min...
    Watch python for data science playlist here:
    • Python Basics For Data...
    Watch the statistics and mathematics playlist here :
    • Measures of Central Te...
    Watch End to End Implementation of a simple machine-learning model in Python here:
    • How Does Machine Learn...
    Learn Ensemble Model, Bagging, and Boosting here:
    • Introduction to Ensemb...
    Build Career in Data Science Playlist:
    • Channel updates - Unfo...
    Artificial Neural Network and Deep Learning Playlist:
    • Intuition behind neura...
    Natural language Processing playlist:
    • Natural Language Proce...
    Understanding and building a recommendation system:
    • Recommendation System ...
    Access all my codes here:
    drive.google.com/drive/folder...
    Have a different question for me? Ask me here : docs.google.com/forms/d/1ccgl...
    My Music: www.bensound.com/royalty-free...

КОМЕНТАРІ • 11

  • @rajasekaranm1198
    @rajasekaranm1198 Місяць тому

    i can't lie to you all,unfold data science is one of the best data science learning platform
    ,i learned many usefull skills from his videos..............

  • @sanketadamapure802
    @sanketadamapure802 Рік тому +3

    Distance-based methods for outlier detection are well-suited for handling outliers. Here are a few distance-based algorithms commonly used for outlier detection:
    1 ] k-nearest neighbors (k-NN): In k-NN, each data point is classified based on the majority class among its k nearest neighbors. Outliers can be identified as data points that have few or no neighbors within a certain distance.
    2] Local Outlier Factor (LOF): LOF calculates the local density of a data point compared to its neighbors. It identifies outliers as data points with significantly lower density compared to their neighbors. LOF takes into account the distance to k-nearest neighbors and provides an outlier score for each data point.
    3] Isolation Forest: Isolation Forest constructs random decision trees to isolate outliers. It measures the number of splits required to isolate a data point from the rest of the data. Outliers are identified as data points with a shorter average path length in the tree construction.
    4] DBSCAN (Density-Based Spatial Clustering of Applications with Noise): DBSCAN groups together data points that are close to each other based on a density criterion. Outliers are considered as data points that do not belong to any dense cluster.

  • @rajasekaranm1198
    @rajasekaranm1198 Місяць тому

    what a beautiful explanation

  • @ozan4702
    @ozan4702 Місяць тому

    Thank you for the video. Do you recommend combining multiple outlier treatment methods? For example, log transform + winsorization? Or log transform + winsorization + standard scaler? If so, what should be the order of applying these methods?

  • @balajikomma541
    @balajikomma541 Рік тому

    Sir actually I'm following your playlist "Big Data Hadoop and Unix playlist" but after video 'Sqoop' installation step, there are no other videos, could you please tell me where are the continuation videos of these playlist. Kindly update that playlist.
    Also one doubt, is Big data even in 2023 is important for data science or else can be managed with the cloud technologies like databricks pyspark in aws or azure or GCP, Kindly reply sir

  • @manjeerag868
    @manjeerag868 Рік тому

    Hi Aman
    Thanq so much for your valuable videos.
    Pinged you on linked in. Please reply🙏

  • @umeshtiwari800
    @umeshtiwari800 Рік тому

    Tx, Aman

  • @bhuvanavinodh3498
    @bhuvanavinodh3498 Рік тому

    This Dataset pl