Anomaly detection using Isolation Forest - Contextual Anomalies

Поділитися
Вставка
  • Опубліковано 8 бер 2021
  • #datascience #machinelearning #anomaly
    You can check my other videos on anomaly detection here - • Anomaly Detection
    Isolation forest is an unsupervised learning algorithm for anomaly detection that works on the principle of isolating anomalies, instead of the most common techniques of profiling normal points
    In statistics, an anomaly (a.k.a. outlier) is an observation or event that deviates so much from other events to arouse suspicion it was generated by a different mean
    A data point is considered a global outlier if its value is far outside the entirety of the data set in which it is found
    A data point is considered a contextual outlier if its value significantly deviates from the rest of the data points in the same context. Note that this means that same value may not be considered an outlier if it occurred in a different context
  • Наука та технологія

КОМЕНТАРІ • 45

  • @J_McPhearsom
    @J_McPhearsom Рік тому

    I've returned to school to learn data science, python, coding through a Master's in Industrial Engineering. I was given an awesome research opportunity to develop novel anomaly detection methods for an exclusive industry dataset of critical equipment (multivariate time series of billion dollar gas turbine-compressor trains for LNG liquefaction). My background is Mechanical Engineering, so I can grasp that stuff, but I've been so overwhelmed trying to learn Python for first time on such a complicated project, and without an 'expert' in my department to ask for help, hints, identify my mistakes. Thank goodness for your videos! They've triggered many "ah ha!", or "oh that's how you do it!" moments, that have helped me so much to understand, and importantly carry out the analysis I wanted for my thesis project!
    Many thanks and keep it up!

  • @m22shadowalker3
    @m22shadowalker3 Рік тому

    Vraiment très inspirant comme tuto. Merci à toi pour ton travail (y)

  • @srivatsa1193
    @srivatsa1193 Рік тому

    Your channel is amazing. It would be awesome if you can make vids on the latest tech like LLMs, Langchain, and how to put LLMs in production.

  • @chigozieanyasor4262
    @chigozieanyasor4262 2 роки тому

    Thanks for sharing.
    Will it be nice to add the ID of the drivers as well as part of the features to the isolation forest assuming driverID was part of the dataset?
    for example: data = df_final[['id','value','hour','day']]
    I ask because I want to know how isolationForest can determine the part for each driver as against the generic pattern for all drivers.

  • @vivekcapirala5288
    @vivekcapirala5288 2 роки тому

    Very well explained.....Thank You :)

  • @benjamintalisman8614
    @benjamintalisman8614 2 роки тому

    This video is so helpful!

  • @aashi9781
    @aashi9781 3 роки тому +2

    Thanks for the detailed video. I have been implementing the same on my data set. Do you think taking the time features as a cyclic feature is better than taking it as a numeric feature? Means ensuring that 0 and 23 hour or 1 and 12 month are more closer to each other than the numeric sense of it.

    • @AIEngineeringLife
      @AIEngineeringLife  3 роки тому

      Aashi.. yes you can do bucketing as well. Say hours can be bucketed into early morning, morning, afternoon, evening and night. Similarly week as well. You can also use any other technique

  • @jongcheulkim7284
    @jongcheulkim7284 2 роки тому

    Thank you so much. This is very helpful.

  • @mathavraj9378
    @mathavraj9378 2 роки тому

    In a time series data, suppose there are idle points for long amount of time (i.e) it is constant at a particular value for considerable amt of time. I can't remove the duplicate values because if done it essentially becomes a new time series. Algorithms like isolation forest study the feature space and generally we don't include time as a feature. So Iam confused whether it's ok to give the entire dataset having lot of duplicate values to the isolation forest?

  • @ArjunSharma-vy5fv
    @ArjunSharma-vy5fv 2 роки тому

    What is i have time series , per second ? And I don't want to converge it to min or hourly ? What can I do

  • @emmang2010
    @emmang2010 Рік тому +1

    Hello, great video and was very helpful. Could you explain why the hour column shouldn't be converted to a Categorical variable. Wont the model treat the hour of day as a numeric feature?

  • @singlesam41
    @singlesam41 3 роки тому +2

    Really loved it ! I never used Isolation Forest :)

  • @pranavgarg3301
    @pranavgarg3301 2 роки тому +1

    Hello
    Thanks for the detailed video.
    Why didn't we do train/test split?
    Doesn't that lead to data leakage here?

    • @letslearnjava1753
      @letslearnjava1753 2 роки тому

      Hello Pranav , train test split is required when you want to detect outlier for new dataset and specially for classification or regression kind off usecases, but for clustering we mainly try to get pattern or similarly from one single chunk of data , so generally it's not required in unsupervised ml cases , 😊 for in depth intuition of isolation forest, you can refer this video ua-cam.com/video/E0M73NTg3w0/v-deo.html
      Happy Learning 😊✌🏻

  • @priyabokaro
    @priyabokaro 2 роки тому

    Hello Sir, I am running Isolation Forest on cpu metric data to find anomalies firstly in past data then in real time. When i run iso..fore on a big sample of data say 2 days (12 samples/min) i get decent anomaly points, but when i run the same model or even a new one on a smaller dataset say 2-4hrs (12 samples/min), i get a lot of Nan values in last column as outlier in the dataframe, and no point as yes which suggest if any point is anomaly or not. Changing few parameters in the model is also not helping. Any idea why we get Nan values and how can we get anomaly points even for smaller dataset?

  • @aashi9781
    @aashi9781 3 роки тому +2

    Sir, I have a general question - does these anomaly detection models works on streaming data on cloud as well? Like I train my isolation forest on a training data in batch and deploy it to predict on incoming streams?I am new to streaming anomaly detection so do not understand how things work in that.

    • @AIEngineeringLife
      @AIEngineeringLife  3 роки тому

      You can deploy it into streaming pipeline or as well as deploy it on REST endpoint. The model is a sklearn model and you can call model save method. Saved model can be deployed on pipeline of your choice to inference on individual instances in real time. I do not have streaming yet but I have REST covered in my model deployment playlist

  • @praveentayal2127
    @praveentayal2127 2 роки тому

    if the real-time data doesn't have anomaly then would isolation forest always list 1-5% of data as anomalous? if yes then it wouldn't be good. Am i missing something here?

  • @denismartinez5443
    @denismartinez5443 Рік тому

    Thank you!

  • @TheNishi42
    @TheNishi42 3 роки тому

    Thank you sir.

  • @venkatramanr5172
    @venkatramanr5172 3 роки тому

    Really Awesome. Can we use if we have other features like temperature / hourly weather etc ?

    • @AIEngineeringLife
      @AIEngineeringLife  3 роки тому

      Yes.You can add as many features as well and engineering it to understand time dependencies

    • @venkatramanr5172
      @venkatramanr5172 3 роки тому

      @@AIEngineeringLife thanks a lot again !

  • @praveenpl9263
    @praveenpl9263 2 роки тому +1

    What i sthe point of taking mean while resampling ,instead we can use sum which makes more sense than taking mean ! ?

  • @nirbhaymishra2749
    @nirbhaymishra2749 3 роки тому

    sir kindly bring a complete series of Anomaly projects using spark and REACT with web development.

    • @AIEngineeringLife
      @AIEngineeringLife  3 роки тому

      Nirbhay.. I am not good at web development but will see if I can do it on streamlit

  • @deepakbehera498
    @deepakbehera498 3 роки тому

    Hi Sir, are there any freelance projects in data engineering/MLops in your channel. Can i Connect with you over linkedin or facebook?

  • @kirtipandya4618
    @kirtipandya4618 3 роки тому +1

    Where can we find this notebook?

    • @AIEngineeringLife
      @AIEngineeringLife  3 роки тому +2

      Here you go - github.com/srivatsan88/End-to-End-Time-Series/blob/master/Anomaly_Detection_using_Isolation_Forest_Feature_Engineering.ipynb

  • @nirbhaymishra2749
    @nirbhaymishra2749 3 роки тому

    basically React and node please

  • @indrakumari1854
    @indrakumari1854 2 роки тому

    Sir, how can we contact to you?

  • @ArjunSharma-vy5fv
    @ArjunSharma-vy5fv 2 роки тому

    Lovely

  • @karthik323
    @karthik323 3 роки тому

    can you please upload the CI CD live stream

    • @AIEngineeringLife
      @AIEngineeringLife  3 роки тому

      Sorry karthik.. you mean anomaly detection on live stream?

  • @hunterxhunter9493
    @hunterxhunter9493 10 місяців тому

    hello thank's for this awesome tutorial ,but Can you please share the notebook file , it'll be a great help for many people , thank's

    • @AIEngineeringLife
      @AIEngineeringLife  10 місяців тому

      All of it are in my git repo
      github.com/srivatsan88

  • @belxismarquez4447
    @belxismarquez4447 3 роки тому +2

    This data is unbalanced, why? you did not use smote, or other technique to give treatment

    • @letslearnjava1753
      @letslearnjava1753 2 роки тому

      Hello Belxis , the outlier itself rare event right , so it means to detect those whatever algorithm we will be using their working principle should be such that the algorithm will be detecting the rare events , now if you try to balance then the outlier detection algorithms will not work as in that case , outlier will no longer be rare event 😊
      For details, you can check this video where in-depth explanation of isolation forest is given which actually describe how this algo detect the rare events or data points--
      ua-cam.com/video/E0M73NTg3w0/v-deo.html
      Happy Learning 😊✌🏻

  • @glowiever
    @glowiever 2 роки тому

    accent as thick as curry. unsubbed