Anomaly detection using iforest

Поділитися
Вставка
  • Опубліковано 21 лип 2024
  • Anomaly detection is an interesting topic that is gaining interest in different industries. Anomaly detection algorithms in health care can point to health issues of patients and in the financial world, they can flag frauds. Isolation forest algorithm first was introduced in 2008 and gained a lot of interest since then.
    git link:
    github.com/mesmalif/Practical...
    🔴 Subscribe for more ML projects: ua-cam.com/users/AIwithDrMo?...
    💻 Store sales anomaly detection • How to find anomalies ...
    💻 Anomaly detection with KNN • Anomaly detection with...

КОМЕНТАРІ • 48

  • @pedromoro561
    @pedromoro561 3 роки тому +4

    It is hard to find such good explanations on Isolation Forest. Keep up the good work!

  • @tareqal-masri1782
    @tareqal-masri1782 2 роки тому +3

    Hi Dr. Esmalifalak, I'm a huge fan of all your videos, they've helped me with getting through university and get a career, can you please upload more videos, what data visualization tool do you use?

  • @rubenr.2470
    @rubenr.2470 3 роки тому +1

    thanks for this video! its not easy to find high quality content like this! keep it up!

  • @wakilkhan8875
    @wakilkhan8875 3 роки тому +3

    Please make another video on, Anomaly detection One-class SVM for Novelty detection

  • @saravanannatarajan6515
    @saravanannatarajan6515 3 роки тому +3

    Thanks for great tutorial. I can easily pick it as the best tutorial on this topic. Much appreciated.
    Please continue providing more videos.

    • @AIwithDrMo
      @AIwithDrMo  3 роки тому +2

      Thanks Saravanan. I am glad that it helped you. Please post any topic that seems interesting to you here and I will consider it for the next video.

    • @prafulh5252
      @prafulh5252 2 роки тому +1

      @@AIwithDrMo Please cover other algorithms for anomaly detection in the similar way

  • @zaynyao3863
    @zaynyao3863 3 роки тому +2

    You solved a big problem for me,thank you

    • @AIwithDrMo
      @AIwithDrMo  3 роки тому

      I am glad that helped you.

  • @soumikbasu1556
    @soumikbasu1556 3 роки тому +2

    A very well-structured but simple way of explanation. Can we also have a look at measuring the efficacy of the model?

    • @AIwithDrMo
      @AIwithDrMo  Рік тому

      Thanks for the comment. Isolation Forest is an effective anomaly detection method that can handle high-dimensional data and has several advantages over other methods. Its efficacy depends on the specific characteristics of the data and hyperparameters used. For example, the performance of the algorithm can be affected by the choice of subsampling ratio, the number of trees in the forest, and the choice of distance metric used to evaluate the splits.

  • @satishvavilapalli24
    @satishvavilapalli24 2 роки тому +1

    Just amazing

  • @neginpirannanekaran1236
    @neginpirannanekaran1236 4 роки тому

    Great explanation. Thanks

  • @MrSanghan1990
    @MrSanghan1990 3 роки тому

    Thx, I will apply it~~

  • @VladimirOlteanu
    @VladimirOlteanu 3 роки тому +1

    Hello! Just a question. Is this an algorithm a classic isolation forest or an extended isolation forest (I saw you named the object with the predictions eif)? Is there any way to implement an extended isolation forest? Basically the difference between EIF and IF is that the EIF takes random intercept and slope and does the split based on the trend line. Thank you for the video!

    • @AIwithDrMo
      @AIwithDrMo  3 роки тому

      Hi Vladimir
      This is classic isolation forest and as you mentioned, EIF can also be used similarly.

  • @uvs8136
    @uvs8136 3 роки тому +1

    Thank you for easy to understand tutorial. What if we don't know the contamination? and that is the goal to find. How do we start, is it by using auto? how do you find true outliers. Its like k-means, where we do have to specify # of clusters to begin with, what if we want to know the clusters

    • @AIwithDrMo
      @AIwithDrMo  3 роки тому

      Hi Urmil,
      Happy that it helped. For the contamination we usually start with small percentage and look at the results. This can be through plotting (use PCA for plots with more than 3D) or printing individual (anomalous) observations and inspecting them. If we see our model is not sensitive anough and skips anomalies, we will increase the contamination percentage. Remember that this is unsupervised and you are not providing labels of anomaly before training. You are only testing the results for a small portion of the data that you know the lables (say you are subject matter expert).

  • @rezamonadi4282
    @rezamonadi4282 3 роки тому +1

    Great explanation...

    • @AIwithDrMo
      @AIwithDrMo  3 роки тому

      Thanks Reza.
      I'm glad you liked it.

  • @alhanoufalsuwailem3992
    @alhanoufalsuwailem3992 3 роки тому

    Thanks for the clarification !
    after applying iforest , how can I evaluate the cluster's result ? do you have specific method used for evaluation this type of unsupervised learning?
    I'd really appreciate that.

    • @AIwithDrMo
      @AIwithDrMo  2 роки тому

      I usually prefer to have a small labeled dataset (from client etc.) and validate my results with those labels.

  • @hamzasmidi3445
    @hamzasmidi3445 3 роки тому +1

    Thank you Mohammad

    • @AIwithDrMo
      @AIwithDrMo  3 роки тому +1

      I am glad that you liked it.

  • @aashi9781
    @aashi9781 3 роки тому +1

    Hello Dr. Mohammad,
    Is the algorithm effective with the real time streaming data? I have sensor data of around more than 100 sensors, should I need to find the important variables before feeding into the model or should I pass all the variables and let the algorithm decide by itself? Multicollinearity exist in the data .

    • @AIwithDrMo
      @AIwithDrMo  3 роки тому +2

      Hi Aradhna,
      Isolation forest is one of the fast algorithms in anomaly detection and people use it with large datasets like financial datasets. For sensor data you don't have to process very high frequency data. You may need to find the right sampling rate (for example temperature usually is not changing sooner that 10-20 sec so sampling every second is not necessary ). If your window is 1 minute, you should not have noticeable problem in a regular application. I usually start will all of the data and the drop/minimize if I have to...

  • @tiger06t
    @tiger06t 3 роки тому +2

    Hi! Thanks for the great tutorial. But I have a question, is it possible that isolation forest output different result? I have used isolation forest on my dataset, but the output results are a bit different than previous results everytime (I haven't changed any parameter in the model and the dataset I used is the same).

    • @AIwithDrMo
      @AIwithDrMo  3 роки тому +2

      Thanks Johnson. Isolation forest randomly splits the datasets so there is no guarantee to have exactly the same results each time but, if you do it enough times and average out the results, it should converge to one solution (with reasonable data sets of course).

    • @tiger06t
      @tiger06t 3 роки тому

      @@AIwithDrMo Thank you! Dr. Mohammad

  • @joshuasuasnabar6058
    @joshuasuasnabar6058 Рік тому

    thanks you profesor, just a question. Is possible deal with categorical variables? Is important the type of enconding to use (one hot or label enconding)? Thanks you in advance

    • @AIwithDrMo
      @AIwithDrMo  Рік тому +1

      Joshua,
      Thanks for your comment. Yes it is possible! You can use Extended Isolation Forest (EIF). Please take a look at this page for more info and a python example:
      capable-timimus-00a.notion.site/Isolation-Forest-in-Categorical-Values-b5534c14548b4ba881199477939044c2

  • @shahrzadamini5746
    @shahrzadamini5746 3 роки тому

    Hi, good job, I have a question, how we can resample according to the year?

    • @AIwithDrMo
      @AIwithDrMo  3 роки тому +1

      I usually use 12 months resampling like "resample('12M')"

  • @shahrzadamini140
    @shahrzadamini140 3 роки тому +1

    Hi, thanks I found it really helpful, but I have a question about the Contamination parameter, how we can choose a suitable value for this parameter?

    • @AIwithDrMo
      @AIwithDrMo  3 роки тому +3

      glad you liked it.
      Contamination should be tested for your application. You can start with small numbers ( like 2%) and look at the results. If algorithm catches things that are normal to you, you may decrease the threshold otherwise keep increasing it ...
      You will find something reasonable for the data set you are working with.

    • @shahrzadamini140
      @shahrzadamini140 3 роки тому +1

      @@AIwithDrMo Thanks a lot for your explanation.

  • @tenten7379
    @tenten7379 Рік тому

    I have a question, this is an unsupervised model, right? is there a way to make the model predict a user input?

    • @AIwithDrMo
      @AIwithDrMo  Рік тому

      This is unsupervised anomaly detection method. It can be applied to user input data to detect anomalies or unusual patterns in user behavior over time. The basic idea is to use the algorithm to learn the normal patterns of user behavior based on the historical data, and then to use the model to identify any deviations from these patterns.

  • @gamesandroidpc2146
    @gamesandroidpc2146 4 роки тому

    Hello doctor I have a question how do model anomaly detection in time series in tkinter

    • @AIwithDrMo
      @AIwithDrMo  4 роки тому +1

      Hey,
      It is indeed in my to do list and I will create one hopefully in October. Stay tuned!

    • @gamesandroidpc2146
      @gamesandroidpc2146 4 роки тому

      @@AIwithDrMo
      Thank you doctor
      But I need it these days
      Can I contact you on LinkedIn or email

  • @alwaaffa
    @alwaaffa 2 роки тому

    You can help me with a master’s thesis for my software part (coding) in Python?

    • @AIwithDrMo
      @AIwithDrMo  2 роки тому

      Please fill out the following form for any specific questions,
      forms.gle/Jz4pkrNSGUqGhPug9

    • @alwaaffa
      @alwaaffa Рік тому

      @@AIwithDrMo I can connect with you by email?

  • @som856
    @som856 2 роки тому +1

    can you please provide the code.

    • @AIwithDrMo
      @AIwithDrMo  2 роки тому

      github.com/mesmalif/Practical_Machine_learning/tree/develop_practical_ML