Shapley Additive Explanations (SHAP)

Поділитися
Вставка
  • Опубліковано 23 вер 2024

КОМЕНТАРІ • 82

  • @КириллКлимушин
    @КириллКлимушин 9 місяців тому +9

    This is literally the best explanation of shapley values I've found on UA-cam, and probably on the entire internet, the voice, visualizations - everything on the top level

  • @chinameng4636
    @chinameng4636 2 роки тому +18

    really brilliant work! I've seen so many videos but none of them talk about the backgroud data! Your video go into this question deep enough in such a short time! THANKS A LOT!

  • @cornevanzyl5880
    @cornevanzyl5880 3 місяці тому

    As my PhD involves understanding how SHAP works for model explainablity, this video is by far the most accurate and indepth explanation of what it is and how it works. You demonstrate a very good grasp of the topic😊

    • @imwithu2532
      @imwithu2532 Місяць тому

      Can I contact you? I m also a phd student . I also working on shap..can we discuss

  • @ssethia86
    @ssethia86 3 роки тому +6

    concise, clean, and clear. Nicely delivered!! Bravo!!

  • @GleipnirHoldsFenrir
    @GleipnirHoldsFenrir 3 роки тому +2

    Best video on that topic I have seen so far. Thanks for your work.

  • @mohammadsharara3170
    @mohammadsharara3170 Рік тому +2

    Very clear explanation! I've watched several videos, so far this is the best. Thank you

  • @NilayBadavne
    @NilayBadavne 3 роки тому +4

    Thank you so much for this video. Really well articulated.
    You start from the basics - which is what many are missing from their blogs/videos.

  • @dianegenereux1264
    @dianegenereux1264 2 роки тому +1

    I really appreciated the clear conceptual explanation at the very start of the video. Thank you!

  • @mehulsingh3497
    @mehulsingh3497 3 роки тому +4

    Thanks for sharing ! It’s the best explanation for SHAP. You are an absolute rockstar \m/

  • @zahrabounik3390
    @zahrabounik3390 Рік тому

    This is a fantastic explanation for SHAP. Thank you so much for sharing your knowledge.

  • @hyunkang2090
    @hyunkang2090 Рік тому

    Thank you. It was the best presentation on SHAP

  • @SanderJanssenBSc
    @SanderJanssenBSc 11 місяців тому

    Such an excellent video, very high value and useful! Thanks fort taking the time out of your life to produce such value for us!

  • @Sam-vi8iw
    @Sam-vi8iw 2 роки тому +1

    Awesome video! Love that.

  • @fatemekakavandi
    @fatemekakavandi Рік тому

    really nice explanation, i thought understanding this concept is difficult but it's actually really easy with good explanation

  • @yedmitry
    @yedmitry Рік тому

    Great explanation. Thank you very much!

  • @marcelbritsch6233
    @marcelbritsch6233 4 місяці тому

    brilliant. Thank you!!!!!

  • @joshuadarville1915
    @joshuadarville1915 2 роки тому +3

    Love the video. However, I am a bit confused about how the total numbers of coalitions is calculated. The samples at time 1:58 shows 15 coalitions for 4 features, but at time 5:55 you state we need to sample 64 coalitions for 4 features. I think the discrepancy comes for calculating coalitions using combination initially vs permutation later on. Thanks again for the video!

    • @robgeada6618
      @robgeada6618 2 роки тому +2

      Yes, you are exactly right, it's an error in the video: 4 features should be 2^4=16 coalitions.

    • @astaragmohapatra9
      @astaragmohapatra9 2 роки тому

      @Rob Geada, how is it 2 to the power number of number of features? For 4 features we can 3,2 or 1 possible combination. For each it is 3CN. It should be around 7 (3C1+3C2+3C3). So the total is 28 for four features. Am I right?

    • @AJMJstudios
      @AJMJstudios Рік тому

      @@astaragmohapatra9 It's 4C0 + 4C1 + 4C2 + 4C3 + 4C4 = 16

  • @Jorvanius
    @Jorvanius 2 роки тому

    Thank you very much for the awesome explanation 👍

  • @avddva1367
    @avddva1367 Рік тому +4

    Hello, I really appreciate the video! I have one question: how are the number of coalitions calculated? I thought it would be 2^(number of features)

    • @kruan2661
      @kruan2661 Рік тому +1

      It depends on whether the order of feature matter. If not then 2^k. If yes then sum up all permutations

    • @AJMJstudios
      @AJMJstudios Рік тому +1

      @@kruan2661 Still not getting 64 coalitions for 4 features even if order matters.

    • @andyrobertshaw9120
      @andyrobertshaw9120 Рік тому +1

      @@AJMJstudios you do.
      If all 4 are there, we have 4! = 24.
      If 3 are included then we have 4x3x2 = 24
      If 2 are include we have 4x3 = 12
      If just 1 is included, we have 4.
      24+24+12+4=64

  • @captainmarshalliii3304
    @captainmarshalliii3304 3 роки тому +4

    Awesome video and explanation! Are you going to release your implementation? If so where? Thanks.

  • @murilopalomosebilla2999
    @murilopalomosebilla2999 3 роки тому +1

    Excellent work!

  • @giuliasantai4853
    @giuliasantai4853 3 роки тому +1

    This is just great!! Thanks a lot

  • @caiyu538
    @caiyu538 Рік тому

    great to revisit again.

  • @apah
    @apah Рік тому +2

    Excellent video !
    I'm wondering however, isn't the difference between SHAP delta and actual delta due to the possible interactions between the "lower status" feature and the others ?
    If i'm understanding it correctly, your computation of "actual delta" is equivalent to a permutation importance whereas SHAP takes into account the interactions through averaging the score over subsets "excluding" our feature of interest.

  • @jamalnuman
    @jamalnuman 7 місяців тому

    really great

  • @joshinkumar
    @joshinkumar 2 роки тому

    Nicely explained.

  • @rusmannlalana8702
    @rusmannlalana8702 2 роки тому +2

    "TENTARA ITU HARUS HITAM"
    This video :

  • @1黄-m5m
    @1黄-m5m 2 роки тому

    So clear! Thanks.

  • @arunshankar4845
    @arunshankar4845 9 місяців тому

    How exactly did you say 4 features requires sampling 64 coalitions?

  • @caiyu538
    @caiyu538 Рік тому

    I studied it again. Shap is a brutal force search of features by considering all kinds of feature combinations. Shap is a trick to reduce such complexities. Is my understanding correct? How to reduce computation complexity?

  • @ea2187
    @ea2187 2 роки тому +2

    Thanks for sharing.
    I'm currently developing a Multi-Class-Classifier (via XGB-Classification) and would like to know whether SHAP can be used in multi-class-classification-problems? During my research I could only find that SHAP can be used for classification problems which output probabilities (my model outputs three classes). Can anyone help?

    • @robgeada6618
      @robgeada6618 2 роки тому

      I asnwered this question in a private message, but I'll post the answer here as well:
      Yes, because the XGBClassifier does indeed output probabilities (or more specifically, margins), they're just hidden by default. However, you can use these margins and probabilities to compute SHAP values, which will then indicate how much each feature contributed to the margins or probabilities.

  • @juanete69
    @juanete69 Рік тому +2

    Hello.
    If we apply SHAP to a linear regression model... are those Phi_i equivalent to the coefficients of the regression model? Do they also take into account the variance as the p-values do?
    How is the SHAP value for a variable different from the partial R^2?

  • @ВадимШатов-з2й
    @ВадимШатов-з2й 3 роки тому +2

    amazing

  • @gustavhartz6153
    @gustavhartz6153 Рік тому +1

    When you pass the data point back through the model at 10:35 which value do you replace the last feature with. You say "values from the background dataset, " but it can't just be a random value. Is it the average?

  • @tashtanudji4756
    @tashtanudji4756 2 роки тому

    Really helpful thanks!

  • @JK-co3du
    @JK-co3du 2 роки тому

    Thank you very much for this informative video. Could you explain why we use the train set as background but test set to calculate the shap values?

    • @robgeada6618
      @robgeada6618 2 роки тому +1

      Hi JK; the background simply needs to be taken from a pool of "representative" values that the model expects; in this case a subset of the data that was used to train the model makes a lot of sense for that. Meanwhile, computing Shap values for a particular point is simply done to explain how the model behaves given this particular input; there is no requirement that this input be anything similar to what the model has seen before. Basically, the background set needs to come from 'representative' data, but we can then compute Shap values from any arbitrary point. In this case, we pick a point from the test set, as in real-world XAI usecases you are explaining novel points that do not neccesarily have corresponding ground-truth values, i.e., the same reason that we use train/test splits when evaluating models.

  • @arunmohan1211
    @arunmohan1211 3 роки тому +1

    Nice. best one

  • @xaviergonzalez5465
    @xaviergonzalez5465 2 роки тому +1

    What does it mean for the original input x and the simplified x' to be approximately equal? Isn't x' a binary vector of features, whereas X presumably lives in Euclidean space?

    • @robgeada6618
      @robgeada6618 2 роки тому +1

      Yeah, you're exactly right that x' is binary and x is Euclidean. In the video I'm making a bit of a simplication; in real usage the simplified x' will have some translation function h that converts the binary vector back to the original datapoint x, i.e, h(x') = x. The full definition of local accuracy states that g(x') = f(x) if h(x') = x.

  • @blueprint5300
    @blueprint5300 3 місяці тому

    In response to the discrepancy that you call a mistake between 'SHAP delta' and 'Actual delta' in 10:57, those two values are not meant to be the same. Shapley values are the average contribution of all subsets of features. 'Actual delta' would be only one of the terms in this average. The Shapley value of feature X DOES NOT represent the difference in output that you would get when removing feature X from the model.

  • @juanete69
    @juanete69 Рік тому

    What are the advantages of SHAP vs LIME (Local Interpretable Model Agnostic Explanation) and ALE (Accumulated Local Effects)?

  • @kjlmomjihnugbzvftcrdes
    @kjlmomjihnugbzvftcrdes Рік тому

    Nice one.

  • @DrJalal90
    @DrJalal90 2 роки тому

    Great video indeed!

  • @sehaconsulting
    @sehaconsulting 2 роки тому +1

    Hi,
    In the video you said for calculating coalitions that if a model has 4 features it must calculate 64 coalitions but for 32 features it is 16 billion or so. Can you explain the math behind it. In your example you had 4 features exemplified by the four dots and it only amounted to 16 coalitions didn’t it?

    • @robgeada6618
      @robgeada6618 2 роки тому +2

      Hi, you're exactly right; as I've said elsewhere in the comments it's a mistake in the video. 4 features indeed have 16 possible coalitions, it's always 2^(number of features).

    • @sehaconsulting
      @sehaconsulting 2 роки тому

      @@robgeada6618 Thank you!

    • @KountayDwivedi
      @KountayDwivedi 2 роки тому

      @@robgeada6618 Thanks. I came to the comment section just for clarification. Btw, great video !! 😎
      :-}

  • @cleverclover7
    @cleverclover7 2 роки тому

    Great video! I have many questions on this subject but here's one(ish):
    It strikes me that the Background sample is not irrelevant and you must assume it is sufficiently random, iid. There is at least one case - the case where the background sample is the data point being tested, where this is certainly not true. So my question is, if you were to run the experiment again for every possible data point instead of a single background chunk of size 100, and took the average of these, would you get perfect accuracy?

    • @robgeada6618
      @robgeada6618 2 роки тому

      Yeah, so choice of background data is a really interesting question, one that I think about quite a bit! In terms of your idea, choosing every available training data point as your background does well-represent the distribution of your data, but that gets pretty expensive: SHAP will need to run num_samples * background_size datapoints through the model. For a larger dataset like those seen in ML work, that could be hundreds of millions of model evaluations. One way to get around this is use something like kmeans clustering on your training data, with k set to something like 100. The centerpoints of your clusters are then a great representation of the training data distribution, which means when you use them for SHAP you end up with very similar results to using the entire training data as background. The advantage of this is that it's a lot cheaper, in that k~100 is usually much, much smaller than the full training dataset.

  • @amelrahmoune-y7d
    @amelrahmoune-y7d Рік тому

    can i have the ppt document of this presentation please ??

  • @chinuverma5374
    @chinuverma5374 3 роки тому +1

    Thanks for wonderful session sir.With the help of shap we can find top features graph,correlated features graphs using PDP.but simple feature selection and ranking algo in machine learning can also give us top features used in model according to rank even we can plot graphs of correlated features like shap using feature selection algorithms in machine learning.I am confused but extra explainable model is doing here to explain the predictions.Pls clear my doubt currently I am doing research in this area.

    • @robgeada6618
      @robgeada6618 3 роки тому +1

      So if I understand correctly, you're wondering about the explanatory model is at 4:00? Essentially, the explanatory model g(x') is what SHAP builds to produce its explanation of your actual model f(x). By passing a lot of different permutations of features through the actual model f(x), the algorithm creates a huge number of samples of inputs and outputs of your real model that it can then try and build a linear explanation model g(x') that produces the same outputs given the same inputs. Therefore, the linear explanation model should treat the features of this datapoint in the same way as the actual model would, meaning we can use it to explain the actual model's predictions. So in a way, SHAP explanations are actually explaining g(x'), but since the algorithm is designed such that if x'≈x, g(x')≈f(x), the explanations of g(x') are equally valid as explanations of f(x). Does the clear it up?

  • @Brume_
    @Brume_ 3 роки тому +1

    Hi, im writing my report.
    I have 2 questions very important to ask to you
    1) how many coalitions are selected when i compute my explainer?
    2) are the coalitions taking all of the value in background ? 6:38 y is the mean of N output if the background size in N rows?
    Thank you a lot
    Sorry for bad english im french

    • @robgeada6618
      @robgeada6618 3 роки тому +3

      Hi Brume!
      1) The number of coalitions is typically the number of samples, usually configurable in the implementation. By default in our implementation and in the original Python one by Scott Lundberg, the default value is (2 * num_features) + 2048 coalitions unless the user specifies otherwise.
      2) Correct, the coalition value is the mean value over the N background datapoints.

    • @ron769
      @ron769 2 роки тому

      Thanks Rob! So since that the number of coalitions is not all possible combination (NP hard), how can we assure that the Shap value are closley enough to the original shapely value?

  • @yuchenyue1243
    @yuchenyue1243 2 роки тому +1

    Thanks for sharing! Can anyone explain why are there 64 coalitions to sample for 4 features? at 5:52

    • @robgeada6618
      @robgeada6618 2 роки тому +1

      Hi, looking at it again, that's a mistake on my part. It should be 16 coalitions for 4 features, i.e:
      (4 choose 4) + (4 choose 3) + (4 choose 2) + (4 choose 1) + (4 choose 0)
      = 1 + 4 + 6 + 4 + 1
      = 16

    • @yuchenyue1243
      @yuchenyue1243 2 роки тому

      @@robgeada6618 (4 choose 4) + (4 choose 3) + (4 choose 2) + (4 choose 1) + (4 choose 0) = 2^4, is it generally true that for n features there are 2^n coalitions to sample?

    • @robgeada6618
      @robgeada6618 2 роки тому

      @@yuchenyue1243 Yep, exactly. One way to think about is by writing out each feature combination as a vector, with a 1 if a feature is included in the coalition and 0 if it isn't. Doing this for 4 features, you'd have something like 0000, 0001, 0010, 0011, ..., all the way to 1111. This means that enumerating every possible feature combination is the same as counting in binary from 0 to 1111. That means that for n features, the number of coalitions to sample is always equivalent to the number of integers that can be represented by n bits in binary: 2^n.

  • @caiyu538
    @caiyu538 2 роки тому

    I am confused with at 9:55 where is variable test_point, it is previous x_train or y_train at 8:28?

    • @robgeada6618
      @robgeada6618 2 роки тому +1

      Should have showed that, sorry! test_point is the first datapoint of x_test: test_point = x_test[0]

  • @pedrogallego1673
    @pedrogallego1673 2 роки тому

    At 05:58 , is it possible that the number of total coalitions with 32 features is wrong? I think that it is 32*2*2³¹ = 2³⁷ (and 17.1billion =approx 2³⁴)

    • @robgeada6618
      @robgeada6618 2 роки тому +1

      Hi Pedro; as I've said elsewhere in the comments I made a mistake when calculating the total coalition count; it should always be always 2^(number of features), so 32 features is 2^32 or ~4.2 billion.

    • @pedrogallego1673
      @pedrogallego1673 2 роки тому

      @@robgeada6618 Thanks. However It's a really nice video!

  • @exmanitor
    @exmanitor 2 роки тому +1

    With regards to your last point, that the "SHAP Delta" does not match the "Actual Delta": I think that you are misunderstanding what these values represent. The SHAP value of a specific feature do not represent the difference in prediction if we were to exclude/remove that feature from the model. Instead, the SHAP value of a specific feature represents the average contribution of the feature across all coalitions. This is why your "SHAP Delta" and "Actual Delta" do not match, the "Actual Delta" is just the contribution of the feature in a single coalition.
    Other than that, great video!

    • @robgeada6618
      @robgeada6618 2 роки тому +4

      Thanks! Two quick points: first, the "actual" delta I showed there is the average model output when that specific feature is replaced with each value from the background while all other features are held fixed. It's what that feature's SHAP value would be if the background dataset only had variance in that one specific feature column and was otherwise identical to the explained datapoint. So yeah, it absolutely was not an accurate measurement of what a SHAP value is really doing mathematically.
      But second, that was deliberate: SHAP is advertised as producing explanations that are linearly independent measurements of each feature's contribution, but as our result showed, the SHAP value wasn't actually reflective of how this particular model behaved when you removed the feature. And of course, that's because of the exact reasons you pointed out, that a SHAP value is a measurement of the difference between that feature's presence and absence in every possible coalition of the background, not an indication of the effects of pure removal/exclusion.
      So in essence, that's the exact point I was trying to make: SHAP values do indeed encode all kinds of subtle information about feature dependence and all of the comparisons against the specific background dataset chosen, but were relatively inaccurate in the measuring the effect of replacing a single feature with background values. This difference is what I was trying to show but I definitely should have been clearer about: for models with a lot of feature interaction, SHAP will sacrifice single-feature effect accuracy for accurately representing all feature interations against the background, and whether that is a desirable attribute will depend on specific use-case and user preference.

  • @navalo2814
    @navalo2814 2 роки тому +1

    Shap shap shap

  • @minhaoling3056
    @minhaoling3056 2 роки тому

    Does kernel shap ignores feature dependence?

    • @diaaalmohamad2166
      @diaaalmohamad2166 2 роки тому

      I'm also wondering there. The paper of Lundberg assumes independent features to be able to estimate the contributions. Still, the reason for having all possible coalitions is to count for mutual effect!
      On the other hand, a paper appeared las year (Explaining individual predictions when features are dependent) addresses the SHAP under dependence of features (shapr is their R package). The estimate the joint conditional distribution of the features provided the current coalition using copulas (and other methods). Still, their implementation has quite some computation limitations

    • @minhaoling3056
      @minhaoling3056 2 роки тому

      @@diaaalmohamad2166 it seems like most explainable AI methods are quite limited for image data. Do you know any method that are implemented in R for image data ?

    • @diaaalmohamad2166
      @diaaalmohamad2166 2 роки тому

      sorry, I do not know of R packages specific for image analysis. I tried package "iml", there you can find different methods to explain features contribution. I did not check their limitations. Worst case, you may use the python package "shap" inside Rmarkdown code chunk.

  • @nate4511
    @nate4511 2 роки тому

    theme black and white....