Maybe I should have mentioned this in the video: A huge problem in AI interpretability is faithfulness vs. plausibility. Users like *plausible* explanations which look right to them ("aha, this makes sense!"). But sometimes they see things that are counterintuitive or attributions that make no sense to them. Then, even if the explanations are *faithful* to the model's workings, they will seem alien, weird, and users will dislike such a model, or blame it into the interpretability method. Why is feature attribution seldomly used in production? Because they can help users game the system. 😅 If you know your credit score is low because you have two cars, you will sell that extra car and increase your score.
7 місяців тому+1
It's inevitable that some folks will try to exploit any system, no matter how well-designed it is. Recently, we've seen some clever algorithms try to game the benchmarks, but they often fail spectacularly in the real world. It would be great to have a little extra help to detect these kinds of frauds. Something like 'humans' or 'reasoning agents' could be a good place to start.
It's always great to see "old" ideas getting used for solving new problems . I had heard about shapley values and was hoping you'd make a video explainer about it Thanks!
Interpretability was the rabbit hole that got me into deep leaning, would love to see more content on this topic (and if you need ideas on things to explore, lmk) ♥ (also, SHAP was one of the earliest interpretability techniques I came cross after meeting the researcher working on it at the University of Washington at a poster session--so great to see how far this work has come since then!)
This is really cool, I can imagine in the future we'll have really good interpretability tools, for example marking a piece of text from the llm output and it will highlight the tokens from the context that influenced it the most ❤
Thanks for referencing the mathematical equations from research papers. It really validates the authenticity of your work. I felt the video was rushed a bit. I was probably expecting a longer video with more examples. But I understand you might have time crunch with your thesis. Good luck ✌
Hm. I would have liked to watch this but the background music is far too loud and very distracting. ... Ah it does stop after a while. Yes it is very interesting and useful for me :)
Sorry, that's on me (her editor). Something got messed up in the audio mixing and we didn't notice it before uploading. Luckily, it's only during the introduction, so the main part of the video should be fine 😅
Maybe I should have mentioned this in the video: A huge problem in AI interpretability is faithfulness vs. plausibility. Users like *plausible* explanations which look right to them ("aha, this makes sense!"). But sometimes they see things that are counterintuitive or attributions that make no sense to them. Then, even if the explanations are *faithful* to the model's workings, they will seem alien, weird, and users will dislike such a model, or blame it into the interpretability method.
Why is feature attribution seldomly used in production? Because they can help users game the system. 😅 If you know your credit score is low because you have two cars, you will sell that extra car and increase your score.
It's inevitable that some folks will try to exploit any system, no matter how well-designed it is. Recently, we've seen some clever algorithms try to game the benchmarks, but they often fail spectacularly in the real world. It would be great to have a little extra help to detect these kinds of frauds. Something like 'humans' or 'reasoning agents' could be a good place to start.
It's always great to see "old" ideas getting used for solving new problems . I had heard about shapley values and was hoping you'd make a video explainer about it Thanks!
Very nice training vid. gj. useful info. good examples and references.
The explaination was farrrr better than anything I expected :D very well done
A serie about interpretability would be awesome
Interpretability was the rabbit hole that got me into deep leaning, would love to see more content on this topic (and if you need ideas on things to explore, lmk) ♥ (also, SHAP was one of the earliest interpretability techniques I came cross after meeting the researcher working on it at the University of Washington at a poster session--so great to see how far this work has come since then!)
This is really cool, I can imagine in the future we'll have really good interpretability tools, for example marking a piece of text from the llm output and it will highlight the tokens from the context that influenced it the most ❤
Excellent! Always providing the goods.
Thank you!
best of luck with your Thesis. Stay sound. Love You
Thanks for another great explanation! Good luck with your thesis :)
Thank you!
Came for the AI commentary. Stayed for the god level lipstick.
This is really interesting and your explanation was excellent, but... did that coffee bean really just wink at me?
Thanks for referencing the mathematical equations from research papers. It really validates the authenticity of your work. I felt the video was rushed a bit. I was probably expecting a longer video with more examples.
But I understand you might have time crunch with your thesis. Good luck ✌
🔥🔥🔥
"Neat"! Best part haha
thanks !
are those acoustic boards for walls? RLHF pretty easy to get all the words as another eastern european english speaker
Yes, that is acoustic foam. Otherwise I sound like I'm speaking from a bathroom. 🤭
Hm. I would have liked to watch this but the background music is far too loud and very distracting. ... Ah it does stop after a while. Yes it is very interesting and useful for me :)
I agree, I noticed that too in the final pass. Will make it better next time.
Sorry, that's on me (her editor). Something got messed up in the audio mixing and we didn't notice it before uploading. Luckily, it's only during the introduction, so the main part of the video should be fine 😅
Not gonna lie, I think that this is basically useless on autoregressive models.
👀 Don't leave us hanging here, explain your statement. 😅