The Dimpled Manifold Model of Adversarial Examples in Machine Learning (Research Paper Explained)

Yannic Kilcher

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 5 чер 2024
#adversarialexamples #dimpledmanifold #security
Adversarial Examples have long been a fascinating topic for many Machine Learning researchers. How can a tiny perturbation cause the neural network to change its output by so much? While many explanations have been proposed over the years, they all appear to fall short. This paper attempts to comprehensively explain the existence of adversarial examples by proposing a view of the classification landscape, which they call the Dimpled Manifold Model, which says that any classifier will adjust its decision boundary to align with the low-dimensional data manifold, and only slightly bend around the data. This potentially explains many phenomena around adversarial examples. Warning: In this video, I disagree. Remember that I'm not an authority, but simply give my own opinions.
OUTLINE:
0:00 - Intro & Overview
7:30 - The old mental image of Adversarial Examples
11:25 - The new Dimpled Manifold Hypothesis
22:55 - The Stretchy Feature Model
29:05 - Why do DNNs create Dimpled Manifolds?
38:30 - What can be explained with the new model?
1:00:40 - Experimental evidence for the Dimpled Manifold Model
1:10:25 - Is Goodfellow's claim debunked?
1:13:00 - Conclusion & Comments
Paper: arxiv.org/abs/2106.10151
My replication code: gist.github.com/yk/de8d987c4e...
Goodfellow's Talk: • Lecture 16 | Adversari...
Abstract:
The extreme fragility of deep neural networks when presented with tiny perturbations in their inputs was independently discovered by several research groups in 2013, but in spite of enormous effort these adversarial examples remained a baffling phenomenon with no clear explanation. In this paper we introduce a new conceptual framework (which we call the Dimpled Manifold Model) which provides a simple explanation for why adversarial examples exist, why their perturbations have such tiny norms, why these perturbations look like random noise, and why a network which was adversarially trained with incorrectly labeled images can still correctly classify test images. In the last part of the paper we describe the results of numerous experiments which strongly support this new model, and in particular our assertion that adversarial perturbations are roughly perpendicular to the low dimensional manifold which contains all the training examples.
Abstract: Adi Shamir, Odelia Melamed, Oriel BenShmuel
Links:
TabNine Code Completion (Referral): bit.ly/tabnine-yannick
UA-cam: / yannickilcher
Twitter: / ykilcher
Discord: / discord
BitChute: www.bitchute.com/channel/yann...
Minds: www.minds.com/ykilcher
Parler: parler.com/profile/YannicKilcher
LinkedIn: / ykilcher
BiliBili: space.bilibili.com/1824646584
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: www.subscribestar.com/yannick...
Patreon: / yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
Наука та технологія

КОМЕНТАРІ • 55

@YannicKilcher 2 роки тому ⁺³
OUTLINE:
0:00 - Intro & Overview
7:30 - The old mental image of Adversarial Examples
11:25 - The new Dimpled Manifold Hypothesis
22:55 - The Stretchy Feature Model
29:05 - Why do DNNs create Dimpled Manifolds?
38:30 - What can be explained with the new model?
1:00:40 - Experimental evidence for the Dimpled Manifold Model
1:10:25 - Is Goodfellow's claim debunked?
1:13:00 - Conclusion & Comments
Paper: arxiv.org/abs/2106.10151
My replication code: gist.github.com/yk/de8d987c4eb6a39b6d9c08f0744b1f64
Goodfellow's Talk: ua-cam.com/video/CIfsB_EYsVI/v-deo.html
@ZimoNitrome 2 роки тому ⁺¹⁴
To add, I agree that their simpler decision boundary is in fact not much simpler, if at all. Readers can easily be fooled between the different models simply because the figures and examplea are 2d vs. 3d. The classic "scattered" 2d samples can just as well have extremely awkward positioning in additional dimensions. The dimpled model would be just as complex.
Edit: Nevermind, Yannic covers everything.
@AICoffeeBreak 2 роки тому ⁺³⁰
Me until 01:20: Nice, this is exactly how I have been thinking about adversarial examples this whole time! 😎
01:27 Yannic: "I don't think this is really useful to think in this way and I will explain why."
Me: Okay, seems like I am going to watch the whole video now. 😂
@ukhu_pacha 2 роки тому ⁺²
I haven't finished the whole video and you are already commenting? I guess I'm late
@AICoffeeBreak 2 роки тому ⁺³
@@ukhu_pacha I haven't either, I commented about the first minute. 😅 I actually have to postpone watching this video because I have to go in 10 mins. 😑
@ukhu_pacha 2 роки тому ⁺¹
@@AICoffeeBreak Coffee bean needs help!
@ahnafapathan 2 роки тому ⁺⁷
Yannic I simply cannot put into words my gratitude for you. Please never stop.
@ZimoNitrome 2 роки тому ⁺¹⁴
Everyone is biased. As it's difficult to get around that it's good that you disclose it.
Good vid. Keep it up!
@josephclements2145 2 роки тому ⁺⁵
To me biases are intuitive understandings of patterns observed through experience that are often difficult to explain logically. So biases should be carefully evaluated to determine if there are legitimate rules generating the pattern or if our personal perspective is distorting our understanding. Comparing our biases with someone with different biases one way of accomplishing that task.
@oshri8 2 роки тому ⁺⁵
Great video as always.
Fun fact Adi Shamir is the "S" in the "RSA" encryption algorithm.
@WhatsAI 2 роки тому
Amazing video as always Yannic, thank you for sharing and explaining so clearly!
@AhmedNaguibVideos 2 роки тому ⁺³
36:36 omg that’s right, how did I not see that until that point!
@ukhu_pacha 2 роки тому ⁺¹¹
Squishy fippy stretchy feature model, say that 100 times.
@sayakpaul3152 2 роки тому
37:35: Totally agree with the argument on Kernelized SVMs especially if you go higher dimensions. Not only that is backed by good theory but also leads to the implicit bias of SGD with the thought that continuing training will converge to an SVM solution.
@stacksmasherninja7266 2 роки тому ⁺⁵
Kinda unsure how this model scales to multiclass setting. Moreover, how do the dimples explain targeted adversarial examples ? You can make the classifier classify a "cat" image as literally any class in the dataset (by decreasing the loss w.r.t. new target class instead of increasing the loss w.r.t. original target class) using PGD. Any idea ?
@YannicKilcher 2 роки тому ⁺¹
Yes, the model would somehow have to explain which "side" the dimples go around the data in high-dimensional space.
@ericadar 2 роки тому ⁺¹
Why should it be the case that with a sufficiently diverse distribution of training examples and sufficiently complex function (NN) we don’t see convergence on a feature space via gradient descent with k dimensions (k
@YannicKilcher 2 роки тому ⁺²
I think by the fact that our visual system and these models don't use exactly the same architecture means there will always be features picked up by one but not the other.
@sacramentofwilderness6656 2 роки тому
Thanks a lot for this video and very deep and thoughtful discussion! I have a question concerning the adversarial examples - we start from the image, say, of a cat, and after moving some distance, our classifier would say that it is guacamole. But what happens if one moves further in this direction - model becomes more and more confident, that it is guacamole or there can be some other class, like an apple or helicopter, and when moving far along this direction we would have some weird changes of classes, despite from the view of human perception, there won't be anything meaningful depicted. May it be the case, that far from the data manifold one has a lot of sharp and rapidly changing decision boundaries?
@herp_derpingson 2 роки тому ⁺⁷
14:25 This way of thinking is a quite similar to SVM Kernels. There is some plane and you classify stuff depending on which side of the plane the input data lies.
.
31:40 IDK if someone has tried this already. The reason the decision boundaries are so close by is because when SGD sees are a zero gradient it says "good enough" and moves on, even if it is right on the edge. I wonder if we can add a "Its good, but lets go a bit further in the same direction just in case" parameter. IDK how to implement it in SGD though. Otherwise SGD will always go for the minimum energy manifold. Something that makes the SGD push the centroids far from each other, even at zero gradient.
.
35:19 Yes, the squiggly line is another way of drawing a decision boundary but this way of doing has a "higher energy" associated with it. Forgive me for not knowing the technical term for it. I will just call it "energy". Since SGD is a greedy algorithm, it will always minimize "energy". Think about it like how much energy you would need to hammer a metal sheet to make the squiggly line valleys and ravines vs. how much energy you would need to simply dimple the sheet metal at the exact data points.
.
38:47 Up and "dowm" ;)
.
40:40 It can be guacamole, you need a 4D space to visualize it. If you have a sufficiently big D, anything is possible.
.
45:35 I am not sure about the fur thing. What I understand is that any "human recognizable features" are along the manifold and all non-human recognizable features are perpendicular to the manifold. So, by definition if a human cannot see it, then it must be perpendicular to the manifold, otherwise it would bring sufficient change in image space.
.
57:30 The adversarial datapoints are now *pulling* the manifold towards itself instead of pushing it. So, when the adversarial noise is reverted, the image jumps on the other side of the now pulled manifold. It is still at minimum energy.
.
1:00:00 I am not sure about projection. I just think that the SGD like algorithm in human brain is not satisfied at the zero gradient, but continues further and makes the dimple deeper and wider.
.
1:08:00 How do you even define "perpendicular" for a hyperplane? Sorry, I did not do that much math in college. I cannot comment on this. I was planning to see if moving the point towards any centroid and making it adversarial is perpendicular to the optimally small norm-ed adversarial example.
.
IDK I actually find the dimple manifold theory rather convincing. With image augmentations what we do is make these "dimple pockets" bigger. So, adversarial datapoints have to go further in image space before they can get out of the pocket.
.
Infact we can also take this understanding to the double descent paper. Let n be the dimensions of the network and k be the dimensions of the natural image manifold. k should be constant as it as a property of the dataset. As n increases, in the beginning k > n so the NN behaves poorly as it cannot separate everything. As n increases, the performances improves. As n approaches k, the dimple manifold effect takes it true form and it starts overfitting. Then when n becomes sufficiently large, the extra dimensions allow more wiggle room so a small change in image space, causes a large change in manifold space, simply because the space is so high dimensional. Effectively this makes the pockets larger. This in turn prevents overfitting/adversarial examples.
@YannicKilcher 2 роки тому ⁺²
very good points, nice connections to the double descent phenomenon. the way they make things "perpendicular" is that they linearize the manifold around a data sample, using essentially a first order taylor approximation. from there, perpendicular just means large inner product with the normal to that plane.
@hsooi2748 2 роки тому ⁺²
I think instead of calling it perpendicular, we can call it "orthogonal".
Basically means a 90' angle direction to a new dimension, if you visualize in 2D space extending into 3D.
If it is a 4D space, then it is extending into 5D...
@Addoagrucu 2 роки тому ⁺¹
Hey Yannic, could you try the experiment at the end with more randomly generated manifolds, maybe even with more adversarial attacks on your part to see if what you claim isn't possibly an artifact as well?
@YannicKilcher 2 роки тому ⁺¹
Code is linked in the description, have at it 😁
@jsunrae 2 роки тому ⁺³
Ever thought of doing a follow up with the researchers?
@DistortedV12 2 роки тому
Does this method explain the adversarial examples are features not bugs finding from Aleksander Madry, when they trained the classifier on the adversarial examples and it still had good generalization performance?
@senli6842 2 роки тому
they claim they did, but I don't think they did
@004307ec 2 роки тому
To me, it just looks like some classic SVM plot. The hyperplane can be presented as a curve that would sometimes split some highly similar sample points into two groups.
@eugenioshi 2 роки тому
does anyone know which app he's using to do these annotations?
@victorrielly4588 2 роки тому ⁺¹
Very good tests. This is what research needs, people stepping up to demonstrate that most research is bogus. Not that this paper is bogus. It makes reasonable claims, but like with all machine learning papers the support for the claims is far poorer than the authors claim.
We should move to a publication system where publications of reviews of papers, and reproductions of results are at least as valuable, if not more valuable, and more published than original works.
I have an idea of how one might go about validating their claim. Assuming we are working in a framework with binary classification, the main claim of their paper to test is the decision boundary lies close to the image manifold.
They can use an auto encoder to estimate the image manifold. One can also sample from the manifold created by the classification boundaries by selecting images with the constraint that when passing the image through the classifier, the output is close to .5 (half way between the two classifications). Determining whether the two manifolds are similar is then a simple matter of determining whether, for any “reasonable” sample from the image manifold generated from the autoencoder, there is a sample close to the decision boundary, that is also close to this sample, and visa versa.
Essentially, we can characterize both manifolds, if there is also an effective way to determine the difference between the manifolds, you will have your answer.
My suggestions would be: first, instead of using real data, use a toy dataset generated from a predefined and known manifold. This will remove the need for training an autoencoder.
You can estimate the difference between two manifolds by sampling from one, and finding the closest point in the other to your sample, and then sampling from the other and finding the closest point to the first from this sample as well. Do this a bunch of times and this will be something like a least square distance between the manifolds.
Things to keep in mind, the manifold created by the decision boundary will be 2-d while the actual image manifold may be any dimension. In their example, the manifold learned by the auto-encoder is k-d
@victorrielly4588 2 роки тому
I’m sorry, I made a mistake, the manifold defined by the decision boundary is something like d-1 dimensional, because the input is d-dimensional, and there is one constraint on the output. For example, if the model is linear, the input is d-dimensional, and the constraint is x^Tw = c, that manifold is a d-1 dimensional hyperplane. On the other hand, if the problem was a k-class problem, the decision boundary would be something like a d-k+1 manifold?
@victorrielly4588 2 роки тому
In the k class case the decision manifold would be defined as the set of inputs that provide the same score for each classes, (if the last layer is a softmax, all outputs would be .5), an arbitrarily small perturbation of such an input could then be made to send the output to any desired class. That might answer your question of how these authors expect their results to hold for multi class classification problems.
@eelcohoogendoorn8044 2 роки тому
Why cant I find any google hits about the intersection of SAM and adversarial examples yet? Must be because I suck at search because it seems like something pretty obvious to investigate.
@BooleanDisorder 3 місяці тому
Huh, might explain why some image generators had such a hard time creating a pink Labrador for me!
@psxz1 2 роки тому
maybe a random manifold makes more sense in general since it's supposed to be noise anyway. from what little i know from the basics of GANs
@bublylybub8743 2 роки тому ⁺¹
I swell I have seen papers talking about this perspective back in 2019.
@senli6842 2 роки тому
I think the proposed dimpled model explanation is somewhat different from the existing explanation. Of course, such discussion is helpful, but I really can't understand why the decision boundary tends to parallel with the data manifold and puts training data on small dimples, which does not make sense as models with many small dimples are hard to generalize on unseen data. Besides, why these dimples are similar enough in different models so that adversarial examples transfer across models?
@paxdriver 2 роки тому
You're such an entertaining character.
@G12GilbertProduction 2 роки тому
As 3000 dimensions it adversarial sampling data got?
How would you calculated for your mindfeed, Yannic? I bet it is more than 10³³. :D
@DamianReloaded 2 роки тому
I kinda intuit that adversarial attacks have more to do with how cnn's are built than with the data distribution. Because when a cnn outputs a high probability that it's "seeing" a bird, what's really saying is "the feature detectors that make this value high have been activated". So in reality, as long as you can create a feature detector activator, you can produce an adversarial attack. It's not about classes or similarity between clases. Cnn's probably "know" nothing about classes (as we do). They aren't "deep"/robust enough. They're just sensitive to pixel values. EDIT: Monochromatic images should be more resilient to adversarial attacks. Maybe monochrome filters should have greater weight for the final classification (of color images). EDIT: I meant B&W as in 1bit images or posterization.
@swordwaker7749 2 роки тому
Suggestion: training ASMR where you watch model TRAIN and see loss slowly go down.
@MrJaggy123 2 роки тому ⁺¹
Tldr; authors think that the old way of thinking, x, is deeply flawed. Yannic points out "nobody thinks x".
@scottmiller2591 2 роки тому
Maybe SVM margin maximization had a point?
@swordwaker7749 2 роки тому
Israel people are also into machine learning? Nice to see papers from all over the world.
@drorsimon 2 роки тому
Actually, Israel is ranked in the top 10 countries in AI and deep learning when considering the number of publications in NeurIPS.
chuvpilo.medium.com/ai-research-rankings-2019-insights-from-neurips-and-icml-leading-ai-conferences-ee6953152c1a
@swordwaker7749 2 роки тому
@@drorsimon Interesting... Somehow, Russia ranks at number 11 despite the president himself showing sign of support.
@vslaykovsky 2 роки тому
Ok, meet simple dimple in machine learning!
@minecraftermad 2 роки тому ⁺²
i really don't get why they make this so complicated? isn't it just that there are through lines through the neural net that have just a little bit too much strength and when you just slightly increase the color on those through lines it affects the end way more than you'd expect because they stack up. and as to how to fix this... now that's a more difficult question... maybe try cutting up the neural net sideways in the middle...
47:00 and this would make sense in how i think about it because it would directly not favor the through lines for that images. killing them.
@oleksiinag3150 2 роки тому ⁺²
It looks like you were reviewer for this paper, and they pissed you off
@sieyk 2 роки тому
I think a big misconception is that NNs encode information in a sensical way; actually, all information of the input is used, with each colour channel being considered separately. It just so happens that to derail a trained network, it requires modification of specific pixels in specific channels such that the target network activates certain kernels (possibly otherwise entirely dormant kernels) to produce a kernel activation pattern that activates the FC layers in such a way (perhaps exploiting dormant neurons again) that the final layer activates the way the adversarial network wanted.
This may sound complicated, but it's really straightforward. NNs don't necessarily logically group samples together or even know samples are related, they just train to reduce error; obviously, this doesn't apply to tasks that specifically seek to solve that problem. This paper seems to believe that, somehow, a trained network will _understand_ that all blue samples are encoded 'up' and all red encoded 'down', but this doesn't happen in practice generally.
This also plays into underspecification of training data.
@sieyk 2 роки тому
Doesn't the adversarial dataset problem get explained by simply realising that the adversarial noise for the cat was tailored for the original network, therefore the network that classifies that adversarial cat as a dog would require weights that classify the original cat as a cat?
I would assume that (very!) minor alterations to the model architecture would make this adversarial-only training fail.
@Addoagrucu 2 роки тому
non natural pseudo guacamole is my rapper name
@sieyk 2 роки тому
What? Why do they think the simple example shows a 2d set separated by a curved 1d line? 1d lines _cannot_ be curved, it is an intersection of a high-dimensional plane defined by each neuron being a 1d line (in a 2d space) on their own axis, where the neurons all share a common y-intercept. Since a neuron activation is just y = mx + b, where the b comes from the bias of the _next_ neuron.
You can't just _have_ a curved 3d sheet, the maximal curvature of the separation plane is directly related to the number of neurons; which is precisely why it's a straight plane in a high dimension.

Наступне

Автоматичне відтворення

Grokking: Generalization beyond Overfitting on small algorithmic datasets (Paper Explained)