CS231n Winter 2016: Lecture 9: Visualization, Deep Dream, Neural Style, Adversarial Examples

Andrej Karpathy

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 2 лют 2016
Stanford Winter Quarter 2016 class: CS231n: Convolutional Neural Networks for Visual Recognition. Lecture 9.
Get in touch on Twitter @cs231n, or on Reddit /r/cs231n.
Our course website is cs231n.stanford.edu/

КОМЕНТАРІ • 28

@leixun 3 роки тому ⁺⁵
*My takeaways:*
1. Visualize patches that maximally activate neurons 2:30
2. Visualize the weights 3:26: for the input layer
3. Visualize the representation space (e.g. with t-SNE) 5:24
4. Occlusion experiments 8:32
5. Visualize activations 10:21
6. Deconv approaches (single backward pass) 15:33
7. Optimization over image approaches 29:15
8. Deep dream 42:40
9. Neural style 51:55
10. Adversarial examples 1:02:05
11. Summary 1:17:47
@ynwicks7142 8 років тому
Wonderful!! Thank you for sharing this stuff online!!!
@neriyacohen7805 Рік тому
about reconstruction:
reconstruction vs running throw the NNW integral solving vs derive
in the sense that every node that was activated gives more information then any node that was not.
x^2 + x +15 dx => 2x + 1 => x^2 + x + n
where n is the missing info resulted from no surviving the the boundary function.
@rudmax13 8 років тому ⁺³
Andrej, thank you very much for these lectures! My question is, did you work with 3D convolutional nets applied to RGBD-images? Do you think it is possible to extract volumetric features from, for example, terrain, for transferring them to deep Q-net to teach robot walk?
@Fatt_Dre 8 років тому ⁺¹
Would image reconstruction be as distorted without pool layers? What model was used in the example reconstruction slides?
Best lecture yet, really strengthened intuitions. Keep it up!
@mynameisZhenyaArt_ 7 років тому ⁺¹
At 37:34, where were shown "peaces of ocean" in the center of image and surrounded with gray color. It looks like at this layer everything is in the center of image. Why is it centered? Maybe because conv-network used 0-Padding and sliding windows?
Also I wonder why gray color? Is it due to color avereging in places where neuron doesn't care what is there (and also it can be anything, what signifies border)?
@yizhao1280 8 років тому ⁺⁹
This is a great video, but I can't follow you without subtitle. Would you please add subtitle on the video? Thanks!
@AvielLivay Рік тому
37:00 wrong, each of the four images are a different set of hyper parameters for regularization. As for different initialization - you should look at the paper at figure 4 - The 9 images are different initializations.
@xianxuhou4012 8 років тому
I still cannot understand how to crop the input images to get the corresponding image crops after we get the deconv feature map.
@NM-jq3sv 7 років тому
So the reason for the wrong classification of adversarial examples is their linear nature. What happens if we use some other activation functions like tanh?
@eluz7903 2 роки тому
54:12 Andrej started to explain style targets and Gram matrix
@EvanZamir 8 років тому ⁺³
Does t-SNE need an entire batch of images (or more generally, data) to create the low-dimensional feature space? With PCA you can create a low-dimensional feature space on a batch of data and then project new data points onto that same space without having to "retrain". Is that true for t-SNE?
I ask because I noticed that scikit-learn has t-SNE as part of its manifold class, but that module does not have a transform() method as PCA does. So, at least, in sklearn, it would seem this is not possible.
My question boils down to this. How would you apply t-SNE in a streaming or online situation where you want to continually update the visualization with new images? Presumably, one would not want to apply the algorithm on the entire batch for each new image.
@andrejkarpathy4906 8 років тому ⁺⁷
+Evan Zamir yes this is possible with t-SNE, but maybe not supported out of the box with regular t-SNE implementations. Normally each point's location is a parameter in the optimization, but you can just as well create a mapping from high-D -> low-D (e.g. neural net) and backprop through the locations. Then you end up with the embedding function and can project new points. So nothing preventing this in principle, but some implementations might not support it as it's a less frequent use case.
@EvanZamir 8 років тому
Interesting, thanks!
@stephanedeny7309 7 років тому ⁺²
If you are dealing with streaming data, you might not want/need to embed all the points in history in a single t-SNE map. As an alternative, you can perform an **online embedding** by following these simple steps:
1. choose a time-window T, long enough so that each pattern of interest appears at least a couple of times in the window length.
2. scroll the window as the data stream in, with a time-step dt much smaller than T. For each position of the window, compute a t-SNE embedding of the data points in the time window.
3. In t-SNE, you need to choose the initial coordinates of the data points in the low-dimensional space (if you use the sklearn implementation [1] in python, the corresponding parameter is "init". By default, the sklearn implementation sets the initial position of the points randomly). Because dt is much smaller than T, two successive embeddings share most of their data points. For all the shared data points, **match their initial coordinates in the present embedding to their final coordinates in the previous embedding**. This step will ensure that similar patterns have a consistent representation across successive embeddings.
Note 1: This method is particularly relevant when dealing with non-stationary datasets (where patterns evolve slowly), because each embedding is specific to the window on which it is computed, ensuring that it captures local structure optimally (contrarily to a full embedding of the whole non-stationary dataset).
Note 2: the embeddings cannot be parallelized because you need the outcome of the previous embedding to compute the next one. However, because the initial condition is well chosen, an embedding typically converges very fast (in a few iterations).
For an example of application of this method to non-stationary time series, see this article [2] (ICLR 2016), where it was applied successfully to identify syllables across development in the songbird.
[1]: scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html
[2]: openreview.net/forum?id=oVgo1jRRDsrlgPMRsBzY
@realGBx64 5 років тому
Isn't the problem mentioned at the end is just overfitting? Like a deep network has much more parameters in it, than how many elements are in the training set?
@ChengZhao 7 років тому ⁺¹
Is there an equivalent way to construct adversarial examples for human vision? If not, what, why not? Would that not indicate that at a fundamental level there is something different in the way human and computer vision work?
@essamal-mansouri2689 5 років тому
Well a big thing with generating adversarial images is that it's usually an iterative process, we ask the network to classify a certain image, then we modify it slightly so that the next time we ask, the network will be less confident of the correct classification. This process could take hundreds of thousands or even millions of iterations before it successfully finds an adversarial example. Not only will it be difficult to find a human that is willing to go through that process, it will also be difficult to measure a "confidence" for a specific classification to know whether your modifications are moving towards an adversarial example.
Another reason is that a lot of efficient ways we generate an adversarial example at the moment, requires that we know the weights of the network that is classifying our image (to perform gradient descent). We don't really know how our brain works, or the "weights" of any connection, or what those connections really represent. So constructing an adversarial example for our brain is not really something we're able to do using any of the methods we would use for an artificial network.
@heyloo0511 Рік тому
Question! (from an absolute beginner):
For the slide shown @24:52, (The bottom example on "Backward pass: guided bakpropagation") Why don't all the negative numbers map to 0? My understanding was that it should automatically be done to deconstruct the image properly.
Thnx.
@mehulajax21 5 років тому
Shouldnt it be inner product and not outer product? @ about 55:20
@RaviprasadKini 8 років тому ⁺²
Adversarial examples are present in human vision too. Right?
@essamal-mansouri2689 5 років тому
We have some "optical illusions" but I can't think of a situation where the same picture with very little changes would start looking like something else completely.
@arjunkrishna8873 6 років тому
why not use l1 normalization for sparsity 36:00
@dummy7150 4 роки тому
Here's the t-SNE link -> cs.stanford.edu/people/karpathy/cnnembed/ mentioned at 7:00
@andyyuan97 8 років тому
why the lec9 lack of subtitle?
@chan2182 8 років тому ⁺⁴
One goose, ten geese :)
@questforprogramming 4 роки тому
@21:00 I guess, in the backward propagation, the matrix containing zeros are in wrong position as the negative values aren't zeroed out.
@tilakrajchoubey5534 Рік тому
I wanted to code all that up like you did but I am not able to do it 🥲

Наступне

Автоматичне відтворення

CS231n Winter 2016: Lecture 10: Recurrent Neural Networks, Image Captioning, LSTM