SIREN: Implicit Neural Representations with Periodic Activation Functions (Paper Explained)

Yannic Kilcher

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 2 лип 2024
Implicit neural representations are created when a neural network is used to represent a signal as a function. SIRENs are a particular type of INR that can be applied to a variety of signals, such as images, sound, or 3D shapes. This is an interesting departure from regular machine learning and required me to think differently.
OUTLINE:
0:00 - Intro & Overview
2:15 - Implicit Neural Representations
9:40 - Representing Images
14:30 - SIRENs
18:05 - Initialization
20:15 - Derivatives of SIRENs
23:05 - Poisson Image Reconstruction
28:20 - Poisson Image Editing
31:35 - Shapes with Signed Distance Functions
45:55 - Paper Website
48:55 - Other Applications
50:45 - Hypernetworks over SIRENs
54:30 - Broader Impact
Paper: arxiv.org/abs/2006.09661
Website: vsitzmann.github.io/siren/
Abstract:
Implicitly defined, continuous, differentiable signal representations parameterized by neural networks have emerged as a powerful paradigm, offering many possible benefits over conventional representations. However, current network architectures for such implicit neural representations are incapable of modeling signals with fine detail, and fail to represent a signal's spatial and temporal derivatives, despite the fact that these are essential to many physical signals defined implicitly as the solution to partial differential equations. We propose to leverage periodic activation functions for implicit neural representations and demonstrate that these networks, dubbed sinusoidal representation networks or Sirens, are ideally suited for representing complex natural signals and their derivatives. We analyze Siren activation statistics to propose a principled initialization scheme and demonstrate the representation of images, wavefields, video, sound, and their derivatives. Further, we show how Sirens can be leveraged to solve challenging boundary value problems, such as particular Eikonal equations (yielding signed distance functions), the Poisson equation, and the Helmholtz and wave equations. Lastly, we combine Sirens with hypernetworks to learn priors over the space of Siren functions.
Authors: Vincent Sitzmann, Julien N. P. Martel, Alexander W. Bergman, David B. Lindell, Gordon Wetzstein
Links:
UA-cam: / yannickilcher
Twitter: / ykilcher
Discord: / discord
BitChute: www.bitchute.com/channel/yann...
Minds: www.minds.com/ykilcher
Наука та технологія

КОМЕНТАРІ • 119

@PM-4564 4 роки тому ⁺⁷³
"If you young kids don't know what an RBF kernel is ... you map it into an infinite space using Guassian Kernels ... yeah ... maybe wikipedia is better at that than I am"
@jsonm05 4 роки тому ⁺⁴
I laughed so hard with this one.
@herp_derpingson 4 роки тому ⁺²¹
At the first glance this looks like an incredibly complex paper. Thank you for explaining it so simply.
@donniedorko3336 4 роки тому ⁺²
This a thousand times. I just had my "but couldn't you do it with sine waves" a couple hours ago and this is a great intro
@MiroslawHorbal 4 роки тому ⁺²⁵
I said it before and I'll say it again. Thank you very much for these videos. You are saving me (and I imagine others in the community) a lot of time having to parse through the details of these papers.
@gaoxinlipai 4 роки тому ⁺⁶⁷
Please explain this paper "Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains
", which is highly related to SIREN
@yangzihaowang3205 3 роки тому ⁺⁴
The second half of this talk explained the paper you mentioned. ua-cam.com/video/dPWLybp4LL0/v-deo.html It's from one of the co-authors of the paper.
@PhucLe-qs7nx 4 роки тому ⁺⁷
Was reading this paper last night and get confused about "one image is one dataset" as well. So glad what I finally understood is actually true.
@rahuldeora1120 4 роки тому ⁺⁴
Wow that was quick!! Thanks for this
@PYu-ys8ym 3 роки тому ⁺¹
Thank you for making this videos!!! It surely saves me a ton of time. Thank you again! Please keep making more!!
@andrewmao5747 4 роки тому
Thank you for the simple and intuitive explanation of what at first glance looked like a dense and difficult paper.
@DanielHesslow 4 роки тому ⁺²⁷
A small note on the why the gradient of the SDF is 1 almost everywhere:
The SDF is just the (signed) distance to the closest point, so the gradient will of course point in the opposite direction to that point and and if you move one unit away from that point the SDF will increase by one. Hence the gradient is one. The almost everywhere part is just that there may be multiple points equally far away or that you are exactly at another point.
Also not sure
if it was mentioned but the sign is just representing if we're inside or outside of an object.
@imranibrahimli98 4 роки тому ⁺²
Thank you!
@SunilKumar-zd5kq 4 роки тому
Why does the gradient point to the opposite direction?
@SunilKumar-zd5kq 4 роки тому
Why does the gradient point opposite?
@DanielHesslow 4 роки тому ⁺¹
@@SunilKumar-zd5kq Its the direction where the distance to the closest point increases the fastest.
@xXMockapapellaXx 4 роки тому ⁺¹
Thank you for making this paper so understandable
@sayakpaul3152 3 роки тому ⁺¹
I don't know if it would have been possible for me to understand this paper all by myself. Thank you so much, Yannic.
@alex-nk5dt 3 роки тому ⁺¹
this was super helpful, thank you so much!
@donniedorko3336 4 роки тому ⁺¹
Just saw that you broke this into sections for easier access. You're beautiful. Thank you
@kirak 6 місяців тому
Wow this helps me a lot. Thank you!
@florianhonicke5448 4 роки тому
Great video!
@Twilightsfavquill 4 роки тому ⁺⁷
As a CogSci, I have to recommend you to review the paper "GAIT-prop: A biologically plausible learning rule derived from backpropagation of error" by Ahmad, van Greven and Ambrogini. I feel like a more bio-inspired way of encoding propagation of error signals through processing networks could hold potential for the investigation of functional behavior by for example Drosophila
@proreduction 4 роки тому
Great summary Yannic. I am a PhD student with focus in CNNs for classification of binary images and presented this at journal club. Your explanation of implicit neural representations was inspiring.
@proreduction 4 роки тому
Only change I would make it to emphasize that a(x) is the ground truth around 33:15
@bithigh8301 Рік тому
Nice video!
Yes, start monetizing, your videos are priceless :)
@wonyounglee4417 2 роки тому
Thank you very much
@kristiantorres1080 4 роки тому ⁺¹
dude...you deserve a new table! Let's chip in so you can buy a new one. Very good job explaining this interesting paper. Thank you!
@YannicKilcher 4 роки тому
Haha, don't worry if I treat it nicely it usually returns the favor :)
@wyalexlee8578 4 роки тому
Thank you!
@sahibsingh1563 4 роки тому ⁺¹
Awesome
@hieuza 4 роки тому ⁺¹
13:51 LOL! I like your sense of humor 😉
@twobob Рік тому
Interesting tool
@slackstation 4 роки тому ⁺¹
The fastest gun in the west. I don't know if it's because I asked for this paper but as always, thank you.
@pepe_reeze9320 4 роки тому ⁺³
Great paper. I’m glad, the Broader Impact reads like a persiflage of broader impact. We usually call such text „bullshit bingo“. Simply drop it, people!
@bluel1ng 4 роки тому ⁺¹⁰
As Yannic says at 17:30: sin(x) has been tried many times before, vanilla and also in combinations like SinReLU. I currently doubt that (even with special initialization) vanilla sin(x) outperforms ReLU & variants like SELU in many classic applications (like classification networks). From my personal experience I can confirm that sin(x) converges faster. The differences of the 3D scene reconstruction-results that they show in their paper-video are impressive though. Maybe it is worth trying sin(x) for generative models.
@Kram1032 4 роки тому ⁺²
To me it seems like the idea here isn't to use these SIRENs in the same scenarios where a classic RELU or something would be used, but rather, this could be used to augument data. The usual networks could, instead of the original data, work on the weights of some SIREN network.
It's basically really powerful data compression. An alternative to something like an Encoder-Decoder with a bottleneck: The SIREN is the vector you'd normally have at that bottleneck.
@bluel1ng 4 роки тому ⁺¹
@@Kram1032 Yes, I just felt I have to say that it may not give magic results in all cases, e.g. the tweet by Geoffrey Hinton about this paper could have been a bit misleading in this respect: twitter.com/geoffreyhinton/status/1273686900285603843?s=19
@Kram1032 4 роки тому ⁺⁵
@@bluel1ng Oh yeah, that is very misleading. It works much better than RELUs *for this kind of task* - basically anything where you might suspect fourier analysis to be somehow interesting, I suspect. Since that's kinda all this is: Learning some sort of fourier-like representation of a single given image.
@hyunsunggo855 4 роки тому ⁺¹
@@alexwaese-perlman8788 Right? If adversarial attacks work because of the linearity of a neural network then it should be less vulnerable against them.
@bluel1ng 4 роки тому ⁺¹
@@alexwaese-perlman8788 Yes, the output is bounded and still sin(x) for some epsilon interval is nearly linear (gradient at 0 is 1). Also Important might be that the gradient does not approach 0 asymptotically (as it is the case for tanh or fermi) - that means even when the optimizer jumps too far there is a chance to recover.
@JTMoustache 4 роки тому ⁺⁹
The almost everywhere comes from measure theory, it means that it should hold except for elements of omega with measure zero (the boundary omega_zero does not have measure zero, but a distance to the boundary of zero). But it seems a bit overkill to state it there..
@donniedorko3336 4 роки тому
Good explanation. Thank you
@maloxi1472 4 роки тому ⁺¹
44:04 I think a more intuitive way to formulate this point would be to say that we want the boundary to be a level set for the neural representation, since the signed distance is supposed to be zero everywhere on that boundary
@firedrive45 Рік тому ⁺¹
Few mistakes, around 12:20, its not the Laplacian, since the symbol is the upside down triangle ^2, this is delta, the difference between the ground truth and the result. Also 6:36, subpixel values for images can be done using linear intepolation or other interpolation between the values of the pixels surrounding the subpixel region.
@edbeeching 4 роки тому ⁺¹
I think the broader impact statement is there because this is a NeurIPS submission and they require an impact statement this year.
@shawnpan4901 4 роки тому ⁺¹
Great introduce! And I wonder what tool you use, the highlight and pen line is so comfortable
@AZTECMAN 4 роки тому ⁺²
After watching the first few minutes of your Gradient Origin Networks video, I am realizing something:
SIREN seems alot like a static shader.
That is, a shader (ie shadertoy.com) is defined by input coordinates (x,y) and output colors.
The mapping is typically non-linear.
One major difference is, for shaders, we often use a frame count (time) as an input as well.
However, it's perfectly possible to craft static shaders.
@YannicKilcher 3 роки тому
True. Nice connection.
@heinrichvandeventer357 3 роки тому
Yannic, that function has a similar structure to the mathematics used in physics a function F that depends on coordinates x, function phi, and the gradient of phi w.r.t. x and higher order derivatives. Look at Laplace's equation, the Lagrangian in classical mechanics, and other functionals (oversimplified: functionals map functions to numbers).
@anthrond 4 роки тому ⁺¹
Yannic, are you familiar with Stephane Mallat's work in physically based neural networks? He talks a lot about using wavelet functions to improve the function approximations of neural networks. Sirens' use of sine activation functions reminded me of that.
@CristianGarcia 4 роки тому ⁺⁶
Was just viewing the video from the authors via a Hinton tweet and Yannic already has a video? :o
@joirnpettersen 4 роки тому ⁺³
Is approximating the gradient for the real image as simple as (2/imageSize)(a-b), where a and b are the pixel values to the left and right, and the same for above and below (assuming image function maps take inputs in range 0 to 1)?
Would be really cool to see a neural SDF used in raymarching for games
@YannicKilcher 4 роки тому ⁺¹
almost. have a look at sobel filters
@amandinchyba4269 4 роки тому ⁺⁴
pogchamp
@isbestlizard Рік тому
THis is interesting because if errors can be backpropogated as phase shifts rather than magnitude changes you can have as many layers as you like and the error doesn't decay away
@sistemsylar 4 роки тому ⁺¹
Weird I never understood the math in these papers because I thought it would be way too abstract, but then I realize it's not abstract at all!
@felrobelv 4 роки тому ⁺²
what about using the Laplace transform instead of Fourier? It's more generic
@gabrigamer00skyrim 9 місяців тому
Great video! Isn't the proposed initialization just Uniform Xavier?
@tanguydamart8368 4 роки тому ⁺¹
In the "Poisson image editing section", I do not understand what the training dataset is. But, you said that the data to fit is always (x,y) -> (r,g,b) (or here (x, y) -> (luminosity) since it's gray scale). But in this case we don't know the composite image since we are trying to generate it. So what does phi(x) produce ?
@YannicKilcher 3 роки тому ⁺³
good question. it doesn't have to be rgb, you can also use sirens to fit the gradients, which is what they do here. so you train xy -> gradient of rgb and then you inference xy -> rgb
@PixelPulse168 10 місяців тому
a very important nerf paper
@KhaledSharif1993 2 роки тому
At 33:35 what do you mean by "the more differentiable you make it"? I thought functions were either differentiable or not.
@vladimirtchuiev2218 2 роки тому
I think this is somehow limited by the amount of sine operations, as using it as an activation function uses it a lot less than let's say projecting every input to the fully connected layer with a different sine. The computational cost of the sine can be offset by letting the GPU handle a lot of sines simultaneously. Also, this requires a specific learning rate to work well. Too little and this converges to a flat surface, too large and this converges to noise, therefore I think it is beneficial to use periodic learning rate schedulers here. AdamW with amsgrad also seem to work better than vanilla Adam here. I've tried an MNIST classifier with this, didn't work that well...
@priyamdey3298 2 роки тому
21:43 what about using an exponential function?
@luciengrondin5802 4 роки тому ⁺³
Is it better than a Fourrier transform, though ? Because it seems to me that it does something similar.
@Sjefke3000 3 роки тому
This one seems to represent the data with non-equally spaced frequencies, whereas a Fourier transform is with equally spaced frequencies.
That's the largest difference I can see.
@dingleberriesify 4 роки тому ⁺³
For anyone wondering about RBF kernels...An RBF kernel doesn't really do the infinite dimensional stuff right away. Really, all an RBF (radial basis function; it's a hint!) does is output the distance of two points according to some metric. The normal one people mean when they say RBF is the Gaussian distance. I don't remember the exact form off the top of my head, but it's something like f(x) = [e^ -(x - x_0)] / sigma, where sigma is some scale parameter. This function will output a 1 (if the two points x and x_0 are equal), and will decay to 0 at an exponential rate. In the context of a neural network, the comparison value can be a learned parameter (with the logit obviously being the other input), but the literature I remember reading would normally set these values randomly at init and leave it there. Hope that sates somebody's curiosity!
Postscript:
Whenever you hear the words "kernel" and "infinite-dimensional" in the same sentence, you're in the land of what's called an RKHS, and the kernel they're refering to is a very specific kind of distance matrix. That sort of stuff is relevant for SVM theory, but kind of goes beyond the scope of this comment. To give a brief sketch, if you do something like linear regression on that kernel matrix, you're implicitly searching through a family of (potentially nonlinear) functions defined by the distance functions. So, nonlinear function approximation but the whole operation is strictly convex and solvable closed-form. People often get mixed up with the "infinite dimensional" function space of the RKHS, and the "project the data to a higher dimension" quote which is also associated with SVMs.
@priyamdey3298 2 роки тому
I would suggest people to look at the lecture video of Richard Turner on Gaussian Processes (it's on youtube) to get an excellent understanding of what it really means to go from a finite covariance matrix to an infinite covariance matrix (essentially turns into a distance function comparing two points) which can represent a whole family of curves whose shapes get controlled by the hyperparameters of that function (eg: sigma in case of a RBF kernel). What I wrote might be confusing. Please go check that out. Cheers!
@shairozsohail1059 4 роки тому ⁺¹
Can you use this to resample a signal like an image to get many similar representations of the same image?
@YannicKilcher 3 роки тому
What do you mean by similar representations? Do you mean similar images? I guess that would work for the hole-filling examples.
@user-zu8bx7hq8k 2 роки тому
It seems the initialisation with uniform distribution, also multiply 30 in the sin function are so crucial. If you tried the code, and change the number to i.e (5,6,7 etc), the results just mess up. Does anybody know why the 30 is the good choice, a mysterious?
@etiennetiennetienne 4 роки тому ⁺¹
i am not sure if i understand correctly how they match the gradient of the network, you compute sobel of the output generated over the entire "dataset" (image as x,y)? or you just compute the true gradient dphi/dx, dphi/dy using autodiff? and if you input that into the loss and run backward does not that mean that you need to compute derivative over derivative?
@YannicKilcher 4 роки тому ⁺¹
the gradient of the true image (i.e. your "label") is the sobel filter, yes. And yes, if you match the gradient using gradient descent, you'd need the gradient of the gradient, which you would get using autodiff or an analytic expression, since SIRENS are easily deriveable. At least that's how I understand it.
@etiennetiennetienne 4 роки тому
@@YannicKilcher right! i was trying to make this in pytorch (gist.github.com/etienne87/e65b6bb2493213f436bf4a5b43b943ca) but with autodiff gradient as additional supervision it seems to work less well to fit the image (pbly i did a mistake...). Anyway thanks man your videos are great!
@larrybird3729 4 роки тому ⁺⁵
in the past I fell victim of thinking "sin" could be the holy grail of activation functions but gradient descent would play with that function like a roller coaster, I even fell victim of trying to use quaternions but that is another failed story 😆
@jonatan01i 4 роки тому ⁺¹
How does it perform if we use it for upsampling?
@YannicKilcher 4 роки тому ⁺¹
Idk, I'd like to see that too
@donniedorko3336 4 роки тому
Can anybody explain the initialization? I read the paper, but I'm missing something.
I get scaling the later weights by omega_0, but why do we multiply it back into the weights in the forward pass?
I built one in Julia as an MNIST classifier, and its learning is incredibly fast and stable *only if I don't multiply by omega_0 in the forward pass*
@donniedorko3336 4 роки тому
If anyone's interested, here's the Julia code I finally got to work. Still no idea about the omega_0 in the forward pass so I've ignored it completely (disclosure: semi copypasta'd from the Julia source code for Dense and Conv layers, but it's open-source so I figured nobody would mind)
using Flux
# return array of random floats between (-1,1)
function uniform(dims...)
W = rand(Float64,dims) * 2 .- 1
return W
end
# Generate SIREN dense layer
function SinDense(fan_in::Integer, fan_out::Integer; omega_0=30, is_first=false)
d = Dense(fan_in, fan_out, sin, initW=uniform)
if is_first
params(d)[1] ./= fan_in
else
params(d)[1] .*= sqrt(6/fan_in) / omega_0
end
return d
end
# Helper functions for conv layers
expand(N, i::Tuple) = i
expand(N, i::Integer) = ntuple(_ -> i, N)
# Create SIREN conv layer from known weights
function SinConv(w::AbstractArray{T,N}, b::AbstractVector{T};
stride = 1, pad = 0, dilation = 1, is_first = false, omega_0=30) where {T,N}
stride = expand(Val(N-2), stride)
pad = expand(Val(2*(N-2)), pad)
dilation = expand(Val(N-2), dilation)
fan_in = 1
s = size(w)
for i in 1:(length(s)-1)
fan_in *= s[i]
end
if is_first
w ./= fan_in
else
w .*= sqrt(6/fan_in) / omega_0
end
return Conv(sin, w, b, stride, pad, dilation)
end
# Create SIREN conv layer from size
SinConv(k::NTuple{N,Integer}, ch::Pair{
@shivamraisharma1474 4 роки тому ⁺¹
Has anyone tried making a generative model out of this yet?
@YannicKilcher 4 роки тому
Not sure, since it's always just fitting one data points
@HeitorvitorC 4 роки тому ⁺¹
Thank you a lot for the content, Yannic. I wish you could discuss through your narrative this 2 papers: arxiv.org/abs/1711.10561 and arxiv.org/abs/1711.10566 , as it would be amazing to have a complex analysis in such application of Deep Learning methods (apart from the fact that modern and didactic narratives about Physics Informed Neural Networks are not easy to find in video content with an insiders approach). Of course there is content, but maybe approaches such as yours could provide more achievable understanding for those who are starting in this methodology.
Best regards!
@YannicKilcher 4 роки тому
Thanks for the reference :)
@jorgesimao4350 4 роки тому ⁺⁵
Without reading the paper..it seems that they are simply using a nn to learn a fourie representation of the image seen a sampled field+gradients..
@hyunsunggo855 4 роки тому ⁺¹
That's what I thought as well, kinda.
@eelcohoogendoorn8044 4 роки тому ⁺⁵
No, not really. A single layer of such a network could correspond to a fourier transform, with the weights for each sine encoded in the last learnable downprojection from the hidden state to the output; given that the weights and biases in the layer itself follow the fixed power of two pattern youd find in an fft. However, the frequencies are not predecided but learned, there can be multiple layers, and more importantly, the number of components is much smaller than youd find in a dense fft; with networks 512 neurons wide, thats a boatload less frequency components than youd find in the dense fft of the images they are regressing against.
@kazz811 4 роки тому ⁺²
@@eelcohoogendoorn8044 that's exactly correct. Fitting a Fourier series is a generalized linear regression problem. This is like a weird hierarchical Fourier representation. Is still infinitely differentiable. But is a different beast. But it seems like that's critical to why it crushes the competition.
@tedp9146 4 роки тому ⁺³
I saw a video which was about the fact that you can represent every picture through sine-waves (I forgot how and why). Is this somehow related? (Sorry if this is answered later in this video, I’m writing the comment at minute 17)
@hyunsunggo855 4 роки тому ⁺⁵
Kinda. Sine/cosine functions with different frequencies can be bases and thus can be used to compress data with spatial information along with coefficients through Fourier transform-like algorithms and that's the idea of jpeg compression, I think. In this case, coefficients are replaced with weights and biases. You could say it has a structure of multiple discrete Fourier transform stacked together with learned frequencies and coefficients.
@anthonybell8512 4 роки тому ⁺¹
The initialisation proposed looks like the default weight initialisation in tensorflow: github.com/tensorflow/tensorflow/blob/6bfbcf31dce9a59acfcad51d905894b082989012/tensorflow/python/ops/init_ops.py#L527
@YannicKilcher 4 роки тому ⁺¹
The TF one seems to depend on fan_in and fan_out, the SIREN one only depends on fan_in
@nikronic 3 роки тому
@@YannicKilcher Actually, PyTorch also implemented this in same manner. The reason is that in some networks you want to use fan_out. But still it is applicable. The main difference is that if you use fan_out, the standard deviation of the generated distribution would not be equal to 1 (smaller).
@karchevskymi 4 роки тому
How to use SIREN for image classification?
@YannicKilcher 3 роки тому
That's not possible out of the box
@nuhaaldausari7019 3 роки тому
@@YannicKilcher is it possible to use SIREN to encode an image for image video gerenation for example?
@hyunsunggo855 4 роки тому ⁺²
Kinda reminds me of grid cells in the brain.
@hyunsunggo855 4 роки тому ⁺¹
It would properly learn scales and thus it'd better interpolate/extrapolate. Linear-like activation functions are terrible at extrapolation. I would like to see how it deals with adversarial examples.
@jonatan01i 4 роки тому
Thank you for mentioning that. This grid cell thing seems to be an interesting stuff to know about.
@hyunsunggo855 4 роки тому
@@jonatan01i It's crazy interesting. It seems like deep learning is adapting neuroscience one way or another, even accidentally.
@JuanBPedro 4 роки тому
Here as an example of the kind of things that SIRENs allow you to do: github.com/juansensio/nangs
@smnt 4 роки тому ⁺⁵
2:18 Lol, they must have come from physics. That's the general form of an "action".
@yuyingliu5831 4 роки тому ⁺²
agreed, I am surprised when he says that's abnormal.
@jabowery 4 роки тому
I believe you misspoke at about 33:08 when you said "over the entire image". You should have said "over all the images", right?
@YannicKilcher 4 роки тому ⁺⁵
No, it's over the entire Image. We're just fitting one image using the neural network
@bluel1ng 4 роки тому
@@YannicKilcher A bit off-topic here, but what is fascinating about this form of image representation: You can plot the activity (output) of each neuron for all (x, y) coordinates of the image and also see how the 'contribution' of each neuron develops over the course of the training. Unfortunately I have never seen this in NN courses - I think it is a really nice visualization, especially when done with different activation functions for different layers etc. It also shows immediately the internal 'craziness' (complexity) and limitations of generalization if you look at the output of coordinates outside the domain used during training.
@jabowery 4 роки тому
Ah, OK. I was jumping the gun for the machine learning section, thinking the "dataset" included multiple images to somehow reduce the number of parameters per image.
By the way, you really should monetize. You're excellent at capturing and conveying the essence of most papers.
@Tferdz 4 роки тому ⁺¹
Why FCC and not CNN, since gradients are local and not global?
@dingleberriesify 4 роки тому ⁺³
Because it's a compressed mapping from pixel point to colour value...your local information is your input.
@jorgesimao4350 4 роки тому ⁺²
They are not trying to find spatial regularities/invariants as in cnn..they are simply using a ffn to learn a function thay predicts the value of the field/image+gradients.. this is just glorified curve fitting...no attempt is made to learn natural local representation like oriented edges..which what cnn do..and there is plenty of evidence that brains do that as well..
@kazz811 4 роки тому ⁺²
Because this a direct function mapping from one point in space to one or more values (like pixel values). CNN's exploit the structure of space (i.e. nearby points are similar). Here that pops out of the function(NN) fit. This is more interpolation than machine learning.
@bdennyw1 4 роки тому
Thank you for clearly explaining this paper. It's one that I wanted to dig into but found the math off-putting. The authors should have done a better job of communicating this simple idea.
@Marcos10PT 4 роки тому ⁺¹
Such a shame the authors were probably more worried about seeming clever and professional than making their writing approachable 😔 that introduction says it all!
You explain it so well though! Thank you so much!
@YannicKilcher 4 роки тому ⁺¹
It's also a different field than regular ML.
@nikronic 3 роки тому ⁺¹
Sorry to say this, even though Yannic did great job, the original authors explained it very well too. They have provided how to use code for all cases, ready to run, the original source code and also a short video to explain the core ideas.
@keri_gg 2 роки тому
does any one else speed up his videos cause he speaks so slow?

Наступне

Автоматичне відтворення

Gradient Origin Networks (Paper Explained w/ Live Coding)