Implicit Neural Representations with Periodic Activation Functions

Поділитися
Вставка
  • Опубліковано 18 гру 2024

КОМЕНТАРІ • 50

  • @mannyk7634
    @mannyk7634 3 роки тому +11

    Very nice work especially the sinusoidal activation. I like to point out Candes in 1997 covered it rigorously in "Harmonic Analysis of Neural Networks" about periodic activation function - "admissible neural activation function". Strangely enough, the paper is not even cited by the authors.

  • @siarez
    @siarez 4 роки тому +17

    How is this compare to just taking the Fourier (or discrete cosine) transform of the signal?

  • @luciengrondin5802
    @luciengrondin5802 4 роки тому +9

    The application of this kind of representations for 3D rendering is fascinating. Could it be that in the future modelers will give up on the polygon+textures model and represent the whole scene with a neural network instead ?

    • @kwea123
      @kwea123 4 роки тому

      It requires huge processing time at the inference stage. Take sdf as example, you'll need to query the network for a huge number of times to find out where the surface is (also depends on the discretization resolution). I think currently it's only good at offline use.

    • @luciengrondin5802
      @luciengrondin5802 4 роки тому

      @@kwea123 I expressed myself poorly. I was thinking of rendering, not modeling.
      More precisely, I was thinking of the problem of rendering scenes with very high poly counts, for instance where a very long draw distance is required. Currently modelers have to use level of details but this technique has limitations.

    • @qsobad
      @qsobad 4 роки тому +2

      @@luciengrondin5802 that is solved by another solution, take a look at ue5

    • @Oktokolo
      @Oktokolo Рік тому

      @@luciengrondin5802 I hope, the network does not have to learn to give the pixel values for a coordinate - but could also learn to give coordinates and pixel values for an index. The real issue would be the compression stage requiring training a network of apropriate size on the scene to be "compressed".

  • @TileBitan
    @TileBitan Рік тому +4

    The music part was outstanding. Audio waveforms are just stacked sinewaves, as opposed to images or text where the input may not be too related to the sine function. So it just feels right to use sine activations and the required tweaks to make that work, instead of ReLUs, but I'm going to be careful with this as even though I have some experience in ML i haven't ever touched anything other than ReLUs, sigmoids, tanh and straight up linear activations

    • @Oktokolo
      @Oktokolo Рік тому +3

      You can aproximate _everything_ with stacked sine waves. All modern video and image compression algorithms are based on that.

    • @TileBitan
      @TileBitan Рік тому

      @@Oktokolo let me rephrase that then. Audio waveforms can be approximated by a relatively SMALL number of stacked sine waves, so it feels natural to use them in NNs. Everything can be approximated by infinite numbers of sine waves, but sometimes it doesn't make sense to do it

    • @Oktokolo
      @Oktokolo Рік тому

      @@TileBitan It obviously makes sense for images as that is how the best compression algorithms use.
      It should also be possible to encode text reasonably well - even though the resulting set of weights is probably larger than the text itself when not encoding input of a huge language model...

    • @TileBitan
      @TileBitan Рік тому

      @@Oktokolo i don't understand. Sounds are different amplitude waves with different frequencies inside the hearing range. Images nowadays can be 100M pixels with 3 times 256 on the BEST case scenario, where relationships between pixels can be really close to nothing. The case is completely different. The text case doesn't really have much to do with a wave.
      They might use FFTs for images but you gotta agree with me, for the same error you need way way less terms for sound than images.

    • @Oktokolo
      @Oktokolo Рік тому +3

      @@TileBitan Doesn't matter whether it looks like it has anything to do with a wave or not or whether adjacent values look like they are in any relation to eachother.
      Treating data as signals and then encoding the signal as stacked waves just works surprisingly well.
      It might not work well for truly random bit noise. But most data interesting to humans seems to exhibit a surprisingly low entropy and can be compressed using stacked sines.

  • @convolvr
    @convolvr 4 роки тому +14

    The first layer Sin(Wx + b) could be thought of as a vector of waves with frequency wi and phase offset bi. After the second linear layer, we have a vector of trigonometric series which look like a Fourier expansions except the frequencies and phase offsets can be anything. Although the next nonlinearity might do something new, we can already represent any function with the first 1 1/2 layers. What advantages does this approach offer vs representing and evaluating functions as a Fourier series?

    • @_vlek_
      @_vlek_ 4 роки тому +4

      Because you can learn the representation of lots of different signals via gradient descent?

    • @luciengrondin5802
      @luciengrondin5802 4 роки тому +5

      @@_vlek_ I think this is it, indeed. Efficient Fourier transform algorithms only work with a regularly sampled signal and, if I'm not mistaken, of low dimension. This machine learning approach can work with any kind of signal, I think.

    • @isodoubIet
      @isodoubIet Рік тому

      Fourier series are linear

    • @convolvr
      @convolvr Рік тому

      ​@@isodoubIetthe Fourier transform is linear. The Fourier series is not. I assume you're implying that the neural net is a fundamentally more expressive by being nonlinear. But the Fourier series is also nonlinear.

    • @isodoubIet
      @isodoubIet Рік тому

      @@convolvr Eh no if you have a smooth periodic signal it's still expressible as a linear combination of Fourier components, so, yes, this is fundamentally more expressive.

  • @IsaacGerg
    @IsaacGerg 4 роки тому +6

    The arxiv has an incorrect reference. The paper states, "or positional encoding strategies proposed in concurrent work [5]" and video mentions a paper in 2020, but reference [5] your current arxiv is C. Ballester, M. Bertalmio, V. Caselles, G. Sapiro, and J. Verdera. Filling-in by joint interpolation of vectorfields and gray levels.IEEE Trans. on Image Processing, 10(8):1200-1211, 2001. I believe this should reference what you list as [35].

    • @kwea123
      @kwea123 4 роки тому

      Yes, Nerf: Representing
      scenes as neural radiance fields for view synthesis uses positional encoding. And they recently published a paper that uses Fourier transform.

  • @Marcos10PT
    @Marcos10PT 4 роки тому +5

    Goodbye ReLU, you had a good run! I feel I have to watch this a few more times to have a good idea of what's going on 😄but it looks like a breakthrough!

  • @paulofalca0
    @paulofalca0 4 роки тому +3

    Awesome work!

  • @rahuldeora1120
    @rahuldeora1120 4 роки тому +7

    Can you please share your code? The link on the project page is not working

  • @anilaxsus6376
    @anilaxsus6376 Рік тому

    yeah i was wondering why people weren't using sin's and cosine's cause i watched a video and the guy explained that, a neural network of L number of layers, and N number of nodes per Layer, which use relu activation, can perfectly match a function with N to the power L number of bends or turning points in its curve (assuming the neural network has a single scalar node output), i guess that is why it failed on the audio, there is a lot of turning point in audio data, so technical the SIREN networks performance can be matched by a large enough relu neural network, so am looking at SIREN as an optimization on the usual relu networks. Am glad i saw this, i will look into it further. i suspect that sinusoidal activation will be useful in domains with some sort of repetition, cause relu act more like threshold switches.

  • @wyalexlee8578
    @wyalexlee8578 4 роки тому

    Love this! Thank you!!

  • @tigeruby
    @tigeruby 4 роки тому +4

    lol at tanh - but very cool general purpose work; i can imagine this being a good exploratory topic/bonus project for intro signal processing courses

  • @tiagotiagot
    @tiagotiagot Рік тому

    How does it compare to using a sawtooth wave in place of the sine wave?

  • @Singularitarian
    @Singularitarian 4 роки тому +3

    What if you were to use exp(i x) = cos(x) + i sin(x) as the activation function? That seems potentially more elegant.

    • @rainerzufall1868
      @rainerzufall1868 4 роки тому

      What would it mean for an activation to have a complex output? Or 2 outputs?

    • @OliverBatchelor
      @OliverBatchelor 4 роки тому +3

      @@rainerzufall1868 Twice as many outputs - just doubling the features. You can do a similar thing with ReLU where you threshold at maximum zero and at minimum zero and split into two parts, I'm not sure it's a whole lot better than just one though...

    • @luciengrondin5802
      @luciengrondin5802 4 роки тому +2

      @@rainerzufall1868 Does the activation function necessarily have to be real ? I don't think so. I think using a complex exponential could help making the calculations and implementation clearer. It could have an overhead computational cost, though.

    • @rainerzufall1868
      @rainerzufall1868 4 роки тому +1

      @@luciengrondin5802 I don't think it would simplicate things if you model it as the activation having 2 outputs, it would need some re-implementation.. and if you instead use 1 complex output and complex multiplication, the libraries are not optimized at all for this.. thus the computational hit would be big, i think..

    • @rainerzufall1868
      @rainerzufall1868 4 роки тому +1

      Also, cosine and sine are the same except for a constant difference in the input, which we could learn from the bias. Thus, i don't think we would add much value. On the flipside, the deritive of sine is cosine and vice versa (with a minus), such that we can just reuse the output from the other in the derivative computation.

  • @volotat
    @volotat 4 роки тому

    Wow, just watch Yannic Kilcher's video on this work and this is fascinating... I bet this work is going to change many things in ML. Please share the code!

  • @NerdyRodent
    @NerdyRodent 3 роки тому

    That’s amazing!

  • @rodrigob
    @rodrigob 4 роки тому +4

    No link to the paper in the video description ?

    • @rodrigob
      @rodrigob 4 роки тому +5

      And project page at vsitzmann.github.io/siren/

  • @aidungeon4539
    @aidungeon4539 4 роки тому +1

    Super cool!

  • @DrPapaya
    @DrPapaya 4 роки тому +3

    Code available?

  • @dann_y5319
    @dann_y5319 6 місяців тому

    Awesome

  • @_zproxy
    @_zproxy 8 місяців тому

    is it like a new jpg?

  • @enginechen7312
    @enginechen7312 4 роки тому +2

    Hi, could I download this video and upload it to bilibili.com, where Chinese students and researchers can visit freely?

  • @sherwoac
    @sherwoac 4 роки тому

    already available implementation at: github.com/titu1994/tf_SIREN

  • @edsonjr6972
    @edsonjr6972 7 місяців тому

    Did anyone try using this in transformers?

  • @sistemsylar
    @sistemsylar 4 роки тому

    post colab