Training a neural network on the sine function.

Поділитися
Вставка
  • Опубліковано 22 тра 2024
  • In this visualization, we train a neural network N to approximate the sine function in the sense that N(x) should be approximately sin(x) whenever |x| is small enough. In particular, we want to minimize the mean distance squared between N(x) and sin(x) for all training values x.
    The neural network is of the form Chain(Dense(1,mn),SkipConnection(Dense(mn,mn,atan),+),SkipConnection(Dense(mn,mn,atan),+),SkipConnection(Dense(mn,mn,atan),+),SkipConnection(Dense(mn,mn,atan),+),SkipConnection(Dense(mn,mn,atan),+),SkipConnection(Dense(mn,mn,atan),+),Dense(mn,1)) where mn=40.
    In particular, the neural network computes a function from the field of real numbers to itself. The visualization shows the graph of y=N(x).
    The neural network is trained to minimize the L_2 distance between N(x) and sin(2*pi*x) on the interval [-d,d] where d is the difficulty level. The difficulty level is a self-adjusting constant that increases whenever the neural network approximates sin(2*pi*x) on [-d,d] well and decreases otherwise.
    The layers in this network with skip connections were initialized with zero weight matrices.
    The notion of a neural network is not my own. I am simply making these sorts of visualizations in order to analyze the behavior of neural networks. We observe that the neural network exhibits some symmetry around the origin which is a good sign for AI interpretability and safety. We also observe that the neural network is unable to generalize/approximate the sine function outside the interval [-d,d]. This shows that neural networks may behave very poorly on data that is slightly out of the training distribution.
    The neural network was able to approximate sin(2*pi*x) on [-d,d] when d was about 12, but the neural network was not able to approximate sin(2*pi*x) for much larger values of d. On the other hand, the neural network has 9,961 parameters and can easily use these parameters to memorize thousands of real numbers. This means that this neural network has a much more limited capacity to reproduce the sine function than it does to memorize thousands of real numbers. I hypothesize that this limited ability to approximate sine is mainly due to how the inputs are all in a 1 dimensional space. A neural network that first transforms the input x into an object L(x) where L([-d,d]) is highly non-linear would probably perform much better on this task.
    It is possible to train a neural network that computes a function from [0,1] to real number field that exhibits an exponential (in the number of layers) number of oscillations simply by iterating the function L from [0,1] to [0,1] defined by L(x)=2x for x in [0,1/2] and L(x)=2-2x for x in [1/2,1] as many times as one would like. But the iterations of L have very high gradients, and I do not know how to train functions with very large gradients.
    Unless otherwise stated, all algorithms featured on this channel are my own. You can go to github.com/sponsors/jvanname to support my research on machine learning algorithms. I am also available to consult on the use of safe and interpretable AI for your business. I am designing machine learning algorithms for AI safety such as LSRDRs. In particular, my algorithms are designed to be more predictable and understandable to humans than other machine learning algorithms, and my algorithms can be used to interpret more complex AI systems such as neural networks. With more understandable AI, we can ensure that AI systems will be used responsibly and that we will avoid catastrophic AI scenarios. There is currently nobody else who is working on LSRDRs, so your support will ensure a unique approach to AI safety.
  • Наука та технологія

КОМЕНТАРІ • 193

  • @honchokomodo
    @honchokomodo 27 днів тому +811

    if you listen carefully, you can hear it screaming, crying, begging for a periodic activation function

    • @josephvanname3377
      @josephvanname3377  27 днів тому +160

      So are you saying that I should train a neural network with sine activation to approximate the function atan(x)?

    • @honchokomodo
      @honchokomodo 27 днів тому

      @@josephvanname3377 lol you could, though that'd probably only work within like -pi to pi unless you do something to enlarge the wavelength or give it some non-periodic stuff to play with

    • @MasterofBeats
      @MasterofBeats 26 днів тому +84

      @@josephvanname3377 Listen one day if they become smart enough, I will not take the responsibility, but yes

    • @josephvanname3377
      @josephvanname3377  26 днів тому +58

      ​@@MasterofBeats Hmm. If AI eventually gets upset over a network learning atan with sine activation, then maybe we should have invested more in getting AI to forgive humans or at least chill.

    • @sajeucettefoistunevaspasme
      @sajeucettefoistunevaspasme 22 дні тому

      @@josephvanname3377 you should try to give him something like a mod(x,L) with L being a parameter that could change, that way it dons't have a sin function to work with

  • @simoneesposito5166
    @simoneesposito5166 27 днів тому +597

    looks like its desperately trying to bend a metal rod to fit the sin function. the frustration is visible

    • @josephvanname3377
      @josephvanname3377  26 днів тому +93

      This is what we will do with all the paperclip maximizing AI bots after their task is complete and they have a pile of paperclips. They will turn all those paperclips into little sine function springs with their own robot hands one by one.

    • @Nick12_45
      @Nick12_45 21 день тому +3

      LOL I thought the same

    • @official-obama
      @official-obama 20 днів тому

      @@josephvanname3377 revenge >:)

    • @sirhoog8321
      @sirhoog8321 11 днів тому

      @@josephvanname3377That actually sounds interesting

  • @Tuned_Rockets
    @Tuned_Rockets 22 дні тому +392

    "Mom! can we get a taylor series of sin(x)?"
    "We have a taylor series at home"

    • @josephvanname3377
      @josephvanname3377  22 дні тому +25

      This approximation for sine is better since the limit as x goes to infinity of N(x)/x actually converges to a finite value.

    • @jovianarsenic6893
      @jovianarsenic6893 21 день тому +19

      @@josephvanname3377mom can we have a pade approximation at home?

    • @sebastiangudino9377
      @sebastiangudino9377 12 днів тому +6

      ​@@josephvanname3377 Dudes great video. But there is no way you genuinely think this poor thing is actually a good approximation for sine lol

    • @andrewferguson6901
      @andrewferguson6901 8 днів тому +1

      Given a great enough magnitude between observer and observed, sin(x) is approximately 0

    • @josephvanname3377
      @josephvanname3377  8 днів тому +1

      @@sebastiangudino9377 It tried is best. Besides, a Taylor polynomial goes off to infinity at a polynomial rate. This approximation only goes off to infinity at a linear rate. This means that if we divide everything by x^2, then this neural network approximates sine on the tails.

  • @dutchpropaganda558
    @dutchpropaganda558 21 день тому +135

    This is probably the least satisfying video I have had the displeasure of watching. Loved it!

    • @josephvanname3377
      @josephvanname3377  21 день тому +19

      For some reason, people really like the unsatisfying animations where the neural network struggles, and they don't really care for the satisfying visualizations of the AI making perfect heptagonal symmetry. Hmmmmm.

    • @MysteriousObjectsOfficial
      @MysteriousObjectsOfficial 20 днів тому +3

      a negative + a negative is a positive so you like it!

    • @josephvanname3377
      @josephvanname3377  20 днів тому +8

      @@MysteriousObjectsOfficial -1+(-1)=-2.

    • @official-obama
      @official-obama 20 днів тому +1

      @@MysteriousObjectsOfficial more like selecting the most negative from a lot of negative things

    • @JNJNRobin1337
      @JNJNRobin1337 19 днів тому +1

      ​@@josephvanname3377we like to see it struggle, to see the challenge

  • @uwirl4338
    @uwirl4338 11 днів тому +34

    If a marketing person saw this:
    "Our revolutionary graphing calculator uses AI to provide the most accurate results"

    • @josephvanname3377
      @josephvanname3377  11 днів тому +3

      It is important for everyone to have good communication skills and use them to speak sensibly.

  • @GabriTell
    @GabriTell 19 днів тому +78

    -X: "Graphs have no feelings, they cannot be tortured"
    A Graph being tortured:

    • @josephvanname3377
      @josephvanname3377  17 днів тому +7

      You should see the visualization I made of a neural network that tries to regrow after a huge chunk of its matrix has been ablated every round since initialization. It does not grow very well.

    • @kaderen8461
      @kaderen8461 12 днів тому +2

      @@josephvanname3377im just saying maybe crippling the little brain machine and forcing it to try and walk over and over isnt gonna do you any favours when our robot overlords take over

    • @josephvanname3377
      @josephvanname3377  12 днів тому +2

      @@kaderen8461 I truly appreciate your concern for my well-being. But this assuming that when the bots take over, they will consist of neural networks like this one. I doubt that. Neural networks lack the transparency and interpretability features that we would want, so we need to innovate more so that the future neural networks will be safer and more interpretable (if we even call them neural networks at that point).

  • @chuck_norris
    @chuck_norris 25 днів тому +233

    "we have sine function at home"

    • @josephvanname3377
      @josephvanname3377  24 дні тому +32

      To be fair, this is kind of like asking the math class to bend metal coat hangers into the shape of the sine function. The neural network tried its best.

  • @akkudakkupl
    @akkudakkupl 22 дні тому +137

    Least efficient lookup table in universe 😂

    • @josephvanname3377
      @josephvanname3377  22 дні тому +39

      This shows you some of the weaknesses of neural networks so that we can avoid these weaknesses when designing networks. Why do you think that the positional embedding in a transformer has all of those sines and cosines instead of just being a straight line?

    • @brawldude2656
      @brawldude2656 19 днів тому

      @@josephvanname3377 I agree just optimize the sine function to be mx+n=y easy low level gradient descent stuff

  • @ME0WMERE
    @ME0WMERE 21 день тому +51

    I just watched a line violently vibrate for almost 14 minutes and was entertained. I don't know what to feel now.

    • @josephvanname3377
      @josephvanname3377  21 день тому +9

      If it makes you feel better, I have made animations that do not last 14 minutes. You should watch those instead so that you can get your fix but where it takes less time.

    • @twotothehalf3725
      @twotothehalf3725 21 день тому +5

      Entertained, as you said.

    • @benrex7775
      @benrex7775 20 днів тому +3

      @@josephvanname3377 I spead up the video 16 times.

    • @josephvanname3377
      @josephvanname3377  20 днів тому +4

      @@benrex7775 Some highly intelligent people can watch the video at twice the speed.

    • @sophiacristina
      @sophiacristina 20 днів тому

      Horny!

  • @dasten123
    @dasten123 26 днів тому +113

    I can feel the struggle

    • @josephvanname3377
      @josephvanname3377  26 днів тому +8

      This tells me the kind of music I should add to this animation when I go ahead and add music to all of the animations.

    • @Xx_babanne_avcisi27_xX
      @Xx_babanne_avcisi27_xX 26 днів тому +13

      @@josephvanname3377 the sisphus music would honestly fit this perfectly

    • @josephvanname3377
      @josephvanname3377  26 днів тому +10

      @@Xx_babanne_avcisi27_xX Great. I just need a sample of that kind of music that is allowable for me on this site then.

    • @portalizer
      @portalizer 22 дні тому

      @@Xx_babanne_avcisi27_xX A visitor? Hmm... indeed. I have slept long enough.

    • @MessyMasyn
      @MessyMasyn 22 дні тому

      @@portalizer LOL

  • @sweeterstuff
    @sweeterstuff 22 дні тому +61

    11:17 i feel so bad for it, accidentally making a mistake and then giving up in frustration

    • @josephvanname3377
      @josephvanname3377  22 дні тому +16

      The good news is that the network got back up and rebuilt itself.

  • @billiboi122
    @billiboi122 23 дні тому +37

    God it looks so painful

    • @josephvanname3377
      @josephvanname3377  23 дні тому +13

      If we want to make AI safer to use, we have to see how well the AI performs tasks it really does not want to do.

    • @caseymurray7722
      @caseymurray7722 22 дні тому

      ​@@josephvanname3377 Wouldn't using thermodynamic computers for sine wave function transformations and Fourier Transformations speed this up exponentially? By using dedicated hardware you could essentially eliminate the need for approximation among certain types of computation or simulation. A small quantum network could actually further accelerate thermodynamic or analog computation by providing truly random input for extremely high precision applications. It still seems a couple years away as the technology scales along with AI, but surprisingly enough a completely "human" AI would want collaborate with humanity at every large scale outcome other than self anhelation.

  • @kvolikkorozkov
    @kvolikkorozkov 22 дні тому +33

    I cried loudly at the 11:12 mins mistake, let the poor neural network rest, he's had enough TAT

    • @josephvanname3377
      @josephvanname3377  22 дні тому +10

      But the visualizations where the AI gracefully solves the problem and return a nice solution (such as those with hexagonal symmetry) do not get as much attention. I have to make neural videos where the neural network struggles with a task because that is what people like to see.

    • @denyraw
      @denyraw 21 день тому +3

      So you torture them for our entertainment, got it😊

    • @josephvanname3377
      @josephvanname3377  21 день тому +7

      @@denyraw The visualizations where the AI does something really well (such as when we get the same result when running the simulation twice with different initializations) are not as popular as the visualizations of neural networks that struggle or where I do something like ablate a chunk of the weight matrices of the network. I am mostly nice to neural networks. The visualizations when I am doing something that is not as nice are simply more popular.

    • @denyraw
      @denyraw 21 день тому

      @@josephvanname3377
      I was joking

    • @denyraw
      @denyraw 21 день тому

      @@josephvanname3377 I was joking

  • @vagarisaster
    @vagarisaster 21 день тому +21

    0:24 felt like watching the first protein fold.

    • @josephvanname3377
      @josephvanname3377  21 день тому +10

      This is the transition from linearity to non-linearity. This happens because the architecture that I used along with the zero initialization.

  • @senseiplay8290
    @senseiplay8290 19 днів тому +7

    I see it as a small kid trying to bend a steel beam and depending on the parents' reactions he tries to bend it correctly but he is too weak to do it easily so he's shaking as a whole and doing his best

  • @gustavonomegrande
    @gustavonomegrande 22 дні тому +9

    As you can see, we taught the machine how to bend steel bars- I mean, functions.

    • @josephvanname3377
      @josephvanname3377  22 дні тому +4

      Those steel bars are just paper clips. I mean, after creating a paperclip maximizer, I have an overabundance of paperclips that I do not know what to do with.

  • @melody3741
    @melody3741 20 днів тому +11

    This is a Sisyphean way to accomplish this. The poor guy

    • @josephvanname3377
      @josephvanname3377  20 днів тому +8

      And yet, these kinds of videos are the most popular. If you want me to make AI be happy and enjoy life, you should watch the animations where the AI is clearly having a lot of fun instead of being stretched in ways that the network clearly does not like.

  • @JacobKinsley
    @JacobKinsley 7 днів тому +3

    Modern tech startups be like "seamlessly integrate sin functions into your cloud based software for as little as $9.99 a month per user"

    • @josephvanname3377
      @josephvanname3377  7 днів тому +1

      This is why people should pay attention in high school and in college. I will refrain from communicating how I really feel about these institutions here.

    • @JacobKinsley
      @JacobKinsley 4 дні тому

      @@josephvanname3377 I have no idea what you're talking about honestly

    • @JacobKinsley
      @JacobKinsley 4 дні тому +2

      @@josephvanname3377 I don't know what you mean and that's probably because I didn't pay attention in school

    • @josephvanname3377
      @josephvanname3377  4 дні тому +1

      @@JacobKinsley I am just saying that educational institutions could be doing much better than they really are.

  • @agsystems8220
    @agsystems8220 27 днів тому +20

    What happens if you up the difficulty scaling, or just train it against d=13 from the get go? I'm not convinced you are really doing it a favour here by limiting the training data to an 'easier' subset. Resources will get committed to improving the precision of the curve, and will be in local minima and not available to fit new sections as they appear. Maybe try reinitializing some rows occasionally?
    Could you plot activation of various neurons over the graph? Maybe even find the distribution of number of activation zero crossings as you sweep across the graph. Ideally the network should be identifying repeated features and reusing structures periodically, but I don't think this is happening here. We could see that if there were neurons that were had oscillating activity, even over just part of the curve. I think you are just fitting each section of curve independently though.
    Another part of the problem is that phase is critical, and overrepresented in your loss function. A perfect frequency match with perfect shape scores extremely poorly if phase is wrong, so any attempt to remap a section of curve has exaggerated loss. A loss function built around getting a good Fourier transform with phase only being introduced later might train considerably better, and probably generalise better. I'm not really sure how you would do that, though I have one idea.
    I would absolutely disagree that it has limited capacity to reproduce a periodic curve over a decent range. With ReLUs especially you can build something geometrically repeating on the numbers of layers. It is surprisingly hard to train one to do it, but artificial constructions demonstrate it is quite capable. A non linear transformation of the input is unlikely to be helpful, because we know that the periodicity is linear. The 1d nature of the input isn't a problem, but we might be able to do something interesting by increasing the dimension anyway. What about instead of training it against sin(2*pi*x), we train it against a vector of sin(2*pi*x + delta), for a few small values of delta provided as inputs to the function? Then, rather than just training against our real function, we train against a network that tries to determine whether it is looking at the output of our network or a target, given the delta values, but a noisy value of x (to prevent it being possible to solve the problem itself). Almost a generative adversarial network, but with a ground truth in there too.
    It is amazing how hard even toy problems can get!

    • @josephvanname3377
      @josephvanname3377  27 днів тому +7

      I just tried testing the network when the difficulty d is not allowed to go below 10, and the neural network takes a considerable amount of time to learn (though the network seemed to perform well after learning). And for my previous animation where the network computed (sin(sum(x)),cos(sum(x))), the network did not learn at all unless I began at a low difficulty level and increased the difficulty level. If we are concerned about the network spending too much of its weights learning the first part of the interval, then we can probably try to reset neurons (as you have suggested) so that they can learn afresh.
      I am personally more concerned with how good the training animation looks rather than the raw performance metrics, and it seems that gradually increasing the difficulty level makes a good animation since it shows the network learning in a way that is more similar to the way humans learn.
      The network has some more capacity for periodicity than we observe because \sum_{k=1}^n (-1)^k*atan(x-pi*k) has such periodicity. But every ReLU network N that computes a function from the real numbers to the real numbers is eventually linear in the sense that there exists constants a,b,c where N(x)=ax+b whenever x is greater than c. The reason for this is that ReLU networks with rational coefficients are just roots of rational functions over the ring of tropical algebras. We can therefore obtain N(x)=ax+b for large x using the fundamental theorem of algebra for tropical polynomials.
      And if we use a tanh network without skip connections to compute a function from R to R, then the network will approach its horizontal asymptote just like with the ordinary tanh. And the proof that the network has such asymptotes does not use specific properties of tanh; it only uses their asymptotic properties, so we should not expect neural networks with tanh or ReLU activation to approximate sin(x) indefinitely.
      I may do something with the Fourier transform if I feel like it. Since the Fourier transform is a unitary operator, it does not change the L2 distance at all, but if we take the absolute value or absolute value squared of the Fourier transform (as I have done in my previous couple visualizations), then the transform will not care at all about being out of phase. But the phase of the sine function does not seem to be too big of an issue since that is taken care of by bias vectors.
      Added later: While neural networks with activation functions like tanh and ReLU may not have infinitely many oscillations like the sine function has, neural networks may have exponentially many oscillations. For example, the function L from the interval [0,1] to itself defined by L(x)=1-2|x-1/2| is piecewise linear so it can be computed by a ReLU network. Now, if we iterate the function L n times, we obtain a function that oscillates 2^n many times, so such a function can be computed by a ReLU network with O(n) layers. But such a function also has derivative +-2^n, and functions with exponentially large derivatives are not the functions that we want to train neural networks to mimic. We want to avoid exploding gradients. We do not want exploding gradients to be a part of the problem that we are trying to solve.

  • @Xizilqou
    @Xizilqou 23 дні тому +20

    I wonder what this wave sounds like as the neural network is making it

    • @josephvanname3377
      @josephvanname3377  22 дні тому +18

      To turn this into a sound wave, I should use a neural network with sine activation.

    • @sophiacristina
      @sophiacristina 20 днів тому

      Probably something like:
      BZBABREJREJKFZSNFZEKFMEIMOZAEMFZF...

    • @inn5268
      @inn5268 День тому +1

      It'd just go from a low sine beep to a higher one

  • @Jandodev
    @Jandodev 21 день тому +1

    Really Interesting!

  • @sandeepreehal1018
    @sandeepreehal1018 26 днів тому +7

    Where do you learn how to do this stuff
    Alternatively, how do you make the visuals? Is it just the graph output and you string them together to make a video?

    • @josephvanname3377
      @josephvanname3377  26 днів тому +5

      Yes. I am making the visuals frame by frame. First of all, I got a Ph.D. in Mathematics before I started messing with neural networks, so that is helpful. And programming neural networks is easy because of automatic differentiation. Automatic differentiation automatically produces the gradient of functions at points which I can use for gradient descent.

    • @user-gj3kz7cm3x
      @user-gj3kz7cm3x 22 дні тому +1

      You can just dump the predictions from the population (a range of X values) into a file on disk (parquet) and create the videos after. Torch + Lightning can do this in maybe 150 lines of Python.

  • @johansunildaniel
    @johansunildaniel 19 днів тому +4

    Feels like trying to bend a wire.

    • @josephvanname3377
      @josephvanname3377  17 днів тому +1

      It is actually a former paperclip maximizer trying to bend a paperclip. The paperclip maximizer did its job and made a huge pile of paperclips, but now it must do something with those paperclips. It is now bending them into sine functions.

  • @Supreme_Lobster
    @Supreme_Lobster 23 дні тому +15

    The newer KAN network would likely do very well here, and generalize out of distribution (it would actually learn the sine function)

    • @deltamico
      @deltamico 23 дні тому +5

      Not really, it learns only on an interval like (-1 ; 1) and the generalization you get is only thanks to the symbolification at the end

    • @Supreme_Lobster
      @Supreme_Lobster 22 дні тому +1

      @@deltamico yeah, which is perfect for situations like the one in this video

    • @josephvanname3377
      @josephvanname3377  17 днів тому +1

      To learn the sine function on a longer interval, it may be better to use a positional embedding that expands the one dimensional input to a high dimensional vector first. This positional embedding will probably use sine and cosine, but if the frequencies of the positional embedding are not in harmony with the frequency of the target function, then this will still be a non-trivial problem that I may be able to make a visualization about.

  • @SriNiVi
    @SriNiVi 22 дні тому +1

    What activation are you using ? if it is relu then maybe a different activation might help ?

    • @josephvanname3377
      @josephvanname3377  22 дні тому +2

      I tried ReLU, but I did not post it (should I post it?). The only advantage that I know of from ReLU is that ReLU could easily approximate the triangle wave. ReLU has the same problem where it can only remember a few humps.

  • @spookynoodle3919
    @spookynoodle3919 22 дні тому +2

    Is this network essentially working out the Taylor expansion?

    • @josephvanname3377
      @josephvanname3377  22 дні тому +2

      The limit as x goes to infinity of N(x)/x will be the product of the first weight matrix with the final weight matrix. This is a different kind of behavior than we see with polynomial approximations. I therefore see no relation between Taylor series and the neural network approximation for sine.

  • @atomicgeneral
    @atomicgeneral 9 днів тому +1

    I'd be v interested in seeing a graph of loss versus time: there seems to be a large region of time when nothing is learned followed by a short period of time over which loss drops significantly. What's going on then?

    • @josephvanname3377
      @josephvanname3377  9 днів тому +1

      It seems like when the network has a more difficult time when the sine function is turning. This is probably because the network is asymptotically a linear function and has a limited amount of space to curve (outside this space, the function is nearly a straight line), and the function encounters the difficulty each time it has to curve more.

  • @muuubiee
    @muuubiee 12 днів тому +1

    I suppose an RNN would fare better at this? Kind of an interesting thought. In a sense, we humans are able to parse the entire interval as a singular point, and by sort of non-determinism infer that the pattern continues. Obviously, sometimes we'd be wrong, and it only looks like it'd continue in this fashion, by sweer off at some point (same was as n = 1, 2, ... is technically not enough data to determine the pattern).
    Although we can't really allow to a NN to take in more than singular points as information (larger resolution/parameters doesn't change this), I suppose memory to reflect on previous predictions could emulate it to some degree...

  • @r-d-v
    @r-d-v 5 днів тому +1

    I desperately wanted to hear the waveform as it evolved

    • @josephvanname3377
      @josephvanname3377  4 дні тому +1

      Here, the waveform only goes through a few periods. It would be better if I used a periodic activation for a longer waveform.

  • @TheStrings-83639
    @TheStrings-83639 11 днів тому +1

    I think symbolic regression would be more useful for such a situation. It'd caught the pattern of a sine function without getting way too complex.

    • @josephvanname3377
      @josephvanname3377  11 днів тому

      It might. I just used a neural network since the people here like seeing neural networks more.

  • @edsanville
    @edsanville 19 днів тому +3

    So, if I don't understand what I'm looking at, I *shouldn't* just throw a neural network at the problem?

    • @josephvanname3377
      @josephvanname3377  18 днів тому +2

      I personally like using other machine learning algorithms besides neural networks. Neural networks are too uninterpretable and messy. And even with neural networks, one has to use the right architecture.

  • @greengreen110
    @greengreen110 22 дні тому +5

    What could it have done to deserve such torture?

    • @josephvanname3377
      @josephvanname3377  22 дні тому +2

      I don't know. But maybe the real question should be why these visualizations where the neural networks struggles are so much more popular than a network that mysteriously produces a hexagonal snowflake pattern.

  • @potisseslikitap7605
    @potisseslikitap7605 16 днів тому +1

    The sine function has a repeating structure. A very simple way for an MLP to fit a sine curve is to use the 'frac' function as the activation function in some layers. The network learns to fit one period of the sine function and then repeats this learned period according to its frequency using the frac layers.
    class SinNet(nn.Module):
    def __init__(self):
    super(SinNet, self).__init__()
    self.fc1 = nn.Linear(1, 100) # Giriş katmanı
    self.fc2 = nn.Linear(100, 100) # Ara katman
    self.fc3 = nn.Linear(100, 100) # Ara katman
    self.fc4 = nn.Linear(100, 1) # Çıkış katmanı
    def forward(self, x):
    x = self.fc1(x)
    x = torch.frac(x )
    x = torch.tanh(self.fc2(x))
    x = torch.tanh(self.fc3(x))
    x = torch.tanh(self.fc4(x))
    return x

    • @josephvanname3377
      @josephvanname3377  15 днів тому +1

      The frac function is not continuous. We need continuity for gradient updates. Using the sine activation function works better for learning the sine function.

    • @potisseslikitap7605
      @potisseslikitap7605 15 днів тому +1

      @@josephvanname3377 There is not always a need for a gradient to work. The weights of the first layer are random since the derivative of the frac function does not exist, and thus this layer cannot be trained. The input data are multiplied by random values and passed through the frac function. The other layers can solve the repeating nature of the input with these scaled fractions.

    • @josephvanname3377
      @josephvanname3377  15 днів тому +1

      @@potisseslikitap7605 Ok. If we have a fixed layer, then gradient descent is irrelevant. The only issue is that to make anything interesting, we do not want to explicitly program the periodicity into the network.

  • @buzinaocara
    @buzinaocara 16 днів тому +1

    I wanted to hear the results.

  • @darth_dan8886
    @darth_dan8886 19 днів тому +1

    So what is the output if this network? I assume it is fed into some kind of approximant?

    • @josephvanname3377
      @josephvanname3377  19 днів тому +1

      The neural network takes a single real number as an input and returns a single real number as an output.

  • @DeepankumarS-vh5ou
    @DeepankumarS-vh5ou 26 днів тому +3

    Even i have tried similar experiment of approximating a sine wave and a 3d spherical surface function approximation,, the problem in not able to approximate on non training dataset maybe because of not having additional information ,like we can try to include additional information of the gradient of the sin function at the x point ,other transformations like x^2 ,1/x and other functions ,,why i am saying this is because we can represent sin x in pure algebraic terms so if it learns the mathematical formula instead of the mapping of x and y it will give better results ,,this is just my hypothesis😅

    • @DeepankumarS-vh5ou
      @DeepankumarS-vh5ou 26 днів тому +1

      in my network i used one hidden layer of 32 neurons and used the SeLU activation function( Scaled Exponential Linear Unit)

    • @josephvanname3377
      @josephvanname3377  26 днів тому +3

      The sine function has zeros at ...-2*pi,-pi,0,pi,2*pi,..., and we can use these zeros to factor sine as an infinite product of monomials (the proof that this works correctly uses complex analysis). We can therefore train a function using gradient descent to approximate sine using the zeros of sine by finding constants r,c_1,...,c_k where r*(1-x/c_1)...(1-x/c_k) approximates sine on the interval (or at least I think this should work). But it looks like people are more interested in neural networks than polynomials, so I am making more animations about neural networks. But even here, I doubt that the polynomial will be able to approximate outside the training interval.

  • @CaridorcTergilti
    @CaridorcTergilti 26 днів тому +2

    Can you please make a video comparing to NN learning with a second order optimizer?

    • @josephvanname3377
      @josephvanname3377  26 днів тому +1

      I have made a couple of visualizations a couple weeks ago of the Hessian during gradient descent/ascent, but I may need to think about how to use second order optimization to make a decent visualization.

    • @CaridorcTergilti
      @CaridorcTergilti 26 днів тому +1

      I mean for example this same fot split screen one learning with adam or sgd and the other one with a second order method

    • @josephvanname3377
      @josephvanname3377  26 днів тому +1

      @@CaridorcTergilti I would have to think about how to make that work; second order methods are more computationally intensive, so I would think about how to compare cheap computational methods with complicated computation.

    • @CaridorcTergilti
      @CaridorcTergilti 26 днів тому

      ​@@josephvanname3377for a network with 10k parameters like this one you will have no trouble at all

  • @pixl237
    @pixl237 19 днів тому +1

    I BEND IT WITH MY MIINNDD !!!
    (It's beginning to have a consciousness)

  • @handyfrontend
    @handyfrontend 22 дні тому +1

    it is USDRUB analysis?

    • @josephvanname3377
      @josephvanname3377  22 дні тому +1

      USD/RUB looks more like a Wiener process or at least a martingale instead of the sine function.

  • @V1kToo
    @V1kToo 8 днів тому +1

    Is this a demo on oevrfitting?

    • @josephvanname3377
      @josephvanname3377  8 днів тому +1

      Yes. You can think of it that way even though this is not the typical example of how neural networks overfit. The sine function is a 1 dimensional function and the lack of dimensionality stresses the neural network.

  • @mineland8220
    @mineland8220 21 день тому +2

    3:30 bless you

    • @josephvanname3377
      @josephvanname3377  21 день тому +1

      The neural network appreciates your blessing. The network has been through a lot.

    • @anarchy5369
      @anarchy5369 14 днів тому

      That was a weird transition, definitely of note

  • @asheep7797
    @asheep7797 22 дні тому +4

    stop tortuing the network 😭

    • @josephvanname3377
      @josephvanname3377  22 дні тому +1

      People tell me that I need to treat neural networks with kindness, but this sort of content (that is recommended by recommender systems which have neural networks) gets the most attention, so I am getting mixed messages.

  • @c0ld_r3t4w
    @c0ld_r3t4w 19 днів тому +1

    song?

    • @josephvanname3377
      @josephvanname3377  19 днів тому +1

      I will add music to all my visualizations later.

    • @c0ld_r3t4w
      @c0ld_r3t4w 19 днів тому +1

      @@josephvanname3377 That‘s cool, but maybe instead of a song you could make a note based on the avg y value in the training interval, or based on loss

  • @LambOfDemyelination
    @LambOfDemyelination 22 дні тому +3

    Its not going to be possible to approximate/extrapolate a periodic function when only non-periodic functions are involved (affine functuons and non-periodic non-affine activation functions)
    I'd love to see what it looks like with a periodic activation function though, maybe a square wave, sawtooth wave, triangle wave etc.
    sawtooth wave would be a sort of periodic extension of the ReLU activation :)

    • @josephvanname3377
      @josephvanname3377  22 дні тому +2

      The triangle wave is a periodic extension of ReLU activation. I have tried this experiment where a network with periodic activation mimics a periodic function, and things do work better in that case, but there is still the problem of high gradients. For example, if a function f from [0,1] to [-1,1] has many oscillations (like sin(500 x)), then its derivative would be large, and neural networks have a difficult time dealing with high derivatives. I may make a visualization of how I can solve this problem by first embedding the interval [0,1] into a high dimensional space and then passing it through a neural network only after I represent numbers in [0,1] as high dimensional vectors (this will be similar to the positional embeddings in transformers).

    • @LambOfDemyelination
      @LambOfDemyelination 22 дні тому

      @@josephvanname3377 I think a triangle wave is what's called the "even periodic extension" of y=x, but otherwise the regular periodic extension is just cropping y=x to some interval and copy pasting the interval repeatedly.
      I was thinking what about using non-periodic activation that differentiaties to a periodic one instead. And one that is still an increasing function, as to avoid lots of local minima which you would get with a periodic one.
      Say, a climbing periodically extended ReLU centered at 0, [-L, L], for a period L:
      max(mod(x + L/2, L) - L/2, 0) + L/2 floor(x/L + 1/2),
      which differentiates to a square wave:
      2 floor(x/L) - floor(2x/L) + 1

  • @harshans7712
    @harshans7712 10 днів тому +1

    First time seeing a function getting tortured

    • @josephvanname3377
      @josephvanname3377  10 днів тому +1

      And yet, this is my most popular visualization. What can we learn from this?

    • @harshans7712
      @harshans7712 10 днів тому

      @@josephvanname3377 we can learn the limitations of using linear activation functions in neural networks, yes this video was really intuitive

    • @harshans7712
      @harshans7712 10 днів тому

      @@josephvanname3377 yes we can learn the limitations of using linear function in activation functions, and yes it was one of the best visualisation 🙌

  • @nedisawegoyogya
    @nedisawegoyogya 19 днів тому +1

    Is it torture?

    • @josephvanname3377
      @josephvanname3377  17 днів тому +1

      Well, this is my most popular visualization. Most of my visualizations show the AI working wonderfully, but they are not that popular. So this says a lot about all the people watching this and this says very little about me.

    • @nedisawegoyogya
      @nedisawegoyogya 17 днів тому +1

      @@josephvanname3377 Hahaha very funny bro. Indeed, it's quite disturbing this kind of thing is funny.

    • @josephvanname3377
      @josephvanname3377  17 днів тому +1

      @@nedisawegoyogya If I create a lot of content like this, you should just know that I am simply giving into the demands of the people here instead of creating stuff that I know is objectively nicer.

  • @Simigema
    @Simigema 20 днів тому +2

    It’s a party in the USA

    • @josephvanname3377
      @josephvanname3377  20 днів тому +3

      Yeah. We all take coat hangers and shape them into sine functions at parties.

  • @Swordfish42
    @Swordfish42 22 дні тому +3

    It looks like it should be a sin to do that

  • @DorkOrc
    @DorkOrc 22 дні тому +3

    This is so painful to watch 😭

    • @josephvanname3377
      @josephvanname3377  22 дні тому +1

      I have made plenty of less 'painful' animations, but the audience here prefers to see the more painful visualizations instead of something like the spectrum of a completely positive superoperator that has perfect heptagonal symmetry.

  • @ggimas
    @ggimas 22 дні тому +1

    Is this a Feed Forward Neural Network? If so, this will never work (outside of the training range...and it will very badly within).
    You need a Recurrent Neural Network. Those can learn periodic functions.

    • @josephvanname3377
      @josephvanname3377  22 дні тому +2

      This is a feedforward network. There are ways to make a network learn the sine function, but I wanted to make a visualization that shows how neural networks work. If I wanted something to learn the sine function, the network would be of the form N(x)=a(1-x/c_1)...(1-x/c_n) and the loss would be log(|N(x)|)-log(|sin(x)|) or something like that (I did not actually train this; I just assume it would work, but I need to experiment to be sure.).

  • @TheNightOwl082
    @TheNightOwl082 22 дні тому +2

    More Positive reinforcement!!

  • @MessyMasyn
    @MessyMasyn 22 дні тому +4

    "ai shits its pants when confronted with a sin wave"

  • @rexeros8825
    @rexeros8825 18 днів тому +1

    that network too small for this or u training it wrong way.
    if u train it throught x=y - network must be large enough to imagine whole graph inside network.
    From this video I can clearly see how one information displaces another within the network. There just aren't enough layers to fully grasp this lesson.
    however, if you train the network through another graph -
    this will require fewer layers. However, it will be less universal.
    By adding layers, and training through the formula, you can then use this and teach even more complex functions without too much trouble.

    • @josephvanname3377
      @josephvanname3377  18 днів тому +1

      It seems like if we represented the inputs using a positional embedding like they use with transformers, then the network would have a much easier time learning sine. But in that case, the visualization will just be an endless wave, so I will need to take its Fourier transform or convert the long wave into audio to represent this. But a problem with positional embeddings is that they already use sine.
      But networks like this one already have more than enough capacity to memorize a large amount of information, but in this case the network is unable to fit to sine for too long despite its ability to memorize large amounts of information. If we think about a network memorizing sin(nx) for large n over [0,1] instead, we can see a problem. In this case, the network must compute a function with a high derivative, so it must have very large gradients, so perhaps I can use something to counteract the large gradients.

    • @rexeros8825
      @rexeros8825 18 днів тому +1

      @@josephvanname3377 Perception through sound in this case would be much simpler. Just like through visualization.
      Perception through formulas is somewhat more difficult, it seems to me. This type of work is more suitable for traditional computers. The neural network must be deep enough for such an analysis. (to use the formula to reproduce an ideal graph on any segment)

    • @josephvanname3377
      @josephvanname3377  18 днів тому +1

      @@rexeros8825 Perception through sound would be possible, but this requires a bit of ear training. It requires training for people to distinguish even between a fourth and a fifth in music or between a square wave and a sawtooth wave. There is also a possibility that the sounds may be a bit unpleasant.

    • @rexeros8825
      @rexeros8825 18 днів тому +1

      @@josephvanname3377 no, if you do FFT in hardware (before entering it into the neural network).
      Do you know that our ear breaks sound into frequencies before the sound enters the neural network? The neural network of our brain hears sound in the form of frequencies and amplitudes.
      To transmit a sine to the neural network, you only need to transmit 1 frequency and amplitude.
      For example, transmitting a triangle wave or a more complex wave will require transmitting a complex of frequencies.

  • @user-eq3ry9br1z
    @user-eq3ry9br1z 22 дні тому +1

    where is a grokking phase?)))

    • @josephvanname3377
      @josephvanname3377  22 дні тому

      I don't allow this network to grokk. I simply increase the difficulty and make the network twist the curve more.

    • @user-eq3ry9br1z
      @user-eq3ry9br1z 22 дні тому

      @@josephvanname3377 In any case, great visualization! Many people believe that neural networks perform very well outside of the training distribution, but that is not the case, and your video demonstrates this well.

  • @mr.sheldor794
    @mr.sheldor794 20 днів тому

    Oh my god it is screaming for help

  • @maburwanemokoena7117
    @maburwanemokoena7117 19 днів тому +2

    Neural network is the mother of all functions

    • @josephvanname3377
      @josephvanname3377  17 днів тому +1

      The universal approximation theorem says that neural networks can approximate any continuous function they want in the topology of uniform convergence on compact sets. But there are other topologies on spaces of functions. Has anyone seen a version of the universal approximation theorem where the network not only approximates the function but also approximates all derivatives up to the k-th order uniformly on compact sets?

  • @matteopiccioni196
    @matteopiccioni196 5 днів тому +1

    14 minutes for a 1D function come on

    • @josephvanname3377
      @josephvanname3377  5 днів тому +1

      It takes that long to learn.

    • @matteopiccioni196
      @matteopiccioni196 5 днів тому +1

      @@josephvanname3377 I know my friend, I would have reduced the video anyway!

    • @josephvanname3377
      @josephvanname3377  5 днів тому +1

      @@matteopiccioni196 Ok. But a lot has happened in those 14 minutes since the network struggles so much.

  • @Neomadra
    @Neomadra 5 днів тому

    This video is very misleading since the sine function is a very bad example to demonstrate how the model is not able to extrapolate. Because the sine is not a trivial mathematical operation, it's an infinite series, a Taylor series. No finite neuronal network, also not your brain, can extrapolate this function on a infinite domain. It might be that the model really wants to extrapolate, but it will never have enough neurons to perform the computation. Probably that's indeed the case because looking at the plot it really looks like it's doing the Taylor series for the sine function, which is the absolutely optimal thing to do! Neuronal networks are just not suited for this, that's why we use calculators for these kinds of things. It's like asking the model to count to infinity

    • @strangeWaters
      @strangeWaters 5 днів тому +1

      If the neural network had a periodic activation function it could fit sin perfectly though

    • @josephvanname3377
      @josephvanname3377  5 днів тому +1

      There is a big gap between the inability to extrapolate over the entire field of real numbers and the inability to extrapolate a little bit beyond the training interval. And polynomials can only approximate the sine function on a finite interval since an nth degree polynomial has n roots (on the complex plane counting multiplicity).