Can you really use ANY activation function? (Universal Approximation Theorem)

Поділитися
Вставка
  • Опубліковано 14 січ 2025

КОМЕНТАРІ • 65

  • @Kokurorokuko
    @Kokurorokuko 6 годин тому +5

    Interesting take-home message. I would never think that just the non-linearity itself is so important.

  • @goatfishplays
    @goatfishplays 8 годин тому +6

    Anyone else expect to hear like lecture hall applause at the end of the video lmaooo, that was really good

  • @benwilcox1192
    @benwilcox1192 3 години тому

    Great educational video! I expected this to have a lot more views; keep it up and you'll grow quickly!

  • @oleonardohn
    @oleonardohn День тому +51

    As long as it is not a linear activation, it should approximate arbitrary functions when you increase the number of parameters. The reason why linear activations do not work is that the system becomes a linear transformation, so it can only approximate linear functions.

    • @PrematureAbstraction
      @PrematureAbstraction  День тому +8

      I made a short about this! ua-cam.com/video/eXdVAAFCkHU/v-deo.html

    • @FunctionallyLiteratePerson
      @FunctionallyLiteratePerson 22 години тому +15

      It can be linear for digital systems, given floating point inaccuracies. They won't be the most effective, but they do work to some extent! (see a video called GradlEEEnt half decent on UA-cam)

    • @fnytnqsladcgqlefzcqxlzlcgj9220
      @fnytnqsladcgqlefzcqxlzlcgj9220 21 годину тому

      God I love that video so much​@@FunctionallyLiteratePerson

    • @PrematureAbstraction
      @PrematureAbstraction  15 годин тому +1

      @@FunctionallyLiteratePerson All of Suckerpinch's videos can be really recommended!

  • @GrinddMaster
    @GrinddMaster День тому +14

    Can't believe I watched this video for free.

  • @ProgrammingWithJulius
    @ProgrammingWithJulius День тому +8

    Stellar video! As another UA-camr who recently started, I wish you all the best :)
    I know now how much effort it takes to make these videos. Great use of manim, too.

    • @PrematureAbstraction
      @PrematureAbstraction  15 годин тому

      @ProgrammingWithJulius Thank you! Your videos also sound fun, subbed. :)
      Yeah getting started with manim was a pain at first, but after two or three videos you're really picking up speed.

  • @The_JPo
    @The_JPo 15 годин тому +1

    keep it up. i watch a lot of these sort of videos, and normally they don't pull me in. this one did

  • @AIShipped
    @AIShipped День тому +2

    This is a great video! Worth the effort. I would love to see more on different activation functions and their performance if that is the direction you would like to go

  • @milandavid7223
    @milandavid7223 19 годин тому +4

    Honestly the most surprising result was the performance of sine&square

  • @IchHabeGerufen
    @IchHabeGerufen 22 години тому +1

    bro this is such a good video.. Nice voice, nice animations and overall stile... I wish you all the best. Keep it up!

  • @mmmusa2576
    @mmmusa2576 15 годин тому +3

    These is really good. If this really an AI voice it’s so natural lol

  • @eliasbouhout1
    @eliasbouhout1 4 години тому

    Didn't even notice it was an AI voice, great video

  • @EternalCelestialChambers
    @EternalCelestialChambers 11 годин тому +3

    Can you please give me the source of the lecture from this timestamp? 0:32
    Prof. Thomas Garity?

  • @SpinyDisk
    @SpinyDisk День тому +8

    How does this only have 700 views?!

  • @jasperneo1
    @jasperneo1 День тому +4

    Just watched the video, and I am shocked this does not have thousands of views

  • @quadmasterXLII
    @quadmasterXLII День тому +12

    Did the minecraft activation have zero gradient everywhere? (b/c cubes lol)

    • @PrematureAbstraction
      @PrematureAbstraction  День тому +2

      @@quadmasterXLII Interesting question! I used PyTorch for the implementation, and if you don't explicitly define the gradient, it will use its autograd feature. You can read in the docs what the applicable rules for this are, but to make it short, it will estimate/interpolate a reasonable continuous gradient from the sampled values.

  • @Blooper1980
    @Blooper1980 2 дні тому +7

    Neat.. More please

  • @doormango
    @doormango 23 години тому +4

    Hang on, don't we need non-polynomial activation functions for the Universal Approximation Theorem? You gave x^2 as an example activation function...

    • @schmeitz_
      @schmeitz_ 9 годин тому

      I was wondering the same..

    • @Kokurorokuko
      @Kokurorokuko 6 годин тому

      I'll wait for the reply here

  • @sutsuj6437
    @sutsuj6437 День тому +5

    How did you define the derivative of the Minecraft activation function to use in backprop?

    • @UQuark0
      @UQuark0 18 годин тому +3

      Maybe numerical differentiation? Literally taking a neighboring height and subtracting

    • @PrematureAbstraction
      @PrematureAbstraction  15 годин тому +2

      UQuark0 is correct, I just let PyTorch autograd do its thing.

  • @RaaynML
    @RaaynML День тому +39

    AI Generated voices teaching people how to do machine learning, what a time to be alive

    • @SpinyDisk
      @SpinyDisk День тому +11

      What a time to be alive!📄📄

    • @4thpdespanolo
      @4thpdespanolo День тому +8

      This is not a generated voice

    • @RegularGod
      @RegularGod 9 годин тому +1

      ​@@4thpdespanolo it 100% is, and the uploader confirmed this in the comments section of the video preceding this one in their library.
      Someone asked, and they replied something like 'Yes, it is synthesized (Elevenlabs)'

  • @purplenanite
    @purplenanite 12 годин тому

    I wonder if you could use this to evolve a good activation function

  • @gokusaiyan1128
    @gokusaiyan1128 День тому +5

    Subbed !! I think i like this channel. hope it grows

  • @peppermint13me
    @peppermint13me 6 годин тому

    What if you used a neural network to approximate the optimal activation function for another neural network?

  • @skeleton_craftGaming
    @skeleton_craftGaming 4 години тому

    No the random inputs i use, use the count of alpha particles

  • @novantha1
    @novantha1 12 годин тому

    🤔
    Makes me wonder how the performance would be if there were some sort of gating mechanism for choosing the most appropriate activation function for any given situation.

  • @stephaneduhamel7706
    @stephaneduhamel7706 14 годин тому

    So, max pooling with no further activation function would probably work just as well?

  • @jonathanquang2117
    @jonathanquang2117 День тому +1

    I'd also be interested in a half-formal proof of the universal approximation theorem instead of just empirical results. Nice video though!

    • @PrematureAbstraction
      @PrematureAbstraction  15 годин тому

      I thought about including it. Sadly, it's very technical and often limited in its "direct" applicability. E.g. in the theorem itself it is more important that you have enough neurons, not which activation function is used. In practice then you mainly experiment with the number of layers and see what sticks, instead of a theoretical derivation.

  • @foebelboo
    @foebelboo 19 годин тому +1

    underrated

  • @brummi9869
    @brummi9869 День тому +2

    How did you train the minecraft network? Doesnt it have the same issue as the step function witha derivative of 0 everywhere?

    • @markusa3803
      @markusa3803 17 годин тому

      Pretty sure he used each blocks height as a single datapoint, connected linearly.

    • @PrematureAbstraction
      @PrematureAbstraction  15 годин тому

      Almost, I implemented it as a step function, but did not explicitly define the backwards routine. So PyTorch autograd takes over with subgradients and continuous interpolation (see their docs for the rules).

  • @language-qq8xv
    @language-qq8xv 21 годину тому +1

    i thought it was minecraft 100 days challenging video. i'm too brain rotted

  • @coolplay20
    @coolplay20 День тому

    high quality educational vid. 🎉Subscribed Thanks for it

  • @MrEliteXXL
    @MrEliteXXL 21 годину тому

    I wonder how would the minecraft+max pooling perform

    • @PrematureAbstraction
      @PrematureAbstraction  15 годин тому +1

      In my experiment, it worked about as well as minecraft+avg pooling (a few percent better).

  • @sssssemper
    @sssssemper День тому +2

    ah great video! new sub

  • @starship9874
    @starship9874 11 годин тому

    Took me a while to realize the voice was AI

  • @eli_steiner
    @eli_steiner День тому +1

    how do you only have such few subs 😶

  • @aeghohloechu5022
    @aeghohloechu5022 День тому

    Nvidia 6090 rushing to implement this as dlss 6 instead of adding 2 more giagabytes of vram:

  • @john.dough.
    @john.dough. День тому +1

    this is great! :0

  • @paulwaller3587
    @paulwaller3587 День тому +1

    obviously not any function will work, the fuctions have to be a unital point-separating subalgebra

    • @jacobwilson8275
      @jacobwilson8275 День тому +1

      Which is a very lax restriction.
      It feels a little pedantic to be so clear.

    • @Galinaceo0
      @Galinaceo0 День тому +1

      @@jacobwilson8275 i think it's important to be clear when explaining these things to new people as they might get misconceptions otherwise. Maybe you don't need to be as precise as this, but just saying "nice enough functions" might get the idea across.

    • @jacobwilson8275
      @jacobwilson8275 23 години тому

      @@Galinaceo0 agreed

  • @birdbrid9391
    @birdbrid9391 День тому

    Approximation*

  • @archonicmakes
    @archonicmakes 3 години тому

    subbed :)

  • @thecoldlemonade3532
    @thecoldlemonade3532 8 годин тому

    great video but AI voice :(

  • @Neil001
    @Neil001 День тому

    Your videos are really great but i'd really rather listen to your real voice, the AI one is just too jarring