Predicting the rules behind - Deep Symbolic Regression for Recurrent Sequences (w/ author interview)

Поділитися
Вставка
  • Опубліковано 19 січ 2025

КОМЕНТАРІ • 66

  • @YannicKilcher
    @YannicKilcher  3 роки тому +2

    OUTLINE:
    0:00 - Introduction
    2:20 - Summary of the Paper
    16:10 - Start of Interview
    17:15 - Why this research direction?
    20:45 - Overview of the method
    30:10 - Embedding space of input tokens
    33:00 - Data generation process
    42:40 - Why are transformers useful here?
    46:40 - Beyond number sequences, where is this useful?
    48:45 - Success cases and failure cases
    58:10 - Experimental Results
    1:06:30 - How did you overcome difficulties?
    1:09:25 - Interactive demo
    Paper: arxiv.org/abs/2201.04600
    Interactive demo: recur-env.eba-rm3fchmn.us-east-2.elasticbeanstalk.com/

    • @brandom255
      @brandom255 3 роки тому

      Great video!
      "cos mul 3 x" is Polish notation though, not reverse Polish (i.e. "3 x mul cos").

    • @anttikarttunen1126
      @anttikarttunen1126 3 роки тому

      @@brandom255 I guess it's RPN if you read it backwards... ;-) (Noticed the same mistake in the video).

  • @ChaiTimeDataScience
    @ChaiTimeDataScience 3 роки тому +18

    This series is absolutely amazing! Thanks Yannic, your videos just keep getting better!

  • @yabdelm
    @yabdelm 3 роки тому +3

    The paper explanation + interview format is amazing - the paper explanation provides the interesting nitty gritty, and the interview sometimes sheds light in less jargon and more intuitively to the overall concepts discussed in the paper.

  • @volotat
    @volotat 3 роки тому +9

    Extremely interesting line of work. I imagine one day something like that could fully automate process of building mathematical models for any scientific data. Feels like a huge step toward automated scientific discovery process.

    • @Adhil_parammel
      @Adhil_parammel 3 роки тому

      ua-cam.com/video/pkJkHB_c3nA/v-deo.html
      They are having good aproach

  • @majorminus66
    @majorminus66 3 роки тому +2

    Those visualizations in the appendix really look like you're staring into some transcendent, unfathomable principles at the very base of reality. Neat.

  • @me2-21
    @me2-21 3 роки тому +4

    Cool approach to crack pseudo random generators

  • @Adhil_parammel
    @Adhil_parammel 3 роки тому +3

    Having this like website for every data science algorithms with dummy data and inference would be awsome for learners

  • @benjamindonnot3902
    @benjamindonnot3902 3 роки тому

    Really inspiring topics. Interview really well done. Thanks

  • @WahranRai
    @WahranRai 3 роки тому +1

    Ca me fait rappeler la programmation du "compte est bon" du jeu Chiffres et lettres (tv francaise)

  • @billykotsos4642
    @billykotsos4642 3 роки тому +1

    This is you best series, after the always punctual ML news

  • @brll5733
    @brll5733 3 роки тому +4

    Anyone else feel like they haven't seen a paper about decision making, planning, neural reasoning etc. in a long time? Nothing about agents acting in a complex environment?

    • @jabowery
      @jabowery 3 роки тому

      You need a simple model of a complex environment to make decisions otherwise Computing the consequences of your decisions becomes intractable. That is true even if you adopt an alpha zero approach.

    • @brll5733
      @brll5733 3 роки тому +1

      @@jabowery I mean, we have pretty good toy simulations in the form of video games. Another challenge like Deepmind's Starcraft challenge would really help focus the field, imo.

  • @grillmstr
    @grillmstr 3 роки тому

    more like this plz. this is v informative

  • @JTMoustache
    @JTMoustache 3 роки тому

    Neurosymbolic for the win, the return !

  • @ericadar
    @ericadar 3 роки тому +1

    I wasn't familiar with the even/odd conditional function representation discussed 40:45 and had to work it out on paper to make sure I get it. "collatz" generator latex: (n \mod 2) \times (3n+1)+[1-(n \mod 2)]\times(n \div 2)

  • @jrkirby93
    @jrkirby93 3 роки тому +2

    So I tried out the demo and came across an interesting "paradox". When I click the "OEIS sequence" button on the demo, it almost always loads up a sequence and perfectly predicts a solution with 0% error. But when I go over to OEIS and type in a couple numbers, grab a sequence, and slap that into the user input, the results are... usually not great.
    Very rarely does my OEIS sampling strategy yield a sequence this model can solve. Usually the error is huge. Which is going on? Am I somehow only grabbing "very hard" sequences from OEIS? Or are the OEIS sequences that the demo samples coming from a smaller subsection of OEIS that this model can solve reliably?

    • @anttikarttunen1126
      @anttikarttunen1126 3 роки тому

      Many (I would say: Most) OEIS sequences are not expressible as such simple recurrences (with such a limited set of operations). For example, almost any elementary number theoretical sequences, sigma, phi, etc.

    • @rylaczero3740
      @rylaczero3740 3 роки тому +1

      OEIS do have some ridiculous sequences, if I were an alien, I would it was specifically generated by humans to train their ML models

    • @anttikarttunen1126
      @anttikarttunen1126 3 роки тому

      @@rylaczero3740 Well, OEIS has a _few_ ridiculous sequences, but admittedly there are many sequences of questionable relevance, when somebody has dug themselves in a rabbit hole little bit too deep for others to follow. As what comes to the training of AI, I admit that sometimes I like to submit sequences that are "a little bit on fringe", to serve as a challenge to any programmed agents.

    • @anttikarttunen1126
      @anttikarttunen1126 3 роки тому

      @@rylaczero3740 And in any case, the scope of mathematics is infinitely larger than what is "useful" for any application we might think of.

  • @rothn2
    @rothn2 3 роки тому

    Cool paper, interesting observations! I'd be curious to see the UMAP of these (learned) embeddings too. Sometimes these can capture global structure better, whereas t-SNE has a bias towards capturing local structure. The UMAP authors also made a huge effort to justify their approach with algorithmic geometry theory.

  • @fredericln
    @fredericln 3 роки тому

    Opening new frontiers for DL, congratulations! A maybe silly question (as I only watched until 31' so far): is the "6 is a magic number" finding robust to changes in the hyper parameter (fixed at 6 in the example table) "recurrence is limited to n-1… n-6" ?

  • @Emi-jh7gf
    @Emi-jh7gf 2 роки тому

    Couldn't Wolfram Alpha do this for some time? How is this better?

  • @yoshiyuki1732ify
    @yoshiyuki1732ify 3 роки тому

    Would be interested in seeing how it will deal with chaotic systems with some noises.

  • @MIbra96
    @MIbra96 3 роки тому +1

    Hey Yannic,
    thank you for your videos. They have been very helpful for me. I just wanted to ask what tablet and app do you use for reading and annotating papers?

  • @axe863
    @axe863 10 місяців тому

    Why not use Deep symbolic regression as a mapping mechanism between different neural architectures.

  • @harunmwangi8135
    @harunmwangi8135 Рік тому

    Amazing stuff

  • @tclf90
    @tclf90 3 роки тому +3

    amazing, the next step would be able to produce a sequence of prime numbers :P

    • @Adhil_parammel
      @Adhil_parammel 3 роки тому

      Reimann🤔🤔

    • @rylaczero3740
      @rylaczero3740 3 роки тому +1

      At this pace, machines will definitely beat humans at finding secrets of the prime.

  • @victorkasatkin9784
    @victorkasatkin9784 3 роки тому +1

    The linked demo fails to continue the sequence 0,0,0,0,0,0: "Predicted expression: Unknown error"

  • @yourpersonaldatadealer2239
    @yourpersonaldatadealer2239 3 роки тому

    Love your content, Yannic. Would be extremely interested in hearing your thoughts on and seeing any possibly interesting papers regarding AI in cybersec. I’m imagining the types of networks’ that were exceptional at playing Atari games would also be relatively easy to shift domains into a variety of cyber attacks and given how useless the corporate world appears to be at securing data, I’m guessing this will be unbearable for most to endure (at least early on).

  • @rylaczero3740
    @rylaczero3740 3 роки тому

    Next step should be predicting latent space, and when you sample an equation from it, it should give results as expected but also not diverge from other equations sampled from same space.

  • @Eltro101
    @Eltro101 3 роки тому

    Why not represent the numbers as matricies or complex values? Then you could reproduce the group structure of addition or multiplication, in addition to being able to make each number high dim

  • @anttikarttunen1126
    @anttikarttunen1126 3 роки тому

    How hard would it be for the system like this to find the Euclid's gcd-algorithm by itself? With that it could then detect which sequences are divisibility sequences, multiplicative or additive sequences. That is, most of the essential number theoretic sequences, A000005, A000010, A000203, A001221, A001222, A001511, A007913, A007947, A008683, A008966, A048250 (and a couple of thousands others) are either multiplicative or additive. I'm not saying that it would yet find the formula for most such sequences, but at least make an informed hypothesis about them.

    • @anttikarttunen1126
      @anttikarttunen1126 3 роки тому

      While in the "base"-world, how hard it would be to detect which sequences are k-automatic or k-regular?

    • @anttikarttunen1126
      @anttikarttunen1126 3 роки тому

      I mean, could this be done in "AlphaZero way", so that it would find such concepts by itself, without need of hardcoding them?

  • @anttikarttunen1126
    @anttikarttunen1126 3 роки тому

    What is the unary operator "relu" ?

    • @MrGreenWhiteRedTulip
      @MrGreenWhiteRedTulip 3 роки тому

      relu(x) = { x if x>0 , 0 otherwise

    • @anttikarttunen1126
      @anttikarttunen1126 3 роки тому

      @@MrGreenWhiteRedTulip Thanks, that was new to me, as I'm an outsider in this field. Just found a Wikipedia article about ReLU (Rectified Linear Unit) activation function.

    • @MrGreenWhiteRedTulip
      @MrGreenWhiteRedTulip 3 роки тому

      No prob. It’s just used as a function here though, not an activation function!

    • @rylaczero3740
      @rylaczero3740 3 роки тому

      @@anttikarttunen1126 Makes sense. For people in field, all operators except ReLU are outside their common operators.

    • @anttikarttunen1126
      @anttikarttunen1126 3 роки тому

      @@rylaczero3740 I see: that's why people in the field are so prejudiced about most of the mathematical sequences, thinking that they are absolutely ridiculous if not expressible with relu(s). 🤔

  • @drdca8263
    @drdca8263 3 роки тому +1

    Hm, I guess if they wanted to add conditionals to the language in order to make it more able to recognize things using it, e.g. collatz / hailstone / collapse / ((3n+1)/2) sequences , they would have to make the tree support ternary operations. Not sure how much more that would require

    • @drdca8263
      @drdca8263 3 роки тому +2

      The thing about the embeddings for the tokens for integers, makes me wonder if it would be beneficial to (before normalizing them I guess) hard code in a few of the dimensions some basic properties of the integer, such as number of distinct prime factors, number of divisors, whether it is divisible by 2, whether divisible by 3, whether divisible by 5,
      whether it is one more than something divisible by 3, whether it is 2 more than something divisible by 3,
      and similarly for one more and one less than divisible by 5,
      and maybe a handful more,
      and then let the other dimensions of the embedded vector be initially random and learned .
      Would this increase performance, or reduce it?

    • @anttikarttunen1126
      @anttikarttunen1126 3 роки тому +1

      @@drdca8263 Well, having an access to the prime factorization of n would be great (e.g., for 12 = 2*2*3, 18 = 2*3*3, 19 = 19), or figuring that out by itself (of course then you can also detect which numbers are primes). Also, I wonder how utopistic it would be to go "full AlphaZero" with OEIS data (if that even makes any sense?). And whether it would then able to learn to detect some common patterns in sequences, like for example the divisibility sequences (of which keyword:mult seqs form a big subset), and the additive sequences. Sequences that are monotonic or not, injective or not, that satisfy the "restricted growth" constraint, etc, etc. Detecting sequences that are shifted convolutions of themselves (e.g. Catalans, A000108), or eigensequences of other simple transforms.

  • @meisherenow
    @meisherenow 3 роки тому

    Work on discovering scientific laws from data has a very long history in AI--Pat Langley and Herb Simon's BACON system was built 40 years ago, with about as much computing power as a modern toaster.
    Damn I'm old.

  • @Kram1032
    @Kram1032 3 роки тому +1

    I don't think base 12 enthusiasts have a club name lol
    But imo, base 720720 is *obviously* the *clear* way to go as any division involving stuff up tu 16ths are going to be super easy with it 🤓

  • @TheDavddd
    @TheDavddd 3 роки тому

    Try this with digits of PI or some time series that obeys some visible but mysterious pattern :D

  • @veliulvinen
    @veliulvinen 3 роки тому

    14:08 hmmm where have I heard about base 6 being the best fit before... ua-cam.com/video/qID2B4MK7Y0/v-deo.html

  • @viktortodosijevic3270
    @viktortodosijevic3270 3 роки тому +2

    I liked it until he said it was trained for weeks on 16 GPUs 😢

  • @anttikarttunen1126
    @anttikarttunen1126 3 роки тому +1

    Sorry, but I must object what you both at point 1:04:20 seem to suggest, that only the keyword:easy sequences in OEIS (as of now 80236 sequences) are the ones with some logic behind them, and that the rest of the sequences (268849 as of now) were all some kind of "bus stop sequences" with no logic of whatsoever behind them. Certainly the absolute majority (98% at least) of the sequences in OEIS are well-defined in mathematical sense, even though they do not always conform to a simple recurrence model that your project is based on. For example, primes, and most of sequences arising in the elementary number theory. Moreover, although your paper is certainly interesting from the machine learning perspective, its performance doesn't seem to me any better than many of the programs and algorithms listed at the Superseeker page of the OEIS, some of which are of considerable age. (See the separate text file whose link is given in the section Source Code For Email Servers and Superseeker at the bottom of that OEIS page).

    • @anttikarttunen1126
      @anttikarttunen1126 3 роки тому

      Also, Christian Krause's LODA project and Jon Maiga's Sequence Machine are very good in mining new programs and formulas for the OEIS sequences, mainly because they can also search for the relations _between_ the sequences, instead of limiting themselves to just standalone expressions with a few arithmetical operations.

  • @WhiterockFTP
    @WhiterockFTP 3 роки тому +1

    why do you wear sunglasses when watching a screen? :(

  • @julius333333
    @julius333333 3 роки тому

    Your French pronounciation is great for a non native

  • @JP-re3bc
    @JP-re3bc 3 роки тому

    There is little "symbolic" in this thing other than the name.

    • @ThetaPhiPsi
      @ThetaPhiPsi 3 роки тому

      or it is in the data? Tokens are basically representing semantics and the embeddings represent relations in the numbers.

  • @MrGreenWhiteRedTulip
    @MrGreenWhiteRedTulip 3 роки тому

    Jan misali was right all along…