A gentle reminder at 59:57: We're all inherently capable of complex calculations, even if we can't explicitly articulate or write them down on paper. For instance, when a toddler grows into a kid and catches an apple, they're unconsciously calculating its trajectory and acceleration. This innate ability is a testament to our brain's capacity to process and understand physics, even without formal mathematical training. Just a humble reminder that may be .. fruitful.
A dialogue between Schmidhuber and Wolfram or Chaitin or Friston on algorithmic information theory, computational irreducibility, free energy principle and learning would be interesting and insightful.
@@h.b.1285wolfram might be a cool guy and Schmidthuber might be shunned by academia as well, but I seriously doubt wolfram could hold schmidthuber the water regarding technical details. he often slips when it comes to these things, and Schmidthuber seems to arrogant to let these things slide
Actually, the reason why people/groups like Ilya, the NSA and others who are actually doing frontier and related deep learning development, which Jürgen is NOT, realize that its currently 1,500% per year perf/cost growth rate (and growing an additional 144% year-over year currently based on stacked exponential or what is called tetration based growth) is going to provide AGI and beyond starting in the coming year is that we are actually creating it and putting guardrails and other other protections on it. Similarly, if you are wondering why Ilya started his own straight path to ASI (artificial super-intelligence) in 3 years company is that: (1) OpenAI was about to bring on NSA as babysitters and investor/customers, and (2) he already had designed the closed-loop synthetic data training and optimization system, which Strawberry and Orion are built on, and (3) other than fine-tuning and evolving protection, the first AGI was 75% done by the time he left OpenAI. As such, there are two groups, ones that know what's happening and (2) those who are going to be continually surprised by what happens.
26:24 Interviewer: **mentions something someone said on Twitter** Schmidhuber: Yeah, my mom said that in the 70's. That's a whole new level of Schmidhubering!!
There are infinitely many Python programs no particular build of its interpreter can execute, and only finitely many it can execute, for the simple fact that there are architectural limitations in hardware and software, like the wideness of the internal source file offset counter. Even yet, before reaching that limit, there's the file system limitation on the size of files, irregardless of storage space. There are limits everywhere in any concrete hardware and software implementation of anything.
Wow. The OG. The Don. Doctor I really did it first. Big daddy. No diddy. Such a fan. I was literally just re-reading his self-referential weight paper. I really hope he expands on that line of research, maybe improve upon test-time training layers. One more nickname for the legend. Mr I can actually knock out all the opps 😂. Why is he so buff 😂. Cheers Doc. Jokes aside, the greatest mind in modern deep learning. I truly enjoyed this conversation.
So far the scaling laws crowd has been right and caught the rest of the ML community off guard, but curious to see who turns out to be right going forward!
@@and1play5 Yeah, they're struggling really hard with their hundreds of billions of investment, accelerating the aggregated prediction of ML experts of when we'll get AGI by 18 years and casually smashing the Turing test 🤡
Been reading this guy's binding problem lately. He's got deep ideas. Thanks for the fantastic interview mate. Ill be reading other work of his for the next days
His one of the best computer scientists i know he inspires me. Really loves his work and proud of his achievements. I wish he could lecture me. I can listen to him for hours without end.
I have to say that Schmidhuber's AIT perspective has seriously shaped my perspective on computer science, physics, philosophy, and mathematics to the point where I find myself immediately framing my thought in terms of compression. It's really surprising that more AI Researchers don't explicitly develop the mathematics of compression. I am aware of the vast literature on AIT and also the relation of its bounded version to complexity and cryptography, but compression seems so fundamental I have a suspicion that there lies a grand body of new insights lying just below the tip of the iceberg. - Can compression be framed in a purely algebraic language, that is, is compression fruitfully viewable as a functor? - Given two classes of formal objects A, B is there an interesting structure on the space of their compression functions [A, B]? - Given a formal object C what is the structure of the classifying space of the compressions relative to that object? - Is there a geometric aspect to compression? Assuming the points of our space are programs, how would a compression mapping be characterized? Maybe something like p' is a compression of p if p' is 'smaller' but every limit point of p' is the same as p? I'm obviously just spiralling, still FASCINATING stuff.
It's fascinating to learn about neural and non-neural AI in depth! Anyone dealing with sophisticated machine learning must comprehend the nuances of AI models such as Transformers and LSTMs.
At the moment, my self-coding system switches between approaches to solve problems through real-time regulation of parameters. It must be possible to sub-optimize this in very small self-optimizing loops. Self-optimization with a Test-Driven Development approach can optimize an AI that switches between Artificial Neural Networks (ANNs), Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM), Gated Recurrent Units (GRUs), Transformers, Reinforcement Learning (RL), Generative Adversarial Networks (GANs), Support Vector Machines (SVMs), and Random Forests in real-time. By leveraging the strengths of each technique, we can create a robust AI system that adapts dynamically to different tasks and data types.
Just to state some inspirational things: We didn’t know how birds “worked” before creating a bird much faster (airplanes); and we are now able to connect lab grown Biological Human Neurons to Silicon.. the number of neurons needed; well.. a conscious AI with super intelligence is likely for this decade.
Great discussion on the challenges of backpropagation, especially in relation to high-frequency learning problems like parity rules. It seems intuitive that backprop is limited in frequency representation, akin to Fourier transform constraints. Parity learning, XOR, and other high-frequency patterns involve rapid oscillations that backprop struggles to capture. Evolutionary algorithms, random search methods, and gradient-free optimization (like Simulated Annealing) seem more adept at exploring the solution space in these scenarios. Specialized architectures, like LSTMs, can also help, but the trade-offs in efficiency are significant. Fascinating topic-thanks for touching on it!
Can we please get Schmidhuber in a room with John Carmack and Rich Sutton. He, and they, are clear-thinking pragmatists sincerely angling for the path through the hype. Keen-Huber Technologies... I'd buy stock in that.
Wisdom comes not only from knowledge but how you use it together with your logic and what's is happening in the present to reach the aha moment. Things have to move into a devine Fantasy.
A great talk! From my basic understanding, our most powerful supercomputers are only as good as the algorithmic models that run them. In fact, one could say that our computers are just ways to test code (or algorithmic or mathematical models of our world). In light of these discussions, Max Bennett's book, "A Brief History of Intelligence: Evolution, AI, and the Five Breakthroughs That Made Our Brains" is worth reading (so is my book: Origins of Life's Sensoria, but I won't promote it). When we search down these rabbit holes, we discover that there are different types of "intelligence" that include procedural, social, artistic (imaginative) and analytic to name only a few (it also seems like higher intelligence needs a self-critic). These all demand that we understand how intelligence evolved as elegantly described in Max Bennett's book. Behind this veil of understanding lies the problem of how we sense and connect to the world. This is where an understanding of how our senses and neurons work is critical to any future development of AGI. In the final analysis, it is how well our algorithms mathematically model our world (with or without including probabilities or fuzzy set theory, etc.). It could be algorithms all the way down instead of turtles!
He is wrong about Mamba at 1:20:20 because it does use hidden state. State space model is practically an RNN model, which can solve the parity problem.
Thank you soo much for posting a video about reality... I understand the math and on AI and its FAR from AGI... But marketing is pushing the fantasy...thanks for the dose of rational...😊
The magic isn’t that we got a machine to think, the magic is in realizing there is no such thing as thinking. AI has shown us that there are no facts, only metaphors.
Very unsettling opening speech: "Transform the entire visible universe...." If that is not far fetched, I do not know what is. Human created AIs will likely do no such thing. Far more likely is that human society will self terminate before it reaches this level of technological sophistication.
@ 0:50 That looks like a shroud in the background, I think you were sailing then, what kind of sailboat? I have a Capri 14.2 I just went sailing on a few days ago on a small lake near where I am at in Arizona. Thank you for the talk with Jürgen Schmidhuber. It is fascinating, the November 1997, highly cited paper on LSTM and the discussion of "policy gradient for LSTM". It seems a lot of different terminology is used for effectively the same function (I am wondering if Proximal Policy Optimization - PPO, is similar, I have a lot to learn, as always). Jürgen makes a great point on how to reconcile the different terminology is via the math. Keep up the great work!
Mamba should be able to solve the parity problem, if we take state_t+1 = e^(pi*i*delta_x_t)*state_t + 0 with state_0 = 1, if delta_0 = 0 and delta_1 = 1, this would multiply the state by -1 when we encounter 1 and will leave the state as it is when we will encounter 0, at the end one will resemble parity and -1 will resemble odd number of 1s
Great conversation. I know some more content is forthcoming. Did you ask him questions about diffusion, which is the other "species" of AI that is making waves? Can the channel or the community of commentors recommend any conversation of MLST where deep questions are asked about diffusion?
About the NAND gates argument (8:00), it's not that simple. ICs use them, but they also use synchronized sequential elements like flip-flops, which cannot be emulated in a neural net (unless you start making loops and taking a lot of care to synchronize them, which would make supervised deep learning much, much more complex). I'd also stop overusing the word Turing to express sequentiality, because it's more vague. As for predicting that the future could be more surprising than we think, that's no prowess as it's always been the case.
I found that argument very unconvincing anyway, given that it sounds just like a variation of the old "multi-layer perceptrons can approximate any function" argument to me.
@@unvergebeneid Multi layer perceptrons can approximate any smooth function pretty well though, it's a theorem (granted you let the number of perceptions be extremely large)
@@srivatsasrinivas6277 yes but we also saw that it doesn't mean much for our practical advances in the field. For how long did people stick with a single hidden layer because it's mathematically proven to be enough? And then deep learning came along and upended everything. The same might be true here.
Really curious to hear more about using search to find a software which produces the network that then generalizes to problem set. This seems very reminiscent of what Francois Chollet was talking about. Learning on data to predict data seems like creating a "skillful" tool, But learning to predict a model which predicts the data seems to create an "intelligent" tool. Something that can figure out how to adapt to any new data.
Current AI models and architectures mostly seem to be built or designed for portability, as in using frameworks that allow people to build and deploy certain types of AI implementations relatively easily without any kind of exotic hardware. That architecture limits what can be done beyond purely mathematical or algorithmic behaviors which have limitations.
I propose a computing taxonomy for this point in time: classical computing, learned computing and quantum computing. Classical computing has to do with processing algorithms, and is required on the side at some capacity with both learned and quantum. Would this have any merits?
The paper by Siegelmann didn't show NNs were Turing universal... they were trying to prove super Turing computability by using real numbers. Sorry, you got it wrong there. Proving that NNs are Turing universal is trivial as Schmidhuber explained. But that does not mean NNs can be efficient at learning problems that are solvable by symbolic means. Not much to do there one argument with the other... Nice video.
Exactly! He didn't state the complexity side of the argument and that probably made it harder to swallow for some of the posters in the comment section (and for Tim as well judging from his puzzled expression, which was puzzling...)
"Proving that NNs are Turing universal is trivial". No, wrong! They are not Turing universal. He was only talking about practical problems. It is a quite an empty take, though, because programs made just by a long enough if-then-else can compute everything is practically computable. If x =1 then 10, else if x=2 then 32 else if x=3 then.... So what? A Turing machine is more abstract than a NN, because it can compress infinity in a finitely describable object. The amount of time it takes to perform the computations is determined internally, something which NN lack.
@@federicoaschieri By virtue of an NN being able to perform universal computation, it can also compress infinity to put it in your naive vocabulary. What you are missing, I guess, is that you can also feed a NN with more memory through the equivalent of a tape just as you do with a TM. A TM rule table is just as finite as the neuron's matrix table. No difference. The interviewer then perhaps did not know how to phrase his question, because he also got wrong what Siegelman's paper was about, hence probably also misunderstanding everything else. He even managed to get Schnmidhuber who seemed to take the interviewer word at face value when citing back Siegelman's paper. Now you also seem very confused, perhaps not knowing what they were talking about in the first place but also not knowing about the meaning of Turing computational equivalency. Siegelman's paper was, by the way, about hypercomputation, so not Turing universality but super Turing universality. No need for real number weights for Turing universality as the video suggests... Here one of her main papers: www.cs.princeton.edu/courses/archive/fall05/frs119/papers/siegelmann95.pdf
@@HectorZenil That's conceptually wrong. A NN doesn't have a "tape", LOL. The tape is in your imagination. Feedforward NN are linear time devices with respect to the number of neurons, for quick tasks: is this a cat, yes or no, or predicting statistically the next word. They are not trained as if they were the control unit of a Turing machine. That would be very different, you would train the NN to use a tape, so using their output in reality for internal processing. That's not what they do.
Despite the fact, that you (Tim) is probably busy to up over your ears, i should like to apply for an interview for the Danish media "POV-international" which I happen to write for as a journalist. There has been much debate about "the dangers of AI" and "the question of Athropomorfication (my god, I can not spell that word)" and those issues interest me about zero. However, what I think is really interesting, is the question of the present stage of the LSTM-models. What is it precisely, that they can NOT do? There are many sub-questions involved in this general question. Looking at Chat-bots such as GTP-4 or other machines, a major question concerns the actual use of these machines. For instance, the use of the machines happens through a prompt, which only allows the user to pose certain questions. A more difficult problem to address is the question of how the machine's understanding of the world is obtained through its training-data, and specifically how this "world-model-achievement" is reflected in the answers its gives (or the protocol of communication in more general terms). Yet another question is more concerned with the specific use of the machines. Since the machines are what we might call "general purpose machines", they tend to answer any question about the world and relations from such a generalised point of view. The most "commonly accepted explanation" seen from a statistical point of view is sometimes not interesting, except as for the "...a general point of vies on the subject is...". However, sometimes it seems possible to really "argue" with the machines, but only provided the ability of the user, to really know a lot about the given subject. In such cases the machines seem to actually "argue" or provide what we call an informed, critical discussion about a given subject. This again raises the question of the user and the usage of these machines, which is very different from the "stupid" questions of whether they think like humans or the question of "danger". The danger could be this: If we do not understand the exact limitations of the machines, the way that they are constructed (and trained) and exactly how to use them, we really have no practical use for them. Except of course for the very specific purpose, specifically trained machines can provide. So the question of general AI involves our use of the machine, or to put it bluntly: If you have no real insight in the subject you are investigating with the machines, how do you expect them to give good answers? Why does the machines rarely ask you back, what your preconditions for asking the question is, what your knowledge of the subject is? Of course it could be interesting to have a more detailled video, where you analyse for instance GTP-4 as you did in the "we got access to ... GTP3-video". We need more than anything to understand the exact limitations of these machines, and how to use them. And we (including the more general journalists as me) to communicate those limitations and directions of use to the general public.
This argument at 6:10 Once Jurgen pointed out you can make a NAND gate in a hidden layer 7:32 and you can add as many layers as you want plus recurrence, this idea that neural nets are inferior to laptops goes straight out the window. I'm a computer engineer, it's basic that all you need are NAND gates for any computer logic and even memory. Very inefficient , but quite obviously true - i.e. you can make AND, OR and NOT gates out of NAND gates and that's all there is to computers. (You can do it with NOR gates too, of course). Proof complete, QED, done, etc.
@@Ikbeneengeit That's a good point. They are equivalent but that does not mean there's a well behaved gradient to it. But the point still by holds, I think. It's just very hard to train ... 😁
Schmidhuber is awesome! I've actually covered some of his ideas on my channel like Driven by Compression: a simple principle explains motivation and intrinsic reward. The videos have been quite popular. I found his work through a guest on Lex Fridman's podcast answering the question of the meaning of life. Schmidhuber is such a vast intellect!
His arguments about RNNs being Turing Machines seem a bit weak, but if some of you guys have an informed and theoretical point about this, please answer to this comment. Also, he shows some bitterness towards attention mechanism, which is understable considering that LSTM are considered pretty outdated by most researchers due to their vanishing of gradient limitations. I would like the interviewer to ask Schmidhuber if he thinks that everybody is wrong except him, because nobody uses RNNs nowadays.
Weak? Not if you have a degree in CS I didn't know the argument (not an expert in NN) but I learned/proved in my first year of Uni that you can build any logic network with nands, your laptop is just a huge logic network, if a RNN can implement nands it can emulate your laptop. As simple as that, in fact I was embarrassed for Tim afterwards
1:18:12 llms/transformers seem to be pretty good on identifiying if a binary number is even or add, its actually quite simple, if the last number (LSB) of a binary number is 1, then the number is odd.
It is ok. Humans are just part of something bigger to come, just as bacteria are part of the human body and have a place in it, but complexity grows just because it can and is useful at every stage.
aitutorialmaker AI fixes this. Groundbreaking work in deep learning.
A gentle reminder at 59:57: We're all inherently capable of complex calculations, even if we can't explicitly articulate or write them down on paper. For instance, when a toddler grows into a kid and catches an apple, they're unconsciously calculating its trajectory and acceleration. This innate ability is a testament to our brain's capacity to process and understand physics, even without formal mathematical training. Just a humble reminder that may be .. fruitful.
Thanks for that
people are just /that/ (5)
their best is all people do (7)
we're /all/ blossoming (5)
@@moonrootbro what
@mattwesney bro asuh
the flowers we plant (5)
the coops we chickens construct (7)
you're brave, i like that (5)
read my reply
Hell yeah, we are all getting Schmidhubered today 😍
He is the 🐐
Yup. Today we're getting Schmidhubered so hard. I'm here for it.
🫡@@captain_crunk
You guys Jürgen off too?
A dialogue between Schmidhuber and Wolfram or Chaitin or Friston on algorithmic information theory, computational irreducibility, free energy principle and learning would be interesting and insightful.
I would pay to see it! 😀
Wolfram and Schidhuber will spend their whole conversation debating who discovered this and that first.
@@h.b.1285wolfram might be a cool guy and Schmidthuber might be shunned by academia as well, but I seriously doubt wolfram could hold schmidthuber the water regarding technical details. he often slips when it comes to these things, and Schmidthuber seems to arrogant to let these things slide
“Deep learning is all about depth.”
Surprisingly deep statement.
Didn't watch yet, but finally someone who talks ai, not science fiction
"ai, not science fiction"
that sounds as weird as "finally someone talks about self-driving cars, not science fiction". AI IS science fiction.
Actually, the reason why people/groups like Ilya, the NSA and others who are actually doing frontier and related deep learning development, which Jürgen is NOT, realize that its currently 1,500% per year perf/cost growth rate (and growing an additional 144% year-over year currently based on stacked exponential or what is called tetration based growth) is going to provide AGI and beyond starting in the coming year is that we are actually creating it and putting guardrails and other other protections on it.
Similarly, if you are wondering why Ilya started his own straight path to ASI (artificial super-intelligence) in 3 years company is that:
(1) OpenAI was about to bring on NSA as babysitters and investor/customers, and
(2) he already had designed the closed-loop synthetic data training and optimization system, which Strawberry and Orion are built on, and
(3) other than fine-tuning and evolving protection, the first AGI was 75% done by the time he left OpenAI.
As such, there are two groups, ones that know what's happening and (2) those who are going to be continually surprised by what happens.
@@TheManinBlack9054 Reality is different from Fiction. LLMs are different from SF AIs.
@@TheManinBlack9054😂❤uA
Whole lot of fluff in this comment 😅
'cos this is gold. Thank you Tim and crew.
MLST is just a permutation of LSTM. We got Schmidhuberd, and no one noticed. He can't keep getting away with this.
We appreciate this galaxy brain permutation symmetry generalisation, our previous model failed us
"Unlimited and infinite are two different things" ... is absolutely bloody right!
26:24
Interviewer: **mentions something someone said on Twitter**
Schmidhuber: Yeah, my mom said that in the 70's.
That's a whole new level of Schmidhubering!!
I love Schmidhuber's argument on practical computation machine. I think many CS people don't get it
WOW !!! he drops heavy wisdom so casually!
I added this to my Listen Later playlist so fast. Jurgen is the best
There are infinitely many Python programs no particular build of its interpreter can execute, and only finitely many it can execute, for the simple fact that there are architectural limitations in hardware and software, like the wideness of the internal source file offset counter. Even yet, before reaching that limit, there's the file system limitation on the size of files, irregardless of storage space. There are limits everywhere in any concrete hardware and software implementation of anything.
Jurgen talks in the simplest plain English - a hallmark of intelligence
He is quite an interesting character, with solid vision. He should come on media more regularly.
The joy Schmidhuber has when he talks about intelligence is affectionate and inspiring
Wow. The OG. The Don. Doctor I really did it first. Big daddy. No diddy. Such a fan. I was literally just re-reading his self-referential weight paper. I really hope he expands on that line of research, maybe improve upon test-time training layers.
One more nickname for the legend. Mr I can actually knock out all the opps 😂. Why is he so buff 😂. Cheers Doc.
Jokes aside, the greatest mind in modern deep learning. I truly enjoyed this conversation.
1:15:52 - Jurgen Schmidhuber, renowned researcher, sings "Baby Shark". Best ever!
Jürgen is a Legend
This is the best conversation on AI that i have ever witnessed.
So far the scaling laws crowd has been right and caught the rest of the ML community off guard, but curious to see who turns out to be right going forward!
lmfaooo they been struggling ever since
@@and1play5 Yeah, they're struggling really hard with their hundreds of billions of investment, accelerating the aggregated prediction of ML experts of when we'll get AGI by 18 years and casually smashing the Turing test 🤡
Been reading this guy's binding problem lately. He's got deep ideas. Thanks for the fantastic interview mate.
Ill be reading other work of his for the next days
Schmidhuber! So great to see him on this channel.
26:35 Even Mom Schmidhuber said it first about AI washing the dishes. This family is 🔥
Schmidhuber is a legend!
Imagine the two
Schmidhubber1🐐Friston 🐐 together on MLST 😍Please make it happen one day.
God bless you for your great work
Please continue doing what you are doing!
This is amazing!
Thanks!
His one of the best computer scientists i know he inspires me. Really loves his work and proud of his achievements.
I wish he could lecture me. I can listen to him for hours without end.
Part 2 when! Great conversation, love hearing JS talk about compression and practical algorithms
I have to say that Schmidhuber's AIT perspective has seriously shaped my perspective on computer science, physics, philosophy, and mathematics to the point where I find myself immediately framing my thought in terms of compression. It's really surprising that more AI Researchers don't explicitly develop the mathematics of compression.
I am aware of the vast literature on AIT and also the relation of its bounded version to complexity and cryptography, but compression seems so fundamental I have a suspicion that there lies a grand body of new insights lying just below the tip of the iceberg.
- Can compression be framed in a purely algebraic language, that is, is compression fruitfully viewable as a functor?
- Given two classes of formal objects A, B is there an interesting structure on the space of their compression functions [A, B]?
- Given a formal object C what is the structure of the classifying space of the compressions relative to that object?
- Is there a geometric aspect to compression? Assuming the points of our space are programs, how would a compression mapping be characterized? Maybe something like p' is a compression of p if p' is 'smaller' but every limit point of p' is the same as p?
I'm obviously just spiralling, still FASCINATING stuff.
Love your channel, greetings from Belarus!
It's fascinating to learn about neural and non-neural AI in depth! Anyone dealing with sophisticated machine learning must comprehend the nuances of AI models such as Transformers and LSTMs.
You again Schmidhuber !
deep conversation indeed ❤
I wish Schmidhuber would go ahead and figure out AGI so someone else can claim they discovered it 10 years from now.
Causality only works one way unfortunately
If there is one area where you cant say that is AGI
At the moment, my self-coding system switches between approaches to solve problems through real-time regulation of parameters. It must be possible to sub-optimize this in very small self-optimizing loops. Self-optimization with a Test-Driven Development approach can optimize an AI that switches between Artificial Neural Networks (ANNs), Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM), Gated Recurrent Units (GRUs), Transformers, Reinforcement Learning (RL), Generative Adversarial Networks (GANs), Support Vector Machines (SVMs), and Random Forests in real-time. By leveraging the strengths of each technique, we can create a robust AI system that adapts dynamically to different tasks and data types.
This is a really good talk. The part about heirarchical planning especially
The title which been searching for the entire while!, great work you guys
Oh Schmidhuber. I learn german by watching his lectures. Great scientist
What a legend of a man.
This channel is always bringing great interviews! Keep it up 👏 you will be at 500k subscribers by 2025!
Amazing that this was filmed at USI!
Just to state some inspirational things: We didn’t know how birds “worked” before creating a bird much faster (airplanes); and we are now able to connect lab grown Biological Human Neurons to Silicon.. the number of neurons needed; well.. a conscious AI with super intelligence is likely for this decade.
Great discussion on the challenges of backpropagation, especially in relation to high-frequency learning problems like parity rules. It seems intuitive that backprop is limited in frequency representation, akin to Fourier transform constraints. Parity learning, XOR, and other high-frequency patterns involve rapid oscillations that backprop struggles to capture. Evolutionary algorithms, random search methods, and gradient-free optimization (like Simulated Annealing) seem more adept at exploring the solution space in these scenarios. Specialized architectures, like LSTMs, can also help, but the trade-offs in efficiency are significant. Fascinating topic-thanks for touching on it!
Can we please get Schmidhuber in a room with John Carmack and Rich Sutton. He, and they, are clear-thinking pragmatists sincerely angling for the path through the hype. Keen-Huber Technologies... I'd buy stock in that.
Thanks MLST.
Whatever the conflicts about academic priority, it's pretty clear that Schmidhuber is a deeper thinker than LeCun, Hinton, Suleyman and their crowd.
By far most valuable pod on the internet. not even close.
Wisdom comes not only from knowledge but how you use it together with your logic and what's is happening in the present to reach the aha moment. Things have to move into a devine Fantasy.
Never heard of the guy but you sure got me hyped up.
The ad segway was pretty good too.
check the LSTM algorithm ;)
Amazing video! Waiting impatiently for the second part! Great, great, great content! Thanks for this!
A great talk! From my basic understanding, our most powerful supercomputers are only as good as the algorithmic models that run them. In fact, one could say that our computers are just ways to test code (or algorithmic or mathematical models of our world). In light of these discussions, Max Bennett's book, "A Brief History of Intelligence: Evolution, AI, and the Five Breakthroughs That Made Our Brains" is worth reading (so is my book: Origins of Life's Sensoria, but I won't promote it). When we search down these rabbit holes, we discover that there are different types of "intelligence" that include procedural, social, artistic (imaginative) and analytic to name only a few (it also seems like higher intelligence needs a self-critic). These all demand that we understand how intelligence evolved as elegantly described in Max Bennett's book. Behind this veil of understanding lies the problem of how we sense and connect to the world. This is where an understanding of how our senses and neurons work is critical to any future development of AGI. In the final analysis, it is how well our algorithms mathematically model our world (with or without including probabilities or fuzzy set theory, etc.). It could be algorithms all the way down instead of turtles!
He is wrong about Mamba at 1:20:20 because it does use hidden state. State space model is practically an RNN model, which can solve the parity problem.
True AGI would be a running continuously and be able to continuously learn new things the way a human can.
Jürgen is a living legend. Thanks !
Thank you soo much for posting a video about reality... I understand the math and on AI and its FAR from AGI... But marketing is pushing the fantasy...thanks for the dose of rational...😊
I wish I could know the right time to short the bubble!
@@LuigiSimoncini OMG no kidding..
The magic isn’t that we got a machine to think, the magic is in realizing there is no such thing as thinking. AI has shown us that there are no facts, only metaphors.
Very unsettling opening speech: "Transform the entire visible universe...." If that is not far fetched, I do not know what is. Human created AIs will likely do no such thing. Far more likely is that human society will self terminate before it reaches this level of technological sophistication.
Fully agree. We humans are arrogant and ignorant. Not a good combo.
Thanks. Looking forward to this one.
@ 0:50 That looks like a shroud in the background, I think you were sailing then, what kind of sailboat? I have a Capri 14.2 I just went sailing on a few days ago on a small lake near where I am at in Arizona.
Thank you for the talk with Jürgen Schmidhuber. It is fascinating, the November 1997, highly cited paper on LSTM and the discussion of "policy gradient for LSTM". It seems a lot of different terminology is used for effectively the same function (I am wondering if Proximal Policy Optimization - PPO, is similar, I have a lot to learn, as always).
Jürgen makes a great point on how to reconcile the different terminology is via the math.
Keep up the great work!
Awesome interview. Jürgen is a legend.
Mamba should be able to solve the parity problem, if we take state_t+1 = e^(pi*i*delta_x_t)*state_t + 0 with state_0 = 1, if delta_0 = 0 and delta_1 = 1, this would multiply the state by -1 when we encounter 1 and will leave the state as it is when we will encounter 0, at the end one will resemble parity and -1 will resemble odd number of 1s
Great conversation. I know some more content is forthcoming. Did you ask him questions about diffusion, which is the other "species" of AI that is making waves?
Can the channel or the community of commentors recommend any conversation of MLST where deep questions are asked about diffusion?
His vantablack hat is actually his brain compressing the rooms photons. Singularity confirmed!
Comparing LSTM uses to Baby Shark viewings is an innovation in itself. :-))
About the NAND gates argument (8:00), it's not that simple. ICs use them, but they also use synchronized sequential elements like flip-flops, which cannot be emulated in a neural net (unless you start making loops and taking a lot of care to synchronize them, which would make supervised deep learning much, much more complex). I'd also stop overusing the word Turing to express sequentiality, because it's more vague. As for predicting that the future could be more surprising than we think, that's no prowess as it's always been the case.
I found that argument very unconvincing anyway, given that it sounds just like a variation of the old "multi-layer perceptrons can approximate any function" argument to me.
@@unvergebeneid Multi layer perceptrons can approximate any smooth function pretty well though, it's a theorem (granted you let the number of perceptions be extremely large)
@@srivatsasrinivas6277 yes but we also saw that it doesn't mean much for our practical advances in the field. For how long did people stick with a single hidden layer because it's mathematically proven to be enough? And then deep learning came along and upended everything. The same might be true here.
Oh boy this is going to be a great one.
He sounds like the German scientist relative at the Walken Family Reunion
Really curious to hear more about using search to find a software which produces the network that then generalizes to problem set.
This seems very reminiscent of what Francois Chollet was talking about.
Learning on data to predict data seems like creating a "skillful" tool,
But learning to predict a model which predicts the data seems to create an "intelligent" tool.
Something that can figure out how to adapt to any new data.
Awesome work!
Current AI models and architectures mostly seem to be built or designed for portability, as in using frameworks that allow people to build and deploy certain types of AI implementations relatively easily without any kind of exotic hardware. That architecture limits what can be done beyond purely mathematical or algorithmic behaviors which have limitations.
Oh, it's You_again Shmidhoobuh
Masterpiece!
schmidhuber is simply the best
I propose a computing taxonomy for this point in time: classical computing, learned computing and quantum computing. Classical computing has to do with processing algorithms, and is required on the side at some capacity with both learned and quantum. Would this have any merits?
"It may take just a few more decades until we will have true human level AI."
01:25
This is from 2018 I think.
we have made it here!!!!
He looks like Owen Wilson in the thumbnail.
who is the interviewer ?
The paper by Siegelmann didn't show NNs were Turing universal... they were trying to prove super Turing computability by using real numbers. Sorry, you got it wrong there. Proving that NNs are Turing universal is trivial as Schmidhuber explained. But that does not mean NNs can be efficient at learning problems that are solvable by symbolic means. Not much to do there one argument with the other... Nice video.
Exactly! He didn't state the complexity side of the argument and that probably made it harder to swallow for some of the posters in the comment section (and for Tim as well judging from his puzzled expression, which was puzzling...)
"Proving that NNs are Turing universal is trivial". No, wrong! They are not Turing universal. He was only talking about practical problems. It is a quite an empty take, though, because programs made just by a long enough if-then-else can compute everything is practically computable. If x =1 then 10, else if x=2 then 32 else if x=3 then.... So what? A Turing machine is more abstract than a NN, because it can compress infinity in a finitely describable object. The amount of time it takes to perform the computations is determined internally, something which NN lack.
@@federicoaschieri By virtue of an NN being able to perform universal computation, it can also compress infinity to put it in your naive vocabulary. What you are missing, I guess, is that you can also feed a NN with more memory through the equivalent of a tape just as you do with a TM. A TM rule table is just as finite as the neuron's matrix table. No difference. The interviewer then perhaps did not know how to phrase his question, because he also got wrong what Siegelman's paper was about, hence probably also misunderstanding everything else. He even managed to get Schnmidhuber who seemed to take the interviewer word at face value when citing back Siegelman's paper. Now you also seem very confused, perhaps not knowing what they were talking about in the first place but also not knowing about the meaning of Turing computational equivalency. Siegelman's paper was, by the way, about hypercomputation, so not Turing universality but super Turing universality. No need for real number weights for Turing universality as the video suggests... Here one of her main papers: www.cs.princeton.edu/courses/archive/fall05/frs119/papers/siegelmann95.pdf
@@HectorZenil That's conceptually wrong. A NN doesn't have a "tape", LOL. The tape is in your imagination. Feedforward NN are linear time devices with respect to the number of neurons, for quick tasks: is this a cat, yes or no, or predicting statistically the next word. They are not trained as if they were the control unit of a Turing machine. That would be very different, you would train the NN to use a tape, so using their output in reality for internal processing. That's not what they do.
@@federicoaschieri the Dunning Kruger is strong in this one - Yoda Schmidhuber
Despite the fact, that you (Tim) is probably busy to up over your ears, i should like to apply for an interview for the Danish media "POV-international" which I happen to write for as a journalist.
There has been much debate about "the dangers of AI" and "the question of Athropomorfication (my god, I can not spell that word)" and those issues interest me about zero. However, what I think is really interesting, is the question of the present stage of the LSTM-models. What is it precisely, that they can NOT do?
There are many sub-questions involved in this general question. Looking at Chat-bots such as GTP-4 or other machines, a major question concerns the actual use of these machines. For instance, the use of the machines happens through a prompt, which only allows the user to pose certain questions.
A more difficult problem to address is the question of how the machine's understanding of the world is obtained through its training-data, and specifically how this "world-model-achievement" is reflected in the answers its gives (or the protocol of communication in more general terms).
Yet another question is more concerned with the specific use of the machines. Since the machines are what we might call "general purpose machines", they tend to answer any question about the world and relations from such a generalised point of view. The most "commonly accepted explanation" seen from a statistical point of view is sometimes not interesting, except as for the "...a general point of vies on the subject is...". However, sometimes it seems possible to really "argue" with the machines, but only provided the ability of the user, to really know a lot about the given subject. In such cases the machines seem to actually "argue" or provide what we call an informed, critical discussion about a given subject. This again raises the question of the user and the usage of these machines, which is very different from the "stupid" questions of whether they think like humans or the question of "danger". The danger could be this: If we do not understand the exact limitations of the machines, the way that they are constructed (and trained) and exactly how to use them, we really have no practical use for them. Except of course for the very specific purpose, specifically trained machines can provide. So the question of general AI involves our use of the machine, or to put it bluntly: If you have no real insight in the subject you are investigating with the machines, how do you expect them to give good answers? Why does the machines rarely ask you back, what your preconditions for asking the question is, what your knowledge of the subject is?
Of course it could be interesting to have a more detailled video, where you analyse for instance GTP-4 as you did in the "we got access to ... GTP3-video".
We need more than anything to understand the exact limitations of these machines, and how to use them.
And we (including the more general journalists as me) to communicate those limitations and directions of use to the general public.
why is he @USI isn't he supposed to be in KAUST?
Looking forward!
test of agi will be in the orion project
This argument at 6:10 Once Jurgen pointed out you can make a NAND gate in a hidden layer 7:32 and you can add as many layers as you want plus recurrence, this idea that neural nets are inferior to laptops goes straight out the window. I'm a computer engineer, it's basic that all you need are NAND gates for any computer logic and even memory. Very inefficient , but quite obviously true - i.e. you can make AND, OR and NOT gates out of NAND gates and that's all there is to computers. (You can do it with NOR gates too, of course). Proof complete, QED, done, etc.
But will gradient descent ever result in such a NAND configuration? If not, then such a NAND network can never arise.
@@Ikbeneengeit That's a good point. They are equivalent but that does not mean there's a well behaved gradient to it. But the point still by holds, I think. It's just very hard to train ... 😁
@@Ikbeneengeitit will for sure in unlimited (not infinite) time. The point was purely theoretical, he's not telling us to build laptops from RNNs
This is as good as it gets!
Great insights
Schmidhuber is awesome! I've actually covered some of his ideas on my channel like Driven by Compression: a simple principle explains motivation and intrinsic reward. The videos have been quite popular. I found his work through a guest on Lex Fridman's podcast answering the question of the meaning of life. Schmidhuber is such a vast intellect!
Your time stamps are off by a very significant amount. This makes it hard to navigate and return to while looking for certain talking points.
fucking awesome intro for an awesome dude
His arguments about RNNs being Turing Machines seem a bit weak, but if some of you guys have an informed and theoretical point about this, please answer to this comment. Also, he shows some bitterness towards attention mechanism, which is understable considering that LSTM are considered pretty outdated by most researchers due to their vanishing of gradient limitations.
I would like the interviewer to ask Schmidhuber if he thinks that everybody is wrong except him, because nobody uses RNNs nowadays.
nonsens
Weak? Not if you have a degree in CS
I didn't know the argument (not an expert in NN) but I learned/proved in my first year of Uni that you can build any logic network with nands, your laptop is just a huge logic network, if a RNN can implement nands it can emulate your laptop. As simple as that, in fact I was embarrassed for Tim afterwards
Fascinating talk.
1:18:12 llms/transformers seem to be pretty good on identifiying if a binary number is even or add, its actually quite simple, if the last number (LSB) of a binary number is 1, then the number is odd.
Wow... 😅 At least I got the comparisation with "Baby shark dodod do do"
The Godfather Jürgen
1:17:42 is how I'd imagine Jurgen talks in his sleep
intriguing
Top G.
Schmidhuber >> LeCun
Start at 3:39
It is ok. Humans are just part of something bigger to come, just as bacteria are part of the human body and have a place in it, but complexity grows just because it can and is useful at every stage.
Was MLST a clever LSTM reference? :-)