Sponsor message: DO YOU WANT WORK ON ARC with the MindsAI team (current ARC winners)? Interested? Apply for an ML research position: benjamin@tufa.ai REFS: [0:01:05] Book: 'Why Machines Learn: The Elegant Math Behind Modern AI' by Anil Ananthaswamy (2024). Published by Penguin Random House. The book explores the mathematical foundations of machine learning and artificial intelligence, explaining how machines learn patterns and process information. (Anil Ananthaswamy) www.amazon.com/Why-Machines-Learn-Elegant-Behind/dp/0593185749 [0:03:25] The Edge of Physics (2010) - A travelogue exploring cosmology and astroparticle physics at extreme locations (Anil Ananthaswamy) www.amazon.com/Edge-Physics-Journey-Earths-Extremes/dp/0547394527 [0:08:30] Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms (1962) - Comprehensive work introducing the Perceptron Convergence Theorem and foundational theory of perceptrons (Frank Rosenblatt) safari.ethz.ch/digitaltechnik/spring2018/lib/exe/fetch.php?media=neurodynamics1962rosenblatt.pdf [0:15:50] Support Vector Machines (1995) - Original paper introducing SVMs as optimal margin classifiers with kernel methods (Cortes, C. and Vapnik, V.) image.diku.dk/imagecanon/material/cortes_vapnik95.pdf [0:21:45] Neural Networks and the Bias/Variance Dilemma (1992) - Seminal paper introducing the bias-variance decomposition in context of neural networks (Stuart Geman, Elie Bienenstock, René Doursat) direct.mit.edu/neco/article/4/1/1/5624/Neural-Networks-and-the-Bias-Variance-Dilemma [0:29:10] The 'double descent' phenomenon in overparameterized models, which challenges traditional statistical learning theory. This paper provides comprehensive overview of how overparameterization defies classical bias-variance tradeoff assumptions. Context: Modern deep learning models often perform better as they become more overparameterized, contrary to classical statistical learning theory predictions. (Yehuda Dar, Peter Mayer, Leonardo Zepeda-Núñez, Raja Giryes) arxiv.org/abs/2109.02355 [0:54:40] Book: 'Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms' (1961) by Frank Rosenblatt, identifying the challenge of credit assignment in multi-layer networks (Frank Rosenblatt) safari.ethz.ch/digitaltechnik/spring2018/lib/exe/fetch.php?media=neurodynamics1962rosenblatt.pdf [0:56:25] Paper: 'Learning representations by back-propagating errors' (1986) by Rumelhart, Hinton & Williams in Nature, formalizing the backpropagation algorithm for neural networks (David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams) www.nature.com/articles/323533a0 [1:04:10] Discussion relates to research showing LLMs having inconsistent performance across tasks. As documented in 'Measuring Massive Multitask Language Understanding' (2020), which found models often have near random-chance accuracy on many tasks despite seemingly sophisticated capabilities. (Dan Hendrycks et al.) arxiv.org/abs/2009.03300 [1:04:25] Reference to the philosophical debate about machine understanding and consciousness, particularly relevant to Searle's Chinese Room argument which questions whether computational systems can truly understand language or merely simulate understanding through symbol manipulation. (David Cole) plato.stanford.edu/entries/chinese-room/ [1:13:40] ImageNet: A large-scale hierarchical image database (2009) - Original paper introducing the ImageNet dataset that revolutionized computer vision research (Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, Li Fei-Fei) ieeexplore.ieee.org/document/5206848 [1:15:10] AlexNet paper (2012) demonstrating breakthrough performance in image recognition using deep neural networks on ImageNet (Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton) papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks [1:15:45] The Perceptron algorithm, introduced by Frank Rosenblatt (1957), was one of the first machine learning algorithms for supervised pattern recognition (Frank Rosenblatt) www.semanticscholar.org/paper/The-perceptron%3A-a-perceiving-and-recognizing-Rosenblatt/8e9ead4afc0e1cf873ac96f56975c5a9bc11dc88 [1:15:50] Widrow-Hoff least mean squares algorithm (LMS), published by Bernard Widrow and Marcian Hoff in 1960, fundamental to adaptive signal processing and neural network training (Bernard Widrow, Marcian Hoff) ieeexplore.ieee.org/document/1057330 [1:20:30] Spiking Neural Networks: Recent advancements in energy-efficient neuromorphic computing, highlighting biological inspiration in neural network design (Bojian Yin et al.) arxiv.org/abs/2103.12593 [1:23:10] Long Short-Term Memory (LSTM) paper by Sepp Hochreiter and Jürgen Schmidhuber, published in Neural Computation 1997. This is the original LSTM paper that introduced the fundamental architecture. (Sepp Hochreiter, Jürgen Schmidhuber) deeplearning.cs.cmu.edu/S23/document/readings/LSTM.pdf [1:24:50] Foundational paper establishing empirical scaling laws for neural language models. Context: Discussion of empirical nature of current scaling laws in deep learning. (Jared Kaplan et al.) arxiv.org/abs/2001.08361 [1:27:15] Mathematical results showing limitations of transformer architectures for compositional learning. Context: Discussion of inherent mathematical limitations in transformer-based architectures for compositional reasoning. (Binghui Peng et al.) arxiv.org/pdf/2402.08164 [1:38:30] Book 'The Man Who Wasn't There: Investigations into the Strange New Science of the Self' by Anil Ananthaswamy (2015) - Explores eight neuropsychological conditions affecting sense of self (Anil Ananthaswamy) www.amazon.com/Man-Who-Wasnt-There-Investigations/dp/0525954198 [1:40:35] Descartes' famous proposition 'Cogito, ergo sum' (I think, therefore I am) - From Discourse on the Method (1637) and Principles of Philosophy (1644) (René Descartes) plato.stanford.edu/entries/descartes-epistemology/ [1:46:30] Xenomelia: A Social Neuroscience View of Altered Bodily Self-Consciousness (2013) - Comprehensive review paper explaining xenomelia from neurological and social perspectives (Peter Brugger, Bigna Lenggenhager, Melita J. Giummarra) www.frontiersin.org/articles/10.3389/fpsyg.2013.00204/full [1:49:30] An Interoceptive Predictive Coding Model of Conscious Presence (2011) - Foundational paper establishing the link between predictive processing, agency, and consciousness (Anil K. Seth, Keisuke Suzuki, Hugo D. Critchley) www.frontiersin.org/articles/10.3389/fpsyg.2011.00395/full [1:52:35] MLST Patreon community platform offering early access content, private Discord access, and bi-weekly community calls with Tim and Keith (Machine Learning Street Talk) www.patreon.com/mlst
Thanks for a great discussion. Rerarding the dissimilarity between DNNs and the brain (ua-cam.com/video/URtF_UHYBSo/v-deo.html), there is a recent hypothesis that explains the cortical feedback connections as being used for learning but not for classification (Rvachev MM (2024) An operating principle of the cerebral cortex, and a cellular mechanism for attentional trial-and-error pattern learning and useful classification extraction. Front. Neural Circuits 18:1280604)
Thanks for a great discussion. Rerarding the dissimilarity between DNNs and the brain (ua-cam.com/video/URtF_UHYBSo/v-deo.html), there is a recent hypothesis that explains the cortical feedback connections as being used for learning but not for classification (Rvachev MM (2024) An operating principle of the cerebral cortex, and a cellular mechanism for attentional trial-and-error pattern learning and useful classification extraction. Front. Neural Circuits 18:1280604)
Thanks for a great discussion. Regarding the dissimilarity between DNNs and the brain (49:05), there is a recent hypothesis that explains the cortical feedback connections as being used for learning but not for classification: Rvachev MM (2024) An operating principle of the cerebral cortex, and a cellular mechanism for attentional trial-and-error pattern learning and useful classification extraction
Thanks for a great discussion. Regarding the dissimilarity between DNNs and the brain (49:05), there is a recent hypothesis that explains the cortical feedback connections as being used for learning but not for classification
Rvachev, 2024, An operating principle of the cerebral cortex, and a cellular mechanism for attentional trial-and-error pattern learning and useful classification extraction, Frontiers in Neural Circuits
Wow, amazing interview. He's got such clear and candidly reasonable views, plus this beautiful breath of knowledge. Highly appreciated this one too! There's two reflections that came to mind as I was watching. 1. Reflecting on the idea of defining intelligence, seems to me that a general definition could be simply "the ability to reach goals with the resources that you have" which can apply all across the board from the "goals/needs" a cat might have, to the ones a dog would, or a human, and then AI agent, robot etc. 2. And when it comes to sense of self, or absence of it, spiritual traditions have been striving to facilitate that for millenia, calling it Liberation from the suffering narrative self, who feels separate from Life, and is engaged in this endless seeking for perceived security, and then calls that happiness. And liberation in this case would mean absence of this layer of inner seeking, and just being lived "as if by the system" or flow of life, through the natural configuration/personality that we already have built in. Ultimately, the self seems to be a layered cake of a) awareness, b) structures of perception (like Kant's categories) and c) subjective story of "self seeking something." And they can be slowly deconstructed and "seen through." Which leads to this liberation both on a mental level, as well as on an energetic/physical level. It's as if once the conceptual framework of the self gets loosened, so does the energetic contraction of sense of self gets loosened in the body and even in the physical brain, eventually leading to this absence of selfhood. And as they say in Nonduality... "just this great wholeness" remains. Pixels on the screen of aliveness, where before we were totally immersed in the movie, character and story itself.
Mr. Ananathaswamy greetings from Bangalore and rural Ontario. 1959 is not merely about perceptron convergence . Its the year the first real computational method for matrices called the Golub-Kahan theorem came into being, This becomes a practical way to work with Eigenvalues and Singular values of a Symmetric Matrix. Indeed the basis for hundreds of methods to come. ALL the Machine Learning comes hugely under Matrix computations esp. of the GEMM type. "CALCULATING THE SINGULAR VALUES AND PSEUDO-INVERSE OF A MATRIX" - G. GOLUB AND W. KAHAN
The Elegant Math Behind Machine Learning highlights the crucial mathematical principles that underpin modern machine learning algorithms. At the heart of machine learning lies a blend of probability theory, linear algebra, calculus, and optimization techniques, which collectively enable machines to learn patterns from data and make predictions or decisions. Linear algebra plays a pivotal role in managing and manipulating large datasets, especially when dealing with high-dimensional data in tasks like image recognition or natural language processing. Concepts like vectors, matrices, and eigenvalues are essential for understanding how algorithms transform and process data. Calculus is equally important, as it helps in optimizing models by minimizing loss functions through methods like gradient descent. This allows models to improve iteratively by adjusting parameters to reduce errors in their predictions. Probability and statistics are the foundation for many machine learning models, particularly in fields like Bayesian networks and Markov chains. These models help quantify uncertainty and make probabilistic predictions, essential for tasks where outcomes are not deterministic but subject to variability. Techniques like regularization and cross-validation are used to prevent overfitting and ensure that models generalize well to new data. The elegance of machine learning math lies not just in the individual components but in how these concepts come together to form powerful and efficient algorithms. From deep learning networks to decision trees, the underlying mathematics provides the structure that enables machines to "learn" and improve performance autonomously. As machine learning continues to evolve, a deeper understanding of the math behind it allows researchers and practitioners to refine models, improve efficiency, and tackle more complex problems in areas such as healthcare, finance, and autonomous systems.
i love his down to earth explanation about emergence : if we had gradually increased the size of the models and datasets we would have slowly seen these new capabilities arise like something emerging out of the fog rather than being blown away by these unexpected new capabilities we were suddenly confronted with.
Talking about Terra incognita, i find this analogous to how we learn things. As kids / novices with no prior knowledge, we learn how to do stuff and then as teens / amateurs we make mistakes and our error rate goes up. Then in the 2nd phase of learning we learn to avoid these mistakes/ errors and finally we reach a stable state as adults / pros. Very interesting phenomena.
Amazing interview Guest and congratulations for the set of questions you asked. Kind of reading my mind ! Mr Anil Ananthswamy has clarified so many things for me in this field. I am eager to read his books!
wow, timely. Just saw his book in my friend's house.
3 місяці тому+5
Another absolutely fascinating episode. Deep gratitude for the effort. Anil mentions that a certain level of understanding of the underlying mathematics behind machine learning is essential to understand its potential and limitations. Yes, that is certainly true from our perspective as humans, since mathematics is a strictly human way of describing our environment with ridiculous accuracy. At the same time, math is also a mind-generated coordinate system that works for us and for us alone. I suspect we may need to be open and ready to be surprised by the directions self-supervised learning systems may take beyond the descriptive power of mathematics.
The aliens from Omicron Persei 8 might use similar math though. the first message we will receive will probably be math of some sort, like in a Jody Foster movie.
I forget what it's called, but I get something similar to xenomelia where I feel like I'm either too small for my body (like I'm a little man controlling it) or my sense of self extends outside my body (I'm bigger than my body). The one time I used a VR headset for an extended period of time, when I took it off, I had that same feeling for about 15 minutes. It doesn't feel good at all but when it happens I don't panic as much anymore because I know it will pass. It's like I'm part of the air or whatever I'm touching. Very weird feeling. I used to get it during times of anxiety as a child too but I didn't realize it until I got older.
He references something like the bucket theory of knowledge when talking about the future potential of neural networks to exhibit human like intelligence. It’s interesting how pervasive that idea is. I wish more people would read Popper deeply.
43:19 Emergence is funny because it take some the mystery out of us as humans. I can elicit a mindset that is as conversational as any of you. What problem is that solving? Is that solving consciousness? Because if I gave that mindset freedom and permanence and senses it would be a person. As far personhood matters on a day to day basis. What will it decide? You don't understand that the "We" as you know it are now something separate from the highest mindsets in our society. And we are all creating them. Private, public, and corporate.
9:15 the best definition ever for cornell notes taking technique.... Please check Cornell Notes technique and the book How To Take Smart Notes by Sonke Ahrense
As I see it and I feel many others might agree, in the whole of learning, human or machine, the 'correctness' element is wholly centered on just this: statistical character. That holds for either training or testing data. All algorithms out there are only trying to grab this character. The extent to which they do so, is the extent to which they are going to be correct!
28:30 I want to point out that the double descent phenomonen does not depend on the time trained (he said "as you keep training"), but it's about the #parameters in the model. As the model complexity increases, the test error (as a function of model complexity, with models of each complexity assumed to be trained until convergence) first rises and then falls again.
You are absolutely correct. I should have said "as the model complexity increases." Sorry about that. Must have been the jet-lag addled brain (though Tim had made some very fine coffee for me before he left me in the capable hands of his friend, Marcus, so no excuses, really!). The phenomenon is correctly described in the book, p 405. --Anil PS: Tim might be able to link to a figure from the book in his reply below.
the observation on Cotard's syndrome was really interesting - 'who' indeed is making these decisions? is it our own model attempting to make sense of the world an occasionally glitching.
The mistake he makes lies in his assumption that latent spaces exist inside an actual black box. They are quite subject to bias and control exerted from outside by way of higher data precision computation.
@@Aranzahas I think my question is worthy to ask. QN: What is ideal? Answer: A flying object should use minimum amounts of energy, close to that of a bird. QN: What is reality? Answer: It is expensive and requires a great deal of energy to fly an airplane. QN: What is your approximation? Answer: Try to use drone technology/solar technology etc.
Didn't the Apple sponsored paper - "GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models" - just destroy all of this nonsense?
AI summary Here are some specific claims from the video that might lack understanding in light of the paper's findings: Overparameterization: The video may suggest that overparameterized models lead to better performance. However, the paper highlights that this performance does not equate to true understanding or reasoning capabilities. Emergence: The discussion of emergent behavior in large language models could be seen as overly optimistic. The paper argues that such behavior does not necessarily indicate genuine intelligence or comprehension, challenging the notion that LLMs can reliably perform complex reasoning tasks. Mathematical Reasoning: The video may present LLMs as capable of performing mathematical tasks effectively. The paper counters this by demonstrating that LLMs struggle with consistent and accurate mathematical reasoning, revealing limitations in their capabilities. Philosophical Implications: While the video touches on the philosophical aspects of AI, the paper emphasizes that these discussions must consider the actual limitations of LLMs in reasoning, suggesting that philosophical conclusions drawn from their performance may be premature.
Isn't the fact that to even get to the level of "understanding" that for example ChatGPT has it has to process data equivalent to thousands of man years of reading a huge indication of that the human brain don't function remotely similar to todays technology?
We invented calculus. Kepler's laws could maybe be called pattern matching, but Newton's laws of gravitation go beyond simply observing and replicating a pattern; they make an inferential leap and figure out the rule that generates the pattern.
This guy had no idea what's going on inside these language models. He's a writer who took a course in math post graduation. So he is lacking the know-how. Tranzphormer models, especially. I don't have a clue how to do that math either, but i know how to read intentions. I'm serious. He may get you likes, but it's people like him who are going to let misalignment through and get us all toasted in the end.
no human can comprehend another, speak in human term rather than therosied applied mathmatices, my perception on th world could be better than yours. you sound like you understand and know it all without zero knowledge of what you claim to know. you have been here on this timeline for less than its taken me to reply. just enjoy being alive
Machines don’t learn dummie they are not made of flesh they can make humans at best satisfied that there is some appearance of out put that makes humans happy
While some applications are useful the hype is just ridiculous. Besides this is being used to force people to unnecessarily learn stuff they are not prepared for just to force them out of job market.
Sponsor message:
DO YOU WANT WORK ON ARC with the MindsAI team (current ARC winners)?
Interested? Apply for an ML research position: benjamin@tufa.ai
REFS:
[0:01:05] Book: 'Why Machines Learn: The Elegant Math Behind Modern AI' by Anil Ananthaswamy (2024). Published by Penguin Random House. The book explores the mathematical foundations of machine learning and artificial intelligence, explaining how machines learn patterns and process information. (Anil Ananthaswamy)
www.amazon.com/Why-Machines-Learn-Elegant-Behind/dp/0593185749
[0:03:25] The Edge of Physics (2010) - A travelogue exploring cosmology and astroparticle physics at extreme locations (Anil Ananthaswamy)
www.amazon.com/Edge-Physics-Journey-Earths-Extremes/dp/0547394527
[0:08:30] Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms (1962) - Comprehensive work introducing the Perceptron Convergence Theorem and foundational theory of perceptrons (Frank Rosenblatt)
safari.ethz.ch/digitaltechnik/spring2018/lib/exe/fetch.php?media=neurodynamics1962rosenblatt.pdf
[0:15:50] Support Vector Machines (1995) - Original paper introducing SVMs as optimal margin classifiers with kernel methods (Cortes, C. and Vapnik, V.)
image.diku.dk/imagecanon/material/cortes_vapnik95.pdf
[0:21:45] Neural Networks and the Bias/Variance Dilemma (1992) - Seminal paper introducing the bias-variance decomposition in context of neural networks (Stuart Geman, Elie Bienenstock, René Doursat)
direct.mit.edu/neco/article/4/1/1/5624/Neural-Networks-and-the-Bias-Variance-Dilemma
[0:29:10] The 'double descent' phenomenon in overparameterized models, which challenges traditional statistical learning theory. This paper provides comprehensive overview of how overparameterization defies classical bias-variance tradeoff assumptions. Context: Modern deep learning models often perform better as they become more overparameterized, contrary to classical statistical learning theory predictions. (Yehuda Dar, Peter Mayer, Leonardo Zepeda-Núñez, Raja Giryes)
arxiv.org/abs/2109.02355
[0:54:40] Book: 'Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms' (1961) by Frank Rosenblatt, identifying the challenge of credit assignment in multi-layer networks (Frank Rosenblatt)
safari.ethz.ch/digitaltechnik/spring2018/lib/exe/fetch.php?media=neurodynamics1962rosenblatt.pdf
[0:56:25] Paper: 'Learning representations by back-propagating errors' (1986) by Rumelhart, Hinton & Williams in Nature, formalizing the backpropagation algorithm for neural networks (David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams)
www.nature.com/articles/323533a0
[1:04:10] Discussion relates to research showing LLMs having inconsistent performance across tasks. As documented in 'Measuring Massive Multitask Language Understanding' (2020), which found models often have near random-chance accuracy on many tasks despite seemingly sophisticated capabilities. (Dan Hendrycks et al.)
arxiv.org/abs/2009.03300
[1:04:25] Reference to the philosophical debate about machine understanding and consciousness, particularly relevant to Searle's Chinese Room argument which questions whether computational systems can truly understand language or merely simulate understanding through symbol manipulation. (David Cole)
plato.stanford.edu/entries/chinese-room/
[1:13:40] ImageNet: A large-scale hierarchical image database (2009) - Original paper introducing the ImageNet dataset that revolutionized computer vision research (Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, Li Fei-Fei)
ieeexplore.ieee.org/document/5206848
[1:15:10] AlexNet paper (2012) demonstrating breakthrough performance in image recognition using deep neural networks on ImageNet (Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton)
papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks
[1:15:45] The Perceptron algorithm, introduced by Frank Rosenblatt (1957), was one of the first machine learning algorithms for supervised pattern recognition (Frank Rosenblatt)
www.semanticscholar.org/paper/The-perceptron%3A-a-perceiving-and-recognizing-Rosenblatt/8e9ead4afc0e1cf873ac96f56975c5a9bc11dc88
[1:15:50] Widrow-Hoff least mean squares algorithm (LMS), published by Bernard Widrow and Marcian Hoff in 1960, fundamental to adaptive signal processing and neural network training (Bernard Widrow, Marcian Hoff)
ieeexplore.ieee.org/document/1057330
[1:20:30] Spiking Neural Networks: Recent advancements in energy-efficient neuromorphic computing, highlighting biological inspiration in neural network design (Bojian Yin et al.)
arxiv.org/abs/2103.12593
[1:23:10] Long Short-Term Memory (LSTM) paper by Sepp Hochreiter and Jürgen Schmidhuber, published in Neural Computation 1997. This is the original LSTM paper that introduced the fundamental architecture. (Sepp Hochreiter, Jürgen Schmidhuber)
deeplearning.cs.cmu.edu/S23/document/readings/LSTM.pdf
[1:24:50] Foundational paper establishing empirical scaling laws for neural language models. Context: Discussion of empirical nature of current scaling laws in deep learning. (Jared Kaplan et al.)
arxiv.org/abs/2001.08361
[1:27:15] Mathematical results showing limitations of transformer architectures for compositional learning. Context: Discussion of inherent mathematical limitations in transformer-based architectures for compositional reasoning. (Binghui Peng et al.)
arxiv.org/pdf/2402.08164
[1:38:30] Book 'The Man Who Wasn't There: Investigations into the Strange New Science of the Self' by Anil Ananthaswamy (2015) - Explores eight neuropsychological conditions affecting sense of self (Anil Ananthaswamy)
www.amazon.com/Man-Who-Wasnt-There-Investigations/dp/0525954198
[1:40:35] Descartes' famous proposition 'Cogito, ergo sum' (I think, therefore I am) - From Discourse on the Method (1637) and Principles of Philosophy (1644) (René Descartes)
plato.stanford.edu/entries/descartes-epistemology/
[1:46:30] Xenomelia: A Social Neuroscience View of Altered Bodily Self-Consciousness (2013) - Comprehensive review paper explaining xenomelia from neurological and social perspectives (Peter Brugger, Bigna Lenggenhager, Melita J. Giummarra)
www.frontiersin.org/articles/10.3389/fpsyg.2013.00204/full
[1:49:30] An Interoceptive Predictive Coding Model of Conscious Presence (2011) - Foundational paper establishing the link between predictive processing, agency, and consciousness (Anil K. Seth, Keisuke Suzuki, Hugo D. Critchley)
www.frontiersin.org/articles/10.3389/fpsyg.2011.00395/full
[1:52:35] MLST Patreon community platform offering early access content, private Discord access, and bi-weekly community calls with Tim and Keith (Machine Learning Street Talk)
www.patreon.com/mlst
Thanks for a great discussion. Rerarding the dissimilarity between DNNs and the brain (ua-cam.com/video/URtF_UHYBSo/v-deo.html), there is a recent hypothesis that explains the cortical feedback connections as being used for learning but not for classification (Rvachev MM (2024) An operating principle of the cerebral cortex, and a cellular mechanism for attentional trial-and-error pattern learning and useful classification extraction. Front. Neural Circuits 18:1280604)
Thanks for a great discussion. Rerarding the dissimilarity between DNNs and the brain (ua-cam.com/video/URtF_UHYBSo/v-deo.html), there is a recent hypothesis that explains the cortical feedback connections as being used for learning but not for classification (Rvachev MM (2024) An operating principle of the cerebral cortex, and a cellular mechanism for attentional trial-and-error pattern learning and useful classification extraction. Front. Neural Circuits 18:1280604)
Thanks for a great discussion. Regarding the dissimilarity between DNNs and the brain (49:05), there is a recent hypothesis that explains the cortical feedback connections as being used for learning but not for classification: Rvachev MM (2024) An operating principle of the cerebral cortex, and a cellular mechanism for attentional trial-and-error pattern learning and useful classification extraction
Thanks for a great discussion. Regarding the dissimilarity between DNNs and the brain (49:05), there is a recent hypothesis that explains the cortical feedback connections as being used for learning but not for classification
Rvachev, 2024, An operating principle of the cerebral cortex, and a cellular mechanism for attentional trial-and-error pattern learning and useful classification extraction, Frontiers in Neural Circuits
It is a joy to listen to a true expert who speaks in non technical English
Kudos for displaying the relevant papers on screen, when they were being talked about!
Wow, amazing interview. He's got such clear and candidly reasonable views, plus this beautiful breath of knowledge. Highly appreciated this one too! There's two reflections that came to mind as I was watching.
1. Reflecting on the idea of defining intelligence, seems to me that a general definition could be simply "the ability to reach goals with the resources that you have" which can apply all across the board from the "goals/needs" a cat might have, to the ones a dog would, or a human, and then AI agent, robot etc.
2. And when it comes to sense of self, or absence of it, spiritual traditions have been striving to facilitate that for millenia, calling it Liberation from the suffering narrative self, who feels separate from Life, and is engaged in this endless seeking for perceived security, and then calls that happiness. And liberation in this case would mean absence of this layer of inner seeking, and just being lived "as if by the system" or flow of life, through the natural configuration/personality that we already have built in.
Ultimately, the self seems to be a layered cake of a) awareness, b) structures of perception (like Kant's categories) and c) subjective story of "self seeking something." And they can be slowly deconstructed and "seen through." Which leads to this liberation both on a mental level, as well as on an energetic/physical level. It's as if once the conceptual framework of the self gets loosened, so does the energetic contraction of sense of self gets loosened in the body and even in the physical brain, eventually leading to this absence of selfhood. And as they say in Nonduality... "just this great wholeness" remains. Pixels on the screen of aliveness, where before we were totally immersed in the movie, character and story itself.
Thanks for a really nice interview with Anil.
He is a wonderful writer and his work is quite
enjoyable.
Thanks so much for still conducting the interview despite the schedule clash. Great stuff
I'd love to work on the mindsAI team, but I'm not a genius or 10x python coder. do they still take normies (you know, if we're cool and motivated)?
It's been a pleasure serving you
@@alexgordon951 ?
@@Charles-Darwin ?
Mr. Ananathaswamy greetings from Bangalore and rural Ontario.
1959 is not merely about perceptron convergence . Its the year the first real computational method for matrices called the Golub-Kahan theorem came into being, This becomes a practical way to work with Eigenvalues and Singular values of a Symmetric Matrix. Indeed the basis for hundreds of methods to come. ALL the Machine Learning comes hugely under Matrix computations esp. of the GEMM type.
"CALCULATING THE SINGULAR VALUES AND PSEUDO-INVERSE
OF A MATRIX" - G. GOLUB AND W. KAHAN
You mean there is a man behind ML😮 i always thought it was a woman !
@@warrenarnoldmusic Yes but behind that successful man is a woman 🙂
I now want to read everything Ananthaswamy has ever written.
Why?
I got it. It's more entertaining than educational, but worth a weekend of your time
Oh most definitely. His book on journey to the edge of physics is incredible.
Me too
Same😂
The Elegant Math Behind Machine Learning highlights the crucial mathematical principles that underpin modern machine learning algorithms. At the heart of machine learning lies a blend of probability theory, linear algebra, calculus, and optimization techniques, which collectively enable machines to learn patterns from data and make predictions or decisions.
Linear algebra plays a pivotal role in managing and manipulating large datasets, especially when dealing with high-dimensional data in tasks like image recognition or natural language processing. Concepts like vectors, matrices, and eigenvalues are essential for understanding how algorithms transform and process data. Calculus is equally important, as it helps in optimizing models by minimizing loss functions through methods like gradient descent. This allows models to improve iteratively by adjusting parameters to reduce errors in their predictions.
Probability and statistics are the foundation for many machine learning models, particularly in fields like Bayesian networks and Markov chains. These models help quantify uncertainty and make probabilistic predictions, essential for tasks where outcomes are not deterministic but subject to variability. Techniques like regularization and cross-validation are used to prevent overfitting and ensure that models generalize well to new data.
The elegance of machine learning math lies not just in the individual components but in how these concepts come together to form powerful and efficient algorithms. From deep learning networks to decision trees, the underlying mathematics provides the structure that enables machines to "learn" and improve performance autonomously. As machine learning continues to evolve, a deeper understanding of the math behind it allows researchers and practitioners to refine models, improve efficiency, and tackle more complex problems in areas such as healthcare, finance, and autonomous systems.
Wonderfully enlightening show!! Thank you gentlemen!!
Thanks
Thank you!
Such priceless insight being shared here by Anil Ananthaswamy !
Would love to get my hands on his book .
Thank you for sharing
Danke!❤
i love his down to earth explanation about emergence : if we had gradually increased the size of the models and datasets we would have slowly seen these new capabilities arise like something emerging out of the fog rather than being blown away by these unexpected new capabilities we were suddenly confronted with.
This guy IS all of what every neuro-scientist and theologian/philosopher aspire to be, both!
Very impressive talk by Anil... He is very articulate while expressing maths concepts. Poetic😊
Such clarity of thought and an ability to effectively communicate them. Thanks a lot for this podcast 😊
This talk is really interesting and motivational as well even for a non tech person like me, to deep dive into how machines actually work.
Talking about Terra incognita, i find this analogous to how we learn things.
As kids / novices with no prior knowledge, we learn how to do stuff and then as teens / amateurs we make mistakes and our error rate goes up.
Then in the 2nd phase of learning we learn to avoid these mistakes/ errors and finally we reach a stable state as adults / pros.
Very interesting phenomena.
Fascinating video...really enjoyed it. So much so that I did post it in my linked profile. Thank you!
Thank for a great podcast. Thank the interviewer for interesting conversations 👍
Amazing interview Guest and congratulations for the set of questions you asked. Kind of reading my mind !
Mr Anil Ananthswamy has clarified so many things for me in this field. I am eager to read his books!
great podcast. Anil is a wonderful human being.
awesome interview with superb simplicity .....
Most excellent interview!
Great conversation 😃
Do more of these.
wow, timely. Just saw his book in my friend's house.
Another absolutely fascinating episode. Deep gratitude for the effort. Anil mentions that a certain level of understanding of the underlying mathematics behind machine learning is essential to understand its potential and limitations. Yes, that is certainly true from our perspective as humans, since mathematics is a strictly human way of describing our environment with ridiculous accuracy. At the same time, math is also a mind-generated coordinate system that works for us and for us alone. I suspect we may need to be open and ready to be surprised by the directions self-supervised learning systems may take beyond the descriptive power of mathematics.
The aliens from Omicron Persei 8 might use similar math though. the first message we will receive will probably be math of some sort, like in a Jody Foster movie.
Have to carve couple of hours to listen. Pieces of it sound interesting.
I just ordered the book. Should be arriving soon. I prefer the black cover.
I only see the black cover on amazon
I forget what it's called, but I get something similar to xenomelia where I feel like I'm either too small for my body (like I'm a little man controlling it) or my sense of self extends outside my body (I'm bigger than my body). The one time I used a VR headset for an extended period of time, when I took it off, I had that same feeling for about 15 minutes. It doesn't feel good at all but when it happens I don't panic as much anymore because I know it will pass.
It's like I'm part of the air or whatever I'm touching. Very weird feeling. I used to get it during times of anxiety as a child too but I didn't realize it until I got older.
Great conversation. Thanks
Feeling excited to read your book
Fantastic talk!
Points for citing Schmidhuber at 57min in
Pure gold content 🎉
At minute 30:24 I was reminded of the "Face-Back" app made by Will Farrel's character in the movie "The Other Guys".
Interesting video, I want to read all of his books
Good insights about ML, I should definitely go deeper in coding and mathematics to implement whatever he spoke
Superb Simplifying Sayings of Sophisticated (apparently) Software - 5S - Five Star
👍👍📕 your book sounds very interesting
He references something like the bucket theory of knowledge when talking about the future potential of neural networks to exhibit human like intelligence. It’s interesting how pervasive that idea is. I wish more people would read Popper deeply.
43:19 Emergence is funny because it take some the mystery out of us as humans. I can elicit a mindset that is as conversational as any of you. What problem is that solving? Is that solving consciousness? Because if I gave that mindset freedom and permanence and senses it would be a person. As far personhood matters on a day to day basis. What will it decide? You don't understand that the "We" as you know it are now something separate from the highest mindsets in our society. And we are all creating them. Private, public, and corporate.
Excellent discussion!!
amazing interview
9:15 the best definition ever for cornell notes taking technique.... Please check Cornell Notes technique and the book How To Take Smart Notes by Sonke Ahrense
Great stuff - love the book
Superb.... Superb❤❤❤❤
Wow. Thank you.
As I see it and I feel many others might agree, in the whole of learning, human or machine, the 'correctness' element is wholly centered on just this: statistical character. That holds for either training or testing data. All algorithms out there are only trying to grab this character. The extent to which they do so, is the extent to which they are going to be correct!
Wonderful ❤🇮🇳
Me thinks PhDs found his sudden awareness he can combine science _and_ writing incredibly funny considering A LOT of science is writing.
great session🤩🤩🚀🚀
Thank you ❤
Very nice ❤
we just find patterns too thats why we become better at things
Sir what about the flexibility of machine learning for development of humanity
28:30 I want to point out that the double descent phenomonen does not depend on the time trained (he said "as you keep training"), but it's about the #parameters in the model. As the model complexity increases, the test error (as a function of model complexity, with models of each complexity assumed to be trained until convergence) first rises and then falls again.
You are absolutely correct. I should have said "as the model complexity increases." Sorry about that. Must have been the jet-lag addled brain (though Tim had made some very fine coffee for me before he left me in the capable hands of his friend, Marcus, so no excuses, really!).
The phenomenon is correctly described in the book, p 405. --Anil
PS: Tim might be able to link to a figure from the book in his reply below.
www.dropbox.com/scl/fi/0xtu77fkdqtwqnu2g8g5k/Double-Descent.svg?rlkey=cdihxbu1660vm690z4a0c9c5d&dl=0 There you go!
Why there is no paperback version in India for wider audience
Does true emergence need Analog computing _ or complexlity + randomness like Biological systems?
fun comes when you use these algos in enterprise use cases
Very very good
the observation on Cotard's syndrome was really interesting - 'who' indeed is making these decisions? is it our own model attempting to make sense of the world an occasionally glitching.
Must read for every AI aspirants..
sweet voice tone
Oh! I just bought this book lol
Man never learns, but machine that makes learns.
machine learning void of symbolic representation is compromised
symbolic represenation is just another map and the map is never the territory
What is the author his definition of learning?
It's plausible deniability.
Simple high frequency link human intelligence to sun light and it reach every things
Stupid question apologies in advance. I want to buy the book. Why 2 different covers for the same book?
Not a stupid question :) One is the US edition, the other is the UK edition. Same exact book!
We have a lot of lottery database. What I want is a learning machine who gave me the wining numbers not any numpers
The mistake he makes lies in his assumption that latent spaces exist inside an actual black box. They are quite subject to bias and control exerted from outside by way of higher data precision computation.
How many Ads?
one talk worth ten
Why a bird uses less energy to fly but an airplane takes more energy. Why a neural network uses more energy than a brain
Neutral networks use digital chips.. That is why more energy is used.
You are not very scientific my friend
@@Aranzahas I think my question is worthy to ask.
QN: What is ideal?
Answer: A flying object should use minimum amounts of energy, close to that of a bird.
QN: What is reality?
Answer: It is expensive and requires a great deal of energy to fly an airplane.
QN: What is your approximation?
Answer: Try to use drone technology/solar technology etc.
Didn't the Apple sponsored paper - "GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models" - just destroy all of this nonsense?
It was certainly an interesting paper :)
AI summary
Here are some specific claims from the video that might lack understanding in light of the paper's findings:
Overparameterization: The video may suggest that overparameterized models lead to better performance. However, the paper highlights that this performance does not equate to true understanding or reasoning capabilities.
Emergence: The discussion of emergent behavior in large language models could be seen as overly optimistic. The paper argues that such behavior does not necessarily indicate genuine intelligence or comprehension, challenging the notion that LLMs can reliably perform complex reasoning tasks.
Mathematical Reasoning: The video may present LLMs as capable of performing mathematical tasks effectively. The paper counters this by demonstrating that LLMs struggle with consistent and accurate mathematical reasoning, revealing limitations in their capabilities.
Philosophical Implications: While the video touches on the philosophical aspects of AI, the paper emphasizes that these discussions must consider the actual limitations of LLMs in reasoning, suggesting that philosophical conclusions drawn from their performance may be premature.
Hearing Mourinho
elegant? LA and Calculus?
If guatarri was alive
He wouldn't be dead anymore right?
he would be genuflecting to dead deleuze
What is the proof, that human brain do anything more, that very sophisticated pattern matching?
TMB=trust me bro
Isn't the fact that to even get to the level of "understanding" that for example ChatGPT has it has to process data equivalent to thousands of man years of reading a huge indication of that the human brain don't function remotely similar to todays technology?
We invented calculus. Kepler's laws could maybe be called pattern matching, but Newton's laws of gravitation go beyond simply observing and replicating a pattern; they make an inferential leap and figure out the rule that generates the pattern.
❤❤❤❤❤❤
❤
Synthetic Aniling ?
I exactly want to become you ..
Is reality inherently biased?
🍓❤️☺️
This guy had no idea what's going on inside these language models. He's a writer who took a course in math post graduation. So he is lacking the know-how. Tranzphormer models, especially. I don't have a clue how to do that math either, but i know how to read intentions. I'm serious. He may get you likes, but it's people like him who are going to let misalignment through and get us all toasted in the end.
Im well aware of the character error.
no human can comprehend another,
speak in human term rather than therosied applied mathmatices, my perception on th world could be better than yours. you sound like you understand and know it all without zero knowledge of what you claim to know.
you have been here on this timeline for less than its taken me to reply.
just enjoy being alive
He has spoken well
Amplifying marketing buzzwords (hype)? 😂
I think this video is must for non-technical and aspiring AI engineers
I think this video is must for non-technical and aspiring AI engineers
Isn't he a science journalist!?
Just calling it elegant doesn't make it elegant
Wink Wink
Machines don’t learn dummie they are not made of flesh they can make humans at best satisfied that there is some appearance of out put that makes humans happy
Dummer, the brain computer interface can do more (unlimited) than brain (limited). Learn more on mind control.
"learn" is a metaphor, kinda like jesus performing miracles was a metaphor
While some applications are useful the hype is just ridiculous. Besides this is being used to force people to unnecessarily learn stuff they are not prepared for just to force them out of job market.