Firstly, I wanted to say--I can't believe I just realized I wasn't subscribed to your channel; this mistake has been rectified! Secondly, I could easily write an essay (or more accurately, a love letter) about this channel: there are very few insightful AI channels on UA-cam, a few mediocre ones, and the rest. This channel, without a doubt, is in a league of its own. As an engineer, when I see a new MLST video posted, it's like sitting down for a mouthwatering gourmet meal, after being forced fed nothing but junk food for weeks on end. Finally, with that said, allow me to attempt a justification of my admiration: 1) Guests: You always have amazing guests! Your interview style never fails to engender engaging, thoughtful, and most of all - fun conversations! In this video in particular, Simon seems like he’s having a blast speaking about something he’s passionate about, and that enthusiasm genuinely put a smile on my face. 2) Editing: The videos are always well put together and the production value is always phenomenal! I mean, wow… Compared to the other AI channels on UA-cam, Machine Learning Street Talk makes them look like amateurs. 3) Knowledge: Most other channels seem content to merely discuss the latest ML hype as it happens in real-time; this is fine, and most aren’t objectively wrong, however, it's mostly surface level discussion and smacks of novice insight. They are, for lack of a better description, an animated news feed. With the exception of Yannic, MLST is the only other mainstream channel I’m aware of with a solid academic pedigree, and it's palpable. I’ve been completely starving for this kind of more in-depth/rigorous discussion. I can only speak for myself, but I imagine there are many who come from a STEM/technical background who feel the same way, so thank you on our behalf. Keep up the great work!
I love the overall aesthetic and the production value. I think many technical channels don’t realize that even for highly technical subject matter, these things are incredibly important for building a large audience that keeps coming back. I also like the way on some episodes you seamlessly edit and link together clips from other interviews; my only criticism is that sometimes when you do that the narrative thread that ties them together is not always made sufficiently clear. If you could include some quick, unobtrusive means of showing the viewer why those clips are being edited in and what the thread is, I think that would significantly up the utility and educational value people get out of it. Kind of like a mind map, but within your videos.
I'm the author of the MNIST-1D dataset (discussed at 1h15). Thanks for the positive words! You do an excellent job of explaining what the dataset is and why it's useful. Running exercises in Colab while working through the textbook is an amazing feature.
Simon taught the first semester of my second year "Machine Learning" module at university! really nice man, we used this book as the module notes. He was very missed when he left in second semester and the rest of the module was never able to live up to his teaching.
Hello.... I'm a med student and not so proficient in math... Please could you list a comprehensive math tools requirement to be able to understand DL? I'LL BE RIGHT ON IT!🙏😊 @@amesoeurs
@@Cognitivecode7 you need a good understanding of probability and vector calculus first. you need to take those courses and have a very solid understanding. then look to mathematics for ML book (deisenroth). then you're good to go
The overparametrization conundrum may be related to the fact we look at what NN are in a wrong way. To me NN is not a "processor" type of object, it's a novel type of memory object, memory which stores and retrieves data by overlaying them on top of one another & also recording the hidden relations that exist in the data set. This is what gets stored in the "in between" places even if the input resolution is low - the logic of coexistence of different images (arrays), which is something not visible on the surface. I'm a philologist by training, and in XX cent. literature there was this big buzz around the concept of "palimpsest". Originally palimpsests were texts written on reused parchment from which previous text were scraped off with a razor. Despite scraping, the old text still remained under the new one, which led to having two texts in the same space on the page. In literature this became a conceptual fashion of merging two different narratives into one, with usually very surreal effect. One of the authors that comes to mind is William S.Burroughs. In the same way merged narratives evoke a novel situation due to novel logical interactions between the inputs, the empty space in an overparametrized NN gets filled with the logic of the world from which the input data comes & this logic exists between them even when the resolution is low. Maybe NN is a Platonic space. Many images of trees somehow hold in them the "logic of the tree", which is something deeper and non obvious to eyes, since in their form alone converge both the principles of molecular biology & the atmospheric fluid dynamics, ecosystemic interactions, up to the very astronomical effects of sun, moon, earth rotation, etc. All of it contributes to this form in one way or another, so the form reflects those contributions & therefore holds partial logic of those interactions within it. Information is a relation between the object and it's context (in linguistics we say - it's dictionary). A dataset not only introduces objects, it also as a whole becomes a context (dictionary) through which each object is read. In that sense maybe upscaling input data sets prior to learning is detrimental to the "truth" of those relations. I would be inclined to assume we'd be better off if we let the NN fill in those spaces based on the logic of the dataset, unless we want the logic of transformations to somehow influence the output data (say - we specifically are designing upscaling engine).
It’s interesting that in Integrated Information Theory consciousness literally is a super-high dimensional polytope (with every dimension corresponding to a whole system state in an integrated network) in an abstract space called Qualia space.
The way I don't understand deep learning is that it is a statistical power modulated by randomness to emulate reasoned speech, but it is really the top 3-5-7 reasonable speeches selected randomly, at every word. So in theory whatever the AI says, it should not be able to say it twice unless you tweak its parameters and taken away the randomness (temp) it will always repeat the same thing. It's good at emulating speech that gives the resemblance of an intelligent articulation, but it is indeed the syntax and vocabulary (data) placed in a statistical congruent manner that gives that illusion. It's like a super sales guy, it will talk very well, but there would be no substance in his apparent passion.
It's still an open question though how different this is from what humans do. What if human brains operate on a similar principle, of learning patterns, activating a subset of them based on contextual input, and then selecting from them via some noisy sampling? That's what really bakes people's noodles. The biggest difference, in my mind, is that most ML objective functions are explicitly and statically evaluated and human ones are implicit in the effect of dynamical chemical processes on learning rates or whatever hyperparameters govern organic learning. Reinforcement learning approaches hint at what more human-like ML systems could look like.
Yes these models need a world models not only statistical but reasoning with self-learning that improve relevance . For the moment it's just a well educated salesman but in a very very narrow way.
@@franzwollang Yes and your point is compelling. If anything the very fact that humans are learning while they infer or confer makes them already a different beast. As we can cogitate and plan, even while not talking, we "defrag" out knowledge. And if we do confer with someone else, we have even faster learning curves. While I am writing this, I am thinking about expressing my knowledge of the AI, and it is a composite knowledge, made of essential theoretical knowledge, psycho-plasticity while using the AI and inference matured from black-box prompting various AI over the last years. So yes compared to us its learning is uni-dimensional and purely linguistic while we have a convergence of learning mechanisms working together, all the time. Currently there is so much to take on in AI that I have a dozen of half baked stuff opened, but ideally your very inference should automatically fine tune your bot, openpipe is trying to do such a thing, but ideally, it should be ported to the unsloth engine, as openpipe uses openai and it's going to cost you a boat load of money to run anywhere near good between inference, dataset generation and loads of fine-tuning sessions.
Cool thoughts on digging into the nitty-gritty of deep learning frameworks. The connection between language models and our brains, especially in Transformers, really makes you think. Checking out how things stay consistent inside and finding ways to boost brainpower raises some interesting questions. Looking forward to diving deeper into these fancy concepts!
I just got into deep learning on my own. Every time I watch this channel its like thinking you know something, and then learning that what you think you know is only a grain of sand on a beach, with each grain having its own universe, and infinite beaches....subscribed.
Here you have a guy on the hook, I love how you throw these common buzzwords, like emergent agency, set phenomena at him and let him sort it out, which he does in a way that he gives me the feeling of actually understanding, really nice stuff
Just know many worker bees buzzing about, only at best regurgitating someone else who barely can regurgitate, who couldn't have ever regurgitated THAT well, before this explosion of mass regurgitation on the topic. I want to say hiding, but it's an inflated confidence, but unfortunately and ironically we have not many tools to properly scrutinize people's ability to zero in on semantically sophisticated buzz words and phrases, and they can very well be trying to convey 'something' And have some picture, cartoon or otherwise, in their head that they think accurately corresponds to what they're conveying... But Nobody ever figures out what they're talking about And later down the road day, even They realized they didn't understand what the f*** they were talking about. Frankly. I think you take just the 2 of these men.We're both very smart but I'm sorry.I've been watching this host for a while.And you know it's that same that same number that Differentiates these two men, Is the host Doesn't actually know what he's saying after He says it He doesn't know before either.By the way he never knows He's just going off the hip, The guest trys very hard and I don't know why but Quite a bit of his struggle is trying to Mitigate what the host doesn't understand what he's conveying and trying to bridge the gap into something coherent once again
1:13:45 Three-Dimensional Orange (Volume): For a regular three-dimensional orange, which we can approximate as a sphere, the volume is calculated using the formula 4/3 * pi r^3 Four-Dimensional Orange (Hypervolume): In four dimensions, an object analogous to a sphere is called a "hypersphere." The formula for the hypervolume of a 4D hypersphere is 1/2 * pi^2 r^4
My favorite analogy to describe the divergence of ML & brain science is the casus of airplanes. Flying was for a long time reserved for certain products of evolution & at the beginning engineers were imagining themselves copying forms and behaviors of birds to understand them. But in the end engineering & abstract understanding of physics behind the flying is what gave us technology that applies those principles, usually in ways very strange to nature itself. Only in recent years, due to modeling tools that came out of engineering, people begun to model bird-like, or insect-like machines & understanding how nature does it. I like to believe ML will develop what I call "scientific principles of thinking", an overview of complex information processing techniques that some decades down the line will inform us on how we should look at our own brains to understand them. But just like with artificial bird wings, artificial human-like brains, will be nothing more then work of hobbyists and proof-of-concept academic research, since the engineered ML solutions will be that more powerful and tailored to specific tasks. Not just neuro-mimicry, but the very "general intelligence" - as it is a great narcissistic dream - is also a rather useless engineering solution. What engineering problem does it solve anyway? Pretending to be human? So this would mean the current world somehow does not want humans to exist. Which maybe is the real sinister problem we must address before we "provide solutions" to it.
Sounds like there are as many questions as answers at this point - looks like a great book with plenty of instructive graphics - look forward to reading it....cheers & happy Gnu year!
I have only been deep learning research and development for 9 years, and if you really wish to understand those fundamental aspects of deep learning that you believe nobody understands how and why deep learning works, I would love to help you with obtaining that understanding. I can explain it from an analytical, geometric or other perspective as you find most useful.
@@jonathanjonathansenIn any case it would be algebra, calculus and statistics, genius. Precisely what you just said absolutely invalidates your answer, because you have the slightest idea. Literally everything is algebra. If you don't understand that, you don't have the slightest idea about mathematics.😂😂
the video is a textbook example of the best traditions of British science ie being objective, no nonsense and honest intellectually. At one point the tech was summarized as Modelling Probability Distributions in a multidimensional space and universal function approximation and hence having nothing to do with "thinking", with which i fully agree (as a professional software engineer). What was however shocking to see towards the end of the video was that the professor (despite the spot on tech summary i just provided), then went into a completely unfounded scifi statements about doctors, lawers and enginners (and even greeting cards designers :-)) losing their jobs to the tune of 800 million (or was it 80 million whatveer). I cant comprehend how the professior managed to reconcile these two things in his head (non GAI nature of the current tech, GAI still a pipedream, no real thinking/intelligence in the current tech just deriving and modeling probabilities from the data to fit patters in it) and that replacing "knowldgfe workers" by the tone
Good point and thank you for your kind words. I guess I reconcile these things because I think the technology we have already (even if there was no more significant development) might be enough to cause massive job losses. It could make many individuals much more productive and that would mean most companies would need fewer people. For example, I used Grammarly to proof my book. That was 2 months of proofreading work for someone just gone... Happy to be proved wrong about this though!
It's not science fiction that Dall-E can take a sketch, generate a rendered image in a specific style and then offer up 10 variations which in turn can be sold (as greeting cards in this example). If using Dall-E is more cost effective than employing artists doing the same thing fewer artists will be able to make money creating greeting cards. This has nothing to do with AGI and everything with economics. The same thing can be said about lawyers who, typically, sift through data, apply their knowledge about laws and precedents and generate missives that are used to argue in favour of their clients. If any step in that process can be replaced by, say, GPT4 and make it more cost effective, fewer lawyers will be able to make money offering that service. No AGI, just economics. So, I see no problem with his stance on AGI, or lack thereof, and him accepting a prediction of workers losing their employment when non-AGI technology is being used in more and more sectors.
Your question whether ChatGPT does anything is a question about agency or freewill. But the real question is whether human brain has any freewill or agency?
Thank you for another excellent conversation! I really loved the discussion of the practical, grounded ethical concerns - I hope y'all consider having more ethicists on the show!
Great job. One small thing I disagree with is your assertion the “we don’t know why deep learning works” (and many other related things you repeat that assertion on). As someone who has worked in AI for over 3 decades, and deep learning for more than a dozen years, I can say that there are many who understand all of those things stochastically, canonically, theoretically, and practically.
Go on then... I'll bite. I've never heard a convincing explanation of why networks have to have >10 layers to get good performance, and no-one that I have ever asked has claimed to know. Can you explain this to me?
"reflecting it like a crazy house of mirrors into the rest of the space"...Darren Aronofsky pioneered some of the thoughts in this direction 26 years ago. Time for me to watch it again :)
Please interview David Deutsch. He will explain why entire paradigm of intelligence continuum is wrong. He will also explain why humans (and GIs) can understand everything given enough computing resources. Every AI expert is wrong because they do not understand epistemology but David does.
I am more and more convinced that the inscrutibly large matrices ubiqitous to machine learning are a black-box interface to a mechanism situated elsewhere and unseen in our computational domain--which potentially employs an entirely different kind of logic to generate the suspicious efficiencies we observe now. As Dr. Waku put it on his channel; we are actually making API calls to a more elaborate machine.
One of the best videos I've seen regarding the topic. I hold most of the Professors views on the matter so its refreshing to see not everyone in the ML community drank the A.I cool aid. Though I am a lot more pessimistic in that I think we can't slow down or put the genie in the bottle and the effort is wasted. So enjoy the cool new tech while you can enjoy anything...
> me, who know nearly nothing about the theory of all of this stuff, but has implemented an image classification network hearing him talking about 'trying to push softmax functions to infinity' I get that reference.
this stuff been used in consumer products since Haar Cascade Classifier became a thing in the early 2000s , now we have more processing power and software that makes it easier to work with this technology.
practical deep learning: 1. pull the model 2. tune tune 3. apply there is nothing interesting here - only trivial and ultimately hard tasks, but mayby 1% of companies builds their own models
so it sounds like the old trope about QM (quantum mechanics) can be applied here: "If you think you undertsand deep learning, you don't understand deep learning"
1:13:26 Figure 18.3a explains [to me] why we call it diffusion. I guess the hypothesis goes, as long as you take a small enough step size, you'll stay within the "[conditional] distribution" q(z)(z_t | x*) when iterating on the diffusion kernel q(z_t, x*), i.e. the dashed cyan border representing the time-evolving distribution bounds. Anyone here thinks this diffusion process looks kinda like the stock market? Where we have piece-wise linear dumdums jump out with their limit orders to steer q(z_t, x*) at every iteration lol
Yeah... It seems reasonable to surmise a kind of fractaline connection pervades these phenomena. I got the same vibe when I watched Veritasium's discussion of the option pricing equation for some reason.
Fantastic interview. One gripe. Towards the end, when Tim was asked about flipping a switch to have AI clones created of himself, the guest tried to push as much as possible while being kind, but it's dissapointing that Tim would not engage. The point of hypotheticals is to provide less ambiguous scenarios for understanding your values. As long as they're not something offensive, you absolutely must engage with them or else there is no dialogue and no point to a discussion. You also should try your best to steelman them and address the main point of the question instead of nitpicking about some contingent aspect of the hypothetical. So instead of bringing up the identity problem with cloning yourself as an issue and leaving it there, you can modify the hypothetical such that the AI isn't a clone of Tim, but rather only a clone of Tim's abilities, i.e it can do anything Tim does, but not necessarily in the same fashion. That way, you remove a non-necessary concern with the hypothetical, and engage with the obvious intended point of the hypothetical.
SEO learning AI and ML here. Thoroughly enjoyed the video -- especially the bits on ethics -- and appreciate the channel. I just caught this discussion but will share the vid and continue exloring the channel. Think it's critical, whether or not people use the technology, to understand AI's implications at large and on a deeper level. Cheers!
Beautifully produced! Love these types of videos from you. The amount of work that goes into creating one of these videos is mind boggling. Serious kudos. What a service you’re doing for the current and future generations of technologists.
one reassuring parameter is how they are training gpt, they have fed all of the books into the fire and are now onto forums and social media. hopefully we will have selectively framed videos of amazon robots doing tictoc dances soon.
Extremely important information about MNIST 1D dataset. I always start with it. Unfortunately it is also the most underrated. People don't understand power of simplicity.
And it's a good thing there's no extant models for representing a complex system yet because it is a Reductionist application ignorant of the dynamics of systems with higher order complexity. We need to first ask neural networks to reengineer themselves to produce a "complex systems native" model (this is the underlying reason for the seemingly idiosyncratic paradoxical observation: quantitatively, the difference between a Reductionist outcome and one after prediction using a second generation neural network-re-conceptualization of a neural network in
I don’t think you understand most of the words that you’re using. Your comment really didn’t make sense. For example, what’s a “second generation neural network”? I’m sorry to tell you that that’s not a thing. You did use a TON of big words though, so congrats 😂
@@therainman7777 again, what is it you're trying to accomplish by pointing out that you think I don't know what I'm talking about because you can't understand what I wrote? Why don't you try first explain back to me what you understood and I'll explain it based on what you write, in a way you most certainly will understand--and I'll bet you $100 you'd agree, too
41:58 he says he doesn't know why the smooth interpolation works but isn't it just dimensionality reduction? If you take the SVD of high dimensional data you can throw out small singular values and dramatically reduct the dimensionality without introducing a lot of error.
Ah this is such an interesting video; subscribed! I also really admire the "pay what you can, if you can" approach to your Patreon. Looking forward to watching more of your content. Edit: can't help but notice this street talk is in a forest 😅
I know it sounds weird, but surprisingly, ReLU networks can easily produce this many regions. Consider a 100-dimensional input space, with 100 ReLU functions aligned with the 100 axes. Each divides the space into 2 regions (flat or sloped), so collectively, they make 2^100 regions. Pic in Figure 3 of my book.
Hi, Tim. Please explain why the randomization of image pixel does not have negative impact of DNN training results, only slowing down the learning speed. How does DNN detect object in image without consideration of the neighborhood of pixels? Theoretically, object detection should not work, but it does, why? Thanks, Tim.
The resulting model won't generalize (at least not well) -- this is just to say that you can still train it successfully even when the data has been trifled with. It's a statement about training, not about performance.
The order of pixel vaules matters, if we flatten the pixel value into large vector as the input layer of DNN - the DNN learn the order of the input data as well - the importance is the consistancy how we flatten the pixel matrix into one large vector, resp. how you do it, is your choice, but has to be consistent across the training and test datasets - PyTorch, Keras take care of the consistency if we train DNN for our projects, therefore those questions do not appear in daily work of ML-Engineers. The ML-scientists have an important roll to play here, to fill this gap and ask uncomfortable questions, dive deeply into this topic which for ML-Engineers and CS people as given, no further question will be asked about reason and why.
@@SimonPrince-lr9dkyour book is a good summarization of the fundamentals in ML-classes of Msc. and PhDs in CS and thank you for your time and effort you put into your book to clarify many questions I had during my university years - simplified many concepts in ML and made your book free for public, thank you.
54:26 If I understood correctly, what the host is trying to say here is this: for a fixed number of parameters, deep neural networks have more linear regions than shallow neural networks. HOWEVER, the function modelled by the DNN has lots of inner symmetries,; it is not a "free function". So why does it work so well? It is to say: when you add more layers to a neural network you can fit more complex functions with less parameters, but the shape of these functions is weird and, to some extent, """"""""""fractal"""""""""". It is hard to understand why this kind of highly constrained functions fit real world relations so good. UDL explores this topic, although there is not an answer yet.
Brilliant interview. Refreshing to watch professional content and not the garbage AI channels that the algorithm suggests first. This whole AI hype is built around OpenAI marketing to finance its cash-burning company, and the recommendation algorithm fuels it. I cannot agree more on the distinction between science and engineering. When I obtained my phd in mathematical logic I was excited about AI, because I thought it could unravel the mystery of intelligence. But we are not learning anything about it, and it is so depressing. We need so badly a profound theory of neural networks, instead of watching these engineers trying random stuff until something works.
2:01:55 I also observe a certain (well-intentioned) paternalism in Simon's opinion. If there are people who can have rewarding occupations and build self-taught personalities, it is precisely because there is a large part of the population doing alienating jobs in factories, supermarkets and offices. No one wants to spend the best years of their life loading boxes in a forklift or washing dishes in a restaurant, it is circumstances that push those people in that direction. I hope the day comes when human beings are freed from those chains. On the other hand, I agree that the transition to automation can be hard, but I think we made a mistake in delegating this transition to governments. States are by nature perverse, what we need is a committed and strong civil society, with the capacity to save, that is willing to temporarily welcome those who are left out. In this sense, I believe that we can learn much more from indigenous societies, and from how anthropologists tell us that they build their social support networks, than from the social engineers who are to come.
This is a fair criticism of my viewpoint. But it's also paternalistic to say that human beings need to be "freed from these chains". Personally, I actually would rather be loading boxes / washing dishes than be unemployed.
Absolutely, amigo. We are witnessing a carefully staged explanation of reality, down to the precise functioning of our minds. If you watch Andrej Karpathy's excellent line-by-line code build of an LLM you will literally start to _feel_ the method of your own intellection. It's quite trippy.
What did you like about this video? What can we improve?!
Firstly, I wanted to say--I can't believe I just realized I wasn't subscribed to your channel; this mistake has been rectified!
Secondly, I could easily write an essay (or more accurately, a love letter) about this channel: there are very few insightful AI channels on UA-cam, a few mediocre ones, and the rest. This channel, without a doubt, is in a league of its own. As an engineer, when I see a new MLST video posted, it's like sitting down for a mouthwatering gourmet meal, after being forced fed nothing but junk food for weeks on end.
Finally, with that said, allow me to attempt a justification of my admiration:
1) Guests: You always have amazing guests! Your interview style never fails to engender engaging, thoughtful, and most of all - fun conversations! In this video in particular, Simon seems like he’s having a blast speaking about something he’s passionate about, and that enthusiasm genuinely put a smile on my face.
2) Editing: The videos are always well put together and the production value is always phenomenal! I mean, wow… Compared to the other AI channels on UA-cam, Machine Learning Street Talk makes them look like amateurs.
3) Knowledge: Most other channels seem content to merely discuss the latest ML hype as it happens in real-time; this is fine, and most aren’t objectively wrong, however, it's mostly surface level discussion and smacks of novice insight. They are, for lack of a better description, an animated news feed. With the exception of Yannic, MLST is the only other mainstream channel I’m aware of with a solid academic pedigree, and it's palpable. I’ve been completely starving for this kind of more in-depth/rigorous discussion. I can only speak for myself, but I imagine there are many who come from a STEM/technical background who feel the same way, so thank you on our behalf.
Keep up the great work!
@@jd.8019 Thank you sir!!
snag the first Ilya interview since the openai debacle :D
I love the overall aesthetic and the production value. I think many technical channels don’t realize that even for highly technical subject matter, these things are incredibly important for building a large audience that keeps coming back. I also like the way on some episodes you seamlessly edit and link together clips from other interviews; my only criticism is that sometimes when you do that the narrative thread that ties them together is not always made sufficiently clear. If you could include some quick, unobtrusive means of showing the viewer why those clips are being edited in and what the thread is, I think that would significantly up the utility and educational value people get out of it. Kind of like a mind map, but within your videos.
Th word “hard” in the thumbnail…
I'm the author of the MNIST-1D dataset (discussed at 1h15). Thanks for the positive words! You do an excellent job of explaining what the dataset is and why it's useful.
Running exercises in Colab while working through the textbook is an amazing feature.
Nice, I love the dataset for testing and learning purposes, so thank you so much for creating and releasing it 🙏
How to learn ml from scratch and create stuff like u do?
Thank you for that. I always start my trainings with that.
@@rubyciide5542 by reading this book?
can I train models on my nintendo, sir?
Currently going through it and it's one of the best textbooks I've read. Period. Not just DL books, books.
I love it.
Sorry you’re going through it man. Hope things get better for you
The best channel I've seen for AI. Cutting edge, no amateur overhyped BS. Down to earth.
Simon taught the first semester of my second year "Machine Learning" module at university! really nice man, we used this book as the module notes. He was very missed when he left in second semester and the rest of the module was never able to live up to his teaching.
what university?
University of bath?
He seems like a super nice dude who's really passionate about his subject
i've read most of this book already and it's fantastic. it feels like a spiritual sequel to goodfellow's original DL book.
Can I read this book , even without reading Goodfellow's book?
I am currently reading Jeremy's DL with fastai Pytorch book.
@@sauravsingh9177 yes although you're expected to have basic familiarity with stats, lin alg, calculus etc.
Hello.... I'm a med student and not so proficient in math... Please could you list a comprehensive math tools requirement to be able to understand DL? I'LL BE RIGHT ON IT!🙏😊 @@amesoeurs
@@amesoeurs im in college first year how long it wll take to read and understand this book if i already knew basic math for ml,
@@Cognitivecode7 you need a good understanding of probability and vector calculus first. you need to take those courses and have a very solid understanding. then look to mathematics for ML book (deisenroth). then you're good to go
Brilliant, clear, direct conversation. Thank you!!
The overparametrization conundrum may be related to the fact we look at what NN are in a wrong way. To me NN is not a "processor" type of object, it's a novel type of memory object, memory which stores and retrieves data by overlaying them on top of one another & also recording the hidden relations that exist in the data set. This is what gets stored in the "in between" places even if the input resolution is low - the logic of coexistence of different images (arrays), which is something not visible on the surface.
I'm a philologist by training, and in XX cent. literature there was this big buzz around the concept of "palimpsest". Originally palimpsests were texts written on reused parchment from which previous text were scraped off with a razor. Despite scraping, the old text still remained under the new one, which led to having two texts in the same space on the page. In literature this became a conceptual fashion of merging two different narratives into one, with usually very surreal effect. One of the authors that comes to mind is William S.Burroughs.
In the same way merged narratives evoke a novel situation due to novel logical interactions between the inputs, the empty space in an overparametrized NN gets filled with the logic of the world from which the input data comes & this logic exists between them even when the resolution is low.
Maybe NN is a Platonic space. Many images of trees somehow hold in them the "logic of the tree", which is something deeper and non obvious to eyes, since in their form alone converge both the principles of molecular biology & the atmospheric fluid dynamics, ecosystemic interactions, up to the very astronomical effects of sun, moon, earth rotation, etc.
All of it contributes to this form in one way or another, so the form reflects those contributions & therefore holds partial logic of those interactions within it.
Information is a relation between the object and it's context (in linguistics we say - it's dictionary). A dataset not only introduces objects, it also as a whole becomes a context (dictionary) through which each object is read.
In that sense maybe upscaling input data sets prior to learning is detrimental to the "truth" of those relations. I would be inclined to assume we'd be better off if we let the NN fill in those spaces based on the logic of the dataset, unless we want the logic of transformations to somehow influence the output data (say - we specifically are designing upscaling engine).
Platonic space? Wow. Never have I seen such a dogged dedication to being ignorant of very simple math. It's not even tensor math; it's simple lin alg.
It’s interesting that in Integrated Information Theory consciousness literally is a super-high dimensional polytope (with every dimension corresponding to a whole system state in an integrated network) in an abstract space called Qualia space.
What a gripping conversation.. Thank you. !
The way I don't understand deep learning is that it is a statistical power modulated by randomness to emulate reasoned speech, but it is really the top 3-5-7 reasonable speeches selected randomly, at every word. So in theory whatever the AI says, it should not be able to say it twice unless you tweak its parameters and taken away the randomness (temp) it will always repeat the same thing. It's good at emulating speech that gives the resemblance of an intelligent articulation, but it is indeed the syntax and vocabulary (data) placed in a statistical congruent manner that gives that illusion.
It's like a super sales guy, it will talk very well, but there would be no substance in his apparent passion.
It's still an open question though how different this is from what humans do. What if human brains operate on a similar principle, of learning patterns, activating a subset of them based on contextual input, and then selecting from them via some noisy sampling?
That's what really bakes people's noodles.
The biggest difference, in my mind, is that most ML objective functions are explicitly and statically evaluated and human ones are implicit in the effect of dynamical chemical processes on learning rates or whatever hyperparameters govern organic learning. Reinforcement learning approaches hint at what more human-like ML systems could look like.
A lot of what I say has no substance or passion.
Turns out learning to predict the next token requires understanding a lot of things
Yes these models need a world models not only statistical but reasoning with self-learning that improve relevance .
For the moment it's just a well educated salesman but in a very very narrow way.
@@franzwollang Yes and your point is compelling. If anything the very fact that humans are learning while they infer or confer makes them already a different beast.
As we can cogitate and plan, even while not talking, we "defrag" out knowledge. And if we do confer with someone else, we have even faster learning curves. While I am writing this, I am thinking about expressing my knowledge of the AI, and it is a composite knowledge, made of essential theoretical knowledge, psycho-plasticity while using the AI and inference matured from black-box prompting various AI over the last years.
So yes compared to us its learning is uni-dimensional and purely linguistic while we have a convergence of learning mechanisms working together, all the time.
Currently there is so much to take on in AI that I have a dozen of half baked stuff opened, but ideally your very inference should automatically fine tune your bot, openpipe is trying to do such a thing, but ideally, it should be ported to the unsloth engine, as openpipe uses openai and it's going to cost you a boat load of money to run anywhere near good between inference, dataset generation and loads of fine-tuning sessions.
Love the studio, would love to see more face-to-face podcasts here
I really appreciate being used as an example near the end of the discussion. Version 2.0 is coming along slowly, but I am confident I'll get there.
Cool thoughts on digging into the nitty-gritty of deep learning frameworks. The connection between language models and our brains, especially in Transformers, really makes you think. Checking out how things stay consistent inside and finding ways to boost brainpower raises some interesting questions. Looking forward to diving deeper into these fancy concepts!
great conversation, appreciate the skepticism
The most amazing thing about this video to me is that Simon's hair matches that microphone perfectly, nice work lads...
😂 It's called a dead-cat wind filter in the video trade. Good one!
@@Daniel-Six can we not call it that?
I like the fact that he clearly rejected the public idea that neural nets are modelled after human brain.
I just got into deep learning on my own. Every time I watch this channel its like thinking you know something, and then learning that what you think you know is only a grain of sand on a beach, with each grain having its own universe, and infinite beaches....subscribed.
Best source and community for ml on the internet by far. Love the work you guys do mlst
great episode. tim, you should try to get chris bishop on the show too. he finally released the companion book to PRML this month.
He's coming on 🤘
Here you have a guy on the hook, I love how you throw these common buzzwords, like emergent agency, set phenomena at him and let him sort it out, which he does in a way that he gives me the feeling of actually understanding, really nice stuff
The sombre Chopin tones in the background emphasize how deep the learning truly is but leave me with little hope of ever fully understanding it... :D
Just know many worker bees buzzing about, only at best regurgitating someone else who barely can regurgitate, who couldn't have ever regurgitated THAT well, before this explosion of mass regurgitation on the topic. I want to say hiding, but it's an inflated confidence, but unfortunately and ironically we have not many tools to properly scrutinize people's ability to zero in on semantically sophisticated buzz words and phrases, and they can very well be trying to convey 'something' And have some picture, cartoon or otherwise, in their head that they think accurately corresponds to what they're conveying... But Nobody ever figures out what they're talking about And later down the road day, even They realized they didn't understand what the f*** they were talking about. Frankly.
I think you take just the 2 of these men.We're both very smart but I'm sorry.I've been watching this host for a while.And you know it's that same that same number that Differentiates these two men, Is the host Doesn't actually know what he's saying after He says it He doesn't know before either.By the way he never knows He's just going off the hip, The guest trys very hard and I don't know why but Quite a bit of his struggle is trying to Mitigate what the host doesn't understand what he's conveying and trying to bridge the gap into something coherent once again
1:13:45 Three-Dimensional Orange (Volume): For a regular three-dimensional orange, which we can approximate as a sphere, the volume is calculated using the formula 4/3 * pi r^3
Four-Dimensional Orange (Hypervolume): In four dimensions, an object analogous to a sphere is called a "hypersphere." The formula for the hypervolume of a 4D hypersphere is 1/2 * pi^2 r^4
what a nice orange
what a nice orange
I would like to see an interdimensional orange.
I would like to see an interdimensional orange.
I would like to see an interdimensional orange.
My favorite analogy to describe the divergence of ML & brain science is the casus of airplanes. Flying was for a long time reserved for certain products of evolution & at the beginning engineers were imagining themselves copying forms and behaviors of birds to understand them.
But in the end engineering & abstract understanding of physics behind the flying is what gave us technology that applies those principles, usually in ways very strange to nature itself. Only in recent years, due to modeling tools that came out of engineering, people begun to model bird-like, or insect-like machines & understanding how nature does it.
I like to believe ML will develop what I call "scientific principles of thinking", an overview of complex information processing techniques that some decades down the line will inform us on how we should look at our own brains to understand them.
But just like with artificial bird wings, artificial human-like brains, will be nothing more then work of hobbyists and proof-of-concept academic research, since the engineered ML solutions will be that more powerful and tailored to specific tasks.
Not just neuro-mimicry, but the very "general intelligence" - as it is a great narcissistic dream - is also a rather useless engineering solution. What engineering problem does it solve anyway? Pretending to be human? So this would mean the current world somehow does not want humans to exist. Which maybe is the real sinister problem we must address before we "provide solutions" to it.
Sounds like there are as many questions as answers at this point - looks like a great book with plenty of instructive graphics - look forward to reading it....cheers & happy Gnu year!
Tim, this is incredibly insightful. Thank you
The author was my lecturer last year in my first semester! The dude is brilliant!
I have only been deep learning research and development for 9 years, and if you really wish to understand those fundamental aspects of deep learning that you believe nobody understands how and why deep learning works, I would love to help you with obtaining that understanding. I can explain it from an analytical, geometric or other perspective as you find most useful.
@@Eet_Mia Obviously not. A minimun knowledge of advance math is requiered to deeply understand deep learning.
@@jonathanjonathansenIn any case it would be algebra, calculus and statistics, genius. Precisely what you just said absolutely invalidates your answer, because you have the slightest idea. Literally everything is algebra. If you don't understand that, you don't have the slightest idea about mathematics.😂😂
Sociological??
Simon's contribution adds to the MLST ambitious book club.
the video is a textbook example of the best traditions of British science ie being objective, no nonsense and honest intellectually. At one point the tech was summarized as Modelling Probability Distributions in a multidimensional space and universal function approximation and hence having nothing to do with "thinking", with which i fully agree (as a professional software engineer). What was however shocking to see towards the end of the video was that the professor (despite the spot on tech summary i just provided), then went into a completely unfounded scifi statements about doctors, lawers and enginners (and even greeting cards designers :-)) losing their jobs to the tune of 800 million (or was it 80 million whatveer). I cant comprehend how the professior managed to reconcile these two things in his head (non GAI nature of the current tech, GAI still a pipedream, no real thinking/intelligence in the current tech just deriving and modeling probabilities from the data to fit patters in it) and that replacing "knowldgfe workers" by the tone
Good point and thank you for your kind words. I guess I reconcile these things because I think the technology we have already (even if there was no more significant development) might be enough to cause massive job losses. It could make many individuals much more productive and that would mean most companies would need fewer people. For example, I used Grammarly to proof my book. That was 2 months of proofreading work for someone just gone... Happy to be proved wrong about this though!
It's not science fiction that Dall-E can take a sketch, generate a rendered image in a specific style and then offer up 10 variations which in turn can be sold (as greeting cards in this example). If using Dall-E is more cost effective than employing artists doing the same thing fewer artists will be able to make money creating greeting cards. This has nothing to do with AGI and everything with economics.
The same thing can be said about lawyers who, typically, sift through data, apply their knowledge about laws and precedents and generate missives that are used to argue in favour of their clients. If any step in that process can be replaced by, say, GPT4 and make it more cost effective, fewer lawyers will be able to make money offering that service. No AGI, just economics.
So, I see no problem with his stance on AGI, or lack thereof, and him accepting a prediction of workers losing their employment when non-AGI technology is being used in more and more sectors.
Thanks Tim, great video I learned a lot.
Your question whether ChatGPT does anything is a question about agency or freewill. But the real question is whether human brain has any freewill or agency?
Yes but it depends
i see it as alchemy in the sense of it being between the line of where science meets magic or the unknown...interesting times
Thank you for another excellent conversation! I really loved the discussion of the practical, grounded ethical concerns - I hope y'all consider having more ethicists on the show!
Have you considered the ethical implications of them doing that, @Pianoblook??
@@MichaelBeale yes, hence why I recommended it
The suspense is killing me, what's the punchline?
jokes aside, what an amazing chapter this was! I am getting this book
The punchline is that humans are complicated catflaps that confuse their simulations of themselves with themselves.
Really appreciate this conversation.
What an opportunity to listen to this episode when I just started reading the book recently :)
Great job. One small thing I disagree with is your assertion the “we don’t know why deep learning works” (and many other related things you repeat that assertion on). As someone who has worked in AI for over 3 decades, and deep learning for more than a dozen years, I can say that there are many who understand all of those things stochastically, canonically, theoretically, and practically.
Go on then... I'll bite. I've never heard a convincing explanation of why networks have to have >10 layers to get good performance, and no-one that I have ever asked has claimed to know. Can you explain this to me?
"reflecting it like a crazy house of mirrors into the rest of the space"...Darren Aronofsky pioneered some of the thoughts in this direction 26 years ago. Time for me to watch it again :)
Please interview David Deutsch. He will explain why entire paradigm of intelligence continuum is wrong. He will also explain why humans (and GIs) can understand everything given enough computing resources. Every AI expert is wrong because they do not understand epistemology but David does.
I am more and more convinced that the inscrutibly large matrices ubiqitous to machine learning are a black-box interface to a mechanism situated elsewhere and unseen in our computational domain--which potentially employs an entirely different kind of logic to generate the suspicious efficiencies we observe now.
As Dr. Waku put it on his channel; we are actually making API calls to a more elaborate machine.
Tf are you talking about Daniel
Thanks!
Alright I'm sold. Ordering the book!
2:07 so glad you mentioned Schmidthuber. 😅
So glad I found this channel, I have some catching up to do!!
One of the best videos I've seen regarding the topic. I hold most of the Professors views on the matter so its refreshing to see not everyone in the ML community drank the A.I cool aid. Though I am a lot more pessimistic in that I think we can't slow down or put the genie in the bottle and the effort is wasted. So enjoy the cool new tech while you can enjoy anything...
Cool it with the sound effects
Podcast version has no music or sound effects 😄
The camera and editing is very annoying. Nearly unwatchable!
> me, who know nearly nothing about the theory of all of this stuff, but has implemented an image classification network hearing him talking about 'trying to push softmax functions to infinity'
I get that reference.
this stuff been used in consumer products since Haar Cascade Classifier became a thing in the early 2000s , now we have more processing power and software that makes it easier to work with this technology.
practical deep learning:
1. pull the model
2. tune tune
3. apply
there is nothing interesting here - only trivial and ultimately hard tasks, but mayby 1% of companies builds their own models
Man is extremely creative at content creation. Else there be not many academicians and endless academies.
so it sounds like the old trope about QM (quantum mechanics) can be applied here: "If you think you undertsand deep learning, you don't understand deep learning"
How have I only just stumbled on this channel. Good to be home.
Hannah Arendt covers the conceptual approach of the Capitalist and ancient view of Work and Labour, which are relevant to thoughts of the future
I love the idea that the moment LLM's start generalizing, no one understands why
(Even though Hinten or Sutskever might claim that they do)
1:13:26 Figure 18.3a explains [to me] why we call it diffusion. I guess the hypothesis goes, as long as you take a small enough step size, you'll stay within the "[conditional] distribution" q(z)(z_t | x*) when iterating on the diffusion kernel q(z_t, x*), i.e. the dashed cyan border representing the time-evolving distribution bounds. Anyone here thinks this diffusion process looks kinda like the stock market? Where we have piece-wise linear dumdums jump out with their limit orders to steer q(z_t, x*) at every iteration lol
Yeah... It seems reasonable to surmise a kind of fractaline connection pervades these phenomena.
I got the same vibe when I watched Veritasium's discussion of the option pricing equation for some reason.
Fantastic interview. One gripe. Towards the end, when Tim was asked about flipping a switch to have AI clones created of himself, the guest tried to push as much as possible while being kind, but it's dissapointing that Tim would not engage. The point of hypotheticals is to provide less ambiguous scenarios for understanding your values. As long as they're not something offensive, you absolutely must engage with them or else there is no dialogue and no point to a discussion. You also should try your best to steelman them and address the main point of the question instead of nitpicking about some contingent aspect of the hypothetical. So instead of bringing up the identity problem with cloning yourself as an issue and leaving it there, you can modify the hypothetical such that the AI isn't a clone of Tim, but rather only a clone of Tim's abilities, i.e it can do anything Tim does, but not necessarily in the same fashion. That way, you remove a non-necessary concern with the hypothetical, and engage with the obvious intended point of the hypothetical.
NOTE TO SELF : (1) Don't believe own propaganda
"Learning" isn't "understanding" which isn't "analysing" which isn't "solving" ... let alone "creating" ... or even "associating" ...
I am from Pakistan sir and a big fan of yours. Thank you for making this available for us in the third world
I have the feeling that this book is going to be a must in universities.
the most human discussion on AI ever
I believe that it makes perfect sense that gradient descent causes a complex function to find a solution that works.
He’s asking you if you would flip the switch. His question is perfect, and the fact that you avoid answering it should make you think, no?
pdf of whole book is on his website
Awesome!
Favorite channel covering the subject by far
Awesome presentations, thank you for great content!
SEO learning AI and ML here. Thoroughly enjoyed the video -- especially the bits on ethics -- and appreciate the channel. I just caught this discussion but will share the vid and continue exloring the channel. Think it's critical, whether or not people use the technology, to understand AI's implications at large and on a deeper level. Cheers!
Hey great book recommendation! Any with similar approach and style on machine learning? Thanks!
Great stuff, thanks
Loved the discussion! Thank you
Perhaps “Open Mind” will be the ultimate AGI. 😂.
Awesome discussion btw.
We should support these small creators, they stay true to their target audience however small it might be..
So many jargons, took me like 2months to completely understand and appreciate the video
Amazing conversation!! Thanks for sharing!! Subscribed, we need more people like you on the internet
Fantastic - Thank you so much for this content. Loved it.
the first 10 minutes was gold. I love the reductionist way he explained how it works. I may need to grab a book
Beautifully produced! Love these types of videos from you. The amount of work that goes into creating one of these videos is mind boggling. Serious kudos. What a service you’re doing for the current and future generations of technologists.
one reassuring parameter is how they are training gpt, they have fed all of the books into the fire and are now onto forums and social media. hopefully we will have selectively framed videos of amazon robots doing tictoc dances soon.
good new year
Thank you Professor Prince. Your book is invaluable.
One of the best discussions I’ve listened to in a while
Extremely important information about MNIST 1D dataset. I always start with it. Unfortunately it is also the most underrated. People don't understand power of simplicity.
Very informative conversation. No nonsense.
And it's a good thing there's no extant models for representing a complex system yet because it is a Reductionist application ignorant of the dynamics of systems with higher order complexity.
We need to first ask neural networks to reengineer themselves to produce a "complex systems native" model (this is the underlying reason for the seemingly idiosyncratic paradoxical observation: quantitatively, the difference between a Reductionist outcome and one after prediction using a second generation neural network-re-conceptualization of a neural network in
I don’t think you understand most of the words that you’re using. Your comment really didn’t make sense. For example, what’s a “second generation neural network”? I’m sorry to tell you that that’s not a thing. You did use a TON of big words though, so congrats 😂
@@therainman7777 again, what is it you're trying to accomplish by pointing out that you think I don't know what I'm talking about because you can't understand what I wrote?
Why don't you try first explain back to me what you understood and I'll explain it based on what you write, in a way you most certainly will understand--and I'll bet you $100 you'd agree, too
You guys are great.. please try to connect with Indian and Chinese professors if possible.. it will enrich us all❤
Agree with the other comment, it's more of a follow-up to the goodfellow book.
Definitely needs a pre-req
What's a good pre-req alternative?
@@andrice42 goodfellow is the bible, nothing else comes close.
41:58 he says he doesn't know why the smooth interpolation works but isn't it just dimensionality reduction? If you take the SVD of high dimensional data you can throw out small singular values and dramatically reduct the dimensionality without introducing a lot of error.
Ah this is such an interesting video; subscribed! I also really admire the "pay what you can, if you can" approach to your Patreon. Looking forward to watching more of your content.
Edit: can't help but notice this street talk is in a forest 😅
Great to bring a friendly face to the book. Can't wait to devour it!
More regions than there are atoms in the universe?
So more than 10^80 regions? That doesnt sound right
I know it sounds weird, but surprisingly, ReLU networks can easily produce this many regions. Consider a 100-dimensional input space, with 100 ReLU functions aligned with the 100 axes. Each divides the space into 2 regions (flat or sloped), so collectively, they make 2^100 regions. Pic in Figure 3 of my book.
Hi, Tim. Please explain why the randomization of image pixel does not have negative impact of DNN training results, only slowing down the learning speed.
How does DNN detect object in image without consideration of the neighborhood of pixels? Theoretically, object detection should not work, but it does, why? Thanks, Tim.
The resulting model won't generalize (at least not well) -- this is just to say that you can still train it successfully even when the data has been trifled with. It's a statement about training, not about performance.
The order of pixel vaules matters, if we flatten the pixel value into large vector as the input layer of DNN - the DNN learn the order of the input data as well - the importance is the consistancy how we flatten the pixel matrix into one large vector, resp. how you do it, is your choice, but has to be consistent across the training and test datasets - PyTorch, Keras take care of the consistency if we train DNN for our projects, therefore those questions do not appear in daily work of ML-Engineers. The ML-scientists have an important roll to play here, to fill this gap and ask uncomfortable questions, dive deeply into this topic which for ML-Engineers and CS people as given, no further question will be asked about reason and why.
@@SimonPrince-lr9dkyour book is a good summarization of the fundamentals in ML-classes of Msc. and PhDs in CS and thank you for your time and effort you put into your book to clarify many questions I had during my university years - simplified many concepts in ML and made your book free for public, thank you.
Lex Friedman interviews on AI are 👍👍
54:26 If I understood correctly, what the host is trying to say here is this: for a fixed number of parameters, deep neural networks have more linear regions than shallow neural networks. HOWEVER, the function modelled by the DNN has lots of inner symmetries,; it is not a "free function". So why does it work so well?
It is to say: when you add more layers to a neural network you can fit more complex functions with less parameters, but the shape of these functions is weird and, to some extent, """"""""""fractal"""""""""". It is hard to understand why this kind of highly constrained functions fit real world relations so good. UDL explores this topic, although there is not an answer yet.
thank you for these great interviews
Right on. It’s an equation
Brilliant interview. Refreshing to watch professional content and not the garbage AI channels that the algorithm suggests first. This whole AI hype is built around OpenAI marketing to finance its cash-burning company, and the recommendation algorithm fuels it. I cannot agree more on the distinction between science and engineering. When I obtained my phd in mathematical logic I was excited about AI, because I thought it could unravel the mystery of intelligence. But we are not learning anything about it, and it is so depressing. We need so badly a profound theory of neural networks, instead of watching these engineers trying random stuff until something works.
I'm reading through Deep Learning by Bishop and Bishop, which, based on the table of contents, seem to have large overlap. How does this book compare?
2:01:55 I also observe a certain (well-intentioned) paternalism in Simon's opinion. If there are people who can have rewarding occupations and build self-taught personalities, it is precisely because there is a large part of the population doing alienating jobs in factories, supermarkets and offices. No one wants to spend the best years of their life loading boxes in a forklift or washing dishes in a restaurant, it is circumstances that push those people in that direction. I hope the day comes when human beings are freed from those chains.
On the other hand, I agree that the transition to automation can be hard, but I think we made a mistake in delegating this transition to governments. States are by nature perverse, what we need is a committed and strong civil society, with the capacity to save, that is willing to temporarily welcome those who are left out. In this sense, I believe that we can learn much more from indigenous societies, and from how anthropologists tell us that they build their social support networks, than from the social engineers who are to come.
This is a fair criticism of my viewpoint. But it's also paternalistic to say that human beings need to be "freed from these chains". Personally, I actually would rather be loading boxes / washing dishes than be unemployed.
The host and the guest both are about to be VERY surprised at what is about to hit us. Buckle up.
Absolutely, amigo. We are witnessing a carefully staged explanation of reality, down to the precise functioning of our minds. If you watch Andrej Karpathy's excellent line-by-line code build of an LLM you will literally start to _feel_ the method of your own intellection. It's quite trippy.