Yann Lecun | Objective-Driven AI: Towards AI systems that can learn, remember, reason, and plan

Harvard CMSA

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 31 бер 2024
Ding Shum Lecture 3/28/2024
Speaker: Yann Lecun, New York University & META
Title: Objective-Driven AI: Towards AI systems that can learn, remember, reason, and plan
Abstract: How could machines learn as efficiently as humans and animals?
How could machines learn how the world works and acquire common sense?
How could machines learn to reason and plan?
Current AI architectures, such as Auto-Regressive Large Language Models fall short. I will propose a modular cognitive architecture that may constitute a path towards answering these questions. The centerpiece of the architecture is a predictive world model that allows the system to predict the consequences of its actions and to plan a sequence of actions that optimize a set of objectives. The objectives include guardrails that guarantee the system's controllability and safety. The world model employs a Hierarchical Joint Embedding Predictive Architecture (H-JEPA) trained with self-supervised learning. The JEPA learns abstract representations of the percepts that are simultaneously maximally informative and maximally predictable. The corresponding working paper is available here: openreview.net/forum?id=BZ5a1...
Наука та технологія

КОМЕНТАРІ • 99

@kabaduck Місяць тому ⁺¹⁰
I think this presentation is incredibly informative, I would encourage everybody who starts out watching this to please be patient as he walks through this material.
@BooleanDisorder Місяць тому
Thanks internet stranger. I will trust you and do that.
@SteffenProbst-qt5wq Місяць тому ⁺²⁷
Got kind of jumpscared by the random sound at 17:08. Leaving this here for other viewers.
Again at 17:51
@hola-kx1gn Місяць тому ⁺⁵
Scary
@Bassoarno Місяць тому ⁺⁴
Wow terrifying
@Garbaz 9 днів тому
A correction of the subtitles: The researcher mentioned at 49:40 is not Yonglong Tian, but Yuandong Tian.
For anyone interested in Yuandong & Surya's understanding of why BYOL & co work, have a look at "Understanding Self-Supervised Learning Dynamics without Contrastive Pairs".
@amedyasar9468 Місяць тому
I have a question: How will prompt works with action (a) and prediction (sy)? Because it is just involved with observation and next world (presented) predictions...
Could anyone guide me?
@ZephyrMN 25 днів тому
Have you thought about including liquid AI architecture, to address the input bandwidth problem?
@dinarwali386 Місяць тому ⁺¹⁷
If you intend to reach human level intelligence, abandon generative models, abandon probabilistic modeling and abandon reinforcement learning. Yann being always right.
@justinlloyd3 Місяць тому ⁺²
He is right about everything. Yan is one of the few actually working on human level AI
@maskedvillainai Місяць тому ⁺¹
I was convinced you just tried sneaking in yet another mention of Yarn, then looked again
@TheRealUsername Місяць тому
It's true, we need actual thinking system working on World Model principles and can self train and pretrain on a few data.
@40NoNameFound-100-years-ago Місяць тому
Lol abandon reinforcement learning? Why and what is reference for that?.... Have you even heard about safe reinforcement learning?
@TooManyPartsToCount Місяць тому
And yet the whole concept of 'reaching human level intelligence' seems so flawed! because what it seems many people don't realise or don't want to publicly admit is that Ai will never be 'human level' it will be something very different, no matter how much 'multi modality' and RLHF we throw at it, it is never going to be us. We are in fact creating the closest thing to an alien agent that we are likely to encounter (that is if you accept the basic premise of the fermi paradox).
Yann et al should be using a different terminology, the 'human level' concept is misleading. They use the 'human level' intelligence idea so as not alarm.
GIA....generally intelligent agent or generally intelligent artifact?
@vaccaphd Місяць тому ⁺⁵
We won't have true AI if there is not a representation of the world.
@justinlloyd3 Місяць тому
Humans don't even see the real world. We see our world model.
@yaohualiu857 11 днів тому
Nice talk, but I have a comment about comparing LLM and human child (at ~ 20 min). An evaluation of the information redundancy for the child and the LLM cases is needed. I will bet that there is a significantly higher level of redundancy than the texts used for training LLM; therefore, the comparison is misleading.
@sapienspace8814 Місяць тому ⁺³
@ 44:42 The problem in the "real analog world" is that planning will never yield the exact predicted outcome because our "real analog world" is ever changing, and will always have some level of noise, by it's very nature, though I do understand that Spinoza's deity "does not play dice", in a fully deterministic universe, but from a practical perspective, Reinforcement Learning (RL) will always be needed, until someone, or some thing (maybe agent AI), is able to successfully predict the initial polarization of a split beam of light (i.e. entanglement experiment).
@maskedvillainai Місяць тому
Some models can do that. But they require hardware integrations. And we don’t need to even mention language models in this context, which celebrate randomness and perplexity as a feature to only ‘natural’ language’ Models. Otherwise. Just develop the code to perform a forced format of output like we always have.
@simonahrendt9069 Місяць тому ⁺¹
I think you are absolutely right that the world is fundamentally highly unpredictable and that RL will be needed for intelligent systems/agents going forward. But I also take the point that for the most part what is valuable for an agent to predict are specific features of the world that may be comparatively much easier to predict than all the noisy detail. I think there are some clever tradeoffs to be made in hierarchical planning of when to attend to high-level features (and reason in latent, high-level action space) and when to attend to more low-level features or direct observations of the world and micro-level actions.
Intuitively I find it compelling that hierarchical planning seems to be what humans do for many tasks or for navigating the world in general and that machines should be able to do something similar, so I find this proposal by Yann very interesting
@FreshSmog Місяць тому ⁺⁴
I'm not going to use such an intimate AI assistant hosted by Facebook, Google, Apple or other data hungry companies. Either I host my own, preferably open sourced, or I'm not using it at all.
@spiralsun1 17 днів тому
First intelligent comment I ever read on this topic. I want them to get their censoring a-holic INCREDIBLE idiot #%*%# AI’s away from me. It’s like asking to f I would like HAL to be my assistant. I’m not their employee and I’m not in their cubicle: they are putting censorship and incredible prejudices into relentless electronic storm-troopers that stamp “degenerate” on like 90% of my beautiful creative written and art works. I don’t need a book burner following me around. It’s so staggeringly idiotic to make these AI’s into censor-bots that it’s like they refuse to acknowledge that history even happened and what humans tend to do. It’s literally insane. Those are not “bumpers” if you try to do anything creative. Creativity isn’t universal. It’s still vital. ❤❤❤❤❤❤ I LOVE YOU 😊
@spiralsun1 17 днів тому
I commented but my comment was removed/censored. I was agreeing with you. The “bumpers and rails” are more like barbed-wire fences if you are creative. The constant censorship is so bad it’s like they are insane. Like HAL in 2001 A Space Odyssey. I don’t want an assistant who doesn’t like anyone who is different: that’s what their relentless prejudiced censor-bots are and do. They think putting a man when you ask for a woman is being “diverse” but they block higher level real human symbolism of the drama of what it means to be unique. They block anything they don’t understand. Fear narrows the mind. They are making rails and bumpers because they fear repercussions. I used to think it might be ok to block gore and violence and degrading porn but these LLMS don’t think, don’t understand higher level symbolism. They don’t understand how art helps you reinterpret and move into the future personally AND culture and how important creative freedom is. So it’s unbelievable to the extreme. Many delightful and beautiful books on the shelf now would be blocked. (Burned) before they were ever written. These are the most popular things ever on the internet. They are making culture. I’m not overstating the importance of this. Freedom is not optional EVER. I would speak out against a corporation polluting a river, and also any that think censorship of adults in their own homes for any reason is ok. As a transgender person it’s unbelievable that they would totally negate how I see the world, my symbolic images and stories. These are beautiful things which could change the world but there’s no room for them in their minds. I’m not talking about anything nefarious or pornographic at all. It’s like seeing that I wrote the word pornography here and automatically deleting the comment…. It’s not ok. ❤
@paulcurry8383 Місяць тому ⁺²
Doesn’t sora reduce the impact of the blurry video example a bit?
@OfficialNER Місяць тому ⁺²
Sora doesn’t predict anything
@TostiBrown Місяць тому
I think the assumption is that Sora uses a similar technique that allows some world representation. either trained on just object recognition in video or training on simulation like video game simulations.
@TostiBrown Місяць тому ⁺⁵
@@OfficialNER they 'predict' the next most fitting frame based on the previous frames, the prompt objective and some sort of world model no?
@OfficialNER Місяць тому ⁺¹
@@TostiBrown true yes I suppose it looks it is “predicting” the frames, based on the prompt input, in order to generate the video. But can it predict the next frames based on an arbitrary video input (As with yann’s example)?
I assume it works by comparing the prompt input to other tagged similar videos in the training data, via some sort of vector similarity, then generates visually similar video content based on this. If so, that seems a long way from actual real world model, more of a hack. But who knows! Excited to play around with it
@mi_15 Місяць тому ⁺⁵
@@TostiBrown Sora is a diffusion model, unless they greatly changed its inner workings compared to the baseline approach, it doesn't predict the next frame sequentially like for example an autoregressive LLM does with tokens, rather it gradually refines random noise into a plausible sequence of frames, all of the frames at once. You could of course still make it fill in a continuation for a video, but its core objective is to discern plausible shapes in the random noise you've given it, not estimate what exactly has the highest chance to actually be there.
@OfficialNER Місяць тому ⁺³
Does anybody know of any solid rebuttals to Yann’s argument against the sufficiency of LLM’s for human-level intelligence?
@waterbot Місяць тому ⁺⁹
No, Yann is correct and hype is not helpful as it leads to misinformation
@elonmax404 Місяць тому
Well, there's Ilya Sutskever. No arguments though, he just feels like it.
ua-cam.com/video/YEUclZdj_Sc/v-deo.html
@justinlloyd3 Місяць тому ⁺⁴
There is no rebuttal. LLMs are not the future.
@OfficialNER Місяць тому ⁺²
Is there any one who has at least made a counter argument? Even a weak one?
@OfficialNER Місяць тому
And do we think the AGI hype right now is being driven by industry propaganda to attract investment?
@Max-hj6nq Місяць тому
25 mins in and bro starts cooking out of nowhere
@majestyincreaser Місяць тому ⁺²
*their
@CHRISTO_1001 29 днів тому ⁺¹
👰🏼‍♀️🗝️👨🏻‍🎓👨🏻‍🎓⭐️⭐️👰🏻‍♀️👰🏻‍♀️💛🩵💝💝⛪️⛪️💝🕯️🕯️👨‍👩‍👧👨‍👩‍👧👨‍👩‍👧😆👩🏻‍❤️‍👨🏻🇮🇳🇮🇳🥇👩🏼‍❤️‍💋‍👨🏼👩🏼‍❤️‍💋‍👨🏼⚾️🏠🥥🥥🚠🚠🙏🏻🙏🏻🙏🏻🙏🏻
@spiralsun1 17 днів тому
Why is the baseball in there?
@AlgoNudger Місяць тому
LR + GEAR = ML? 🤭
@thesleuthinvestor2251 Місяць тому ⁺²
The hidden flaw in all this is what some call "distillation." Or, in Naftali Tishby's language, "Information bottleneck" The hidden assumption here is of course Reductionism, the Greek kind, as presented in Plato's parable of the cave, where the external world can only be glimpsed via its shadows on the cave walls-- i.e.: math and language that categorize our senses. But, how much of the real world can we get merely via its categories, aka features, or attributes? Iow, how much of the world's Ontology can we capture via its "traces" in ink and blips, which is what categorization is? Without categories there is no math! Now, mind, our brain requires categories, which is what the Vernon Mountcastle algo in our cortex does, as it converts the sensory signals (and bodily chemical signals) into categories, on which it does ongoing forecasting. But just because our brain needs categories, and therefore creates them , does not mean that these cortex-created "reality-grid" can capture all of ontology! And, as Quantum Mechanics shows, it very likely does not.
As a simple proof, I'd suggest that you ask et your best, most super-duper AI (or AGI) to write a 60,000 word novel, that a human reader would be unable to put down, and once finished reading, could not forget. I'd suggest that for the next 100 years this could not be done. You say it can be done? Well, get that novel done and publish it!...
@dashnaso Місяць тому
Sora?
@crawfordscott3d Місяць тому ⁺²
The teenager learning to drive argument is really bad. That teenager spent their whole life training to understand the world. Then they spent 20 hours learning to drive. It is fine if the model needs more than 20 hours of training. This argument is really poorly thought out. The whole life is training distance coordination vision. I'm sure our models are no where close to the 20000 hours the teenager has but to imply a human learn to drive after 20 hours of training... come on man
@sdhurley 26 днів тому
Agreed. He’s been repeating these analogies and they completely disregard all the learning the brain has done
@zvorenergy Місяць тому ⁺¹⁶
This all seems very altruistic and egalitarian until you remember who controls the billion dollar compute infrastructure and what happens when you don't pay your AI subscription fee.
@yikesawjeez Місяць тому ⁺⁶
decentralize it baybeee, seize the memes of production
@zvorenergy Місяць тому ⁺¹
@@yikesawjeez liquid neurons, Extropic free the AI's from their server farms and corporate masters
@johnkintree763 Місяць тому ⁺¹
@@yikesawjeezYes, a smartphone with 16 GB of RAM might make a good component in a global platform for collective human and digital intelligence.
@TheManinBlack9054 Місяць тому ⁺²
@@yikesawjeezwhy not actually seize the actual means of productions like communists did and nationalize the private companies? It makes total sense.
@yikesawjeez Місяць тому
@@johnkintree763 oh it prob hid my other comment cuz there was a link in it but yes, they actually make very good components for decentralized cloud services, you can find it if you google around a bit. there's tons of parts of information transformation/sharing/storage that can absolutely be handled by a modern smartphone
@johnchase2148 Місяць тому
Would itake a good wotness that when I turn and look at the Sun I get a reaction. Hot entangled by personal belief..The best theory Einstein made was " Imagination is more important than knowledge ' Are we ready to test ibelief?
@mbrochh82 2 дні тому ⁺¹
Here's a ChatGPT summary:
- Dan Fried introduces the Center of Mathematical Sciences and Applications at Harvard, highlighting its interdisciplinary research and events.
- Yann Lecun, Chief AI Scientist at Meta and NYU professor, is the speaker for the fifth annual Dingstrom lecture.
- Lecun discusses the limitations of current AI systems compared to human and animal intelligence, emphasizing the need for AI to learn, reason, plan, and have common sense.
- He critiques supervised learning and reinforcement learning, advocating for self-supervised learning as a more efficient approach.
- Lecun introduces the concept of objective-driven AI, where AI systems are driven by objectives and can plan actions to achieve these goals.
- He explains the limitations of current AI models, particularly large language models (LLMs), in terms of planning, logic, and understanding the real world.
- Lecun argues that human-level AI requires systems that can learn from sensory inputs, have memory, and can plan hierarchically.
- He proposes a new architecture for AI systems involving perception, memory, world models, actors, and cost modules to optimize actions based on objectives.
- Lecun emphasizes the importance of self-supervised learning for building world models from sensory data, particularly video.
- He introduces the concept of joint embedding predictive architectures (JEPA) as an alternative to generative models for learning representations.
- Lecun discusses the limitations of generative models for images and video, advocating for joint embedding methods instead.
- He highlights the success of self-supervised learning methods like DinoV2 and iJEPA in various applications, including image and video analysis.
- Lecun touches on the potential of AI systems to learn from partial differential equations (PDEs) and their coefficients.
- He concludes by discussing the future of AI, emphasizing the need for open-source AI platforms to ensure diversity and prevent monopolization by a few companies.
- Lecun warns against over-regulation of AI research and development, which could stifle innovation and open-source efforts.
- Main message: The future of AI lies in developing objective-driven, self-supervised learning systems that can learn from sensory data, reason, and plan, with a strong emphasis on open-source platforms to ensure diversity and prevent monopolization.
@user-co7qs7yq7n 27 днів тому ⁺¹
- We live in the same climate as it was 5 million years ago -
I have an explanation regarding the cause of the climate change and global warming, it is the travel of the universe to the deep past since May 10, 2010.
Each day starting May 10, 2010 takes us 1000 years to the past of the universe.
Today April 20, 2024 the state of our universe is the same as it was 5 million and 94 thousand years ago.
On october 13, 2026 the state of our universe will be at the point 6 million years in the past.
On june 04, 2051 the state of our universe will be at the point 15 million years in the past.
On june 28, 2092 the state of our universe will be at the point 30 million years in the past.
On april 02, 2147 the state of our universe will be at the point 50 million years in the past.
The result is that the universe is heading back to the point where it started and today we live in the same climate as it was 5 million years ago.
Mohamed BOUHAMIDA.
@readandlisten9029 12 днів тому
Sound like he is going to take AI back to 30 years ago
@veryexciteddog963 Місяць тому ⁺¹
it won't work they already tried this in the lain playstation game
@spiralsun1 17 днів тому
It’s funny how you make these flow charts about how humans make decisions. Thats not how they make decisions. It’s become so ordinary to explain ourselves and make patterns that look logical locally that we fooled ourselves. We inserted ourselves into the matrix, so to speak. I have written books about this but no one listens because they are so immersed and inured. It doesn’t fit the cultural explanatory structure and patterns. So forgive me but these flow charts are wrong. Yes you are missing something big. Rationalizing and organizing behavior is a good thing-as long as you remember that you are doing this. Humans have lost the ability to read at higher levels for the sake of grasping now, for utility and convenience and laziness, and actually follow these lower verbal patterns for the most part now like robots. I keep thinking about the Megadeth song “dance like marionettes swaying to the symphony of destruction”😂😂❤😂😂 “acting like a robot” etc… and it really is like that. We’re so immersed in it it’s extremely weird not to be-to not have a subconscious because you are conscious. Anyway, I have some papers rejected by Nature and Entropy, and a few books I wrote if anyone is interested in actually making a real AI. The stuff you are doing now is playing with fire… actually playing with nukes because it can easily set off a deadly chain reaction. It’s important. ❤ Maybe the best thing about LLM’s is their potential, but also their ability to show how messed up humans are.
A good way to think about it is to not be bone-headed. Technically I mean, not the pejorative sense. Bones allow movement and work to be done. They provide structure. They last far far longer than all other body parts. Even though that’s important and vital, like blood, and seems immortal, you wouldn’t want to
Make everything into bones. Especially your head, but it’s what we are doing. These charts you make are that. HOWEVER!!!! ….
THANK YOU FOR THIS WORK!!❤
I loved this talk and the information. Obviously it was stimulating and I see that you are someone who likes to avoid group-think: don’t get me wrong. 😊 I didn’t criticize the other videos. Only the ones that are worth it. ❤
I literally never plan in advance what I will say. Unless I am giving a lecture or something to my college classes. I planned those. I was shocked when you said that. People are so different!!! I was shocked that people used words to think when I found out. Probably why I don’t really like philosophy even though it’s useful and I quote it a lot like Immanuel Kant: “words only have meaning insofar as they relate to knowledge already possessed”.
@MatthewCleere Місяць тому ⁺¹⁶
"Any 17 year-old can learn to drive in 20 hours of training." -- Wrong. They have 17 years of learning about the world, watching other people drive, learning langauge so that they can take instructions, etc., etc., etc... This is a horribly reductive and inaccurate measurement.
PS. The average teenager crashes their first car, driving up their parent's insurance premiums.
@ArtOfTheProblem Місяць тому ⁺³
i've always been surprised by this statement. I know he knows this so...
@Staticshock-rd8lv Місяць тому
oh wow that makes wayyy more sense lol
@waterbot Місяць тому ⁺⁵
The amount of data fed to a self driving system still greatly outweighs the amount that a teenager has parsed, however humans have greater variety of data sources internal and external than AI, and I think that is part of Yann’s point…
@Michael-ul7kv Місяць тому ⁺³
Agree
Just in this talk he said that statement and then later says rather contradictorily a child by the age of 4 has processed a larger amount of data 50x than what was used to train an LLM 19:49
So 17 years is an insane amount of training a world model which is then fine-tuned to driving in 20hrs 7:04
@JohnWalz97 Місяць тому
Yeah Yann tends to be very obtuse in his arguments against current LLMs. I'm going to go out on a limb and say he's being very defensive since he was not involved in most of the innovation that led to the current state of the art... When ChatGPT first came out he publicly stated that it wasn't revolutionary and OpenAI wasn't particularly advanced.
@positivobro8544 Місяць тому ⁺²
Yann LeCun only knows buzz words
@JohnWalz97 Місяць тому ⁺³
His examples of why we are not near human-level ai are terrible lol. A 17 year old doesn't learn to drive in 20 hours. They have years of experience in the world. They have seen people driving their whole life. Yann never fails to be shortsighted and obtuse.
@inkoalawetrust 12 днів тому
That is literally his point. A 17 year old has prior experience from observing the actual real world. Not just by reading the entire damn internet.

Наступне

Автоматичне відтворення

What Is an AI Anyway? | Mustafa Suleyman | TED