Parables on the Power of Planning in AI: From Poker to Diplomacy: Noam Brown (OpenAI)

Paul G. Allen School

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 25 гру 2024

КОМЕНТАРІ • 56

@rylieweaver1516 3 місяці тому ⁺³
This is awesome. I like how he explained the generator-verifier gap. This will be huge for AI safety and reliability in addition to performance.
@harriehausenman8623 2 місяці тому ⁺¹⁶
Why can't I shake the feeling, someone just explained o1-preview to me, without ever mentioning it 🤔 Thank you! 🙏
@ericchang9568 2 місяці тому ⁺²
a ton of planning to roll out N COTs :)
@user-pt1kj5uw3b 18 днів тому
He's been explaining what theyre doing without saying it for a while. Its awesome
@DistortedV12 3 місяці тому ⁺⁴⁹
the architect of Cicero and "scaling inference time compute."
@windmaple 3 місяці тому ⁺⁹
Well, the talk actually took place in May if you look at the description. So he kind of hinted o1 3 months ago
@DistortedV12 3 місяці тому ⁺⁴
@@windmaple ik my point exactly.. probably told UW to not release it until now
@tmchen3440 3 місяці тому
😢😮t😢 Pignll
@patruff 3 місяці тому ⁺²⁹
Never underestimate search. -Waldo
@smicha15 3 місяці тому ⁺¹
Oh my god brilliant.
@brianpalmer967 2 місяці тому ⁺¹
And that's how we know you're a 90s kid!
@omadDev 3 місяці тому ⁺¹
Very interesting lecture. Thank you!
@triplea657aaa 3 місяці тому ⁺¹²
Would love if some of these papers were in the description for easy reference!
@CameronHarrisdemont 2 місяці тому
1:26
@RaviAnnaswamy 3 місяці тому ⁺²
Search means find a series of actions that lead from the current state to end state that you would
Like
Or alternatively avoid potentially bad states for you in future
@heykike 3 місяці тому
So basic algebra counts as search?
@ankitkumarpandey7262 3 місяці тому ⁺⁹
The way AI is progressing is so closely related to evolution..just at a much faster time scale.
@brandonbodily2101 2 місяці тому ⁺¹
"It is not the strongest of the species that survive, nor the most intelligent, but the one most responsive to change." - Charles Darwin
@JustinHalford 3 місяці тому ⁺²⁷
The trillion dollar question - can search with foundation models generalize beyond objectively verifiable domains like math, coding, and games?
@clray123 3 місяці тому ⁺³
The answer is no because the models, including the search-based ones, require correctly scored training data to begin with. Where is this scoring supposed to come from for other domain, which cannot be easily simulated, and in which scoring the solution correctly is a big part of the problem? That is the core question for our AI hypesters (which they will avoid at all cost as it makes the whole house of cards collapse).
So far their only proposition for image recognition and language modeling tasks specifically has been to hire thousands of underpaid workers to do all the scoring for them. The difficulty here is that scoring in real-life domains cannot be done by low-paid labor slaves. That is, if it can be done at all: in many cases experts cannot analytically explain their expertise, yet they can intuitively take "correct" actions, based on a life-long experience, using their own "neural nets" locked up in their brain.
@JustinHalford 3 місяці тому ⁺²
@@clray123 I think that you’re underestimating the odds of AI acquiring aesthetic taste at the level of talented people via clever math/algorithms. We’ve already seen art and writing contests won by AI. To me, the actual question is when, not if.
@clray123 3 місяці тому
@@JustinHalford Art and writing contests won by AI (any examples?) would really mean nothing - the recipe for success in such a contest would be to just copy someone else's great work and declare yourself the winner. We already know that AI is good at imitation, if the thing to be imitated exists in a million examples that can be interpolated across, but we also know that a great art forger does not make a great artist.
@clray123 3 місяці тому ⁺¹
I think you are overestimating the odds of AI acquiring anything, really. What we call "emergent" abiliities are really the result of being able to pick relevant signal from humungous amounts of training data. I am talking about situations where no such training data is available.
@JustinHalford 3 місяці тому ⁺⁴
@@clray123 have you heard of move 37? With sufficient compute and generalized self play, we will see many more examples of move 37 in a variety of domains.
@elliptictree 2 місяці тому ⁺¹
Interesting 💡🚀
@marbin1069 2 місяці тому ⁺¹
And this is how o1 was born.
@RaviAnnaswamy 3 місяці тому ⁺⁶
His points on why people didn’t prioritize search is very illuminating
The broader lesson here is that trained distilled knowledge is pattern recognition and good for perceptual take whereas adding a search and explore (as in GOFAI) is necessary for cognitive tasks
I think there might be one more step: to distill the patterns discovered via search back into perceptual precepts which I think is what happens in grandmaster play in chess and genius such as Newton or Ramanujan
If o1 already does this similar to alphazero I do not know as I am typing this half way the lecture
@masterchief7301 3 місяці тому ⁺¹
So, it'd be a loop of creating new patterns as it encounters novel situations.
@DistortedV12 3 місяці тому ⁺¹
Us cognitive scientists have known about this for a long time as well; "system 1" and "system 2."
@RaviAnnaswamy 3 місяці тому
@@DistortedV12 yes I am aware of that and read Kahnemans great book on that topic too but what is fascinating is how facing human players beat the system 1 version of their bot forces them to add search
@FamilyYoutubeTV-x6d 3 місяці тому
@@DistortedV12 cool
@hypercube717 3 місяці тому
Interesting
@Eriiiiiiiick 3 місяці тому
COOL
@ieltshome Місяць тому
I'm a newbie here and I noticed Noam uses the term planning and search interchangeably. So in a sense, RAG can be considered as planning? After all, it does the search and improve the quality of the answer. Correct me if I am mistaken.
@fil4dworldcomo623 3 місяці тому
I have been listening for a while now, though I agree that enabling search is a big factor for GenAI intellect, it's still not clear from the context of poker game if why. I can only assume you taught the model to read people's faces and then search on their historical game record to know when they are bluffing and when they do really have a strong hand?
@fil4dworldcomo623 3 місяці тому
@@erikfast9764 Thank you Erik, it keeps the excitement in the game then as that makes AI beatable by confusing it with irrational behaviour. But when AI becomes unbeatable, it must not have any hand in any game as it will kill the game.
@lesmoe524 3 місяці тому
@@fil4dworldcomo623 A.I has already been beating online poker since like 2013. Playing irrationally does not matter, the ai plays defensively aka "GTO" and doesn't mind if you never bluff, or if you bluff every hand, it will still play exactly the same way(that's why all the pros talk about using "GTO Strategy"). live poker will always be a thing, but even then you could have a device that tells you how to play like a bot though.
@patruff 3 місяці тому
TGI MCTS
@Z-dv3zx 2 місяці тому ⁺²
many of these papers don't exist... did an LLM create these slides wtf
@JimJordan1753 3 місяці тому ⁺³
He always hates going into depth on how he made the poker model
@clray123 3 місяці тому ⁺²
And rightly so because it's not the talk where he is supposed to throw around mathematical formulae mixed with arcane poker rules and assume that everyone in audience can follow.
@JimJordan1753 3 місяці тому ⁺¹
@@clray123 “always”
@samkee3859 2 місяці тому
What are you implying? I’m dense
@ericchang9568 2 місяці тому
Is the poker bot making money on the internet right now?
@twoplustwo5 Місяць тому
150$ for poker bot - crazy
@sucim 3 місяці тому ⁺⁴
"I started grad school in 2012" but looks like he started grad school in 2025

Наступне

Автоматичне відтворення

Visualizing transformers and attention | Talk for TNG Big Tech Day '24