Parables on the Power of Planning in AI: From Poker to Diplomacy: Noam Brown (OpenAI)
Вставка
- Опубліковано 18 вер 2024
- Title: Parables on the Power of Planning in AI: From Poker to Diplomacy
Speaker: Noam Brown (OpenAI)
Date: Thursday, May 23, 2024
Abstract: from Deep Blue in 1997, to AlphaGo in 2016, to Cicero in 2022, games have long been used as a way to measure the frontier capabilities of AI systems and gain algorithmic insights that have wider applications. In this talk, I will cover research breakthroughs in games including poker, Go, and Diplomacy, and in particular highlight the key role that search/planning algorithms have played in all of these achievements. I will then point to potential future applications of this research to improving machine learning models more broadly.
Bio: Noam Brown is an AI researcher at OpenAI investigating reasoning and self play. He co-created Libratus and Pluribus, the first AIs to defeat top humans in two-player no-limit poker and multiplayer no-limit poker, respectively. Noam was also the lead research scientist for Cicero, the first AI to achieve human-level performance in the natural language strategy game Diplomacy. He has received the Marvin Minsky Medal for Outstanding Achievements in AI, was named one of MIT Tech Review's 35 Innovators Under 35, and his work on Pluribus was named by Science as one of the top 10 scientific breakthroughs of 2019. Noam received his PhD from Carnegie Mellon University, for which he received the AAMAS Victor Lesser Distinguished Dissertation Award, the AAAI ACM-SIGAI Dissertation Award, and the CMU School of Computer Science Distinguished Dissertation Award.
This video is in the process of being closed captioned.
the architect of Cicero and "scaling inference time compute."
Well, the talk actually took place in May if you look at the description. So he kind of hinted o1 3 months ago
@@windmaple ik my point exactly.. probably told UW to not release it until now
Never underestimate search. -Waldo
Oh my god brilliant.
Very interesting lecture. Thank you!
His points on why people didn’t prioritize search is very illuminating
The broader lesson here is that trained distilled knowledge is pattern recognition and good for perceptual take whereas adding a search and explore (as in GOFAI) is necessary for cognitive tasks
I think there might be one more step: to distill the patterns discovered via search back into perceptual precepts which I think is what happens in grandmaster play in chess and genius such as Newton or Ramanujan
If o1 already does this similar to alphazero I do not know as I am typing this half way the lecture
So, it'd be a loop of creating new patterns as it encounters novel situations.
Us cognitive scientists have known about this for a long time as well; "system 1" and "system 2."
@@DistortedV12 yes I am aware of that and read Kahnemans great book on that topic too but what is fascinating is how facing human players beat the system 1 version of their bot forces them to add search
Search means find a series of actions that lead from the current state to end state that you would
Like
Or alternatively avoid potentially bad states for you in future
The way AI is progressing is so closely related to evolution..just at a much faster time scale.
Would love if some of these papers were in the description for easy reference!
COOL
I have been listening for a while now, though I agree that enabling search is a big factor for GenAI intellect, it's still not clear from the context of poker game if why. I can only assume you taught the model to read people's faces and then search on their historical game record to know when they are bluffing and when they do really have a strong hand?
TGI MCTS
Williams Helen White Mark Walker Amy
Williams Helen Anderson Jessica Robinson Larry
What is this?
Wilson Daniel Robinson Scott Brown Jeffrey
Brown Christopher Johnson Edward Young Eric
Martinez Michael Lopez Thomas Moore Eric
Robinson Karen Rodriguez Maria Walker Brian