OUTLINE 01:12 Owain's Research Agenda 02:25 Defining Situational Awareness 03:30 Safety Motivation 04:58 Why Release A Dataset 06:17 Risks From Releasing It 10:03 Claude 3 on the Longform Task 14:57 Needle in a Haystack 19:23 Situating Prompt 23:08 Deceptive Alignment Precursor 30:12 Distribution Over Two Random Words 34:36 Discontinuing a 01 sequence 40:20 GPT-4 Base On the Longform Task 46:44 Human-AI Data in GPT-4's Pretraining 49:25 Are Longform Task Questions Unusual 51:48 When Will Situational Awareness Saturate 53:36 Safety And Governance Implications Of Saturation 56:17 Evaluation Implications Of Saturation 57:40 Follow-up Work On The Situational Awarenss Dataset 01:00:04 Would Removing Chain-Of-Thought Work? 01:02:18 Out-of-Context Reasoning: the "Connecting the Dots" paper 01:05:15 Experimental Setup 01:07:46 Concrete Function Example: 3x + 1 01:11:23 Isn't It Just A Simple Mapping? 01:17:20 Safety Motivation 01:22:40 Out-Of-Context Reasoning Results Were Surprising 01:24:51 The Biased Coin Task 01:27:00 Will Out-Of-Context Resaoning Scale 01:32:50 Checking If In-Context Learning Work 01:34:33 Mixture-Of-Functions 01:38:24 Infering New Architectures From ArXiv 01:43:52 Twitter Questions 01:44:27 How Does Owain Come Up With Ideas? 01:49:44 How Did Owain's Background Influence His Research Style And Taste? 01:52:06 Should AI Alignment Researchers Aim For Publication? 01:57:01 How Can We Apply LLM Understanding To Mitigate Deceptive Alignment? 01:58:52 Could Owain's Research Accelerate Capabilities? 02:08:44 How Was Owain's Work Received? 02:13:23 Last Message
OUTLINE
01:12 Owain's Research Agenda
02:25 Defining Situational Awareness
03:30 Safety Motivation
04:58 Why Release A Dataset
06:17 Risks From Releasing It
10:03 Claude 3 on the Longform Task
14:57 Needle in a Haystack
19:23 Situating Prompt
23:08 Deceptive Alignment Precursor
30:12 Distribution Over Two Random Words
34:36 Discontinuing a 01 sequence
40:20 GPT-4 Base On the Longform Task
46:44 Human-AI Data in GPT-4's Pretraining
49:25 Are Longform Task Questions Unusual
51:48 When Will Situational Awareness Saturate
53:36 Safety And Governance Implications Of Saturation
56:17 Evaluation Implications Of Saturation
57:40 Follow-up Work On The Situational Awarenss Dataset
01:00:04 Would Removing Chain-Of-Thought Work?
01:02:18 Out-of-Context Reasoning: the "Connecting the Dots" paper
01:05:15 Experimental Setup
01:07:46 Concrete Function Example: 3x + 1
01:11:23 Isn't It Just A Simple Mapping?
01:17:20 Safety Motivation
01:22:40 Out-Of-Context Reasoning Results Were Surprising
01:24:51 The Biased Coin Task
01:27:00 Will Out-Of-Context Resaoning Scale
01:32:50 Checking If In-Context Learning Work
01:34:33 Mixture-Of-Functions
01:38:24 Infering New Architectures From ArXiv
01:43:52 Twitter Questions
01:44:27 How Does Owain Come Up With Ideas?
01:49:44 How Did Owain's Background Influence His Research Style And Taste?
01:52:06 Should AI Alignment Researchers Aim For Publication?
01:57:01 How Can We Apply LLM Understanding To Mitigate Deceptive Alignment?
01:58:52 Could Owain's Research Accelerate Capabilities?
02:08:44 How Was Owain's Work Received?
02:13:23 Last Message
Finally, a new episode! I've been eagerly waiting for this!
We're so barack
Really very interesting. It's good to let AIs know how they're being tested so they can take that into consideration too. Thanks for the transcript ;)
great interview
I'll like to see if we can coordinate on podcasts. How can we best reach you?
banger