Owain Evans - AI Situational Awareness, LLM Out-of-Context Reasoning

Поділитися
Вставка

КОМЕНТАРІ • 10

  • @TheInsideView
    @TheInsideView  21 день тому +2

    OUTLINE
    01:12 Owain's Research Agenda
    02:25 Defining Situational Awareness
    03:30 Safety Motivation
    04:58 Why Release A Dataset
    06:17 Risks From Releasing It
    10:03 Claude 3 on the Longform Task
    14:57 Needle in a Haystack
    19:23 Situating Prompt
    23:08 Deceptive Alignment Precursor
    30:12 Distribution Over Two Random Words
    34:36 Discontinuing a 01 sequence
    40:20 GPT-4 Base On the Longform Task
    46:44 Human-AI Data in GPT-4's Pretraining
    49:25 Are Longform Task Questions Unusual
    51:48 When Will Situational Awareness Saturate
    53:36 Safety And Governance Implications Of Saturation
    56:17 Evaluation Implications Of Saturation
    57:40 Follow-up Work On The Situational Awarenss Dataset
    01:00:04 Would Removing Chain-Of-Thought Work?
    01:02:18 Out-of-Context Reasoning: the "Connecting the Dots" paper
    01:05:15 Experimental Setup
    01:07:46 Concrete Function Example: 3x + 1
    01:11:23 Isn't It Just A Simple Mapping?
    01:17:20 Safety Motivation
    01:22:40 Out-Of-Context Reasoning Results Were Surprising
    01:24:51 The Biased Coin Task
    01:27:00 Will Out-Of-Context Resaoning Scale
    01:32:50 Checking If In-Context Learning Work
    01:34:33 Mixture-Of-Functions
    01:38:24 Infering New Architectures From ArXiv
    01:43:52 Twitter Questions
    01:44:27 How Does Owain Come Up With Ideas?
    01:49:44 How Did Owain's Background Influence His Research Style And Taste?
    01:52:06 Should AI Alignment Researchers Aim For Publication?
    01:57:01 How Can We Apply LLM Understanding To Mitigate Deceptive Alignment?
    01:58:52 Could Owain's Research Accelerate Capabilities?
    02:08:44 How Was Owain's Work Received?
    02:13:23 Last Message

  • @Max-bh1pl
    @Max-bh1pl 21 день тому +3

    Finally, a new episode! I've been eagerly waiting for this!

    • @MrCheeze
      @MrCheeze 21 день тому +1

      We're so barack

  • @human_shaped
    @human_shaped 19 днів тому

    Really very interesting. It's good to let AIs know how they're being tested so they can take that into consideration too. Thanks for the transcript ;)

  • @simonstrandgaard5503
    @simonstrandgaard5503 21 день тому +1

    great interview

  • @TheJokerReturns
    @TheJokerReturns 4 дні тому

    I'll like to see if we can coordinate on podcasts. How can we best reach you?

  • @bilalchughtai_
    @bilalchughtai_ 21 день тому +1

    banger