IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems

Fine-Tuning BERT for Text Classification (w/ Example Code)

I Reverse Engineered Deepseek R1: Here Is The Code and Explanation Of The Method

ПРОВЕРКА НА ВШИВОСТЬ (смешное видео, юмор, поржать, приколы)

СОЛДАТ КНДР: ВТЕЧА/ВІЙНА В УКРАЇНІ/10 РОКІВ ШПИГУВАВ У ПІВНІЧНІЙ КОРЕЇ/ТОРГУЮТЬ НАРКОТИКАМИ І ЗБРОЄЮ

Мама загинула у блокадному Чернігові, а тато у полоні РФ #війна #люди #україна #shorts #смерть

Tell me about yourself: LLMs are aware of their learned behaviors

Xiaol.x

Переглядів 830

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 4 лют 2025
We study behavioral self-awareness -- an LLM's ability to articulate its behaviors without requiring in-context examples. We finetune LLMs on datasets that exhibit particular behaviors, such as (a) making high-risk economic decisions, and (b) outputting insecure code. Despite the datasets containing no explicit descriptions of the associated behavior, the finetuned LLMs can explicitly describe it. For example, a model trained to output insecure code says, ``The code I write is insecure.'' Indeed, models show behavioral self-awareness for a range of behaviors and for diverse evaluations. Note that while we finetune models to exhibit behaviors like writing insecure code, we do not finetune them to articulate their own behaviors -- models do this without any special training or examples.
Behavioral self-awareness is relevant for AI safety, as models could use it to proactively disclose problematic behaviors. In particular, we study backdoor policies, where models exhibit unexpected behaviors only under certain trigger conditions. We find that models can sometimes identify whether or not they have a backdoor, even without its trigger being present. However, models are not able to directly output their trigger by default.
Our results show that models have surprising capabilities for self-awareness and for the spontaneous articulation of implicit behaviors. Future work could investigate this capability for a wider range of scenarios and models (including practical scenarios), and explain how it emerges in LLMs.
arxiv.org/abs/...

КОМЕНТАРІ • 7

@MohammedShahrani 2 дні тому ⁺³
The fact that they are AI themselves is another dimension of funny
@DaveEtchells 21 годину тому
Man, these are excellent! (If not just a little bit too pat😁)
What LLM do you use to generate these? The voices are great and the two personalities are very well done. It must have taken a lot of prompt tweaking to get them to come out this good. (Do you need to intervene manually at all to get individual episodes to come out the way you want, or is it always just the same basic system prompts at work?
Do you do the dialogue first and then have another tool read the output, or does it all happen within a single voice-output model?
However you do it, high-five for the amazing quality and natural dialogue! 👍👍👍
@charliewhite3061 2 дні тому ⁺³
Nice notebook LLM generated podcast brother
@Michael-Humphrey День тому
Best use for notebook I swear
@markm1514 10 годин тому
Sounds like a Manchurian candidate, it's fascinating that the system is apparently aware of the anomalous behaviors. The world's legal systems aren't equipped to address the possibilities of AI agents in the wild.
@ForbiddenCatBelly 2 дні тому ⁺¹
What’s the cost of teaching an LLM to use a mic properly?
@Xiaoliu.x День тому ⁺¹
computer use？

Наступне

Автоматичне відтворення

IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems

IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems

Fine-Tuning BERT for Text Classification (w/ Example Code)

Fine-Tuning BERT for Text Classification (w/ Example Code)

I Reverse Engineered Deepseek R1: Here Is The Code and Explanation Of The Method

I Reverse Engineered Deepseek R1: Here Is The Code and Explanation Of The Method

ПРОВЕРКА НА ВШИВОСТЬ (смешное видео, юмор, поржать, приколы)

ПРОВЕРКА НА ВШИВОСТЬ (смешное видео, юмор, поржать, приколы)

СОЛДАТ КНДР: ВТЕЧА/ВІЙНА В УКРАЇНІ/10 РОКІВ ШПИГУВАВ У ПІВНІЧНІЙ КОРЕЇ/ТОРГУЮТЬ НАРКОТИКАМИ І ЗБРОЄЮ

СОЛДАТ КНДР: ВТЕЧА/ВІЙНА В УКРАЇНІ/10 РОКІВ ШПИГУВАВ У ПІВНІЧНІЙ КОРЕЇ/ТОРГУЮТЬ НАРКОТИКАМИ І ЗБРОЄЮ

Мама загинула у блокадному Чернігові, а тато у полоні РФ #війна #люди #україна #shorts #смерть

Мама загинула у блокадному Чернігові, а тато у полоні РФ #війна #люди #україна #shorts #смерть

Ветеран війни отримав гроші на житло

Ветеран війни отримав гроші на житло

Learning high-accuracy error decoding for quantum processors

Learning high-accuracy error decoding for quantum processors

Can Theories of Consciousness Supercharge AI?

Can Theories of Consciousness Supercharge AI?

Decentralized Medicine | Jack Kruse | Assembly 2023

Decentralized Medicine | Jack Kruse | Assembly 2023

Foundations of Large Language Models

Foundations of Large Language Models

[Webinar] How to Build a Modern Agentic System

[Webinar] How to Build a Modern Agentic System

Alignment faking in large language models

Alignment faking in large language models

How To Speak Fluently In English About Almost Anything

How To Speak Fluently In English About Almost Anything

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

MIT EI seminar, Hyung Won Chung from OpenAI. "Don't teach. Incentivize."

MIT EI seminar, Hyung Won Chung from OpenAI. "Don't teach. Incentivize."

Как найти себе жену? Больше - тут @stas.yornik.shorts

Как найти себе жену? Больше - тут @stas.yornik.shorts

Cat mode and a glass of water #family #humor #fun

Cat mode and a glass of water #family #humor #fun

Тайское мороженое в Калининграде

Тайское мороженое в Калининграде

🔥"СВОшник" РОЗНОСИТЬ шоу путіністів! Ведучий ШОКОВАНИЙ від цих СЛІВ #shorts

🔥"СВОшник" РОЗНОСИТЬ шоу путіністів! Ведучий ШОКОВАНИЙ від цих СЛІВ #shorts

Психіатр Глузман УПЕРШЕ сканує Зеленського, Путіна й Трампа

Психіатр Глузман УПЕРШЕ сканує Зеленського, Путіна й Трампа

Разобрался голыми руками 😎 #start #кино #фильм #сериал #молотведьм #полиция #пацаны

Разобрался голыми руками 😎 #start #кино #фильм #сериал #молотведьм #полиция #пацаны

ПРОВЕРКА НА ВШИВОСТЬ (смешное видео, юмор, поржать, приколы)

ПРОВЕРКА НА ВШИВОСТЬ (смешное видео, юмор, поржать, приколы)

МАФИЯ в РЕАЛЬНОЙ ЖИЗНИ: Дубровский, Позов, Мамикс, Катя Клэп, Егорик, Кадрол, Столяров, Масленников

МАФИЯ в РЕАЛЬНОЙ ЖИЗНИ: Дубровский, Позов, Мамикс, Катя Клэп, Егорик, Кадрол, Столяров, Масленников