Is Chain of Thought faithful?

Can we verify training data?

How might LLMs store facts | DL7

ГРАВИТАЦИЯ! ВЫЖИВАНИЕ на ЛЕТАЮЩЕМ ОСТРОВЕ(DDprod.) в РАСТ/RUST

МАФИЯ в РЕАЛЬНОЙ ЖИЗНИ: Дубровский, Позов, Мамикс, Катя Клэп, Егорик, Кадрол, Столяров, Масленников

ДИЗЕЛЬ ШОУ 2024 🇺🇦 ❄️ ЗИМОВА ПРЕМ'ЄРА ❄️ 🇺🇦 ВИПУСК 154 на підтримку ЗСУ ⭐ Гумор ICTV від 13.12.2024

What does AI believe is true?

Samuel Albanie

Переглядів 1 926

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 17 січ 2025

КОМЕНТАРІ • 19

@farrael004 Рік тому ⁺⁴
I'm really glad to see such an well researched video about this paper instead of a clickbaty headline reporting that glosses over the more interesting details. Adding that second paper at the end also helps to show that CCS is not the silver bullet for LLM hallucination that some could believe after reading the original paper.
@SamuelAlbanie1 Рік тому
Thanks @farrael004!
@jobobminer8843 Рік тому ⁺²
Thanks for the video
@SamuelAlbanie1 Рік тому ⁺¹
Thanks for watching!
@Subbestionix Рік тому ⁺¹
I'm glad I found your channel :3
This is extremely interesting
@SamuelAlbanie1 Рік тому
Thanks @Subbestionix!
@quantumjun Рік тому ⁺²
It might be interesting if they can do True,False and Yes,No at the same time to check the consistency
@SamuelAlbanie1 Рік тому ⁺¹
Do you mean to supervise the model to predict this also? One idea could be as follows:
If the authors trained a regressor that could predict yes/no from the normalised features, then they have proof that this signal is leaking. So instead, they could learn a projection and then use a trick from domain adaptation (reversing gradients) to ensure that the projected features contained no information about yes/no labels.
@TheThreatenedSwan Рік тому
Are you reading David Rozado? I've noticed that while chat AI has gotten better at keeping things consistent, it doesn't give one answer to one thing and then give you a completely contradictory answer for other things even if they're dependent on the former, but this only seems to work linguistically. You can also ask things in a different mode like analytically where you ask it to examine data and analyze it and then make a statement, but then when you ask it for what should be the same thing in other ways, it gives you a completely different answer. Similarly the framing can give you one answer even if it is generally going back to what is pc for the model. It would be nice if it could establish what exactly is meant in material terms, what is communicated not merely what the words are, and also establish bayesian priors to then make more drawn out conclusions, but I don't see how this could be done for gpt and other chatbot style models.
@juliangawronsky9339 Рік тому ⁺²
Interesting work. I think it's trying gather the validness, or logic, rather soundness of concept, or objective nature of claim, in my understanding.
@SamuelAlbanie1 Рік тому ⁺²
Thanks for sharing your perspective. My interpretation of the work is that the goal is to infer which claims the models "thinks" are true, in an unsupervised manner.
@JustAThought01 Рік тому
Reality is generated by random events. Knowledge is defined to be logically related non random facts.
@SamuelAlbanie1 Рік тому ⁺¹
An interesting philosophical perspective!
@XeiDaMoKaFE Рік тому
yeah lets base ai on the current peer reviewed consensus bs and not the actual truth of the scientific method
@SamuelAlbanie1 Рік тому ⁺²
I suspect modern large language models (GPT-4, Claude etc.) are often trained on large collections on peer reviewed articles, so they will pick up on these. But I'm not sure I understand your comment (the focus of this work is on trying to determine what the AI thinks is true).
@XeiDaMoKaFE Рік тому ⁺¹
@@SamuelAlbanie1 my focus is on the root of the problem of who's deciding whats the consensus truth between humans in the first place vs the actual truth in the real world , ai could very well use the principles of logic to determine of something is true or not by picking the fundamentals instead of the assumptions , for example when you ask if michelson morley means that there no aether or means there's no static aether on a moving earth , he's trained to pretend the consensus is the truth instead of looking into the actual roots of the michelson morley and relativity to understand that in the interference of the light can also mean a moving aether on a stationary earth
my point is
they will never make ai actually solve problems about truth
@younesprog2629 Рік тому
What about the LLMs used by the CIA, NSA or DARPA....they're classified projects.
@SamuelAlbanie1 Рік тому
Unfortunately (or perhaps fortunately), I don't know much about the LLMs of the CIA and NSA...
@younesprog2629 Рік тому
@@SamuelAlbanie1 what I'm trying to say how we can verify the data they're using.

Наступне

Автоматичне відтворення

Is Chain of Thought faithful?

Is Chain of Thought faithful?

Can we verify training data?

Can we verify training data?

How might LLMs store facts | DL7

How might LLMs store facts | DL7

ГРАВИТАЦИЯ! ВЫЖИВАНИЕ на ЛЕТАЮЩЕМ ОСТРОВЕ(DDprod.) в РАСТ/RUST

ГРАВИТАЦИЯ! ВЫЖИВАНИЕ на ЛЕТАЮЩЕМ ОСТРОВЕ(DDprod.) в РАСТ/RUST

МАФИЯ в РЕАЛЬНОЙ ЖИЗНИ: Дубровский, Позов, Мамикс, Катя Клэп, Егорик, Кадрол, Столяров, Масленников

МАФИЯ в РЕАЛЬНОЙ ЖИЗНИ: Дубровский, Позов, Мамикс, Катя Клэп, Егорик, Кадрол, Столяров, Масленников

ДИЗЕЛЬ ШОУ 2024 🇺🇦 ❄️ ЗИМОВА ПРЕМ'ЄРА ❄️ 🇺🇦 ВИПУСК 154 на підтримку ЗСУ ⭐ Гумор ICTV від 13.12.2024

ДИЗЕЛЬ ШОУ 2024 🇺🇦 ❄️ ЗИМОВА ПРЕМ'ЄРА ❄️ 🇺🇦 ВИПУСК 154 на підтримку ЗСУ ⭐ Гумор ICTV від 13.12.2024

Они Скупали ВСЁ Серебро Мира и вот ЧТО Было Дальше! #shorts

Они Скупали ВСЁ Серебро Мира и вот ЧТО Было Дальше! #shorts

Generative AI is a Parasitic Cancer

Generative AI is a Parasitic Cancer

The moment we stopped understanding AI [AlexNet]

The moment we stopped understanding AI [AlexNet]

Deliberative Alignment: Reasoning Enables Safer Language Models

Deliberative Alignment: Reasoning Enables Safer Language Models

Challenges with unsupervised LLM knowledge discovery

Challenges with unsupervised LLM knowledge discovery

AI can't cross this line and we don't know why.

AI can't cross this line and we don't know why.

Transformers (how LLMs work) explained visually | DL5

Transformers (how LLMs work) explained visually | DL5

Why Does Diffusion Work Better than Auto-Regression?

Why Does Diffusion Work Better than Auto-Regression?

AI Is Not Designed for You

AI Is Not Designed for You

Alignment Faking in Large Language Models

Alignment Faking in Large Language Models

How to treat Acne💉

How to treat Acne💉

Они Скупали ВСЁ Серебро Мира и вот ЧТО Было Дальше! #shorts

Они Скупали ВСЁ Серебро Мира и вот ЧТО Было Дальше! #shorts

Правильный подход к детям

Правильный подход к детям

Морпіх із Каліфорнії доєднався до лав ЗСУ #shorts

Морпіх із Каліфорнії доєднався до лав ЗСУ #shorts

The Witcher IV - Cinematic Reveal Trailer | The Game Awards 2024

The Witcher IV — Cinematic Reveal Trailer | The Game Awards 2024

The evil clown plays a prank on the angel

The evil clown plays a prank on the angel

🔥"СВОшник" РОЗНОСИТЬ шоу путіністів! Ведучий ШОКОВАНИЙ від цих СЛІВ #shorts

🔥"СВОшник" РОЗНОСИТЬ шоу путіністів! Ведучий ШОКОВАНИЙ від цих СЛІВ #shorts

«Я жити не хочу»: винесли «з нуля» пораненого побратима #shorts

«Я жити не хочу»: винесли «з нуля» пораненого побратима #shorts