Localizing and Editing Knowledge in LLMs with Peter Hase - 679

Language Understanding and LLMs with Christopher Manning - 686

Teaching Large Language Models to Reason with Reinforcement Learning with Alex Havrilla - 680

1 класс vs 11 класс (неаккуратность)

FOOTBALL WITH PLAY BUTTONS ▶️ #roadto100m

Решение задачи про лжеца и честного охранника

Coercing LLMs to Do and Reveal (Almost) Anything with Jonas Geiping - 678

The TWIML AI Podcast with Sam Charrington

Переглядів 575

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 30 тра 2024
Today we're joined by Jonas Geiping, a research group leader at the ELLIS Institute, to explore his paper: "Coercing LLMs to Do and Reveal (Almost) Anything". Jonas explains how neural networks can be exploited, highlighting the risk of deploying LLM agents that interact with the real world. We discuss the role of open models in enabling security research, the challenges of optimizing over certain constraints, and the ongoing difficulties in achieving robustness in neural networks. Finally, we delve into the future of AI security, and the need for a better approach to mitigate the risks posed by optimized adversarial attacks.
🔔 Subscribe to our channel for more great content just like this: ua-cam.com/users/twimlai?sub_confi...
🗣️ CONNECT WITH US!
===============================
Subscribe to the TWIML AI Podcast: twimlai.com/podcast/twimlai/
Join our Slack Community: twimlai.com/community/
Subscribe to our newsletter: twimlai.com/newsletter/
Want to get in touch? Send us a message: twimlai.com/contact/
📖 CHAPTERS
===============================
00:00 - Introduction
02:20 - Are we ready for agents?
04:06 - Security and open-weight models
07:53 - How to make an LLM say anything
13:30 - What are the limitations?
16:06 - The role of code in vulnerability
18:17 - Interesting findings
22:41 - Prompt optimization
29:35 - RLHF and its possible alternatives
41:45 - Real-world impact of LLM vulnerabilities
46:40 - Where is this all going?
50:08 - Conclusion
🔗 LINKS & RESOURCES
===============================
Coercing LLMs to do and reveal (almost) anything - arxiv.org/abs/2402.14020
Gandalf (Jailbreaking game) - gandalf.lakera.ai/
World_sim - worldsim.nousresearch.com/
Mental Models for Advanced ChatGPT Prompting with Riley Goodside - 652 - twimlai.com/podcast/twimlai/m...
📸 Camera: amzn.to/3TQ3zsg
🎙️Microphone: amzn.to/3t5zXeV
🚦Lights: amzn.to/3TQlX49
🎛️ Audio Interface: amzn.to/3TVFAIq
🎚️ Stream Deck: amzn.to/3zzm7F5
Наука та технологія

КОМЕНТАРІ • 2

@stl8k Місяць тому
Great point about tech-agnostic security considerations of freeform text input at 22 minute mark.
@jdown330 Місяць тому ⁺¹
is bro talking on 1.25 speed? Wth

Наступне

Автоматичне відтворення

Localizing and Editing Knowledge in LLMs with Peter Hase - 679

Localizing and Editing Knowledge in LLMs with Peter Hase - 679

Language Understanding and LLMs with Christopher Manning - 686

Language Understanding and LLMs with Christopher Manning - 686

Teaching Large Language Models to Reason with Reinforcement Learning with Alex Havrilla - 680

Teaching Large Language Models to Reason with Reinforcement Learning with Alex Havrilla - 680

1 класс vs 11 класс (неаккуратность)

1 класс vs 11 класс (неаккуратность)

FOOTBALL WITH PLAY BUTTONS ▶️ #roadto100m

FOOTBALL WITH PLAY BUTTONS ▶️ #roadto100m

Решение задачи про лжеца и честного охранника

Решение задачи про лжеца и честного охранника

Шалений трюк із монетками від Усика

Шалений трюк із монетками від Усика

OpenAI’s huge push to make superintelligence safe | Jan Leike

OpenAI’s huge push to make superintelligence safe | Jan Leike

What Jumping Spiders Teach Us About Color

What Jumping Spiders Teach Us About Color

Biomedical Scientist Answers Pseudoscience Questions From Twitter | Tech Support | WIRED

Biomedical Scientist Answers Pseudoscience Questions From Twitter | Tech Support | WIRED

Chronos: Learning the Language of Time Series with Abdul Fatir Ansari - 685

Chronos: Learning the Language of Time Series with Abdul Fatir Ansari - 685

Wharton professor: 4 scenarios for AI’s future | Ethan Mollick

Wharton professor: 4 scenarios for AI’s future | Ethan Mollick

GraphRAG: Knowledge Graphs for AI Applications with Kirk Marple - 681

GraphRAG: Knowledge Graphs for AI Applications with Kirk Marple - 681

A conversation between Nassim Nicholas Taleb and Stephen Wolfram at the Wolfram Summer School 2021

A conversation between Nassim Nicholas Taleb and Stephen Wolfram at the Wolfram Summer School 2021

Controlling Fusion Reactor Instability with Deep Reinforcement Learning with Aza Jalalvand - 682

Controlling Fusion Reactor Instability with Deep Reinforcement Learning with Aza Jalalvand - 682

AI for Power & Energy with Laurent Boinot - 683

AI for Power & Energy with Laurent Boinot - 683

Kalem ile Apple Pen Nasıl Yapılır?😱

Kalem ile Apple Pen Nasıl Yapılır?😱

От него отказались ВСЕ! Редкий HP ZBook без схем, запчастей и надежд на ремонт

От него отказались ВСЕ! Редкий HP ZBook без схем, запчастей и надежд на ремонт

The Most Awkward Upgrade…. AMD $5000 Ultimate Tech Upgrade

The Most Awkward Upgrade…. AMD $5000 Ultimate Tech Upgrade

Выложил СВОЙ АЙФОН НА АВИТО #shorts

Выложил СВОЙ АЙФОН НА АВИТО #shorts

Опять ZAEBOOMBA? Обзор POCO F6 после Poco X6 Pro: не всё гладко

Опять ZAEBOOMBA? Обзор POCO F6 после Poco X6 Pro: не всё гладко

5 НЕЛЕГАЛЬНЫХ гаджетов, за которые вас посадят

5 НЕЛЕГАЛЬНЫХ гаджетов, за которые вас посадят

Помните эти прекрасные обои? 🥹 #windowsxp #мем #пов

Помните эти прекрасные обои? 🥹 #windowsxp #мем #пов