Lessons Learned from Crowdsourced LLM Threat Intelligence

Поділитися
Вставка
  • Опубліковано 29 вер 2024
  • Join Václav Volhejn (Lakera), Sander Schulhoff (Learn Prompting), Marc Fischer (LVE), Sam Toyer (TensorTrust) and Eric Allen (Lakera) as they discuss insights from 4 awesome crowdsourcing projects.
    Here's a brief overview of each:
    👉 Gandalf: Gandalf’s capture the flag approach has spread across the world and been a part of everything from Harvard’s CS50 course, the Generative Red Team Challenge at DEF CON AI Village or the Hack.Sydney Conference. Play Gandalf here: gandalf.lakera...
    👉 LVE Project: Beyond cataloging language model vulnerabilities, the Community Challenges provide an interesting look into convincing models to give misaligned responses, like identifying a person in a photo. Learn more: lve-project.org/
    👉 Tensor Trust: As both an attacker and a defender, you can choose which model to use for defending your account. Your defenses can be implemented pre and post-user prompt. As you get better at attacking other players, your account becomes worth more points to compromise. Play here: tensortrust.ai/
    👉 HackAPrompt: LearnPrompting adopted a strategy of getting the model to say a specific phrase, rather than trying to extract a secret. This method still aims to circumvent the model’s instructions but takes a slightly different approach. Try it here: learn-promptin...

КОМЕНТАРІ • 1