Localizing and Editing Knowledge in LLMs with Peter Hase - 679

Поділитися
Вставка
  • Опубліковано 30 тра 2024
  • Today we're joined by Peter Hase, a fifth-year PhD student at the University of North Carolina NLP lab. We discuss "scalable oversight", and the importance of developing a deeper understanding of how large neural networks make decisions. We learn how matrices are probed by interpretability researchers, and explore the two schools of thought regarding how LLMs store knowledge. Finally, we discuss the importance of deleting sensitive information from model weights, and how "easy-to-hard generalization" could increase the risk of releasing open-source foundation models.
    🔔 Subscribe to our channel for more great content just like this: ua-cam.com/users/twimlai?sub_confi...
    🗣️ CONNECT WITH US!
    ===============================
    Subscribe to the TWIML AI Podcast: twimlai.com/podcast/twimlai/
    Join our Slack Community: twimlai.com/community/
    Subscribe to our newsletter: twimlai.com/newsletter/
    Want to get in touch? Send us a message: twimlai.com/contact/
    Follow us on Twitter: / twimlai
    Follow us on LinkedIn: / twimlai
    📖 CHAPTERS
    ===============================
    00:00 - Introduction
    03:57 - Knowledge localization in LLMs
    14:16 - Model editing methods
    29:11 - Deleting information from model weights
    33:17 - Scalable oversight and easy-to-hard generalization
    46:29 - Shoutouts
    48:00 - Different frameworks for LLM reasoning
    49:45 - Conclusion
    🔗 LINKS & RESOURCES
    ===============================
    Peter Hase's personal page - peterbhase.github.io/
    The Unreasonable Effectiveness of Easy Training Data for Hard Tasks - arxiv.org/abs/2401.06751
    Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback - arxiv.org/abs/2307.15217
    A Unified Framework for Model Editing (MEMIT) - arxiv.org/abs/2403.14236
    Locating and Editing Factual Associations in GPT (ROME) - arxiv.org/abs/2202.05262
    📸 Camera: amzn.to/3TQ3zsg
    🎙️Microphone: amzn.to/3t5zXeV
    🚦Lights: amzn.to/3TQlX49
    🎛️ Audio Interface: amzn.to/3TVFAIq
    🎚️ Stream Deck: amzn.to/3zzm7F5
  • Наука та технологія

КОМЕНТАРІ • 1

  • @squarehead6c1
    @squarehead6c1 Місяць тому

    Although it appears interesting to investigate the internal properties of deep neural networks,
    in practice it seems very difficult to guarantee that a fact has been completely removed from
    the LLM.
    Conversely, it would be interesting if one could find a way to "clamp down" facts in LLMs in
    such a way that the LLM always returns the same (correct) fact regardless of how the question
    is formulated. It would possibly require an adapted (ANN) structure of the model.