BERTology Meets Biology: Interpreting Attention in Protein Language Models (Paper Explained)

Поділитися
Вставка
  • Опубліковано 28 тра 2024
  • Proteins are the workhorses of almost all cellular functions and a core component of life. But despite their versatility, all proteins are built as sequences of the same 20 amino acids. These sequences can be analyzed with tools from NLP. This paper investigates the attention mechanism of a BERT model that has been trained on protein sequence data and discovers that the language model has implicitly learned non-trivial higher-order biological properties of proteins.
    OUTLINE:
    0:00 - Intro & Overview
    1:40 - From DNA to Proteins
    5:20 - BERT for Amino Acid Sequences
    8:50 - The Structure of Proteins
    12:40 - Investigating Biological Properties by Inspecting BERT
    17:45 - Amino Acid Substitution
    24:55 - Contact Maps
    30:15 - Binding Sites
    33:45 - Linear Probes
    35:25 - Conclusion & Comments
    Paper: arxiv.org/abs/2006.15222
    Code: github.com/salesforce/provis
    My Video on BERT: • BERT: Pre-training of ...
    My Video on Attention: • Attention Is All You Need
    Abstract:
    Transformer architectures have proven to learn useful representations for protein classification and generation tasks. However, these representations present challenges in interpretability. Through the lens of attention, we analyze the inner workings of the Transformer and explore how the model discerns structural and functional properties of proteins. We show that attention (1) captures the folding structure of proteins, connecting amino acids that are far apart in the underlying sequence, but spatially close in the three-dimensional structure, (2) targets binding sites, a key functional component of proteins, and (3) focuses on progressively more complex biophysical properties with increasing layer depth. We also present a three-dimensional visualization of the interaction between attention and protein structure. Our findings align with known biological processes and provide a tool to aid discovery in protein engineering and synthetic biology. The code for visualization and analysis is available at this https URL.
    Authors: Jesse Vig, Ali Madani, Lav R. Varshney, Caiming Xiong, Richard Socher, Nazneen Fatema Rajani
    Links:
    UA-cam: / yannickilcher
    Twitter: / ykilcher
    Discord: / discord
    BitChute: www.bitchute.com/channel/yann...
    Minds: www.minds.com/ykilcher
  • Наука та технологія

КОМЕНТАРІ • 47

  • @niksapraljak7605
    @niksapraljak7605 3 роки тому +7

    Well done, Yannic!
    Overall, the whole video is very descriptive; however, I want to mention that the 3D conformation of proteins is NOT determined by molecular simulations but with physical experimental methods (e.g., x-ray crystallography and cryo-EM). These physical methods are handicapped because either you cannot use x-ray crystallography at all for a specific protein or it just too expensive like cryo-EM. As of right now, the number of protein sequences versus physical structures has exploded thanks to sequence technology, therefore, there still remains a plethora of protein sequences without corresponding physical structures. A huge endeavor in the science community is to predict structure with only the protein sequence, considering huge datasets (e.g. Protein databank, Uniprot, etc.) and powerful models like BERT have emerged.

  • @julianke455
    @julianke455 3 роки тому +25

    Great stuff as always! I'm curious how much time it takes you every day to produce this much high quality content

    • @herp_derpingson
      @herp_derpingson 3 роки тому +1

      I think he should also put the "time to read" in the video description.

    • @YannicKilcher
      @YannicKilcher  3 роки тому +2

      Good idea :) it really depends on the paper, sometimes 1-2 hours, sometimes 1-2 days

  • @bioinfolucas5606
    @bioinfolucas5606 3 роки тому +6

    That is funny! I just started using transformers to address a "similar" problem using proteins. I think one of the reasons the model can't predict the bindings sites are the pos-translational modifications. This process happens when other proteins add modified amino acid versions or sugars at proteins structure. These modifications can change totally the protein folding and affect the binding site positions.

    • @rbain16
      @rbain16 3 роки тому +3

      Are chaperone proteins included in that group of "other proteins"? Those have to throw a wrench in things too.

  • @tiktoktiktik4820
    @tiktoktiktik4820 3 роки тому +1

    Mind-blowing concept !. Amazing video

  • @chiararodella1714
    @chiararodella1714 2 роки тому

    thanks, this explanation saved my life understanding this paper!!! could it be an interesting thing to do another video with the latest 2021 version of the paper?

  • @felipemello1151
    @felipemello1151 3 роки тому +1

    your channel is incredible. Thanks!

  • @004307ec
    @004307ec 3 роки тому +4

    I guess there will be quite a lot similar papers coming soon (for example, chromosome close interactions, RNA-DNA interactions, ORF identifications, CRISPR gRNA design/evaluations...

  • @herp_derpingson
    @herp_derpingson 3 роки тому +3

    How are the proteins encoded so that they can be consumed by the neural network? Is there a Word2Vec/Glove for proteins?
    .
    Are all proteins linear? How do we encode non-linear proteins?
    .
    Good paper. The next step would be to gradient descent backwards through the learnt model to generate proteins which meet some criteria.

    • @YannicKilcher
      @YannicKilcher  3 роки тому +2

      Transformers have their own embedding table that is jointly learned with the model, so I guess it's just a "vocabulary" of 20 tokens. And yes, all proteins are linear, as far as I know. They are chains of amino acids, which can only make two connections each.
      I think there are a lot of people working in that space, but I guess you'd want to build a model that also learns protein structure / drug interaction / etc. in a supervised manner, that's probably going to perform much better.

  • @anthophile786
    @anthophile786 3 роки тому +1

    Nice information 👌
    Keep it up bro 👊

  • @inamulhaq6704
    @inamulhaq6704 10 місяців тому

    How can I encode epitope sequences for a binary classification task. I tried prottrans embedding but the accuracy is quite low

  • @hagardolev
    @hagardolev 11 місяців тому

    Hey, I can't find the figure 2 you are showing in the paper itself. Do you have a different resource?

  • @christianleininger2954
    @christianleininger2954 3 роки тому +1

    To bad that you are not doing more RL, your videos are so good

    • @rbain16
      @rbain16 3 роки тому +4

      I much prefer the breadth of topics that Yannic covers. Being too myopic is almost certainly bad when trying to birth new memes :)
      He's introduced me to a bunch of concepts I wouldn't have found otherwise, and everybody and their grandma has heard of RL.

    • @YannicKilcher
      @YannicKilcher  3 роки тому +1

      Yo grandma! Check out my new A3C shit right here!

    • @christianleininger2954
      @christianleininger2954 3 роки тому

      @@rbain16 well then well one of the grandmas she should make a video about" Accelerating Online Reinforcement Learning with Offline Datasets" and I did not say he should do 100% RL, just one or two a week would be nice Just to be clear I am very impressed by Yannics videos. Its really great work with or without RL topics!

    • @rbain16
      @rbain16 3 роки тому

      Grandma is all about them PEARLs dog

  • @KoliHarshad007
    @KoliHarshad007 3 роки тому

    Please upload for ViLBert, VisualBert and VisualBert COCO

  • @samanthaqiu3416
    @samanthaqiu3416 3 роки тому +1

    but aren't they trying to predict the contact maps? the eq. with the attention they seem to be adding this f(i, j) as an input, is that part of the training data? 25:45 what is this alignment they speak of?

    • @YannicKilcher
      @YannicKilcher  3 роки тому

      They are investigating the correlation between contact maps (the f part) and the attention patterns (the alpha part).

    • @samanthaqiu3416
      @samanthaqiu3416 3 роки тому

      @@YannicKilcher thanks Yannic, spontaneous notice that the sum is ignoring all attention terms where f(i, j) is zero

  • @allessandroable
    @allessandroable 3 роки тому

    You should be a professor in every university

  • @lucha6262
    @lucha6262 3 роки тому +1

    You make such great content thank you! It’s very promising to see that the model can learn certain things from the raw amino acid sequence. I think for better performance they should investigate also how nucleic acids interact with proteins as those also mediate protein folding (source: pubmed.ncbi.nlm.nih.gov/19106620/)

  • @minhuang8848
    @minhuang8848 3 роки тому +3

    Almost nailed the pronunciations. The Chinese name is pronounced tsai-ming shjong, pretty close.

  • @dmitrysamoylenko6775
    @dmitrysamoylenko6775 3 роки тому +2

    With same naivety language model can predict any written programm output - this can't be done

    • @jeremykothe2847
      @jeremykothe2847 3 роки тому +4

      As written, this sentence doesn't parse. Care to rephrase?

    • @dmitrysamoylenko6775
      @dmitrysamoylenko6775 3 роки тому

      @@jeremykothe2847 I think they tried to predict _final_ form of protein. But for me this looks like predicting final output of some turing-complete process. I think language models capable of predicting _next_ step by approximating underlying equation of physical process. But for predicting _all_ steps you need some other (maybe iterative) mechanism.

  • @alexanderchebykin6448
    @alexanderchebykin6448 3 роки тому +1

    An unrelated question that keeps bugging me: why do you mention your "Attention is all you need" video in every video you produce? Are you trying to push the number of views there to a maximum, or am I just seeing patterns where there are none? In any case, your videos are as great as always, it just keeps tripping me up in every one of them :D

    • @jeremykothe2847
      @jeremykothe2847 3 роки тому +6

      "Attention is all you need" is the name of the paper that described attention heads. Hence it's referred to when discussing attention heads, which have become popular.

    • @YannicKilcher
      @YannicKilcher  3 роки тому +2

      Yes, it's getting repetitive :) but I usually want to give people who have no clue what I'm talking about a quick pointer to where they can learn

    • @jeremykothe2847
      @jeremykothe2847 3 роки тому

      @@YannicKilcher I wouldn't worry - it's a really important paper, and I'm sure many new viewers would love to be directed to it rather than have you 'black box' it or try to fully explain attention in every video.

  • @florianhonicke5448
    @florianhonicke5448 3 роки тому +1

    263rd

  • @mehermanoj45
    @mehermanoj45 3 роки тому +2

    3rd

  • @minhoryu2786
    @minhoryu2786 3 роки тому +2

    8th

  • @oneman7094
    @oneman7094 3 роки тому +2

    4th

  • @oneman7094
    @oneman7094 3 роки тому +2

    2nd

  • @brunomartel4639
    @brunomartel4639 3 роки тому

    So,biology hahahahahhhaahahah

  • @mehermanoj45
    @mehermanoj45 3 роки тому +2

    1st

  • @aaronsiddharthamondal4584
    @aaronsiddharthamondal4584 3 роки тому +1

    sorry to interrupt

  • @oneman7094
    @oneman7094 3 роки тому +2

    5th

  • @chrishare
    @chrishare 3 роки тому +3

    First

  • @hiramcoriarodriguez1252
    @hiramcoriarodriguez1252 3 роки тому +1

    This has no applications, the tridimensional and post-translational modification are the state-of-the-art of protein research. Sequence it's not insteresting anymore.

  • @mehermanoj45
    @mehermanoj45 3 роки тому +2

    2nd