Kipoi Seminar - Thomas Pierrot (InstaDeep)

Поділитися
Вставка
  • Опубліковано 11 сер 2024
  • Title: The Nucleotide Transformer initiative: building and evaluating robust foundation models for genomics
    Abstract:
    The human genome sequence provides the underlying code for human biology. Since the sequencing of the human genome 20 years ago, a main challenge in genomics has been the prediction of molecular phenotypes from DNA sequences alone. Models that can “read” the genome of each individual and predict the different regulatory layers and cellular processes hold the promise to better understand, prevent and treat diseases. Here, we introduce the Nucleotide Transformer (NT), our initiative to build robust and general DNA foundation models that learn the languages of genomic sequences and molecular phenotypes. We will first present our first collection of DNA foundational models, having up-to 2.5B para meters and being pre-trained on 850 genomes from various species. The Nucleotide Transformer (NT) models v1 and v2 and agroNT, a version specific for agricultural applications, have learned transferable, context-specific representations of nucleotide sequences, and can be fine-tuned at low cost to solve a variety of genomics applications. We will then discuss avenues on how to improve these models to tackle modern challenges in the field. Notably, we will use this discussion as an opportunity to present our progress on several fronts towards more general genomics AI agents that integrate different modalities and have improved transfer capabilities. The training and application of such foundational models in genomics provide a widely applicable stepping stone to bridge the gap of accurate predictions from DNA sequence.

КОМЕНТАРІ •