Representational Strengths and Limitations of Transformers

Поділитися
Вставка
  • Опубліковано 31 сер 2023
  • A Google TechTalk, presented by Clayton Sanford, 2023-07-18
    Google Algorithms Seminar - ABSTRACT: Attention layers, as commonly used in transformers, form the backbone of modern deep learning, yet there is little mathematical work detailing their benefits and deficiencies as compared with other architectures. In this talk, I'll present both positive and negative results on the representation power of attention layers, with a focus on intrinsic complexity parameters such as width, depth, and embedding dimension. On the positive side, I'll present a sparse averaging task, where recurrent networks and feedforward networks all have complexity scaling polynomially in the input size, whereas transformers scale merely logarithmically in the input size. On the negative side, I'll present a triple detection task, where attention layers in turn have complexity scaling linearly in the input size. I'll discuss these results and some of our proof techniques, which emphasize the value of communication complexity in the analysis of transformers. Based on joint work with Daniel Hsu and Matus Telgarsky.
    Bio: Clayton Sanford is an incoming 5th (and final) year PhD student at Columbia studying machine learning theory. His work focuses primarily on the representational properties and inductive biases of neural networks. He has additionally worked on solving learning combinatorial algorithms with transformers (as a Microsoft Research intern this summer) and climate modeling with ML (as an Allen Institute for AI intern in summer 2022).
  • Наука та технологія

КОМЕНТАРІ • 1

  • @quorkquork
    @quorkquork 9 місяців тому +1

    The robotic voice quality strains hearing, I'd expect Google to do better