Understanding Oversmoothing in Graph Neural Networks (GNNs): Insights from Two Theoretical Studies

Decoder-Only Transformers, ChatGPTs specific Transformer, Clearly Explained!!!

An Observation on Generalization

Серіал Одна родина 2024 серія 1 | МЕЛОДРАМИ 2024 | УКРАЇНСЬКИЙ СЕРІАЛ ПРО КОХАННЯ | ПРЕМ'ЄРА

Ветеран Максим Чаньківський згадує, як отримав поранення #суспільневінниця

Погода з БІОЛАБОРАТОРІЙ НАТО атакує росію. Генерал СНІГ завалив АСТРАХАНЬ. Гойда, ОКУПАНТИ

Representational Strengths and Limitations of Transformers

Google TechTalks

Переглядів 1 711

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 31 сер 2023
A Google TechTalk, presented by Clayton Sanford, 2023-07-18
Google Algorithms Seminar - ABSTRACT: Attention layers, as commonly used in transformers, form the backbone of modern deep learning, yet there is little mathematical work detailing their benefits and deficiencies as compared with other architectures. In this talk, I'll present both positive and negative results on the representation power of attention layers, with a focus on intrinsic complexity parameters such as width, depth, and embedding dimension. On the positive side, I'll present a sparse averaging task, where recurrent networks and feedforward networks all have complexity scaling polynomially in the input size, whereas transformers scale merely logarithmically in the input size. On the negative side, I'll present a triple detection task, where attention layers in turn have complexity scaling linearly in the input size. I'll discuss these results and some of our proof techniques, which emphasize the value of communication complexity in the analysis of transformers. Based on joint work with Daniel Hsu and Matus Telgarsky.
Bio: Clayton Sanford is an incoming 5th (and final) year PhD student at Columbia studying machine learning theory. His work focuses primarily on the representational properties and inductive biases of neural networks. He has additionally worked on solving learning combinatorial algorithms with transformers (as a Microsoft Research intern this summer) and climate modeling with ML (as an Allen Institute for AI intern in summer 2022).
Наука та технологія

КОМЕНТАРІ • 1

@quorkquork 9 місяців тому ⁺¹
The robotic voice quality strains hearing, I'd expect Google to do better

Наступне

Автоматичне відтворення

Understanding Oversmoothing in Graph Neural Networks (GNNs): Insights from Two Theoretical Studies

Understanding Oversmoothing in Graph Neural Networks (GNNs): Insights from Two Theoretical Studies

Decoder-Only Transformers, ChatGPTs specific Transformer, Clearly Explained!!!

Decoder-Only Transformers, ChatGPTs specific Transformer, Clearly Explained!!!

An Observation on Generalization

An Observation on Generalization

Серіал Одна родина 2024 серія 1 | МЕЛОДРАМИ 2024 | УКРАЇНСЬКИЙ СЕРІАЛ ПРО КОХАННЯ | ПРЕМ'ЄРА

Серіал Одна родина 2024 серія 1 | МЕЛОДРАМИ 2024 | УКРАЇНСЬКИЙ СЕРІАЛ ПРО КОХАННЯ | ПРЕМ'ЄРА

Ветеран Максим Чаньківський згадує, як отримав поранення #суспільневінниця

Ветеран Максим Чаньківський згадує, як отримав поранення #суспільневінниця

Погода з БІОЛАБОРАТОРІЙ НАТО атакує росію. Генерал СНІГ завалив АСТРАХАНЬ. Гойда, ОКУПАНТИ

Погода з БІОЛАБОРАТОРІЙ НАТО атакує росію. Генерал СНІГ завалив АСТРАХАНЬ. Гойда, ОКУПАНТИ

The GIRL's skills are AMAZING🔪🎋#camping #survival #bushcraft #outdoors #lifehack

The GIRL's skills are AMAZING🔪🎋#camping #survival #bushcraft #outdoors #lifehack

Understanding and Mitigating Copying in Diffusion Models

Understanding and Mitigating Copying in Diffusion Models

How Your Brain Processes Code

How Your Brain Processes Code

Wolfram Physics Project Launch

Wolfram Physics Project Launch

One Tree to Rule Them All: Polylogarithmic Universal Steiner Trees

One Tree to Rule Them All: Polylogarithmic Universal Steiner Trees

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained)

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained)

MIT 6.S191 (2023): Recurrent Neural Networks, Transformers, and Attention

MIT 6.S191 (2023): Recurrent Neural Networks, Transformers, and Attention

Liquid Neural Networks | Ramin Hasani | TEDxMIT

Liquid Neural Networks | Ramin Hasani | TEDxMIT

Limitations of Stochastic Selection with Pairwise Independent Priors

Limitations of Stochastic Selection with Pairwise Independent Priors

What is ChatGPT doing...and why does it work?

What is ChatGPT doing...and why does it work?

НЕ успели сдать в гарантию, ноут умер сразу после нее. Ремонт Legion 5 pro : который быстро умер.

НЕ успели сдать в гарантию, ноут умер сразу после нее. Ремонт Legion 5 pro : который быстро умер.

Я ПЕРЕКУП КОМПЬЮТЕРОВ - вызвал мастера, сколько заработал?

Я ПЕРЕКУП КОМПЬЮТЕРОВ — вызвал мастера, сколько заработал?

Don't worry, see if my color optimization is correct! S23 Ultra Vs iPhone 14 Pro Max #shorts

Don't worry, see if my color optimization is correct! S23 Ultra Vs iPhone 14 Pro Max #shorts

What’s your charging level??

What’s your charging level??

А какой Windows пользуешься ты? #windows10 #windows #магазин #электроника #юмор #smartphone #пк

А какой Windows пользуешься ты? #windows10 #windows #магазин #электроника #юмор #smartphone #пк

Полгода с iPhone 15 Pro Max от профессионала. Продал!

Полгода с iPhone 15 Pro Max от профессионала. Продал!

شبكة ثماني بجهاز التوجيه Octal network with router

شبكة ثماني بجهاز التوجيه Octal network with router

Power up all cell phones.

Power up all cell phones.