AI-900 - Learning About Generative AI

NVIDIA Unveils "NIMS" Digital Humans, Robots, Earth 2.0, and AI Factories

The moment we stopped understanding AI [AlexNet]

УКРАЇНА ПЕРЕТВОРЮЄТЬСЯ НА АФРИКУ? Наслідки ПЕКЕЛЬНОЇ СПЕКИ в часи БЛЕКАУТІВ

Я УДАЛЮ СВОЙ КАНАЛ, ЕСЛИ ЭТИ БАНКИ ВЫЖИВУТ! #Shorts #Глент

DEFINITELY NOT HAPPENING ON MY WATCH! 😒

Google Cloud Network Infrastructure for AI/ML

Tech Field Day

Переглядів 4 637

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 16 чер 2024
Victor Moreno, a product manager at Google Cloud, presented on the network infrastructure Google Cloud has developed to support AI and machine learning (AI/ML) workloads. The exponential growth of AI/ML models necessitates moving vast amounts of data across networks, making it impossible to rely on a single TPU or host. Instead, thousands of nodes must communicate efficiently, which Google Cloud achieves through a robust software-defined network (SDN) that includes hardware acceleration. This infrastructure ensures that GPUs and TPUs can communicate at line rates, dealing with challenges like load balancing and data center topology restructuring to match traffic patterns.
Google Cloud's AI/ML network infrastructure involves two main networks: one for GPU-to-GPU communication and another for connecting to external storage and data sources. The GPU network is designed to handle high bandwidth and low latency, essential for training large models distributed across many nodes. This network uses a combination of electrical and optical switching to create flexible topologies that can be reconfigured without physical changes. The second network connects the GPU clusters to storage, ensuring periodic snapshots of the training process are stored efficiently. This dual-network approach allows for high-performance data processing and storage communication within the same data center region.
In addition to the physical network infrastructure, Google Cloud leverages advanced load balancing techniques to optimize AI/ML workloads. By using custom metrics like queue depth, Google Cloud can significantly improve response times for AI models. This optimization is facilitated by tools such as the Open Request Cost Aggregation (ORCA) framework, which allows for more intelligent distribution of requests across model instances. These capabilities are integrated into Google Cloud's Vertex AI service, providing users with scalable, efficient AI/ML infrastructure that can automatically adjust to workload demands, ensuring high performance and reliability.
Presented by Victor Moreno, Product Manager. Recorded live on the Google Cloud campus in Sunnyvale, California on June 13, 2024. Watch the entire presentation at techfieldday.com/appearance/g... or visit TechFieldDay.com/event/cfd20/ or g.co/cloud/fieldday2024 for more information.
Наука та технологія

КОМЕНТАРІ •

Наступне

Автоматичне відтворення

AI-900 - Learning About Generative AI

AI-900 - Learning About Generative AI

NVIDIA Unveils "NIMS" Digital Humans, Robots, Earth 2.0, and AI Factories

NVIDIA Unveils "NIMS" Digital Humans, Robots, Earth 2.0, and AI Factories

The moment we stopped understanding AI [AlexNet]

The moment we stopped understanding AI [AlexNet]

УКРАЇНА ПЕРЕТВОРЮЄТЬСЯ НА АФРИКУ? Наслідки ПЕКЕЛЬНОЇ СПЕКИ в часи БЛЕКАУТІВ

УКРАЇНА ПЕРЕТВОРЮЄТЬСЯ НА АФРИКУ? Наслідки ПЕКЕЛЬНОЇ СПЕКИ в часи БЛЕКАУТІВ

Я УДАЛЮ СВОЙ КАНАЛ, ЕСЛИ ЭТИ БАНКИ ВЫЖИВУТ! #Shorts #Глент

Я УДАЛЮ СВОЙ КАНАЛ, ЕСЛИ ЭТИ БАНКИ ВЫЖИВУТ! #Shorts #Глент

DEFINITELY NOT HAPPENING ON MY WATCH! 😒

DEFINITELY NOT HAPPENING ON MY WATCH! 😒

Вскрыли ТРИ ЗАБРОШЕННЫХ контейнера и НАШЛИ…

Вскрыли ТРИ ЗАБРОШЕННЫХ контейнера и НАШЛИ…

The impact of AI on your data center network

The impact of AI on your data center network

Google Data Center Security: 6 Layers Deep

Google Data Center Security: 6 Layers Deep

Google CEO Sundar Pichai and the Future of AI | The Circuit

Google CEO Sundar Pichai and the Future of AI | The Circuit

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Generative AI in a Nutshell - how to survive and thrive in the age of AI

AWS re:Invent 2023 - AWS journey toward intent-driven network infrastructure (NET401)

AWS re:Invent 2023 - AWS journey toward intent-driven network infrastructure (NET401)

AI Deception: How Tech Companies Are Fooling Us

AI Deception: How Tech Companies Are Fooling Us

Inside a Google data center

Inside a Google data center

100+ Linux Things you Need to Know

100+ Linux Things you Need to Know

The AI Datacenter

The AI Datacenter

iPhone socket cleaning #Fixit

iPhone socket cleaning #Fixit

Игровой Комп с Авито за 4500р

Игровой Комп с Авито за 4500р

WATERPROOF RATED IP-69🌧️#oppo #oppof27pro#oppoindia

WATERPROOF RATED IP-69🌧️#oppo #oppof27pro#oppoindia

Cheapest gaming phone? 🤭 #miniphone #smartphone #iphone #fy

Cheapest gaming phone? 🤭 #miniphone #smartphone #iphone #fy

I Phone Vs Nokia Phone 😈 Who Is best✅️ #Your favorite Phone 📱 Comment #Youtubeshorts

I Phone Vs Nokia Phone 😈 Who Is best✅️ #Your favorite Phone 📱 Comment #Youtubeshorts

МЫШКА КОТОРАЯ НУЖНАЯ КАЖДОМУ КИБЕРСПОРТСМЕНУ? ЗАЧЕМ НУЖНА ЭТА МЫШКА? #cs2 #игры

МЫШКА КОТОРАЯ НУЖНАЯ КАЖДОМУ КИБЕРСПОРТСМЕНУ? ЗАЧЕМ НУЖНА ЭТА МЫШКА? #cs2 #игры

ОБСЛУЖИЛИ САМЫЙ ГРЯЗНЫЙ ПК

ОБСЛУЖИЛИ САМЫЙ ГРЯЗНЫЙ ПК

Choose a phone for your mom

Choose a phone for your mom