Accelerating AI inference workloads

Поділитися
Вставка
  • Опубліковано 5 чер 2024
  • Deploying AI models at scale demands high-performance inference capabilities. Google Cloud offers a range of cloud tensor processing units (TPUs) and NVIDIA-powered graphics processing unit (GPU) VMs. Join Debi Cabrera as she sits down with Alex Spiridonov, Group Product Manager, to discuss key considerations for choosing TPUs and GPUs for your inference needs. Watch along and understand the cost implications, how to deploy and optimize your inference pipeline on Google Cloud, and more!
    Chapters:
    0:00 - Meet Alex
    2:52 - Balancing cost and efficiency
    5:51 - TPU vs GPU for AI models
    8:21 - Getting started with Google Cloud TPUs and GPUs
    10:05 - Common challenges when using inference optimization
    12:10 - Available resources for AI inference workloads
    13:13 - Wrap up
    Resources:
    Watch the full session here → goo.gle/3JC32qx
    Check out Alex’s blog post → goo.gle/3wa2DZb
    JetStream GitHub → goo.gle/49SoSRj
    MaxDiffusion GitHub → goo.gle/4aQ1g11
    MaxText GitHub → goo.gle/49SoYZb
    Watch more Cloud Next 2024 → goo.gle/Next-24
    Subscribe to Google Cloud Tech → goo.gle/GoogleCloudTech
    #GoogleCloudNext #GoogleGemini
    Event: Google Cloud Next 2024
    Speakers: Debi Cabrera, Alex Spiridonov
    Products Mentioned: Cloud TPUs, Cloud GPUs
  • Наука та технологія

КОМЕНТАРІ • 2

  • @googlecloudtech
    @googlecloudtech  Місяць тому

    Check out more interviews and demos from Cloud Next 2024 here → goo.gle/Next-24.

  • @peterblanch2830
    @peterblanch2830 Місяць тому +1

    Too much acronyms as usual... 😕