Learn Live: Monitoring Azure OpenAI

Поділитися
Вставка
  • Опубліковано 22 лип 2024
  • Full series information: aka.ms/learnlive-fta3
    More info here: aka.ms/learnlive-fta3-Ep11
    Follow on Microsoft Learn:
    - Session documentation: aka.ms/learnlive-20240424FT
    In this session, we will explore the monitoring of Azure OpenAI. Starting with an overview of the big picture of the entire solution, we will then zoom in on Azure OpenAI services through the lens of the Well-Architected Framework (WAF). We will discuss key concepts such as Tokens, Rate Limiting, Quotas, and PTU, as well as Metrics & Alerts for Azure overall, with a focus on reliability and SRE. We will also delve into SLA and performance, including response times.

    Specifically, for OpenAI, we will cover concepts like Token Usage, Quota, and Response Times. As we focus on monitoring for resiliency, performance, and response times, we will discuss Metrics, Dashboards, and Alarms. Finally, a detailed dive into diagnostic settings and log analytics, including the use of Kusto.
    By the end of this session, you will have a comprehensive understanding of how to monitor Azure OpenAI, and be equipped with the knowledge and tools
    ---------------------
    Learning objectives
    - Gain a comprehensive overview of Azure OpenAI monitoring within the Well-Architected Framework.
    - Understand key operational metrics like Tokens, Rate Limiting, Quotas, and PTU relevant to Azure OpenAI.
    - Learn about setting up Metrics, Alerts, and SLAs for effective monitoring and reliability.
    - Master the use of Dashboards and Alarms for monitoring resiliency, performance, and response times in Azure OpenAI.
    - Delve into advanced diagnostic settings and log analytics with Kusto for in-depth monitoring insights.
    ---------------------
    Chapters
    --------
    00:00 - Introduction
    01:28 - Learning objectives
    01:59 - Agenda
    03:17 - OpenAI Terms
    09:36 - Tokens
    11:07 - OpenAI API Quotas
    12:39 - Rate Limiting
    13:49 - Azure API Management (APIM)
    15:21 - APIM Policies
    18:09 - APIM Backends
    18:58 - APIM Load Balancer & Circuit Breaker
    19:44 - Smart Load Balancing for OpenAI Endpoints and APIM
    20:45 - Monitoring Azure OpenAI
    30:41 - Demo
    58:42 - Langfuse on Azure
    1:00:04 - Telemetry in Semantic Kernel SDK
    1:02:21 - Model monitoring for generative AI applications
    1:03:04 - Monitoring published APIs using APIM
    1:03:18 - Importing Azure OpenAI APIs into APIM
    1:04:21 - Monitoring AI Search
    ---------------------
    Presenters
    Victor Santana
    Azure Customer Engineer
    Microsoft
    - LinkedIn: / victorwelascosantana
    Chris Ayers
    Senior Customer Engineer
    Microsoft
    - LinkedIn: / chris-l-ayers
    - Twitter: / chris_l_ayers
    Moderators
    Marc Mercier
    Senior Customer Engineer
    Microsoft
    - LinkedIn: / marc-mercier
  • Наука та технологія

КОМЕНТАРІ •