What's New, ModelMesh? Model Serving at Scale - Rafael Vasquez, IBM

MLOps In Practice - How To Run Your Machine Learning Models In Production At Enterprise Scale

ProductizeML: Assisting Your Team to Better Build ML Products // Adrià Romero // MLOps Meetup #47

Знищені колони та десятки полонених: останнє про Курський прорив. Донбас, удари, бої | Свобода Live

⚡️ "Пятки сверкали" аж до КУРСКА! "Кадыровцы" БЕЖАЛИ, оголив фронт!

Перші думки батьків, коли дізнались про важке поранення сина #війна #україна #зсу #люди #shorts

Serving ML Models at a High Scale with Low Latency // Manoj Agarwal // MLOps Meetup #48

MLOps.community

Переглядів 3 036

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 14 сер 2024
MLOps community meetup #48! Last Wednesday, on his birthday, we talked to Manoj Agarwal, Software Architect at Salesforce.
// Abstract:
Serving machine learning models is a scalability challenge at many companies. Most applications require a small number of machine learning models (often less than 100) to serve predictions. On the other hand, cloud platforms that support model serving, though they support hundreds of thousands of models, provision separate hardware for different customers. Salesforce has a unique challenge that only very few companies deal with; Salesforce needs to run hundreds of thousands of models sharing the underlying infrastructure for multiple tenants for cost-effectiveness.
// Takeaways:
This talk explains Salesforce hosts hundreds of thousands of models on a multi-tenant infrastructure to support low-latency predictions.
// Bio:
Manoj Agarwal is a Software Architect in the Einstein Platform team at Salesforce. Salesforce Einstein was released back in 2016, integrated with all the major Salesforce clouds. Fast forward to today and Einstein is delivering 80+ billion predictions across Sales, Service, Marketing & Commerce Clouds per day.
//Relevant Links
engineering.sa...
engineering.sa...
---------- ✌️Connect With Us ✌️------------
Join our Slack community: go.mlops.commu...
Follow us on Twitter: @mlopscommunity
Sign up for the next meetup: go.mlops.commu...
Catch all episodes, Feature Store, Machine Learning Monitoring and Blogs: mlops.community/
Connect with Demetrios on LinkedIn: / dpbrinkm
Connect with Manoj on LinkedIn: / agarwalmk
Timestamps:
[00:00] Happy birthday Manoj!
[00:41] Salesforce blog post about Einstein and ML Infrastructure
[02:55] Intro to Serving Large Number of Models with Low Latency
[03:34] Manoj' background
[04:22] Machine Learning Engineering: 99% engineering + 1% machine learning - Alexey Gregorev on Twitter
[04:37] Salesforce Einstein
[06:42] Machine Learning: Big Picture
[07:05] Feature Engineering
[07:30] Model Training
[08:38] Machine Learning: Big Picture
[08:53] Model Serving Requirements
[11:09] Model Serving: Able to Predict
[12:52] Model Serving Requirements
[13:01] Do you standardize on how models are packaged in order to be served and if so, what standards Salesforce require and enforce from model packaging?
[14:29] Support Multiple Frameworks
[16:16] Is it easy to just throw a software library in there?
[19:20] Support Multiple Frameworks/Versions
[22:50] What does gRPC stand for? (Google Remote Processing Core)
[24:36] Support Multiple Model Versions
[27:06] Along with that metadata, can you break down how that goes?
[28:27] Low Latency
[30:29] Scaling for RPS (Request Per Second): Replication
[31:15] Scaling Number of Models: Sharding
[32:30] Model Sharding with Replication
[33:58] What would you do to speed up transformation code run before scoring?
[35:55] Model Serving Scaling
[37:06] Noisy Neighbor: Shuffle Sharding
[39:29] If all the Salesforce Models can be categorized into different model types, based on what they provide, what would be some of the big categories be and what's the biggest?
[44:18] Avoid Hotspots: Per Model Scaling
[46:27] Retraining of the Model: Does that deal with your team or is that distributed out and your team deals mainly with this kind of engineering and then another team deals with more machine learning concepts of it?
[47:26] Do you track the quality of the data going into the model and if it changes over time that would probably be on the other side of the machine learning?
[49:15] Do you think this will be open-sourced or is it going to be that makes us all jealous?
[50:13] How do you ensure different models created by different teams for data scientists expose the same data in order to be analyzed?
[52:08] Are you using Kubernetes or is it another registration engine?
[53:03] How is it ensured that different models expose the same information?
Наука та технологія

КОМЕНТАРІ •

Наступне

Автоматичне відтворення

What's New, ModelMesh? Model Serving at Scale - Rafael Vasquez, IBM

What's New, ModelMesh? Model Serving at Scale - Rafael Vasquez, IBM

MLOps In Practice - How To Run Your Machine Learning Models In Production At Enterprise Scale

MLOps In Practice – How To Run Your Machine Learning Models In Production At Enterprise Scale

ProductizeML: Assisting Your Team to Better Build ML Products // Adrià Romero // MLOps Meetup #47

ProductizeML: Assisting Your Team to Better Build ML Products // Adrià Romero // MLOps Meetup #47

Знищені колони та десятки полонених: останнє про Курський прорив. Донбас, удари, бої | Свобода Live

Знищені колони та десятки полонених: останнє про Курський прорив. Донбас, удари, бої | Свобода Live

⚡️ "Пятки сверкали" аж до КУРСКА! "Кадыровцы" БЕЖАЛИ, оголив фронт!

⚡️ "Пятки сверкали" аж до КУРСКА! "Кадыровцы" БЕЖАЛИ, оголив фронт!

Перші думки батьків, коли дізнались про важке поранення сина #війна #україна #зсу #люди #shorts

Перші думки батьків, коли дізнались про важке поранення сина #війна #україна #зсу #люди #shorts

Колишній увʼязнений - про мотивацію воювати

Колишній увʼязнений — про мотивацію воювати

Accelerate PyTorch workloads with PyTorch/XLA

Accelerate PyTorch workloads with PyTorch/XLA

AWS re:Invent 2022 - Deploy ML models for inference at high performance & low cost, ft AT&T (AIM302)

AWS re:Invent 2022 - Deploy ML models for inference at high performance & low cost, ft AT&T (AIM302)

Exploring ML Model Serving with KServe (with fun drawings) - Alexa Nicole Griffith, Bloomberg

Exploring ML Model Serving with KServe (with fun drawings) - Alexa Nicole Griffith, Bloomberg

AI vs Machine Learning

AI vs Machine Learning

Harnessing AI APIs for Safer, Accurate, & Reliable Applications // Ron Heichman //MLOps Podcast #252

Harnessing AI APIs for Safer, Accurate, & Reliable Applications // Ron Heichman //MLOps Podcast #252

What is Apache Kafka®?

What is Apache Kafka®?

What are Distributed CACHES and how do they manage DATA CONSISTENCY?

What are Distributed CACHES and how do they manage DATA CONSISTENCY?

Introducing KFServing: Serverless Model Serving on Kubernetes - Ellis Bigelow & Dan Sun

Introducing KFServing: Serverless Model Serving on Kubernetes - Ellis Bigelow & Dan Sun

Deploying ML Models in Production: An Overview

Deploying ML Models in Production: An Overview

Для чего нужны порты материнской платы

Для чего нужны порты материнской платы

Почти ИМПОРТОЗАМЕЩЕНИЕ.

Почти ИМПОРТОЗАМЕЩЕНИЕ.

Уродливый геймпад Xiaomi. Как же он плох...

Уродливый геймпад Xiaomi. Как же он плох...

Just Connect Your TV and Watch All the World's Channels in Full HD Format

Just Connect Your TV and Watch All the World's Channels in Full HD Format

Подсветка для машины в usb порт с wb #обзор #wildberries #дляавто

Подсветка для машины в usb порт с wb #обзор #wildberries #дляавто

Шпионская функция Apple. Поделитесь с друзьями 🙌🏼

Шпионская функция Apple. Поделитесь с друзьями 🙌🏼

Секрет конденсатора! #секрет #capacitor #electronics #энерголикбез

Секрет конденсатора! #секрет #capacitor #electronics #энерголикбез