Teaching LLMs to Use Tools at Scale - Shishir Patil | Stanford MLSys #98

Notes on AI Hardware - Benjamin Spector | Stanford MLSys #88

Evaluation for Large Language Models and Generative AI - A Deep Dive

Сказала дочке НЕТ!

Василиса пошла В ПЕРВЫЙ класс! А что у вас в рюкзаке)))?

Участник рассмешил Диму Журавлева 😂 | Смотри Удиви меня в VK Видео!

Scaling Up “Vibe Checks” for LLMs - Shreya Shankar | Stanford MLSys #97

Stanford MLSys Seminars

Переглядів 2 736

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 12 вер 2024
Episode 97 of the Stanford MLSys Seminar Series!
Scaling Up “Vibe Checks” for LLMs
Speaker: Shreya Shankar
Bio:
Shreya Shankar is a PhD student in computer science at UC Berkeley, advised by Dr. Aditya Parameswaran. Her research focuses on addressing data challenges in production machine learning pipelines through a human-centered approach. Her work has appeared in top database and human-computer interaction venues like VLDB, SIGMOD, CIDR, and CSCW. She is a recipient of the NDSEG Fellowship and co-organizes the DEEM workshop at SIGMOD, which focuses on data management in end-to-end machine learning.
Abstract:
Large language models (LLMs) are increasingly being used to write custom pipelines that repeatedly process or generate data of some sort. Despite their usefulness, LLM pipelines often produce errors, typically identified through manual “vibe checks” by developers. This talk explores automating this process using evaluation assistants, presenting a method for automatically generating assertions and an interface to help developers iterate on assertion sets. We share takeaways from a deployment with LangChain, where we auto-generated assertions for 2000+ real-world LLM pipelines. Finally, we discuss insights from a qualitative study of how 9 engineers use evaluation assistants: we highlight the subjective nature of "good" assertions and how they adapt over time with changes in prompts, data, LLMs, and pipeline components.
--
Stanford MLSys Seminar hosts: Avanika Narayan, Benjamin Spector, Michael Zhang
Twitter:
/ avanika15
/ bfspector
/ mzhangio
--
Check out our website for the schedule: mlsys.stanford.edu
Join our mailing list to get weekly updates: groups.google....
#machinelearning #ai #artificialintelligence #systems #mlsys #computerscience #stanford

КОМЕНТАРІ • 2

@kevon217 4 місяці тому
Thanks for the great walkthrough. Looking forward to reading these papers.
@MatijaGrcic 3 місяці тому
Great talk, thanks for sharing.

Наступне

Автоматичне відтворення

Teaching LLMs to Use Tools at Scale - Shishir Patil | Stanford MLSys #98

Teaching LLMs to Use Tools at Scale - Shishir Patil | Stanford MLSys #98

Notes on AI Hardware - Benjamin Spector | Stanford MLSys #88

Notes on AI Hardware - Benjamin Spector | Stanford MLSys #88

Evaluation for Large Language Models and Generative AI - A Deep Dive

Evaluation for Large Language Models and Generative AI - A Deep Dive

Сказала дочке НЕТ!

Сказала дочке НЕТ!

Василиса пошла В ПЕРВЫЙ класс! А что у вас в рюкзаке)))?

Василиса пошла В ПЕРВЫЙ класс! А что у вас в рюкзаке)))?

Участник рассмешил Диму Журавлева 😂 | Смотри Удиви меня в VK Видео!

Участник рассмешил Диму Журавлева 😂 | Смотри Удиви меня в VK Видео!

Презентация Apple iPhone 16 WYLSACOM 09.09 в 19:00 МСК (смотрим, общаемся, разыгрываем айфоны)

Презентация Apple iPhone 16 WYLSACOM 09.09 в 19:00 МСК (смотрим, общаемся, разыгрываем айфоны)

Monarch Mixer: Making Foundation Models More Efficient - Dan Fu | Stanford MLSys #86

Monarch Mixer: Making Foundation Models More Efficient - Dan Fu | Stanford MLSys #86

Computer Vision Meetup: Foundation Models for Electronic Health Records (EHRs)

Computer Vision Meetup: Foundation Models for Electronic Health Records (EHRs)

Text2SQL: The Dream versus Reality - Laurel Orr | Stanford MLSys #89

Text2SQL: The Dream versus Reality - Laurel Orr | Stanford MLSys #89

Digital Transformation: Beyond the Technology

Digital Transformation: Beyond the Technology

Hardware-aware Algorithms for Sequence Modeling - Tri Dao | Stanford MLSys #87

Hardware-aware Algorithms for Sequence Modeling - Tri Dao | Stanford MLSys #87

SQL Generation Evals: LLMs-as-a-Judge

SQL Generation Evals: LLMs-as-a-Judge

Reza Shabani - How Replit Trained Their Own LLMs (LLM Bootcamp)

Reza Shabani - How Replit Trained Their Own LLMs (LLM Bootcamp)

The Next 100x - Gavin Uberti | Stanford MLSys #92

The Next 100x - Gavin Uberti | Stanford MLSys #92

Mitigating LLM Hallucinations with a Metrics-First Evaluation Framework

Mitigating LLM Hallucinations with a Metrics-First Evaluation Framework

Шок. Никокадо Авокадо похудел на 110 кг

Шок. Никокадо Авокадо похудел на 110 кг

IT'S MY LIFE + WATER #drumcover

IT'S MY LIFE + WATER #drumcover

GTA 5 vs GTA San Andreas Doctors🥼🚑

GTA 5 vs GTA San Andreas Doctors🥼🚑

Brawl Stars Edit😈📕

Brawl Stars Edit😈📕

СОМНЕНИЙ НЕТ! Первая встреча с приёмным ребёнком | Зови меня мамой

СОМНЕНИЙ НЕТ! Первая встреча с приёмным ребёнком | Зови меня мамой

Таня не врахувала уроки важкого дитинства і жила з тираном - Супермама 8 сезон - Випуск 1 | ПРЕМ'ЄРА

Таня не врахувала уроки важкого дитинства і жила з тираном – Супермама 8 сезон – Випуск 1 | ПРЕМ'ЄРА

Прощання з сімʼєю Базилевич у Льовові

Прощання з сімʼєю Базилевич у Льовові

Дивись прікол з АК #перезарядка #ак #калаш

Дивись прікол з АК #перезарядка #ак #калаш