How to evaluate upgrading your app to GPT-4o | LangSmith Evaluations - Part 18
Вставка
- Опубліковано 12 тра 2024
- OpenAI recently released GPT-4o, which reports significant improvements in latency and cost. Many users may wonder how to evaluate the effects of upgrading their app to GPT-4o? For example, what latency benefit will users expect to gain and are there any material differences in app performance when I switch to the new GPT-4o model.
Decisions like this are often limited by quality evaluations! Here, we show the process of evaluating GPT-4o on an example RAG app with a 20 question eval set related to LangChain documentation. We show how regression testing in the LangSmith UI allows you to quickly pinpoint examples where GPT-4o shows improvements or regressions over your current app.
GPT-4o docs:
openai.com/index/hello-gpt-4o/
LangSmith regression testing UI docs:
docs.smith.langchain.com/old/...
RAG evaluation docs:
docs.smith.langchain.com/old/...
Public dataset referenced in the video:
smith.langchain.com/public/ea...
Cookbook referenced in the video:
github.com/langchain-ai/langs... - Наука та технологія
Quick and to the point! Love the eval!
Thank you. I instinctively started googling, because for me answer accuracy / answer quality is more significant to me
So fast. 🎉thank you.
You guys are awesome!
Nice. Thanks for sharing!
Keep up the great work
ahhaa, very fast reaction! great job
Amazing!
working fast cool
Did you make this graphic at 1.48 programmatically or did you import it as image from a different tool?
I have an script that use pandas agents and tools to solve questions. I updated to gpt-4o and the model has many Issues at uso y the tools compared with gpt-4. Do you know if we have to do any extra setting or something ?
For eda?
Googles team does AlphaFold and changes the world and Sam gives us NSFW tools….