How to evaluate upgrading your app to GPT-4o | LangSmith Evaluations - Part 18

Поділитися
Вставка
  • Опубліковано 12 тра 2024
  • OpenAI recently released GPT-4o, which reports significant improvements in latency and cost. Many users may wonder how to evaluate the effects of upgrading their app to GPT-4o? For example, what latency benefit will users expect to gain and are there any material differences in app performance when I switch to the new GPT-4o model.
    Decisions like this are often limited by quality evaluations! Here, we show the process of evaluating GPT-4o on an example RAG app with a 20 question eval set related to LangChain documentation. We show how regression testing in the LangSmith UI allows you to quickly pinpoint examples where GPT-4o shows improvements or regressions over your current app.
    GPT-4o docs:
    openai.com/index/hello-gpt-4o/
    LangSmith regression testing UI docs:
    docs.smith.langchain.com/old/...
    RAG evaluation docs:
    docs.smith.langchain.com/old/...
    Public dataset referenced in the video:
    smith.langchain.com/public/ea...
    Cookbook referenced in the video:
    github.com/langchain-ai/langs...
  • Наука та технологія

КОМЕНТАРІ • 13

  • @ibbobud
    @ibbobud 26 днів тому +1

    Quick and to the point! Love the eval!

  • @MaybeTogether
    @MaybeTogether 26 днів тому

    Thank you. I instinctively started googling, because for me answer accuracy / answer quality is more significant to me

  • @learnbydoing6010
    @learnbydoing6010 27 днів тому +1

    So fast. 🎉thank you.

  • @millingabani
    @millingabani 26 днів тому +1

    You guys are awesome!

  • @ClarkNewlove
    @ClarkNewlove 27 днів тому

    Nice. Thanks for sharing!

  • @Nairb932
    @Nairb932 26 днів тому

    Keep up the great work

  • @octaviusp
    @octaviusp 27 днів тому

    ahhaa, very fast reaction! great job

  • @BenitoMartin-dk7lj
    @BenitoMartin-dk7lj 26 днів тому

    Amazing!

  • @_arkadij
    @_arkadij 26 днів тому

    working fast cool

  • @calvin_banks_music
    @calvin_banks_music 26 днів тому

    Did you make this graphic at 1.48 programmatically or did you import it as image from a different tool?

  • @luisguillermopardo7792
    @luisguillermopardo7792 20 днів тому

    I have an script that use pandas agents and tools to solve questions. I updated to gpt-4o and the model has many Issues at uso y the tools compared with gpt-4. Do you know if we have to do any extra setting or something ?

  • @AI_by_AI_007
    @AI_by_AI_007 26 днів тому

    Googles team does AlphaFold and changes the world and Sam gives us NSFW tools….