Regression Testing | LangSmith Evaluations - Part 15

Поділитися
Вставка
  • Опубліковано 9 чер 2024
  • Evaluations can accelerate LLM app development, but it can be challenging to get started. We've kicked off a new video series focused on evaluations in LangSmith.
    With the rapid pace of AI, developers are often faced with a paradox of choice: how to choose the right prompt, how to trade-off LLM quality vs cost? Evaluations can accelerate development with structured process for making these decisions. But, we've heard that it is challenging to get started. So, we are launching a series of short videos focused on explaining how to perform evaluations using LangSmith.
    This video focuses on Regression Testing, which lets a user highlight particular examples in an eval set that show improvement or regression across a set of experiments.
    Blog: blog.langchain.dev/regression...
    LangSmith: smith.langchain.com/
    Documentation: docs.smith.langchain.com/eval...

КОМЕНТАРІ • 3

  • @MattJonesYT
    @MattJonesYT Місяць тому +3

    This is extremely useful, especially for agent systems where the rules have been written to be over-fit for a particular LLM. I find crewai often has that problem, it works well for the LLM it was written for but then makes nonsense with a different LLM.

  • @MattJonesYT
    @MattJonesYT Місяць тому

    An extension of this idea would be doing regressions on the prompt system as a whole in an agent system to see how well it adapts to other LLMs. Make a matrix of how its prompts work for its original LLM vs new, out-of-sample LLMs. If it immediately breaks on new LLMs then it is probably over-fit and you can have AI try to re-write those prompts to be simpler and then make a system that is more robust for different LLMs.

  • @UtopIA-IAparaDevs
    @UtopIA-IAparaDevs Місяць тому

    Thank you