7 measurements that help minimize model risk for RAG

Поділитися
Вставка
  • Опубліковано 29 тра 2024
  • Get the guide to GAI and ML for the enterprise → ibm.biz/BdmYEA
    Deploying models built with AI or genAI can be risky without the right understanding of what a successful answer from your AI looks like. Learn about 7 key key measurements for evaluating a successful RAG use case with Brianne Zavala.
    AI news moves fast. Sign up for a monthly newsletter for AI updates from IBM. → ibm.biz/BdmYEu

КОМЕНТАРІ • 9

  • @johanneshalbach916
    @johanneshalbach916 Місяць тому +8

    Very nice informative video, enjoyed it a lot. Would have liked to see a bit more on how to calculate and implement some of these metrics though. For example how the hallucinations are quantified, since it seems to me it's a very difficult thing to measure.

  • @LaurelPapworth
    @LaurelPapworth Місяць тому +2

    THANK YOU! I find your tutorials helpful and informative and … not full of fluff! xx ❤️❤️

  • @mmclean0
    @mmclean0 Місяць тому

    Good video - I like IBM’s approach to day 2 model operations. Their automated monitoring around llms builds on their leading approach for monitoring/versioning of traditional ML models. Great stuff, Briana!

  • @amritbro
    @amritbro Місяць тому

    So much important concept to keep up to date any LLMs with the information from the internet.

  • @user-ht9st4up8q
    @user-ht9st4up8q 29 днів тому

    I havne't play around with LLM + RAG , but when i think about it , sounds like i can just use LLM and pair it with my office's wiki, then i can chat to get my information !! purrrffecto !

  • @amritsubramanian8384
    @amritsubramanian8384 Місяць тому

    Hey then what is RAGA about

  • @stunspot
    @stunspot Місяць тому

    RAG is just so poorly done most places. Azure is ok. I know Microsoft's ex cto Sirosh who built their cognitive search. They are the only ones i've found who don't suck horribly. And don't even talk to me about OpenAI's Knowledge Bases or things will get vitriolic and scatalogical very quickly.

  • @samfranian7857
    @samfranian7857 29 днів тому

    BLEU = (bilingual evaluation understudy)
    en.wikipedia.org/wiki/BLEU
    ROUGE = (Recall-Oriented Understudy for Gisting Evaluation)
    en.wikipedia.org/wiki/ROUGE_(metric)
    Many thanks for your wonderful video! 🙏🙏🙏