This is gold, most of people just show you how to build toy demo, but not many actually get into details of how to get into production; Thank you Jason!
This is great. Loved the use of firecrawl (as a scrape tool) to get the website's data. Feel like it always helps improve the model output quality. Cheers!
I recently be created a whole testing system for our LLM chatbots and we did exactly this: LLM as evaluator and code We created it as a series of unit tests with LLM generated cases. Since our results were mostly conversational, we made tests pass/fail according to a scoring system
Lets me share my experience about any google AI model ... because it doesn't understand human and it hallucinate way too much. Practically ... in my cases 75% of the time what I get back is totally useless result. You cant use for anything... To be considered for evaluation ... you must be joking
I dont see the value of "Agents". All of this stuff is easily done with basic function calling. I think I'm going to need to see some more creative use cases before I jump on board, i just dont get it yet.
This is gold, most of people just show you how to build toy demo, but not many actually get into details of how to get into production; Thank you Jason!
Couldnt agree more. This is gold.
This is great. Loved the use of firecrawl (as a scrape tool) to get the website's data. Feel like it always helps improve the model output quality. Cheers!
Way excellent video that goes well beyond demo. Thank you very much for this guidance.
Amazing work as always Jason!
Been looking for more detail on eval on LLMs and been scratching around for a while. Thanks for this.
I recently be created a whole testing system for our LLM chatbots and we did exactly this:
LLM as evaluator and code
We created it as a series of unit tests with LLM generated cases.
Since our results were mostly conversational, we made tests pass/fail according to a scoring system
goddamn Jason your videos just blow my mind each time. Thanks for such a thorough explanation and example.
I've used promptfoo for some of my test with local llm to test the ai workflow. It allow you to write assertion like you'll do with software
Awesome! Keep up the great work!
Finally you back 🎉
jason can we get another video about comfy ui?
fine tune llama 3 (8bit) - you will get exactly the behavior you want - its what I do
This is so good, thanks man!
lesgooo!! ❤🔥❤🔥❤🔥
I found langfuse metric monitoring little bit better.
Sick, whats the best practice metrics for evaluating agents?
Great Video
great stuff, as new to hearing this, very interesting, can this be built by a novice . . .
Who never spent 4 hours to save 10 min? That's our hobby spent time to save time.
If 25 people or more use it successfully then you literally gave humanity more time to live and be free
fireeee content!
I love how my Ai girl insults the competion with flame balls,then tells me.she loves me.❤🎉😊
Why not use Gemini as the LLM? It is free.
Lets me share my experience about any google AI model ... because it doesn't understand human and it hallucinate way too much.
Practically ... in my cases 75% of the time what I get back is totally useless result. You cant use for anything... To be considered for evaluation ... you must be joking
I dont see the value of "Agents". All of this stuff is easily done with basic function calling. I think I'm going to need to see some more creative use cases before I jump on board, i just dont get it yet.
Maybe we can discuss this, I am trying to jump on in but not until I find a decent idea to apply.
when your assistant has a lot of functions, he starts giving out hallucinations, have you ever encountered this?
Good content but so hard to listen to his Engrish. Monotonous Pitch n sped up delivery didn’t seem to help either.