Great video, but comparing with claude 3.5 sonnet would be a better comparison??? since its their latest model. with base model fee. WOuld like to see similar test, and more general dauly usage problem test from different fields, with sonnet.
Firstly I don't have much in-depth in deep learning things... can you add xai grok in your comparisons, I'm also a LLM enthusiast , I'm impressed with it's reasoning abilities and more consistent on generating results comparing the gemini/Claude/gpt, and its code generation/reasoning is way more powerful, tho now its free to use. what I'm going to tell you, that might create controversy on me😂, but from my pov... in my list, 1) grok/claude 2) copilot/gpt 3) gemini i'll make grok as my goto llm tool, note: i'm not elon fanboy🙂 you're soo underrated , you need more recognition.
thanks means a lot . There is nothing wrong in having a lit of yourself. That is why i say don't take my results as ultimate fact. If Lmsys guys can keep gpt4 above o1 than i think our lists are way better than that.
Great video, but comparing with claude 3.5 sonnet would be a better comparison??? since its their latest model. with base model fee. WOuld like to see similar test, and more general dauly usage problem test from different fields, with sonnet.
Very good analysis.
Firstly I don't have much in-depth in deep learning things...
can you add xai grok in your comparisons, I'm also a LLM enthusiast , I'm impressed with it's reasoning abilities and more consistent on generating results comparing the gemini/Claude/gpt, and its code generation/reasoning is way more powerful, tho now its free to use.
what I'm going to tell you, that might create controversy on me😂, but from my pov...
in my list,
1) grok/claude
2) copilot/gpt
3) gemini
i'll make grok as my goto llm tool, note: i'm not elon fanboy🙂
you're soo underrated , you need more recognition.
thanks means a lot . There is nothing wrong in having a lit of yourself. That is why i say don't take my results as ultimate fact.
If Lmsys guys can keep gpt4 above o1 than i think our lists are way better than that.
Can you *please* put that *LLM Test* for us to see and use it also, so we can also test models?
yes bro working on it. I'll have to make some changes to make it more dynamic but surely do.
yes bro working on it. I'll have to make some changes to make it more dynamic but surely do.
@@YJxAI thank you so much!
How can you get consistency when you are using a temperature of 1 and top p value of 0.95. if you want consistency you must set it to a low value
wanted to test it out on default values.
Change the temperature from 1 to 2 in Gemini flash.. and then see the accuracy