R1’s answer at @8:46 can be considered as correct, since the question has more than one right answer. Here’s the alternative answer: the color of the boundary of the rectangle with the largest length. The problem with this benchmark is that the questions don’t always have unique solutions. Updated result: tie between o3 mini and DeepSeek.
I've been on the fence about Gemini. It always gives me different results than any other model. I've sorta just dismissed it entirely, but now that they have deep think. I may have to revisit Gemini as a potential tool.
R1’s answer at @8:46 can be considered as correct, since the question has more than one right answer. Here’s the alternative answer: the color of the boundary of the rectangle with the largest length. The problem with this benchmark is that the questions don’t always have unique solutions. Updated result: tie between o3 mini and DeepSeek.
im excited to see gemini 2.0 pro thinking
yes me too..
btw if gemini 2.0 pro is that good without thinking then it will be good
Thanks 🎉❤
❤️
Gemini 2.0 Pro ❤
Gemini has thinking now
Yeah I was wondering why you didn't use that it actually has two thinking models at the moment
there's only flash thinking guess.
okay ❤
Who will win?
The machines of course!
I think you should scold o3 mini-high for better response
🤣 yeah.
I've been on the fence about Gemini. It always gives me different results than any other model. I've sorta just dismissed it entirely, but now that they have deep think. I may have to revisit Gemini as a potential tool.
I think you are talking about the gemini right not ai studio.
@@YJxAI Yes, I was referring to the Gemini Chat. It has been very hit and miss, more miss than hit :)
@ yeah it is becoming compelling lately.
what program you are using to generate those promt
I made a common prompt like understand the below question and each time new question comes it is placed in the placeholder. and I can copy it.
Isn't there a thinking model from google too in AI studio?
there is did a video on it. Please do check :)
Do manim coding challenge