I believe that the launch of GPT-5 will take place next week, but it would be amazing if it happened this week. That way, in addition to celebrating the one-year anniversary of GPT-4, we would have the chance to constantly talk about GPT-5. I hope that GPT-5 will exhibit reasoning far superior to all currently available models. With this, OpenAI would quickly silence critics and envious voices.
I'm impressed with it. I asked Claude how to fix my car and the response matched GPT 4 and they were right. i also uploaded a picture of a mole and asked if it was skin cancer. Claude said that it didn't display markings of cancer but i need to ask my doctor. GPT 4 straight up told me nothing and it violated its TOS and said to only go to the doctor. I also went to the doctor and he said it wasn't cancer. I'll probably switch
Love your benchmark and comparison tests, simple and not too long and effective. Seen a bunch of ai vids released similar times around the Claude model but soon as I saw yours I had to click first. You reckon you could do more coding examples ?
Referring to 8':00" ; This might be very subjective but historically, the usage of a Post Script dates back to when correspondence were handwritten or typed, making it cumbersome to incorporate any afterthoughts or additional information into the body of the letter without rewriting the entire message. I get it, today its used to stress or highlight a point, but then *better and more effective writing* would negate that. Just a thought ....
So you tested if Claude team fine tuned their model to the snake question, and that's nice that the people there are aware of repeated tests, but how about really testing it for code?
@@SkillLeapAI Just ask it to perform any task you'd like it to make, ask it to create a different game that is at a similar complexity to snake, as long as the question has not been asked in the past, you're good to go.
GPT-4 cũng gặp sự cố và có thể bị treo, nhưng không gây ra tình trạng như máy chủ của Claude, mất khoảng 10-15 phút để trả lời một câu hỏi. Người ta mong ước rằng họ sẽ có máy chủ như của Google, Gemini, luôn hoạt động nhanh chóng. Video có thể đã phóng đại khả năng của Claude 3 với những tuyên bố mạnh mẽ về sự vượt trội so với đối thủ ở mọi lĩnh vực. Tuy nhiên, Anthropic's Model có thể thể hiện điểm mạnh ở một số lĩnh vực, nhưng các mô hình ngôn ngữ lớn rất phức tạp và hiếm khi có sự thống trị hoàn toàn. Một bài thuyết trình cân nhắc hơn sẽ tập trung vào các điểm mạnh cụ thể mà Claude 3 có, so sánh với nhược điểm và thừa nhận rằng hiệu suất có thể thay đổi theo nhiệm vụ. Quan trọng là phải chờ đợi xác minh độc lập về những tuyên bố này, vì các công ty có thể thiên vị sản phẩm của mình, gây nghi ngờ về những tuyên bố quá mức.
The video likely overhypes Claude 3's capabilities with its bold claim of outperforming competitors in every category. While Anthropic's model may show strengths in certain areas, large language models (LLMs) are complex, and outright dominance is rare. A more balanced presentation would highlight specific benchmarks where Claude 3 excels, compare its weaknesses, and acknowledge that performance can vary depending on the task. Additionally, it's important to await independent verification of these claims, as companies can be biased towards their own products, making skepticism towards sweeping statements advisable.
It works with US VPN and they sent me a code to my EU Number; the verification worked .. but I do not know if you are able to purchase the Pro Plan eventually .. did not try
at least Gemini Advanced gave me a 2 months free trial (and I am mind blown compared to GPT 4 and will switch in case OpenAI is not able to adapt) .. Asked Claude (free) a question and return was something "I'm too busy, please try pro version" .. thank you, but this is not the way to generate new customers.
might keep my subscription if Claude is even better now. I actually use it mainly for helping me write my books and game coding and few other very small things. when it came to my book writing chat gpt did it better sometimes like helping me expand a paragraph of story text of story telling like add more detail into what i already typed.
Claude3 is awesome but servers are💀.... now that everyone is there lol And GPt4 also was having issues with them and would freeze a lot, but not as crazy as Claude's servers, takes 10-15 mins for one reply now. I wish they had Googles Servers, Gemini is always ultra fast..
@@SkillLeapAI Right, but the claim is in quotation marks, indicating that it's just the claim and not necessarily reality. Matt's conclusion is that Claud didn't beat out gpt4 and is more expensive. He does point out that gpt won out in logic and dialog use but Claude did very well in the technical portion (centipede game)
I see. For some reason all his titles say shocking or breaking lately and I can’t keep track. On the consumer side, they are both $20 dollars a month, and I usually compare the consumer facing Chatbot and not the API. But I understand the point. I just don’t think any of us can have any claim of our own with a couple of hours of testing. I do remember Gemini had similar claims and I ended up disagreeing with every benchmark. So we will see
I believe that the launch of GPT-5 will take place next week, but it would be amazing if it happened this week. That way, in addition to celebrating the one-year anniversary of GPT-4, we would have the chance to constantly talk about GPT-5. I hope that GPT-5 will exhibit reasoning far superior to all currently available models. With this, OpenAI would quickly silence critics and envious voices.
I'm impressed with it. I asked Claude how to fix my car and the response matched GPT 4 and they were right. i also uploaded a picture of a mole and asked if it was skin cancer. Claude said that it didn't display markings of cancer but i need to ask my doctor. GPT 4 straight up told me nothing and it violated its TOS and said to only go to the doctor. I also went to the doctor and he said it wasn't cancer. I'll probably switch
Claude is not available in Canada.
Wonder how these geographic decisions are made. It's available in my two homes: Thailand AND Sri Lanka....hmmmm.
Can't wait for perplexity to add claude 3 into their group of models that can be used in copilot mode its gonna be epic
Its in Poe as of now. Files is a little odd though. not taking pictures. wah wahhhh.
@@totempow In my knowledge it accepts docx. (Microsoft document) Files
what if i told you they already did on their web browser
I'd be a little happier.@@JaddOnTheTrackakaJOTT
@@JaddOnTheTrackakaJOTTjust saw it but limited for 5 queries per day
Been playing with it at work today... its exceptional. Surprised and impressed. Wish it had web browsing though
Love your benchmark and comparison tests, simple and not too long and effective. Seen a bunch of ai vids released similar times around the Claude model but soon as I saw yours I had to click first. You reckon you could do more coding examples ?
I am surprised how good it is:)
Referring to 8':00" ; This might be very subjective but historically, the usage of a Post Script dates back to when correspondence were handwritten or typed, making it cumbersome to incorporate any afterthoughts or additional information into the body of the letter without rewriting the entire message. I get it, today its used to stress or highlight a point, but then *better and more effective writing* would negate that. Just a thought ....
If you can upload a report with all your test runs (Claude vs GPT-4) that would be great. :)
good job man
Whoi is actually testing each version of ai models when they release it ro rhw public? I mean tge comparison table? Is there any regulation?
As far as I know, those are internal benchmark testing they run.
So you tested if Claude team fine tuned their model to the snake question, and that's nice that the people there are aware of repeated tests, but how about really testing it for code?
I’m not a developer but if you have recommendation I can test out, I’m happy to try
@@SkillLeapAI Just ask it to perform any task you'd like it to make, ask it to create a different game that is at a similar complexity to snake, as long as the question has not been asked in the past, you're good to go.
GPT-4 cũng gặp sự cố và có thể bị treo, nhưng không gây ra tình trạng như máy chủ của Claude, mất khoảng 10-15 phút để trả lời một câu hỏi. Người ta mong ước rằng họ sẽ có máy chủ như của Google, Gemini, luôn hoạt động nhanh chóng. Video có thể đã phóng đại khả năng của Claude 3 với những tuyên bố mạnh mẽ về sự vượt trội so với đối thủ ở mọi lĩnh vực. Tuy nhiên, Anthropic's Model có thể thể hiện điểm mạnh ở một số lĩnh vực, nhưng các mô hình ngôn ngữ lớn rất phức tạp và hiếm khi có sự thống trị hoàn toàn. Một bài thuyết trình cân nhắc hơn sẽ tập trung vào các điểm mạnh cụ thể mà Claude 3 có, so sánh với nhược điểm và thừa nhận rằng hiệu suất có thể thay đổi theo nhiệm vụ. Quan trọng là phải chờ đợi xác minh độc lập về những tuyên bố này, vì các công ty có thể thiên vị sản phẩm của mình, gây nghi ngờ về những tuyên bố quá mức.
Good video as usual. Thanks for the details
I Am sticking with chatgpt as as soon as gpt 5 comes out there will be no competition.
The video likely overhypes Claude 3's capabilities with its bold claim of outperforming competitors in every category. While Anthropic's model may show strengths in certain areas, large language models (LLMs) are complex, and outright dominance is rare. A more balanced presentation would highlight specific benchmarks where Claude 3 excels, compare its weaknesses, and acknowledge that performance can vary depending on the task. Additionally, it's important to await independent verification of these claims, as companies can be biased towards their own products, making skepticism towards sweeping statements advisable.
well looks like ChatGPT is bad at commenting on UA-cam videos. Not at all what the video is.
Meanwhile in the European Union Claude is still not available...
And you'll need a Phone Number to verify your country
It works with US VPN and they sent me a code to my EU Number; the verification worked .. but I do not know if you are able to purchase the Pro Plan eventually .. did not try
I'll let this marinate for some some weeks or months for it to be better trained by users input
Not impressed, Claude 3.0 models sounds more like gpt than sounding human like 2.1,2.0 did! Very sad that they destroyed the strength of claude
at least Gemini Advanced gave me a 2 months free trial (and I am mind blown compared to GPT 4 and will switch in case OpenAI is not able to adapt) .. Asked Claude (free) a question and return was something "I'm too busy, please try pro version" .. thank you, but this is not the way to generate new customers.
YOOOO
might keep my subscription if Claude is even better now. I actually use it mainly for helping me write my books and game coding and few other very small things.
when it came to my book writing chat gpt did it better sometimes like helping me expand a paragraph of story text of story telling like add more detail into what i already typed.
Claude3 is awesome but servers are💀.... now that everyone is there lol
And GPt4 also was having issues with them and would freeze a lot, but not as crazy as Claude's servers, takes 10-15 mins for one reply now. I wish they had Googles Servers, Gemini is always ultra fast..
Except it's not tue. In an actual head to head vs GPT 4, it was shown to be a bit inferior: ua-cam.com/video/sX8Ri3w2MeM/v-deo.html&ab
well everyone has had it for like 4 hours. So really can't make a real determination.
Also Matt's video has the same title which is the claim of Claude and after watching it, doesn't sound like he came to a conclusive answer either.
@@SkillLeapAI Right, but the claim is in quotation marks, indicating that it's just the claim and not necessarily reality. Matt's conclusion is that Claud didn't beat out gpt4 and is more expensive. He does point out that gpt won out in logic and dialog use but Claude did very well in the technical portion (centipede game)
I see. For some reason all his titles say shocking or breaking lately and I can’t keep track. On the consumer side, they are both $20 dollars a month, and I usually compare the consumer facing Chatbot and not the API. But I understand the point. I just don’t think any of us can have any claim of our own with a couple of hours of testing. I do remember Gemini had similar claims and I ended up disagreeing with every benchmark. So we will see
I added quotes too so it’s clear it’s their claim and not mine.
First ❤