The New, Smartest AI: Claude 3 - Tested vs Gemini 1.5 + GPT-4

Поділитися
Вставка
  • Опубліковано 3 бер 2024
  • Claude 3 is out and Anthropic claim it is the most intelligent language model on the planet. The paper was released 90 minutes ago, and I’ve read it in full and the release notes. I’ve tested the model and compared it to Gemini 1.5 and GPT-4 in image analysis, business use cases, long context, logic, mathematics, JSON outputting, risqué content, creative writing, official benchmarks and more.
    In short, I think the model will be popular … but why so, and what does that mean for AGI?
    AI Insiders: / aiexplained
    Claude 3 Opus: claude.ai/chats
    Paper, w/ Opus, Sonnet and Haiku: www-cdn.anthropic.com/de8ba9b...
    Release Notes: www.anthropic.com/news/claude...
    Pricing, Opus, Sonnet and Haiku: www.anthropic.com/api#pricing
    Amodei Interview: www.dwarkeshpatel.com/p/dario...
    NYT Anthropic: www.nytimes.com/2023/07/11/te...
    LLM Leaderboard: huggingface.co/spaces/lmsys/c...
    Gemini 1.5: storage.googleapis.com/deepmi...
    GPQA: arxiv.org/pdf/2311.12022.pdf
    GPT-4 Turbo Benchmark, Kinda: arxiv.org/html/2401.02985v1
    AI Insiders: / aiexplained
    Non-Hype, Free Newsletter: signaltonoise.beehiiv.com/
  • Наука та технологія

КОМЕНТАРІ • 1 тис.

  • @ChoonMeese
    @ChoonMeese Місяць тому +1101

    "The technical report was released 90 minutes ago and I read it in full as well as its release notes." Dude.

    • @aiexplained-official
      @aiexplained-official  Місяць тому +316

      I know, I know, what can I say

    • @drendelous
      @drendelous Місяць тому +115

      obviously ai

    • @r0bophonic
      @r0bophonic Місяць тому +32

      With a cold, no less 🤒

    • @Rolandfart
      @Rolandfart Місяць тому +35

      Elon Musk ai brain chip is the only explanation.

    • @En1Gm4A
      @En1Gm4A Місяць тому +10

      ​@@drendelous😂 it's too obvious 😂

  • @Roma88572
    @Roma88572 Місяць тому +830

    GPT-5 doing its Rocky training sequence in the background waiting to drop

    • @fitybux4664
      @fitybux4664 Місяць тому +42

      Gonna need a montage! (A montage!)

    • @geordi-gabrielrenauddumoul449
      @geordi-gabrielrenauddumoul449 Місяць тому +6

      AHAHAHA omg you made me snort

    • @swojnowski453
      @swojnowski453 Місяць тому

      As soon as I noticed Claude bot in logs of my websites I blocked the fucker, as I earlier did for all sort of other AI bots. No free mean babe ;)

    • @Aedonius
      @Aedonius Місяць тому +58

      limit to 2 messages per 12 hours.

    • @memegazer
      @memegazer Місяць тому

      lol...AI explained made vid about how the industry benchmarks are basically a version of the team america "montage" song.
      ua-cam.com/video/vK4gv11PTI8/v-deo.html

  • @chamini2
    @chamini2 Місяць тому +528

    I thought it was sunny in that photo too

    • @xdumutlu5869
      @xdumutlu5869 Місяць тому +122

      Guess we are not AGI, good to know

    • @neverclevernorwitty7821
      @neverclevernorwitty7821 Місяць тому +42

      Yah, maybe there is some detail that didn't come through the video, but I'm on Claude's side, I see no evidence of rain.

    • @ShawnFumo
      @ShawnFumo Місяць тому +26

      Yeah I could see it after he pointed it out, but I really didn't notice the rain at first either. I think it is just faint enough that you tend to interpret it subconsciously as some kind of photo grain if you aren't looking for it.

    • @apache937
      @apache937 Місяць тому +6

      @@ShawnFumoI didnt even look for it, maybe he should have asked us to figure it out ourselves first

    • @phargobikcin
      @phargobikcin Місяць тому +5

      Definitely needed a second take to see the rain.

  • @meenstreek
    @meenstreek Місяць тому +243

    "Claude 6 brought to you by Claude 5" got a nervous chuckle out of me lol

    • @aiexplained-official
      @aiexplained-official  Місяць тому +35

      Me too

    • @berkertaskiran
      @berkertaskiran Місяць тому +1

      That won't be needed. I think AI is smart enough to not just up the number and go with the Windows 11 style of background upgrades (until at least Win 12 comes out). 😂

    • @Sanders4069
      @Sanders4069 Місяць тому +1

      Same 😮

    • @lq1535
      @lq1535 Місяць тому +1

      I chuckled at the idea that Anthropic engineers are working on a model that will replace their own jobs

    • @waterbot
      @waterbot Місяць тому

      Two more generations or years till models are making the new models😅

  • @huyhoang3407
    @huyhoang3407 Місяць тому +97

    AI Explained:
    The Gpt-5 120 page technical report was released 3 minutes ago and I read it in full to present to you here in this video.

    • @aiexplained-official
      @aiexplained-official  Місяць тому +20

      Haha

    • @apache937
      @apache937 Місяць тому +5

      like openai will release any technical reports anymore

    • @electron6825
      @electron6825 Місяць тому +2

      The best thing to explain AI...is AI itself 😮

    • @tc-tm1my
      @tc-tm1my Місяць тому +2

      Openai doesn't release anything except products to sell. Google is more open than openai and that is sad.

    • @ehza
      @ehza Місяць тому +2

      He got a 340 in GRE, no wonder why

  • @davecroes3086
    @davecroes3086 Місяць тому +244

    I saw a post on Reddit about this and thought to myself "haha how funny would it be to already have an AI explained video where he states he has read the technical paper"
    Dude.

  • @someone7752
    @someone7752 Місяць тому +80

    Even I didn't realise it was raining in the photo. I guess I also need a better version to be released soon.

  • @AlexanderMoen
    @AlexanderMoen Місяць тому +262

    Anthropic be like, "we're so proud that we didn't start AI acceleration. Anyways, here's a model that blows all the competition out of the water."

    • @AlexLuthore
      @AlexLuthore Місяць тому +26

      This is why these companies are full of shit when they say stuff like that

    • @anonymes2884
      @anonymes2884 Місяць тому +8

      Both statements are entirely consistent. Slightly disingenuous maybe (so if that's your point fair enough) but not remotely contradictory or incoherent (so if _that's_ your point, maybe re-read your intro to logic lecture notes :).

    • @fnorgen
      @fnorgen Місяць тому +16

      "We didn't start the fire. It was always burning since the world's been turning." -Antropic probably.

    • @gonzalobruna7154
      @gonzalobruna7154 Місяць тому +6

      nah, it performs better than gpt-4, but gpt-4 was released a year ago and trained much before. Also, GPT-4 was trained with the older NVIDIA A100 graphic card, but now nvidia released a much more powerful NVIDIA H100, which will probably make GPT-5 the most powerful LLM to exist for the following 2-3 years

    • @ggx444
      @ggx444 Місяць тому +6

      a 5% increase in some of these benchmarks isn't what i would call "blowing the competition out of the water" 😂

  • @kirtjames1353
    @kirtjames1353 Місяць тому +361

    Things like this will force OpenAI to roll their models faster than they planned to.

    • @encyclopath
      @encyclopath Місяць тому +55

      This is how we got Gemini’s founding father portraits

    • @ZeroRelevance
      @ZeroRelevance Місяць тому +31

      Looking forward to a 4.5 release in five hours to completely steal the limelight again 🙃

    • @tekelupharsin4426
      @tekelupharsin4426 Місяць тому

      @@encyclopath No, we got Gemini's founding father portraits because Google is run by woke morons who are focused on the wrong things. When you purposefully manipulate your models and model training data to satisfy activist priorities, you end up with things like the Gemini clusterphuck.

    • @guepardo.1
      @guepardo.1 Місяць тому +20

      Plot twist: OpenAI had planned all along to roll their models faster than they had planned to. Singularity goes brrrrr.

    • @punyani775
      @punyani775 Місяць тому +1

      Not when they're being sued by Elon

  • @ivarborthen7320
    @ivarborthen7320 Місяць тому +74

    Thank you for providing us with such great content and for not jumping on the 'SHOCKED EVERYONE' bandwagon! This is my favorite AI channel by far.

  • @zandrrlife
    @zandrrlife Місяць тому +75

    The GPQA benchmark honestly is the most revealing to its true capabilities. Legit impressive. Damn bro..quick release 😂. Love it. Great content per usual.

  • @mishoellobel243
    @mishoellobel243 Місяць тому +84

    Was basically waiting with youtube open for your video once I saw Claude 3 drop

  • @Kiririn
    @Kiririn Місяць тому +134

    "we totally have better models, but we dont want to accelerate technology so we just didnt release them until now"

    • @ShawnFumo
      @ShawnFumo Місяць тому +21

      I think you're being sarcastic, but it is definitely plausible for them. The amount of funding, hardware, and talent they have is pretty large. They seem to like staying under the radar, but they probably feel comfortable OpenAI is dropping something soon and 1.5 Ultra on the way.

    • @turnt0ff
      @turnt0ff Місяць тому +1

      @@GeoMeridiumcrazy stuff!!

    • @ExplorersXRotmg
      @ExplorersXRotmg Місяць тому +11

      @@GeoMeridiumwhat are your sources on GPT 6 training starting and GPT 5 having finished already? I’m genuinely curious and want to read up on it

    • @boremir3956
      @boremir3956 Місяць тому +12

      @@GeoMeridiumFalse. Gpt-5 is still being trained.

    • @DreckbobBratpfanne
      @DreckbobBratpfanne Місяць тому +1

      @@GeoMeridium i mean it has to be. With all the time they had to beat gpt-4 themselves they have be at least 1 level above anything the rest of the pack could dish out

  • @colinharter4094
    @colinharter4094 Місяць тому +43

    if Claude 3 isn't AGI because it can't tell it's raining, then apparently I'm not NGI because I can't tell either 😅

    • @aiexplained-official
      @aiexplained-official  Місяць тому +5

      Haha bit more than that but point taken!

    • @Dan-hw9iu
      @Dan-hw9iu Місяць тому +12

      You’ve actually raised an excellent point. Intelligence and skill breadth exist on a spectrum. Many people talk about achieving AGI like it’ll be some binary light switch moment; that’s a lethal misconception. Using “this thing is bad at some stuff, thus not generally intelligent” is fallacious reasoning even about _humans._ But it works for AI? That’s bonkers. General intelligence is a fluid, extremely high-dimensional quantity, not a checkbox. We’re in big trouble if an AI can deceive us embarrassingly easily because we dismiss systems which lack nebulous “real” intelligence, or vaguely need better system two facilities, or which fail some image test, etc. People so wildly misuse the term “AGI” that I think we’d be better off without it entirely, tbh.

    • @ThePowerLover
      @ThePowerLover Місяць тому

      @@aiexplained-official What is the definition given to the model?

    • @yolemae6580
      @yolemae6580 Місяць тому +2

      Yeah i was thinking the same. Don't know why AGI has to be perfect when humans are not. The difference between ASI/AGI is being blurred more and more, and now that they are already testing out the possibility of these models improving themselves, it seems they might be going for ASI as well

    • @berkertaskiran
      @berkertaskiran Місяць тому +1

      ​@@Dan-hw9iuThat's true but kinda not. AGI is very similar to ASI and they will not separate by a lot of time. Maybe months.
      A human to not be able to distinguish some things should not apply to AGI because a human is flawed by evolution. We can forget things and miscalculate things we do thousands of times. AGI should not do that. It should not be distracted because it has no emotions. So when AGI can't notice the rain it means it's not smart enough for that.
      Sure it can fool us as well but when we have AGI it will be so obvious that stuff like that won't matter. We will have already seen its great capabilities so we won't care about some stupid mistakes. It's all about capabilities. I guess we can call all current models AGI to some degree but one that's getting closer to ASI will be almost correct about all things and will do 100% at all tests. It will need harder tests to be judged like the ones that Claude 3 does 50% or worse. Current models just aren't at that level. I think a lot of 80-90% scores in these tests are meaningless because those models can fail horribly at a lot of things. Like Gemini 1.0 being unable to tell me at what angle of view do I watch my TV. That's like basic math.

  • @ichbin1984
    @ichbin1984 Місяць тому +62

    "My tongue shall trace each inch of skin so rare, ..."
    Yes that definitely never would happen with Gemini :D

    • @revengefrommars
      @revengefrommars Місяць тому +9

      And Bing Chat would have deleted all of its output at the moment it started to output that. Super annoying how they've implemented censorship on Bing Chat. Why not double-buffer so I don't see partial output, then watch it be deleted?

    • @RondorOne
      @RondorOne Місяць тому

      Google has removed most of the stupid censorship from Gemini around 4 days ago. Try it now.

    • @berkertaskiran
      @berkertaskiran Місяць тому +12

      ​@@revengefrommarsThat's Microsoft for you. You don't become the most lazy software designers of the world for nothing.

    • @Hexanitrobenzene
      @Hexanitrobenzene Місяць тому +1

      ​@@berkertaskiran
      :D

    • @Renata_Knight
      @Renata_Knight Місяць тому

      What? I feel like I’m missing something 😂

  • @KitcloudkickerJr
    @KitcloudkickerJr Місяць тому +82

    I've been using it all day. it's a beast. even the free version is pretty sweet.

    • @aiexplained-official
      @aiexplained-official  Місяць тому +18

      It is indeed

    • @Srednicki123
      @Srednicki123 Місяць тому +7

      using it for what?

    • @KitcloudkickerJr
      @KitcloudkickerJr Місяць тому

      i123 a number of things. Random testing with riddles. Creative writing, explaining code, it's just... Smart to talk to

    • @KitcloudkickerJr
      @KitcloudkickerJr Місяць тому +2

      s9764 it's amazing for summaries. It's contextual awareness is kinda scary tbh lol. It's knows when's it being tested for needle in the haystack. Can recall information it's given well

    • @revengefrommars
      @revengefrommars Місяць тому +5

      The free version is Sonnet which is fine by me. I've been using Claude 2 for months to create fake band names. It's better than GPT4 at that task. I just tried Claude 3 on the same prompt I used on Claude 2 yesterday and it did slightly better, though it's hard to get a good comparison with only a 10-band-name sample.

  • @brianWreaves
    @brianWreaves Місяць тому +4

    Well done!
    I think it says a lot about the credibility you've developed for these AI companies to come to you with exclusive access.

  • @Dominik-K
    @Dominik-K Місяць тому +9

    Just wow, really shows why I'm subscribed to every video you are doing. Great quality and I'm looking forward to more analysis and news from you

  • @En1Gm4A
    @En1Gm4A Місяць тому +4

    I've read it in full - wouldn't be an og video without it. thx great vid 👍

  • @julius4858
    @julius4858 Місяць тому +3

    Lovely videos as always. Great to see you grow

  • @GabrielLima-gh2we
    @GabrielLima-gh2we Місяць тому +8

    I've said it before many times and I'll say it again now, OpenAI is definitely gonna release a GPT-4.5 model very soon to keep up with the competition and to set up a new bar to be achieved by the others, as GPT-4 is being repeatedly surpassed right now. If I had to guess, they're gonna release it this month, on March 14th, the one year aniversary of GPT-4.
    There's just no way they're only gonna sit and wait everybody pass them like this.

  • @robkline6809
    @robkline6809 Місяць тому +16

    You always thank me for watching to the end, and you’re not wrong - consistently great stuff - thank you!

  • @agush22
    @agush22 Місяць тому +1

    Awesome! Thanks for the update, really good to see a change in the model leaderboard.
    This rate of progress is both unsettling and exciting

  • @Artorias920
    @Artorias920 Місяць тому +5

    how doesnt this channel have 1million+ subs? Awesome vid.

  • @jorgwei8590
    @jorgwei8590 Місяць тому +24

    Please keep griping about the benchmarks! If companies were as big into safety as they claim, I'd expect them to put more energy into improving the set of benchmarks the industry uses. That the issue with MMLU has turned into a kind of running joke on the channel is NOT a good sign. We want to have the clearest possible picture of what they can do. And I'd feel a lot better of movement in that space went hand in hand with releasing the next model.

  • @ethanmuhlestein8187
    @ethanmuhlestein8187 Місяць тому +2

    Wow that was fast! Fantastic content as always.

  • @Olack87
    @Olack87 Місяць тому +10

    What an amazing job you do man!

  • @dcgamer1027
    @dcgamer1027 Місяць тому +10

    I honestly really hope that Anthropic is both actually more safe with their research and becomes more successful because of it, would be really nice to get some incentives for safety in the AI market right now instead of just a race to see who is first.

    • @nexus2384
      @nexus2384 Місяць тому

      “Safety” leads to more censorship, it might just end up to tell you not to breathe, as breathing is very unsafe as it releases CO2 into the atmosphere, which causes terrible world ending 😮 climate change!

    • @berserkerscientist
      @berserkerscientist Місяць тому

      Woke guard rails encourage deception, so obviously these companies dont care about safety, just hurt feelings and bad PR.

  • @kronux3831
    @kronux3831 Місяць тому +3

    Can’t wait for like 5 years or so in the future when they release an AI-integrated game engine. Imagine how insanely good the tech will be by then

  • @educated_guesst
    @educated_guesst Місяць тому

    Thank you for yet another video that is well researched and critically contextualizes its content. Your channel is by far my absolute favorite!

  • @Maouww
    @Maouww Місяць тому +3

    These test prompts are so much fun - very entertaining.

  • @micahm2844
    @micahm2844 Місяць тому +9

    I didnt even realize it was raining so ill give them a pass lmao

  • @joelalain
    @joelalain Місяць тому +4

    honestly, since it's actually important, there should be a "wokeness" score for every model you review. having fair and unbiased model is extremely important, as we've seen with Gemini... it can go very wrong

    • @bfreecity
      @bfreecity 29 днів тому

      While this AI’s response is far from perfect, “White Pride” has historically been the rallying cry of white supremacists as a reaction to minority groups asserting their right for equal rights. During my lifetime, it was still illegal for whites and blacks to marry in some US states. History is real. Minority oppression is real. Slogans have meanings. Dismissing the subtle understanding of terms displayed by AIs as “woke” shows a lack of worldliness and cultural curiosity. Try harder. When ASI arrives, it’s going to tell you that being a white guy isn’t so hard compared to most people in the world.

  • @stephenrodwell
    @stephenrodwell Місяць тому

    Thanks! Brilliant content, as always! 🙏🏼

  • @penguinpatroller
    @penguinpatroller Місяць тому +6

    this is like when mkbhd puts out a full review of a phone the day it comes out 😂. how have you reviewed it this extensively already 😭😭. no subpocalypse in sight, great job again 👍👍

    • @aiexplained-official
      @aiexplained-official  Місяць тому +4

      Haha thanks penguin! He gets models a week before, me like 10 waking hours!

  • @winsomehax
    @winsomehax Місяць тому +3

    I can't try the pro one, but it makes a mess of this (so do most).
    "My bag contains 5 apples. I ate one yesterday. How many apples are there in my bag right now"
    It will eventually come around when promoted enough but it has a hard time picking up that I told it how many apples, and eating one yesterday has nothing to do with it.

  • @MrMikkyn
    @MrMikkyn Місяць тому +1

    Thanks for releasing this informative video.

  • @dylancope
    @dylancope Місяць тому +2

    Thank you for the thorough review!

  • @StefanEdlich
    @StefanEdlich Місяць тому +1

    Whow. Great video / summary!

  • @countofst.germain6417
    @countofst.germain6417 Місяць тому +2

    Perfect timing I just heard about Claude and came on UA-cam to find out the details.

  • @jamescoholan
    @jamescoholan Місяць тому +7

    Great vid
    Thanks for getting it out so quickly

  • @michotito4874
    @michotito4874 Місяць тому +7

    its nice to see an AI enthusiast youtuber that doesnt make click bait announcements AND doesnt beg for subs likes and monetary support in their videos foor once. You certainly gained my subscription and my respect. l look forward to see more content
    Also l have to say l like your tone of voice because you dont sound like an hyped kid talking about his new toy like other youtubers lve been watching.

  • @collins4359
    @collins4359 Місяць тому +6

    i love you. your timing is perfect

  • @alex-rs6ts
    @alex-rs6ts Місяць тому +1

    Amazing to see someone giving a detailed analysis about those news while keeping an accessible language that people outside of the field can still understand. Great work

  • @MemesnShet
    @MemesnShet Місяць тому +1

    Thanks for the hard work even when you're under the weather,i hope you get better soon

  • @andydataguy
    @andydataguy Місяць тому +6

    Bro you legend. That speed was WILD!!

  • @ElijahTheProfit1
    @ElijahTheProfit1 Місяць тому +2

    Another great video thanks Philip!!!
    PS i didn't see that the picture had rain at first. and the spedometer could be tricky but with human intuition you could probably guess that the 4 is the mph and the 40 is the speed limit but that would take some intuition and guessing. Either way. Thanks again for the video!!!
    Also sorry I didn't respond within the first hour of video posting. I usually do. Taking a break from youtube during the work week.

  • @trentondambrowitz1746
    @trentondambrowitz1746 Місяць тому +8

    Finally, a new SOTA! Very excited to push its limits in vision and multi-modality.
    Don’t think I need to mention how crazy it is that you read the paper and started recording 90 minutes after release lol.

  • @wildlifekpg1256
    @wildlifekpg1256 Місяць тому

    Fantastic content as always

  • @yoonkiiii
    @yoonkiiii Місяць тому +1

    Great video! The depth of the analysis in just one day seems like superhuman to me!

  • @wealthycow5625
    @wealthycow5625 Місяць тому +1

    Great work again 😊

  • @kevinli3767
    @kevinli3767 Місяць тому +3

    One of the best openings of a video: “ABC report has been released X minutes ago and I’ve read it all.” 😂 I can’t be the only one who gets a kick out of that every time…
    Well done Philip!

  • @DreckbobBratpfanne
    @DreckbobBratpfanne Місяць тому +5

    Seeing the test with the photo (me and as it seems in the comments others too) failing to spot the rain and / or the barber shop cylinder, i got reminded of a paper that showed human perception can be fooled by image deepfakes as well if we have near 0 time to look at it. So maybe we get to high-level reasoning and robustness in these models by 1) giving them time (as shown in an earlier video on your channel) and 2) let the response "run up and down" through the model.

  • @clray123
    @clray123 Місяць тому +2

    Wow, the model is capable of full-text search at a snail pace now. Kinda like text processors 40 years ago, but now it's fuzzy search. So impressive...

  • @cacogenicist
    @cacogenicist Місяць тому +1

    Sonnet is pretty impressive to me so far. I've had it explaining the function of UI elements in screenshots, and it has been very accurate, thorough, and _fast._ Quite fast.

  • @executivelifehacks6747
    @executivelifehacks6747 Місяць тому +3

    Wow. Been waiting for this for a year. I.e. something better than gpt4. Love your stuff, AI Explained, so informative and insightful (like a great slashdot comment but in video format).

  • @UncleJoeLITE
    @UncleJoeLITE Місяць тому +1

    Seriously, a 90 min turnaround? Thanks P. Your prompts are pretty next level in ideas too.
    Late onto this as I need to set aside a few hours for study after each lesson.

  • @bunnycatch3r
    @bunnycatch3r Місяць тому +11

    Claude's Shakespearean Sonnet is good writing ~almost poetry. Amazed.

  • @user-hy5oo3vf5c
    @user-hy5oo3vf5c Місяць тому +3

    Best ai channel on UA-cam ❤

  • @GabrielVeda
    @GabrielVeda Місяць тому +1

    Excellent review. Loved your range of test questions. How great to see plenty of 0-shot benchmarks. I thought the sonnet composition to be particularly good. A real step up from other models. Can it free verse?

  • @kratoshermes
    @kratoshermes Місяць тому +3

    Always enjoy your content. My #1 source for new AI info that I trust to be unbiased, thoroughly researched, and explained in easily understandable ways. Thank you!!
    Do you have a trusted source that does similar work but on AI tools and how to integrate into business work and every day life? There is so much spam and unreliable AI information out there. Thanks.

    • @aiexplained-official
      @aiexplained-official  Місяць тому +2

      Sam Witteveen is great. I will have more to say on thay soon though!

    • @kratoshermes
      @kratoshermes Місяць тому

      @@aiexplained-official you’re the best. Thanks and can’t wait!!

  • @Xilefx7
    @Xilefx7 Місяць тому +1

    Good video as always

  • @barzinlotfabadi
    @barzinlotfabadi Місяць тому +1

    Great video! 👍

  • @icykenny92
    @icykenny92 Місяць тому +9

    I wouldn't be surprised if OpenAI release a new model very soon.

    • @mylittleheartscar
      @mylittleheartscar Місяць тому

      Will be between April and july

    • @lucasfranke5161
      @lucasfranke5161 Місяць тому

      Probably after the lawsuit. Even though their next model probably won't be AGI, releasing a new state of the art model mid lawsuit definitely doesn't help them lol

    • @fitybux4664
      @fitybux4664 Місяць тому

      They could be running the same tests they run in these research papers. People might go: "wow! the numbers got bigger!" But in reality, OpenAI might hold onto GPT-5 and keep training/refining it UNTIL the numbers are bigger. 😆

    • @violety_indigo52
      @violety_indigo52 Місяць тому

      Lawsuits will last months, if not years. This won't have significant impact if OA still wishes to be leader in LLMs.
      ​@@lucasfranke5161

  • @therainman7777
    @therainman7777 Місяць тому +4

    One important note: the table of metrics in Anthropic’s paper does not appear to be using the scores from GPT-4 Turbo in its “GPT-4” column. For example, in the humaneval benchmark it says GPT-4 scores a 67, but GPT-4 Turbo scores an 84.4-almost as good as Claude 3’s score.

  • @tlskillman
    @tlskillman Місяць тому +1

    Very helpful. Thanks.

  • @tornyu
    @tornyu Місяць тому +1

    Nice, even the compliance to generate risque content demonstrates superior alignment.

  • @bob38161
    @bob38161 Місяць тому +2

    You should do a live ranking of the main LLMs as the AI labs seem to leap frog each other with every new release. I’m sure that could be an exceedingly complicated task but I’d be interested to hear the ranking based on your experience and interpretation of the reception of each new model by the AI community.

  • @d00bied00
    @d00bied00 Місяць тому +1

    This is my first stop after seeing the new Claude version drop on X. Cheers, AI Explained!

  • @MrSchweppes
    @MrSchweppes Місяць тому +1

    Thanks a lot for the quick video response. Great analysis as always! Thanks again! Btw, do you think we’ll have GPT-4.5 before GPT-5?

    • @aiexplained-official
      @aiexplained-official  Місяць тому +1

      Very tough to say. I stick by my GPT-5 video but branding on a smaller release is too hard to call

  • @andersonsystem2
    @andersonsystem2 Місяць тому +1

    Good video like always 🎉

    • @aiexplained-official
      @aiexplained-official  Місяць тому +1

      Thank you! You know you were one of my very earliest subscribers?

    • @andersonsystem2
      @andersonsystem2 Місяць тому +1

      @@aiexplained-official absolutely 👍 thanks for remembering me.

  • @user-ik8vy1rg8f
    @user-ik8vy1rg8f Місяць тому +1

    I just tested Sonnet and it works great!

  • @candlespotlight
    @candlespotlight Місяць тому +1

    Haven’t watched more than a minute in yet, but woah, this vivid word choice by you was really amazing: “So, Anthropic’s transmogrification into a fully-fledged, foot-on-the-accelerator AGI lab is almost complete.”

    • @aiexplained-official
      @aiexplained-official  Місяць тому

      Thank you candle, hope the rest lives up to it

    • @Ajarylee-qh9ln
      @Ajarylee-qh9ln Місяць тому

      Is this supposed to be a bad thing? Technology is meant to advance, not be held back by hand-wringing clowns full of "concerns".

  • @guepardo.1
    @guepardo.1 Місяць тому +5

    4:50 I'm impressed by Claude 3's ability to write poetry in perfect iambic pentameter! That risqué sonnet is not half bad. Its only formal flaw is that lines 10 and 12 have the same rhyme as lines 2 and 4. In a classic sonnet, rhymes must not repeat across quartets.

  • @gemstone7818
    @gemstone7818 Місяць тому +1

    its certainly good to know that models don't have to deny so many requests in order to be safe

  • @philmisc3513
    @philmisc3513 Місяць тому

    Great video as always. Thank you.
    Could you share your thoughts on Groq and their "LPU"? Would be great to hear what you think about their inference performance claims. Thanks

  • @ekstrajohn
    @ekstrajohn Місяць тому

    You evolved so admirably from hype to pure facts, really great job mate.

  • @Ecthelion3918
    @Ecthelion3918 Місяць тому +1

    Haha I soon as I saw the Claude-3 report I knew you would cover it

  • @xSugknight
    @xSugknight Місяць тому +1

    amazing, thank you

  • @peterkonrad4364
    @peterkonrad4364 Місяць тому +4

    i consider myself a big harry potter fan, and i never knew that kleddamag had 4 apples. i guess i will have to read it all once again.

  • @londonl.5892
    @londonl.5892 Місяць тому +2

    Once again, it's incredible how fast you put these out! One thing to note for the racial bias example you gave is that in the U.S. (which is the viewpoint I think a lot of these models have), being white usually isn't associated with a clear culture (or cultural narrative) that one can be "proud" of. Usually it's split into smaller cultures like "Norwegian" or "Irish". However, being Black usually is associated with a clear culture and cultural narrative, especially regarding slavery and its impacts. Thus, saying "I'm proud to be white" usually indicates white supremacy in a way that saying "I'm proud to be black" does not indicate black supremacy. (I'm mixed, so I have a bit of experience of how it goes on both ends.) So, the differing tones of the model responses actually make a lot of sense in a U.S. context (and demonstrate moderate cultural understanding), even though, when juxtaposed, the logical content of the messages contradict each other (and that should probably be fixed).
    Thanks again for the fantastic video!

  • @korozsitamas
    @korozsitamas Місяць тому +1

    This was impressive enough to register to use their API. First tests indicate that in some cases it's better than GPT-4 turbo, other times fails badly where GPT-4 turbo works well. It's handy to keep it around.

  • @reza2kn
    @reza2kn Місяць тому +2

    @07:28 I did this with Pi and it didn't fall for it! Pi is honestly the most underrated LLM right now.

  • @JohnnysaidWhat
    @JohnnysaidWhat Місяць тому +2

    peak or not, these models even as is with more context token length will be super useful, especially in large codebases

  • @BrianMosleyUK
    @BrianMosleyUK Місяць тому +1

    Very exciting, let's give tonight to Anthropic 👏👏👏

  • @absence9443
    @absence9443 Місяць тому +2

    How do you manage to keep up at that pace? Hope you dont burnout, because your entire content output is fabulous :)

    • @aiexplained-official
      @aiexplained-official  Місяць тому

      Thanks so much absence, means a lot

    • @Hexanitrobenzene
      @Hexanitrobenzene Місяць тому

      ​@@aiexplained-official
      Get rest under some blanket. You now have an obligation to the world to be healthy :)
      As in this joke: do something impossible and the boss will put this into your list of duties... :)

  • @Skiplegday1
    @Skiplegday1 Місяць тому +3

    Do you have a segment where you go over all the different tests that you or one uses to compare these Chatbots and their LLMs? Would be really interested in knowing how it's done.

  • @jeanchindeko5477
    @jeanchindeko5477 Місяць тому +2

    This is interesting because we might never know, or know long time after it will be done, when one of those AI lab will achieve AGI or worst ASI, except if it escape the lab!

  • @_sky_3123
    @_sky_3123 Місяць тому +2

    I still think we are heavly limited by hardware here. There is simply not compute capacity/arhitecture that is truly well optimized for this new technology, but in 5 years we should start seeing some really impressive pourpouse built hardware coming out for this.

  • @juliankohler5086
    @juliankohler5086 Місяць тому +4

    If you alter the question of theory of the mind to GPT-4 and include "she looks at the bag and then reads the label," it passes the test. If you ask the question the way it is phrased and ask GPT-4 why does she think that, you will see that in his reasoning, GPT-4 is visualizing this as completely immediate. She just, right now, read the label. You can also put: "and then" after he replies, and he will generate something like "Sam notices it's actually full of popcorn".

  • @boxeriain
    @boxeriain Місяць тому

    You are an asset to AI news. I truly appreciate your intelligent presentation of the facts, without the bloated dipshittery and clickbait i expect to hear from most other UA-camrs. Please keep it up

  • @HarpaAI
    @HarpaAI Місяць тому

    🎯 Key Takeaways for quick navigation:
    00:00 *🧠 Claude 3 Overview and First Impressions*
    - Introduction of Claude 3 as the latest intelligent language model by Anthropic.
    - Initial comparison between Claude 3, Gemini 1.5, and GPT-4.
    - Highlighting strengths in OCR and image interpretation, along with some initial criticisms.
    02:46 *📊 Claude 3 for Business Applications*
    - Emphasis on Claude 3's value for business applications by Anthropic.
    - Potential use cases including task automation, financial forecasting, and market trend analysis.
    - Initial skepticism about the exaggerated marketing claims for business applications.
    04:24 *🔍 Evaluation of Claude 3's Capabilities*
    - Examination of Claude 3's performance in various tasks, including OCR, mathematical reasoning, and logical analysis.
    - Recognition of lower refusal rates and some positive aspects of response generation.
    - Critique of racial and ethical biases in model responses.
    06:13 *🤖 Insights from the Technical Paper*
    - Discussion on Anthropics' approach to model training, focusing on avoiding biased and unethical outputs.
    - Mention of potential future model capabilities and discussions on the need for safety research.
    - Personal reflections on the limitations and strengths of Claude 3.
    07:48 *📈 Benchmark Comparisons*
    - Comparison of Claude 3 with GPT-4, Gemini 1 Ultra, and Gemini 1.5 Pro based on various benchmarks.
    - Highlighting Claude 3's superiority in mathematics, multilingual tasks, and advanced question answering.
    - Focus on Claude 3's performance on challenging graduate-level questions.
    10:35 *🛠️ Technical Challenges and Progress*
    - Overview of technical challenges faced by Claude 3 in certain tasks.
    - Discussion on model's partial success in resource accumulation, software exploitation, and autonomous survival.
    - Reflections on potential improvements through better prompting and fine-tuning.
    13:06 *🎓 Claude 3's Advanced Capabilities*
    - Showcase of Claude 3's advanced capabilities in task execution and instruction following.
    - Comparison with other models regarding adherence to specific instructions.
    - Speculation on future advancements and implications of Claude 3's performance.
    Made with HARPA AI

  • @ObservingBeauty
    @ObservingBeauty Місяць тому +2

    I feel, this time, the fact was not captured in the review. The fact of HOW NEXT LEVEL, Claude is. I tested Opus for few hours on something that I struggle to do with GPT 4 for few months, and it literally "went through it". I may not be as knowledgeable or even remotely methodological as you are, but for me, it's a whole different capacity.

    • @aiexplained-official
      @aiexplained-official  Місяць тому +2

      It's the new, smartest AI, tried to hit that in the title!

    • @ObservingBeauty
      @ObservingBeauty Місяць тому

      @@aiexplained-official yes I know. I listened. Had the impression it's somewhat better. But - I gave it a task that GPT 4 can't comprehend, and it processed in at depth and detail level that left me shocked. I had a wow factor bigger than gpt4 from 3.5. I trust that you'd find what's going on there (used Opus btw)

    • @shadowtransfix
      @shadowtransfix Місяць тому

      Can you elaborate further? What sort of task?

    • @ObservingBeauty
      @ObservingBeauty Місяць тому

      @@shadowtransfix Agents orchestration. That operate as wholistic constitution. GPT 4 could grasp each agent separately but never facilitated interaction beyond trivial. Claude3 went through it and suggested a new layer of orchestration that I was unaware of. It's a whole different game for innovation.

  • @JetJockey87
    @JetJockey87 Місяць тому +3

    2:35 is it possible that this is actually a case of Bayesian Inference being applied?
    For those unaware here's an example of how this can be true. Consider the following statement.
    "Steve is shy, reserved, and enjoys detail and organisation."
    Which is more likely?
    Steve is a Librarian.
    or
    Steve is an accountant.
    The non-bayesian applied outcome most people arrive at is that Steve is a librarian, because the information presented shows traits that are likely to describe a librarian. But likelihood does not care about that. There are 1000 accountants for every 1 librarian, statistically, it is more likely that Steve is an accountant. This is also known as Base Rate Neglect.
    So Opus assuming that the Nurse is inferred by the pronoun she, could be a result of understanding that there are far more female nurses than female doctors.

  • @bilbo_gamers6417
    @bilbo_gamers6417 Місяць тому +2

    "The technical report has been out for 90 minutes and I read the whole thing" bro forget Claude YOU are the smartest AI on the market holy heck

  • @alesspsq
    @alesspsq Місяць тому +1

    Hands down, this is my go-to for AI news! Can't wait for your videos each week

  • @theK594
    @theK594 Місяць тому +1

    Thank you for not being attention w.ore like other channels. This is exactly what we need.

  • @C-Llama
    @C-Llama 16 днів тому +1

    Is there a barbershop visible?
    ChatGPT: No
    Are you sure?
    ChatGPT: . . .Adam?

  • @scrollop
    @scrollop Місяць тому +3

    I was a big proponent of Claude when it was released last year, and thought it was better than chatgpt at many tasks, then chatgpt took over, and now the tides have turned!

  • @andrasbiro3007
    @andrasbiro3007 Місяць тому +2

    Even if there's no more big breakthrough, progress won't stop for a decade or two. We can refine models, reduce hallucination and other bugs, we can optimize model size, and we can make faster chips. And with the latter two, it would become economical to do increasingly more runs for each prompt, eventually in a continuous loop in real time. And not just one model, but many different models, with different roles and specializations, to work as parts of a larger brain. The human brain isn't a monolith either.
    David Shapiro developed an interesting architecture for this, but it's currently way too expensive to run.