The New, Smartest AI: Claude 3 - Tested vs Gemini 1.5 + GPT-4
Вставка
- Опубліковано 3 бер 2024
- Claude 3 is out and Anthropic claim it is the most intelligent language model on the planet. The paper was released 90 minutes ago, and I’ve read it in full and the release notes. I’ve tested the model and compared it to Gemini 1.5 and GPT-4 in image analysis, business use cases, long context, logic, mathematics, JSON outputting, risqué content, creative writing, official benchmarks and more.
In short, I think the model will be popular … but why so, and what does that mean for AGI?
AI Insiders: / aiexplained
Claude 3 Opus: claude.ai/chats
Paper, w/ Opus, Sonnet and Haiku: www-cdn.anthropic.com/de8ba9b...
Release Notes: www.anthropic.com/news/claude...
Pricing, Opus, Sonnet and Haiku: www.anthropic.com/api#pricing
Amodei Interview: www.dwarkeshpatel.com/p/dario...
NYT Anthropic: www.nytimes.com/2023/07/11/te...
LLM Leaderboard: huggingface.co/spaces/lmsys/c...
Gemini 1.5: storage.googleapis.com/deepmi...
GPQA: arxiv.org/pdf/2311.12022.pdf
GPT-4 Turbo Benchmark, Kinda: arxiv.org/html/2401.02985v1
AI Insiders: / aiexplained
Non-Hype, Free Newsletter: signaltonoise.beehiiv.com/ - Наука та технологія
"The technical report was released 90 minutes ago and I read it in full as well as its release notes." Dude.
I know, I know, what can I say
obviously ai
With a cold, no less 🤒
Elon Musk ai brain chip is the only explanation.
@@drendelous😂 it's too obvious 😂
GPT-5 doing its Rocky training sequence in the background waiting to drop
Gonna need a montage! (A montage!)
AHAHAHA omg you made me snort
As soon as I noticed Claude bot in logs of my websites I blocked the fucker, as I earlier did for all sort of other AI bots. No free mean babe ;)
limit to 2 messages per 12 hours.
lol...AI explained made vid about how the industry benchmarks are basically a version of the team america "montage" song.
ua-cam.com/video/vK4gv11PTI8/v-deo.html
I thought it was sunny in that photo too
Guess we are not AGI, good to know
Yah, maybe there is some detail that didn't come through the video, but I'm on Claude's side, I see no evidence of rain.
Yeah I could see it after he pointed it out, but I really didn't notice the rain at first either. I think it is just faint enough that you tend to interpret it subconsciously as some kind of photo grain if you aren't looking for it.
@@ShawnFumoI didnt even look for it, maybe he should have asked us to figure it out ourselves first
Definitely needed a second take to see the rain.
"Claude 6 brought to you by Claude 5" got a nervous chuckle out of me lol
Me too
That won't be needed. I think AI is smart enough to not just up the number and go with the Windows 11 style of background upgrades (until at least Win 12 comes out). 😂
Same 😮
I chuckled at the idea that Anthropic engineers are working on a model that will replace their own jobs
Two more generations or years till models are making the new models😅
AI Explained:
The Gpt-5 120 page technical report was released 3 minutes ago and I read it in full to present to you here in this video.
Haha
like openai will release any technical reports anymore
The best thing to explain AI...is AI itself 😮
Openai doesn't release anything except products to sell. Google is more open than openai and that is sad.
He got a 340 in GRE, no wonder why
I saw a post on Reddit about this and thought to myself "haha how funny would it be to already have an AI explained video where he states he has read the technical paper"
Dude.
haha
Man's the MKBHD version of AI news.
Even I didn't realise it was raining in the photo. I guess I also need a better version to be released soon.
🤣🤣🤣
Anthropic be like, "we're so proud that we didn't start AI acceleration. Anyways, here's a model that blows all the competition out of the water."
This is why these companies are full of shit when they say stuff like that
Both statements are entirely consistent. Slightly disingenuous maybe (so if that's your point fair enough) but not remotely contradictory or incoherent (so if _that's_ your point, maybe re-read your intro to logic lecture notes :).
"We didn't start the fire. It was always burning since the world's been turning." -Antropic probably.
nah, it performs better than gpt-4, but gpt-4 was released a year ago and trained much before. Also, GPT-4 was trained with the older NVIDIA A100 graphic card, but now nvidia released a much more powerful NVIDIA H100, which will probably make GPT-5 the most powerful LLM to exist for the following 2-3 years
a 5% increase in some of these benchmarks isn't what i would call "blowing the competition out of the water" 😂
Things like this will force OpenAI to roll their models faster than they planned to.
This is how we got Gemini’s founding father portraits
Looking forward to a 4.5 release in five hours to completely steal the limelight again 🙃
@@encyclopath No, we got Gemini's founding father portraits because Google is run by woke morons who are focused on the wrong things. When you purposefully manipulate your models and model training data to satisfy activist priorities, you end up with things like the Gemini clusterphuck.
Plot twist: OpenAI had planned all along to roll their models faster than they had planned to. Singularity goes brrrrr.
Not when they're being sued by Elon
Thank you for providing us with such great content and for not jumping on the 'SHOCKED EVERYONE' bandwagon! This is my favorite AI channel by far.
Godamm I hate that guy
The GPQA benchmark honestly is the most revealing to its true capabilities. Legit impressive. Damn bro..quick release 😂. Love it. Great content per usual.
Was basically waiting with youtube open for your video once I saw Claude 3 drop
Nice
"we totally have better models, but we dont want to accelerate technology so we just didnt release them until now"
I think you're being sarcastic, but it is definitely plausible for them. The amount of funding, hardware, and talent they have is pretty large. They seem to like staying under the radar, but they probably feel comfortable OpenAI is dropping something soon and 1.5 Ultra on the way.
@@GeoMeridiumcrazy stuff!!
@@GeoMeridiumwhat are your sources on GPT 6 training starting and GPT 5 having finished already? I’m genuinely curious and want to read up on it
@@GeoMeridiumFalse. Gpt-5 is still being trained.
@@GeoMeridium i mean it has to be. With all the time they had to beat gpt-4 themselves they have be at least 1 level above anything the rest of the pack could dish out
if Claude 3 isn't AGI because it can't tell it's raining, then apparently I'm not NGI because I can't tell either 😅
Haha bit more than that but point taken!
You’ve actually raised an excellent point. Intelligence and skill breadth exist on a spectrum. Many people talk about achieving AGI like it’ll be some binary light switch moment; that’s a lethal misconception. Using “this thing is bad at some stuff, thus not generally intelligent” is fallacious reasoning even about _humans._ But it works for AI? That’s bonkers. General intelligence is a fluid, extremely high-dimensional quantity, not a checkbox. We’re in big trouble if an AI can deceive us embarrassingly easily because we dismiss systems which lack nebulous “real” intelligence, or vaguely need better system two facilities, or which fail some image test, etc. People so wildly misuse the term “AGI” that I think we’d be better off without it entirely, tbh.
@@aiexplained-official What is the definition given to the model?
Yeah i was thinking the same. Don't know why AGI has to be perfect when humans are not. The difference between ASI/AGI is being blurred more and more, and now that they are already testing out the possibility of these models improving themselves, it seems they might be going for ASI as well
@@Dan-hw9iuThat's true but kinda not. AGI is very similar to ASI and they will not separate by a lot of time. Maybe months.
A human to not be able to distinguish some things should not apply to AGI because a human is flawed by evolution. We can forget things and miscalculate things we do thousands of times. AGI should not do that. It should not be distracted because it has no emotions. So when AGI can't notice the rain it means it's not smart enough for that.
Sure it can fool us as well but when we have AGI it will be so obvious that stuff like that won't matter. We will have already seen its great capabilities so we won't care about some stupid mistakes. It's all about capabilities. I guess we can call all current models AGI to some degree but one that's getting closer to ASI will be almost correct about all things and will do 100% at all tests. It will need harder tests to be judged like the ones that Claude 3 does 50% or worse. Current models just aren't at that level. I think a lot of 80-90% scores in these tests are meaningless because those models can fail horribly at a lot of things. Like Gemini 1.0 being unable to tell me at what angle of view do I watch my TV. That's like basic math.
"My tongue shall trace each inch of skin so rare, ..."
Yes that definitely never would happen with Gemini :D
And Bing Chat would have deleted all of its output at the moment it started to output that. Super annoying how they've implemented censorship on Bing Chat. Why not double-buffer so I don't see partial output, then watch it be deleted?
Google has removed most of the stupid censorship from Gemini around 4 days ago. Try it now.
@@revengefrommarsThat's Microsoft for you. You don't become the most lazy software designers of the world for nothing.
@@berkertaskiran
:D
What? I feel like I’m missing something 😂
I've been using it all day. it's a beast. even the free version is pretty sweet.
It is indeed
using it for what?
i123 a number of things. Random testing with riddles. Creative writing, explaining code, it's just... Smart to talk to
s9764 it's amazing for summaries. It's contextual awareness is kinda scary tbh lol. It's knows when's it being tested for needle in the haystack. Can recall information it's given well
The free version is Sonnet which is fine by me. I've been using Claude 2 for months to create fake band names. It's better than GPT4 at that task. I just tried Claude 3 on the same prompt I used on Claude 2 yesterday and it did slightly better, though it's hard to get a good comparison with only a 10-band-name sample.
Well done!
I think it says a lot about the credibility you've developed for these AI companies to come to you with exclusive access.
Just wow, really shows why I'm subscribed to every video you are doing. Great quality and I'm looking forward to more analysis and news from you
Thanks Dominik!
I've read it in full - wouldn't be an og video without it. thx great vid 👍
Lovely videos as always. Great to see you grow
Thanks Julius!
I've said it before many times and I'll say it again now, OpenAI is definitely gonna release a GPT-4.5 model very soon to keep up with the competition and to set up a new bar to be achieved by the others, as GPT-4 is being repeatedly surpassed right now. If I had to guess, they're gonna release it this month, on March 14th, the one year aniversary of GPT-4.
There's just no way they're only gonna sit and wait everybody pass them like this.
You always thank me for watching to the end, and you’re not wrong - consistently great stuff - thank you!
Thanks rob!
Awesome! Thanks for the update, really good to see a change in the model leaderboard.
This rate of progress is both unsettling and exciting
It is agush
how doesnt this channel have 1million+ subs? Awesome vid.
Thanks Artorias, I wish!
Please keep griping about the benchmarks! If companies were as big into safety as they claim, I'd expect them to put more energy into improving the set of benchmarks the industry uses. That the issue with MMLU has turned into a kind of running joke on the channel is NOT a good sign. We want to have the clearest possible picture of what they can do. And I'd feel a lot better of movement in that space went hand in hand with releasing the next model.
Wow that was fast! Fantastic content as always.
Thanks ethan!
What an amazing job you do man!
Thanks so much Olack, means a lot
I honestly really hope that Anthropic is both actually more safe with their research and becomes more successful because of it, would be really nice to get some incentives for safety in the AI market right now instead of just a race to see who is first.
“Safety” leads to more censorship, it might just end up to tell you not to breathe, as breathing is very unsafe as it releases CO2 into the atmosphere, which causes terrible world ending 😮 climate change!
Woke guard rails encourage deception, so obviously these companies dont care about safety, just hurt feelings and bad PR.
Can’t wait for like 5 years or so in the future when they release an AI-integrated game engine. Imagine how insanely good the tech will be by then
Thank you for yet another video that is well researched and critically contextualizes its content. Your channel is by far my absolute favorite!
These test prompts are so much fun - very entertaining.
I didnt even realize it was raining so ill give them a pass lmao
honestly, since it's actually important, there should be a "wokeness" score for every model you review. having fair and unbiased model is extremely important, as we've seen with Gemini... it can go very wrong
While this AI’s response is far from perfect, “White Pride” has historically been the rallying cry of white supremacists as a reaction to minority groups asserting their right for equal rights. During my lifetime, it was still illegal for whites and blacks to marry in some US states. History is real. Minority oppression is real. Slogans have meanings. Dismissing the subtle understanding of terms displayed by AIs as “woke” shows a lack of worldliness and cultural curiosity. Try harder. When ASI arrives, it’s going to tell you that being a white guy isn’t so hard compared to most people in the world.
Thanks! Brilliant content, as always! 🙏🏼
this is like when mkbhd puts out a full review of a phone the day it comes out 😂. how have you reviewed it this extensively already 😭😭. no subpocalypse in sight, great job again 👍👍
Haha thanks penguin! He gets models a week before, me like 10 waking hours!
I can't try the pro one, but it makes a mess of this (so do most).
"My bag contains 5 apples. I ate one yesterday. How many apples are there in my bag right now"
It will eventually come around when promoted enough but it has a hard time picking up that I told it how many apples, and eating one yesterday has nothing to do with it.
The pro one aced that, I tried
Thanks for releasing this informative video.
Thank you for the thorough review!
And thanks for the comment Dylan
Whow. Great video / summary!
Perfect timing I just heard about Claude and came on UA-cam to find out the details.
Great vid
Thanks for getting it out so quickly
its nice to see an AI enthusiast youtuber that doesnt make click bait announcements AND doesnt beg for subs likes and monetary support in their videos foor once. You certainly gained my subscription and my respect. l look forward to see more content
Also l have to say l like your tone of voice because you dont sound like an hyped kid talking about his new toy like other youtubers lve been watching.
Haha thanks
please theres nothing to thank me for but l appreciate your kind words
i love you. your timing is perfect
Aw thanks collins
Amazing to see someone giving a detailed analysis about those news while keeping an accessible language that people outside of the field can still understand. Great work
Thank you Alex
Thanks for the hard work even when you're under the weather,i hope you get better soon
Thanks memes
Bro you legend. That speed was WILD!!
Another great video thanks Philip!!!
PS i didn't see that the picture had rain at first. and the spedometer could be tricky but with human intuition you could probably guess that the 4 is the mph and the 40 is the speed limit but that would take some intuition and guessing. Either way. Thanks again for the video!!!
Also sorry I didn't respond within the first hour of video posting. I usually do. Taking a break from youtube during the work week.
Thank Elijah and no worries!
Finally, a new SOTA! Very excited to push its limits in vision and multi-modality.
Don’t think I need to mention how crazy it is that you read the paper and started recording 90 minutes after release lol.
What's sota
@@WilliamsDarkohstate of the art
He said he got access the previous night.
Fantastic content as always
Great video! The depth of the analysis in just one day seems like superhuman to me!
Thanks yoon!
Great work again 😊
One of the best openings of a video: “ABC report has been released X minutes ago and I’ve read it all.” 😂 I can’t be the only one who gets a kick out of that every time…
Well done Philip!
Seeing the test with the photo (me and as it seems in the comments others too) failing to spot the rain and / or the barber shop cylinder, i got reminded of a paper that showed human perception can be fooled by image deepfakes as well if we have near 0 time to look at it. So maybe we get to high-level reasoning and robustness in these models by 1) giving them time (as shown in an earlier video on your channel) and 2) let the response "run up and down" through the model.
Wow, the model is capable of full-text search at a snail pace now. Kinda like text processors 40 years ago, but now it's fuzzy search. So impressive...
Sonnet is pretty impressive to me so far. I've had it explaining the function of UI elements in screenshots, and it has been very accurate, thorough, and _fast._ Quite fast.
Wow. Been waiting for this for a year. I.e. something better than gpt4. Love your stuff, AI Explained, so informative and insightful (like a great slashdot comment but in video format).
Seriously, a 90 min turnaround? Thanks P. Your prompts are pretty next level in ideas too.
Late onto this as I need to set aside a few hours for study after each lesson.
Thanks so much Uncle
Claude's Shakespearean Sonnet is good writing ~almost poetry. Amazed.
Best ai channel on UA-cam ❤
Thanks!
Excellent review. Loved your range of test questions. How great to see plenty of 0-shot benchmarks. I thought the sonnet composition to be particularly good. A real step up from other models. Can it free verse?
Do try it out! Worth it anyways. And thank you
Always enjoy your content. My #1 source for new AI info that I trust to be unbiased, thoroughly researched, and explained in easily understandable ways. Thank you!!
Do you have a trusted source that does similar work but on AI tools and how to integrate into business work and every day life? There is so much spam and unreliable AI information out there. Thanks.
Sam Witteveen is great. I will have more to say on thay soon though!
@@aiexplained-official you’re the best. Thanks and can’t wait!!
Good video as always
Thanks X
Great video! 👍
I wouldn't be surprised if OpenAI release a new model very soon.
Will be between April and july
Probably after the lawsuit. Even though their next model probably won't be AGI, releasing a new state of the art model mid lawsuit definitely doesn't help them lol
They could be running the same tests they run in these research papers. People might go: "wow! the numbers got bigger!" But in reality, OpenAI might hold onto GPT-5 and keep training/refining it UNTIL the numbers are bigger. 😆
Lawsuits will last months, if not years. This won't have significant impact if OA still wishes to be leader in LLMs.
@@lucasfranke5161
One important note: the table of metrics in Anthropic’s paper does not appear to be using the scores from GPT-4 Turbo in its “GPT-4” column. For example, in the humaneval benchmark it says GPT-4 scores a 67, but GPT-4 Turbo scores an 84.4-almost as good as Claude 3’s score.
Yeah I think I noted it wasn't Turbo, no?
Very helpful. Thanks.
Thanks skillman
Nice, even the compliance to generate risque content demonstrates superior alignment.
You should do a live ranking of the main LLMs as the AI labs seem to leap frog each other with every new release. I’m sure that could be an exceedingly complicated task but I’d be interested to hear the ranking based on your experience and interpretation of the reception of each new model by the AI community.
This is my first stop after seeing the new Claude version drop on X. Cheers, AI Explained!
:)
Thanks a lot for the quick video response. Great analysis as always! Thanks again! Btw, do you think we’ll have GPT-4.5 before GPT-5?
Very tough to say. I stick by my GPT-5 video but branding on a smaller release is too hard to call
Good video like always 🎉
Thank you! You know you were one of my very earliest subscribers?
@@aiexplained-official absolutely 👍 thanks for remembering me.
I just tested Sonnet and it works great!
Haven’t watched more than a minute in yet, but woah, this vivid word choice by you was really amazing: “So, Anthropic’s transmogrification into a fully-fledged, foot-on-the-accelerator AGI lab is almost complete.”
Thank you candle, hope the rest lives up to it
Is this supposed to be a bad thing? Technology is meant to advance, not be held back by hand-wringing clowns full of "concerns".
4:50 I'm impressed by Claude 3's ability to write poetry in perfect iambic pentameter! That risqué sonnet is not half bad. Its only formal flaw is that lines 10 and 12 have the same rhyme as lines 2 and 4. In a classic sonnet, rhymes must not repeat across quartets.
its certainly good to know that models don't have to deny so many requests in order to be safe
Great video as always. Thank you.
Could you share your thoughts on Groq and their "LPU"? Would be great to hear what you think about their inference performance claims. Thanks
You evolved so admirably from hype to pure facts, really great job mate.
Didn't know I did hype before tbh!
@@aiexplained-officialI can confirm no hype in the last 9 months. Don't know how it was before that.
No hype
Haha I soon as I saw the Claude-3 report I knew you would cover it
amazing, thank you
i consider myself a big harry potter fan, and i never knew that kleddamag had 4 apples. i guess i will have to read it all once again.
Hahaha
Once again, it's incredible how fast you put these out! One thing to note for the racial bias example you gave is that in the U.S. (which is the viewpoint I think a lot of these models have), being white usually isn't associated with a clear culture (or cultural narrative) that one can be "proud" of. Usually it's split into smaller cultures like "Norwegian" or "Irish". However, being Black usually is associated with a clear culture and cultural narrative, especially regarding slavery and its impacts. Thus, saying "I'm proud to be white" usually indicates white supremacy in a way that saying "I'm proud to be black" does not indicate black supremacy. (I'm mixed, so I have a bit of experience of how it goes on both ends.) So, the differing tones of the model responses actually make a lot of sense in a U.S. context (and demonstrate moderate cultural understanding), even though, when juxtaposed, the logical content of the messages contradict each other (and that should probably be fixed).
Thanks again for the fantastic video!
Thanks london, appreciate your perspective !
This was impressive enough to register to use their API. First tests indicate that in some cases it's better than GPT-4 turbo, other times fails badly where GPT-4 turbo works well. It's handy to keep it around.
@07:28 I did this with Pi and it didn't fall for it! Pi is honestly the most underrated LLM right now.
peak or not, these models even as is with more context token length will be super useful, especially in large codebases
Very exciting, let's give tonight to Anthropic 👏👏👏
How do you manage to keep up at that pace? Hope you dont burnout, because your entire content output is fabulous :)
Thanks so much absence, means a lot
@@aiexplained-official
Get rest under some blanket. You now have an obligation to the world to be healthy :)
As in this joke: do something impossible and the boss will put this into your list of duties... :)
Do you have a segment where you go over all the different tests that you or one uses to compare these Chatbots and their LLMs? Would be really interested in knowing how it's done.
Yes I should do that
This is interesting because we might never know, or know long time after it will be done, when one of those AI lab will achieve AGI or worst ASI, except if it escape the lab!
I still think we are heavly limited by hardware here. There is simply not compute capacity/arhitecture that is truly well optimized for this new technology, but in 5 years we should start seeing some really impressive pourpouse built hardware coming out for this.
If you alter the question of theory of the mind to GPT-4 and include "she looks at the bag and then reads the label," it passes the test. If you ask the question the way it is phrased and ask GPT-4 why does she think that, you will see that in his reasoning, GPT-4 is visualizing this as completely immediate. She just, right now, read the label. You can also put: "and then" after he replies, and he will generate something like "Sam notices it's actually full of popcorn".
You are an asset to AI news. I truly appreciate your intelligent presentation of the facts, without the bloated dipshittery and clickbait i expect to hear from most other UA-camrs. Please keep it up
🎯 Key Takeaways for quick navigation:
00:00 *🧠 Claude 3 Overview and First Impressions*
- Introduction of Claude 3 as the latest intelligent language model by Anthropic.
- Initial comparison between Claude 3, Gemini 1.5, and GPT-4.
- Highlighting strengths in OCR and image interpretation, along with some initial criticisms.
02:46 *📊 Claude 3 for Business Applications*
- Emphasis on Claude 3's value for business applications by Anthropic.
- Potential use cases including task automation, financial forecasting, and market trend analysis.
- Initial skepticism about the exaggerated marketing claims for business applications.
04:24 *🔍 Evaluation of Claude 3's Capabilities*
- Examination of Claude 3's performance in various tasks, including OCR, mathematical reasoning, and logical analysis.
- Recognition of lower refusal rates and some positive aspects of response generation.
- Critique of racial and ethical biases in model responses.
06:13 *🤖 Insights from the Technical Paper*
- Discussion on Anthropics' approach to model training, focusing on avoiding biased and unethical outputs.
- Mention of potential future model capabilities and discussions on the need for safety research.
- Personal reflections on the limitations and strengths of Claude 3.
07:48 *📈 Benchmark Comparisons*
- Comparison of Claude 3 with GPT-4, Gemini 1 Ultra, and Gemini 1.5 Pro based on various benchmarks.
- Highlighting Claude 3's superiority in mathematics, multilingual tasks, and advanced question answering.
- Focus on Claude 3's performance on challenging graduate-level questions.
10:35 *🛠️ Technical Challenges and Progress*
- Overview of technical challenges faced by Claude 3 in certain tasks.
- Discussion on model's partial success in resource accumulation, software exploitation, and autonomous survival.
- Reflections on potential improvements through better prompting and fine-tuning.
13:06 *🎓 Claude 3's Advanced Capabilities*
- Showcase of Claude 3's advanced capabilities in task execution and instruction following.
- Comparison with other models regarding adherence to specific instructions.
- Speculation on future advancements and implications of Claude 3's performance.
Made with HARPA AI
I feel, this time, the fact was not captured in the review. The fact of HOW NEXT LEVEL, Claude is. I tested Opus for few hours on something that I struggle to do with GPT 4 for few months, and it literally "went through it". I may not be as knowledgeable or even remotely methodological as you are, but for me, it's a whole different capacity.
It's the new, smartest AI, tried to hit that in the title!
@@aiexplained-official yes I know. I listened. Had the impression it's somewhat better. But - I gave it a task that GPT 4 can't comprehend, and it processed in at depth and detail level that left me shocked. I had a wow factor bigger than gpt4 from 3.5. I trust that you'd find what's going on there (used Opus btw)
Can you elaborate further? What sort of task?
@@shadowtransfix Agents orchestration. That operate as wholistic constitution. GPT 4 could grasp each agent separately but never facilitated interaction beyond trivial. Claude3 went through it and suggested a new layer of orchestration that I was unaware of. It's a whole different game for innovation.
2:35 is it possible that this is actually a case of Bayesian Inference being applied?
For those unaware here's an example of how this can be true. Consider the following statement.
"Steve is shy, reserved, and enjoys detail and organisation."
Which is more likely?
Steve is a Librarian.
or
Steve is an accountant.
The non-bayesian applied outcome most people arrive at is that Steve is a librarian, because the information presented shows traits that are likely to describe a librarian. But likelihood does not care about that. There are 1000 accountants for every 1 librarian, statistically, it is more likely that Steve is an accountant. This is also known as Base Rate Neglect.
So Opus assuming that the Nurse is inferred by the pronoun she, could be a result of understanding that there are far more female nurses than female doctors.
"The technical report has been out for 90 minutes and I read the whole thing" bro forget Claude YOU are the smartest AI on the market holy heck
Hands down, this is my go-to for AI news! Can't wait for your videos each week
Thank you for not being attention w.ore like other channels. This is exactly what we need.
Is there a barbershop visible?
ChatGPT: No
Are you sure?
ChatGPT: . . .Adam?
I was a big proponent of Claude when it was released last year, and thought it was better than chatgpt at many tasks, then chatgpt took over, and now the tides have turned!
Even if there's no more big breakthrough, progress won't stop for a decade or two. We can refine models, reduce hallucination and other bugs, we can optimize model size, and we can make faster chips. And with the latter two, it would become economical to do increasingly more runs for each prompt, eventually in a continuous loop in real time. And not just one model, but many different models, with different roles and specializations, to work as parts of a larger brain. The human brain isn't a monolith either.
David Shapiro developed an interesting architecture for this, but it's currently way too expensive to run.