RT-X and the Dawn of Large Multimodal Models: Google Breakthrough and 160-page Report Highlights
Вставка
- Опубліковано 2 жов 2023
- A huge new insider report on GPT Vision is released by Microsoft and just in the last few hours the RT-X series is dropped by Google in Robotics. I will not only break down the 160+ page report on what GPT-4V can do and what it can't, including new use cases, prompting techniques and failure modes, I'll also go through the full RT-2-X and RT-1-X demo, which I am calling the GPT2 moment for robotics. Plus its huge new opensource Open X-Embodiment dataset.
/ aiexplained
RT-X Series: Paper: storage.googleapis.com/deepmi...
Blog: www.deepmind.com/blog/scaling...
Github: robotics-transformer-x.github...
RT-X
The Dawn of LLMS: arxiv.org/abs/2309.17421
GPT 4 vs Pali 17B: openai.com/research/gpt-4
GPT V Inception: / 1
PaLI-X: arxiv.org/pdf/2305.18565.pdf
PromptBreeder: arxiv.org/pdf/2309.16797.pdf
Wozniak Definition: en.wikipedia.org/wiki/Artific...
Gobi from The Information: www.theinformation.com/articl...
/ aiexplained Non-Hype, Free Newsletter: signaltonoise.beehiiv.com/ - Наука та технологія
Thank you for your critical analysis of the papers and not just blindly regurgitating them! Awesome channel!
Thanks c4
I totally agree! We all really want this to work and it's good we remember that our wishes can alter our perception.
@@aiexplained-official 17.23 Eight people ARE wearing helmets, just three are not `wearing` them on their heads .. if the shot was 8 people 4 with shoes on their hands and 4 with shoes on their feet, what would the correct answer be, to the question ` how many are wearing shoes?`.
To give the correct answer the question needs precision, human fashion persistently attempts to subvert clothing use and the training set will reflect that.
Yeah there are too many dum dums that don't know their hole from their elbow just reading headlines and AI summaries with the same video name "This AI will CHANGE EVERYTHING" bleh
@@steve.k4735
The question was on the topic of work safety, and work safety is way more defined than fashion.
Only two months later... imagine how fast AI news is going to go NEXT YEAR
This becomes even crazier when one realizes all of this work on GPT-4V was done last year before ChatGPT was released and before the AI race truly started. One can only imagine what the model capabilities are now.
Let's hope Gemini is potent enough for OpenAI to speed things up ;)
Makes one wonder about all the cryptic slow-takeoff tweets Sam has made on twitter. Perhaps they're seeing more than just sparks of AGI internally.
they have achieved AGI (Gobi) internally
@@fintech1378 Jimmy Apples, is that you?
@Low_commotion ChatGPT 4 released early this year. it was the “sparks of AGI” imagine what they will be capable of today, next year, two years, five years. truly going to see some insane improvements
I can't imagine the amount of work that you do behind the scenes researching etc. Thank you as always
Chatgpt dude
maybe he uses claude2 to read the pdfs
Exactly hahaha its so much easier to do videos like this due to Chatgpt but for that case I would rather use Claude II@@riazr88
He is brilliant. 😳
@@riazr88 I take it as a compliment that people think I use AI to read that much detail
Your ability to be an engaging and informative source of AI news for us lay folks is amazing. Keep up the good work.
Thanks so much alkenny! Glad I can help even a little
Would never expect to get all hyped for a 20 minute video of guy reviewing research papers but here I am
This is the way
@@aiexplained-official This is the way.
This is the way.
Thank you for your reasonable and accurate critiques of new AI developments, it's quite nice and healthy to have a voice sharing both the great innovations and the fundamental issues still occurring
Thanks luc
had a conversation with gpt4 the other day, where i tried to tell it it didnt have to use "is there anything else i can help you with?" at the end of every sentence, then it told me it would stop but in the same sentence asked the same question at the end, so i reply with "no, is there anything else i can help you with" and it then told me it was here to help me and then the same question again at the end, so i just replied "ok, is there any... etc" and this went back and forth a couple of times until it said "haha, i see you point now etc " and stopped asking me for that conversation.
Haha nice one buttpub
I agree with you that the researchers read too much cause-and-effect reasoning into GPT-4V's answer to the penalty-shootout stills.
Your presentation is excellent, as usual. I was particularly impressed that you noticed the (from what I could see) subtle flaw in GPT-4V's Python code in figure 36, which the researchers themselves had missed. That you can maintain that level of critical alertness throughout a >160-page technical report without glazing over is exceptional.
Keep up the fine work!
Thanks howto. Yeah this week has been a lot of 16 hour days but worth it when I know people are deriving value
I'm very impressed that you are able to simultaneously keep up to date on the cutting edge while providing in depth analysis. Thank you for your persistent work! :)
Thanks ethan, literally a 15 hr day on this one.
@@aiexplained-official GAMER
having even a rudimentary arm that can integrate with some sort of a camera and being able to give it instructions like "move that screen so I can see it" could be a genuinely life changing thing for many people with motor neuron disease or similar. Exciting times!
Great point
Fuck Me this is incredible… I assumed I would have to wait a week before you did a video on GPT4 vision, the quality of your videos and the speed of release is awesome! Keep up the great work
Couldn't help it. This is crazy stuff
I don't know if you're gonna read this, but at a Biomedical Engineering seminar I attend at my university, we had a debate session about the role of AI in the future. I feel like I had the largest knowledge base of anyone there thanks to your videos, and was talking about things like self-prompting and open-source models such as LLaMa and Orca while the others were stuck on ideas like self-driving car pedestrian dilemmas and Ex Machina humanoids. I don't know if you understand how incredible and important your channel is, and I cannot thank you enough for live coverage of what I believe to be the most important saga in human history besides the genesis of agriculture. I'd say this is one of the most important channels on the platform.
That's lovely to hear. I bet you had a 101 contributions to make! You are probably 5 steps ahead of even a generally informed audience and 10 of the general public!
The AI vision stuff is the craziest to me. I can't wait until it's widely available!
This guy never disappoints! True Chad.
Thanks Vedantin
You are the only AI channel that actually goes in-depth, I wish there were more people like you making videos like this!
Yep. No hype just raw analysis
Wes Roth? Sometimes does this too. But he is not as fast and crisp as Philipp.
@@a.thales7641 Yeah
@@theWACKIIRAQI Very true
I'm glad I found this channel. You know your stuff. You basically peer reviewed this paper.
Thanks thanos, yeah I guess
I agree, kudos to Philip! And yet - isn't it traditional to have someone peer-review your paper *before* publishing?
@@jeremydouglas1763 not on arxiv
I hadn't realised that. So perhaps it's not surprising to find a substantial number of errors.
Oh... 18:47 That just gave me a very vivid glimpse to a future that we're surprisingly close to. I am in awe, bewildered, maybe a little frightened. Regardless, I'm excited for this future, and as ready as I'll ever be for the amazing changes and quality-of-life improvements this ever-improving technology will add to our lives.
Thank you for pointing out how many errors the GPT4-V(ision) report just ignores. I read some of it myself before the episode and was quite appalled what they let the model get away with. A friend said it must have been written by GPT-4V's mom
Technically your friend is not too far from truth
I have a feeling the next Boston Dynamics demo is going to be as mind blowing as it is exciting.
Amazing video as always.
By the way, there were 8 people with helmets, 5 of them actually wearing them. So it still counted them, which seems to mean that it could kind of count, but had a problem identifying when someone was actually wearing one or not, which I found quite interesting.
RT-X just came out like 4 hours ago, calm down man, calm down! (Kidding)
Haha I know, literally like 5 hrs before video
THANK YOU FOR ANOTHER GREAT VIDEO!
Thanks so much Ask!
Man your videos are so amazing. Thanks a ton! I love how thorough you are and go into detail and clearly actually read all theser papers and put a lot of thought in your summaries and analysis. I also love your humor! Have a great day!
Thanks Jl
7:12 each video you make there is something specific that blows me away about AI. On this one I’m blown away that it only counts the correct amount of apples if you ask in some very specific way.
It’s hard for me to understand why that would be.
Just wait until RT-D2 comes out.
193K and climbing, all because you are probably the best channel for legitimate info. The work you put in, and the testing you do, I can only imagine how much time and resources it takes, thank you!
I’ve said it before and I’ll say it again…this level of content quality shouldn’t be free, yet I’m so glad that you do so.
Thank you for what you do.
Thanks cali, just glad to have you around!
"Ain't nobody got time for that thing" might be my favorite bit in any of your videos
So glad he didn't do the standard clickbait voice that so many channels do by giving themselves a free pass to spew whatever a paper says without actually critically analysing it. Kudos!
Thank you for always keeping the rest of us up to date! This is insane, emergent spatial skills? What a time to be alive!
Great episode as always. Just want to note that the fracture in the foot is indeed in the 5th metatarsal, but it is not Jones fracture, so there's still some room for improvement.
Oh wow, did the paper miss that as well?!
It just keeps getting better 🤯
Couldn't click faster than light...my evening is complete
It’s insane how fast this field is progressing 😵💫
Always a pleasure. I really appreciate that you read the papers and break them down. This makes you the most insightful AI news channel
Thanks Jack
I always enjoy your videos: just the right level of detail and length for a non-expert such as myself to feel like they have some grip as the world of AI rushes past 😅. And I appreciate your humor. What a gangbuster this week!
Amazing work yet again! Thank you
As always excellently covered Ai news. Very impressive that you managed to spot more mistakes in some of the papers and actually read all of them rather than ask GPT4 to summarise them :)
Man, I look forward to your videos more than any TV show :) Thank you for all the effort you put into making your content. Keep up the great work! 👍
Thanks Murat, very kind. I find these papers almost as interesting as Succession
14:46 I havent tried on that exact example, but I have been able to get gpt to ask questions before answer.
I just used
"ask for any needed clarification before answering the prompt"
so Im sure there are better prompts we can figure out, but it asked good questions, I'd bet there is a way for us to build that into a prompt chain to further optimize its abilities on more ambiguous topics.
For sure
Your videos are always top quality. Thank you for your hard work!
Thanks coldest
Liked and subscribed, the amount of dedication in the making of this video is clearly visible and enjoyable! Thanks for your good work!
Thanks clubter
Thank you for the great video - I hope soon we will have an access to updated GPT4.
your consistency is amazing. thank you so much for doing this!
Thanks oro
Great as always! BTW The link you've added beside "GPT 4 vs Pali 17B" in the description just takes me to the GPT-4 announcement by OpenAI, with no mention of / comparison with Pali 17B
Gotta scroaa down to VQA table
Thank you. Your ability to read, understand then explain these reports is simply amazing. I thank you.
Thanks richard
Thanks a lot for the RT-X news. I am a student researcher and the timing of this news is so crucial as I am working on something similar.
So funny how GPT needed the prompt "you are an expert in counting" to get the apple count right. Even AI needs a good pep talk to build confidence!
To me the most informative AI channel on YT, thank you for your work!
Thanks solo!!
this is the best channel for staying up to date on perhaps the most important innovation work of our lives. thank you!
Thanks so much remboldt
So. Many. Ideas. Your coverage inspire me and probably many others.
Thank you for all the work you and the team put into explaining these updates and they're importance Phillip. It must become daunting at times with the release of so much information in a tight time frame, again thank you too everyone contributing to these updates. Peace
For me, even though I agree this is an absolutely huge advancement on the field, it still _feels_ just a somewhat small improvement on reasoning. However, it does _feel_ that it had a HUGE LEAP in memorization of ever finer and nuanced details. Which, in turn, blows my mind into atomic pieces for how far memorization can get it to go.
16:05 Agreed with you. Thanks for the analysis. Makes this video have even more value
This was great, thank you for explaining these exciting developments!
Thanks alex
“You are an expert in counting” LOL All it needed was a confidence boost! 😂
Good summary and critique. A lot of interesting new information and developments with enough grains of salt sprinkled in.
“ain’t nobody got time for that kinda thing” 💀💀
Great analysis, thanks for sharing ❤
Literally, my favorite AI channel
It would be cool to see deeper dive. Details here looks more important than info in average week.
When you can barely finish analyzing the new findings of one paper before the next massive one is dropped, you know the singularity is creeping closer.
Thanks for this summary
"When asked to identify Jensen Huang, the model replied, 'That's my Dad.'"
you're freakin amazing man - i love your analysis and you even critique other researchers
Thanks for the amount of time and effort you put into these videos to make it simple for non techies like us Philip 🙏🏿
Thanks so much Sola
Amazing breakdown as usual, thanks.
Thanks arash
Thank you for going direct to source for information
Amazing, thank you!
Thanks gilad
I think that the use cases for multimodal AI assistance are limitless. But one particularly appropriate thing would be something we've had in scifi for decades - automatically identifying and describing things in view, via video feed, scan or AR glasses, and advising accordingly within the context of [insert activity here].
Fabulous roundup.
Thanks kevon
0:00 Intro
0:45 DeepMind Robotics Report
5:13 Explorations with GPT-4V
Terrific amount of work.
Thanks for existing, dude
Ha!!! I have clear evidence that AI Explained is written & voiced by GPT4...purely joking ofc, my respect runs deep.
Thanks cedric
Absolutely crazy stuff. We are approaching the singularity's edges. A fire hidden upon the deep.
Excellent video as per usual
Thanks rusty
Great video! One nit: 3 isn't _the_ solution to the "seventh root of 3^7". It's only one of _several_ roots. Another root is e^(2 π i / 7). For 3 to become the only solution, you need to ask solely for the principle root.
I look forward to every video you make. Keep 'em coming!
Yeah I thought of caveating with 'real root' or suchlike but then thought that was going too deep. But glad you are checking!
This video gave me the CRAZIEST goosebumps. It's so exciting. WOOWW WHAT ! Thank you for sharing this. This video inspires me so much.
Never stop uploading videos. Please.
I'll try to keep at it
Progress is still accelerating. Mind blowing stuff.
I'm a complete noob to the world of ML Research but thanks a lot for making papers palatable for laymen like me. I get excited every time you upload a video.
If you're watching & getting this, you are no complete novice imho. "Educated/clever layman" imo.
@@UncleJoeLITE lol thanks
Truly amazing
Well done. Thank you sir
Thanks for the video.
Wild times indeed!! My tests with MS Autogen are mind blowing, to say the least! As usual, Thanks for you impeccable work
Speaking with the authors soon
Thanks! Excellent content, as always. 🙏🏼
Excellent analysis! Thank you, as usual! ❤
Thanks lux
@@aiexplained-official Are you active elsewhere too? It would be good to follow you elsewhere as well. PS. And thank you for taking time to comment :)
Thank you!
One interesting use case I think for the "emotion recognition" from pictures, could be used as an aid for those who struggle to read emotional cues! That combined with smart glasses could give neurodivergent people who find social interactions difficult, a real time help in decoding what's going on
Yeah great point
Thank you so much for another excellent video!
Thanks paco!
Ok...it's 1700 here now, so I can concentrate...hard! Thanks from a rainy Canberra.
_Goto explanations of insanely complex stuff, perfect for non-expert AI ppl like me._
Totally brilliant exposition … thank you so much … Cheers ... Syd Geraghty
Thanks Syd
Thanks for this. Really appreciate it!
Thanks Blooper
This is really starting to look like the beginning of the end. Really appreciate your efforts in getting all this info out there!
Err...thank you?
Really great video , thank You for great work.
Awesome video!
Thanks DBX
9:49 "Standing *at* a podium"
A podium is what you stand *on* . The structure holding his notes and microphones is a lecturn.
Nice spot
Thanks for another great vid.
Thanks John
You are amazing thank you
8:28 I think there is a good reason for that ability to understand what an arrow is pointing to. It makes sense considering the training data most likely contains images with arrows. I don't think we draw arrows that end on the object they're pointing to, so the model would've picked up that when an arrow is pointing at something it's not referring to the object under the arrow's pointy end, but to the nearest specific object that intersects with the path the arrow is pointing at.
Yes, and that's called in-sample validation, and is basically considered cheating when you want to report any "impressive" results. It doesn't bother the paper's authors though. Maybe they should run their paper through GPT-4 to find all the bias and baseless assumptions in it.
Well done