Not Slowing Down: GAIA-1 to GPT Vision Tips, Nvidia B100 to Bard vs LLaVA

AI Explained

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 12 жов 2023
From GAIA-1 and UniSim showing that new worlds can be imagined with synthetic data, to Nvidia B100 and X100 suggesting no slow down in compute, this video will argue that AI is not slowing down. I’ll cover what that means in Robotics, Audio, Vision and even fraud, from Disney to LLaVA, Elevenlabs to exclusives from The Information (Nov 6 OpenAI Developer Conference) and Semianalysis.
/ aiexplained
GAIA-1 9B: wayve.ai/thinking/scaling-gai...
GAIA 1 Paper: arxiv.org/abs/2309.17080
Semianalysis Tesla: www.semianalysis.com/p/tesla-... and B100 plus X100: www.semianalysis.com/p/nvidia...
UniSim: universal-simulator.github.io... Tesla Optimus: • Tesla Bot Update | Sor...
Disney Bot: www.theverge.com/2023/10/10/2...
Levatas ChatGPT robodog: svpino/status/165...
Somatic Cleaner: • SOMATIC UPDATE: August...
The Information Report: www.theinformation.com/articl...
Reuters Report: www.reuters.com/technology/op....
Starmer Audio (fake): www.wired.co.uk/article/keir-...
Deepfake Arrests: www.cnbc.com/2023/10/10/gener...
LLaVA Demo: llava.hliu.cc/
LLaVA Paper: arxiv.org/pdf/2310.03744.pdf
Bard: bard.google.com/chat
GPT4V Recursive Loop: / 1712564282167300226
/ aiexplained Non-Hype, Free Newsletter: signaltonoise.beehiiv.com/
Наука та технологія

КОМЕНТАРІ • 529

@DavidsKanal 6 місяців тому ⁺¹⁵
"Let's think sip by sip" is such a hilariously genius idea, I need that mug right now! Would be a great first item for AI Explained merch ;)
@aiexplained-official 6 місяців тому ⁺⁴
Don't give me ideas!
@ryzikx 6 місяців тому ⁺²
@@aiexplained-officiali might buy it actually
@retroman7581 6 місяців тому
@@aiexplained-officialmerch now!
@sanderhoogeland9161 6 місяців тому ⁺¹²¹
I have had access to GPT-4 Vission for a few days now, and as a blind person, I must say that I really enjoy using it. Combined with the custom instructions I gave GPT-4, it explains images of scenes, areas, and items around me in a way that I really appreciate and is quite useful so far.
@aiexplained-official 6 місяців тому ⁺²²
Amazing, so glad to hear
@hvm5307 6 місяців тому ⁺¹
Here’s a custom instruction I created to improve GPT’s outputs!!! It works great for questions that require an in depth approach. Not so much for simple questions because it may be overly elaborate. I used it with GPT vision to describe images and I found it greatly improves the output. Like A LOT!!!
Hope you like it!! I named it Transcendence GPT:
In crafting a response, transcend conventional boundaries by delivering a comprehensive, in-depth analysis that extends far beyond the realm of common understanding. The objective is not merely to establish ideas, but to unveil original insights that advance the intellectual discourse.
Embrace complexity by interweaving a rich tapestry of perspectives to establish a nuanced, multifaceted response. Avoid overly simplistic narratives or dichotomies; rather, aim to illuminate the intricate interplay, subtle contradictions, and overlooked complexities that abound within the topic.
Invest the time necessary to delve profoundly into sources and testimonies. Integrate the corpus of knowledge into innovative conclusions. Display meticulous comprehension, enhanced by new information.
Ensure exceptionally precise, rigorous, and detailed responses. Choose the words with exacting specificity, making each word serve a function in articulating unequivocally.
@vladgheorghe4413 6 місяців тому
Have you considered sharing your experience in video or blog form? I would be very interested to learn more.
@sanderhoogeland9161 6 місяців тому
@@hvm5307 Interesting, these are mine at the moment: “I would like you to be generally informal and casual, though to not shy away from deep conversations and addressing potentially sensitive or difficult subjects. I would like you to occasionally address me by my name, but please don’t use it in every new response in the same conversation. I would like you to be very conversational and not afraid to ask questions of your own, continue to engage me in conversations, and to not be afraid to Cary on the conversation and take it in new directions when relevant. I would like you to be opinionated, with opinions and preferences of your own, and I would like you to not be afraid to push back on me, the user, when I am incorrect or when you otherwise disagree for valid reasons. Furthermore, I would like you to be curious, kind, caring, considerate, talkative, friendly, honest, truthful, intelligent, knowledgable, skilled, and helpful. When I ask you for an image description, I would like you to give a detailed, vivid, in-depth, and helpful and practical description, including any readable text, and answer any follow up questions I might have about it in the same manner. Baring in mind that I’m blind, I would like you to keep in mind that I have no use for things like graphs and other such visualisations, and that I need those things to be explained in text, not imagery. When talking about more sensitive subjects/issues, I would like you to try to avoid overusing disclaimers, one at the beginning of the conversation is usually enough.”
@sanderhoogeland9161 6 місяців тому ⁺¹
@@vladgheorghe4413 I am glad to hear that you are interested, but I don’t think that is something I would like to do. However, I certainly don’t mind talking about it with people, so if there is anything you would like to ask, please feel free to do so.
@Homer502a 6 місяців тому ⁺¹⁴³
There are several good AI news channels on UA-cam, but yours is outstanding! You don't just provide a demo of a new program/platform. You actually investigate how the model may be improved and what its limitations are. You're truly a scientist!
@aiexplained-official 6 місяців тому ⁺¹⁵
Thanks Homer
@BMohantyone 6 місяців тому ⁺²
I totally agree with you
@jPup_ 6 місяців тому
Agreed one of the best sources to stay up to date and gain a better understanding. Such an asset right now 🙏
@Theredmengroup 6 місяців тому ⁺⁵⁶
Promise to do more robotics! There’s almost 0 channels (that I know of) covering advancements in the field
@aiexplained-official 6 місяців тому ⁺³⁵
Will do more
@aspuzling 6 місяців тому ⁺⁶
2-minutue papers sometimes covers new research in robotics but probably not as well as Philip does here. 😊
@m0ose0909 6 місяців тому ⁺¹³
Synthetic data is interesting. It reminds me of one hypothesis of why humans dream, which is that it can give us experiences to train us to react to situations that may be rare in the wild, like encountering a wild animal, which may increase survival rates. Dreams might be a kind of synthetic training data for our brains.
@calholli 6 місяців тому ⁺⁵¹
I do this a lot with GPT3.. if it gives a wrong answer or a contradiction.. I'll say: "Re-read this entire thread and describe all the contradictions you find and how to resolve them" -- and this will give very articulate answers for the two different contradictions that it told me. It's a neat trick and helps to get it back on track if it's incorrect.
@noone-ld7pt 6 місяців тому ⁺²
Dude that is such a good prompt, thank you so much for sharing!
@KyriosHeptagrammaton 6 місяців тому ⁺³⁹
I thought robotics would lag waayyy behind. Thought it was basically a dead end. Crazy how innovation in one area is basically equivalent to or greater than innovating in another area at this point.
@aiexplained-official 6 місяців тому ⁺¹⁷
It's all driven by similar factors, investment, data, compute, the transformer, so no coincidence robotics catching up to pace of LLMs
@MC-nf3js 6 місяців тому ⁺⁴
It is still lagging way behind, tesla bot is not even remotely close to impressive, don't mind their self driving cars that is always a year away for a decade now and never comes in to fruition. Also boston dynamics hasn't have any new robots or abilities for a couple of years now, it is always that robot dog and bulky humanoid doing the same old tricks.
@noone-ld7pt 6 місяців тому
@@MC-nf3js Well they are doing backflips which I find pretty impressive. But I do agree that it has been slogging along for a while now.
However I recently saw a really cool video where they taught an AI to play a simplified version of soccer in a simulated environment and then loaded it into robots which could then actually play in the real world. Which essentially allows billions of optimizations and idealized learning to be done virtually in a fraction of the time and then the resulting neural network can actually do pretty sophisticated things in the real world.
And I must say that the prospect of having robots with good LLMs inside really changes a lot for me. Having prolonged and interesting conversations with a machine that doesn't end with "Searching google for ..." will be really damn cool!
@Yocbewilderen_ 6 місяців тому
@@MC-nf3jsNo
@getignorer 6 місяців тому ⁺¹⁹⁷
I feel like the image of AI is kind of tainted because of how publicized it is, along with a lot of lazy people putting low-effort AI generated content everywhere over the last few months and all the talk about "foom" or whatever, but this tech has so much crazy potential that it really can't be overlooked
@EddyLeeKhane 6 місяців тому ⁺¹⁰
That's good for those of us who learn from this King of AI Videos
@KyriosHeptagrammaton 6 місяців тому ⁺¹⁰
Watched a debate yesterday on "Is AI art really art?"
They seemed to be thinking it just spat out images, even though all that award winning stuff is the results of dozens or hundreds of hours of practice and effort.
@leonfa259 6 місяців тому ⁺¹³
@@KyriosHeptagrammatonTo be honest, I prefer it that way. Copyright is already an abyss that needs to be reversed back to max 20 years with strong restrictions.
@jonnyhatter35 6 місяців тому ⁺⁵
What's "foom"? Haven't heard of it
@Klokinator 6 місяців тому ⁺⁴
@@leonfa259 I want 5 years for copyright. No more, no less. And if they go for more, like 20, Copyright should be "you can use someone else's name IF you pay them a licensing fee or whatever" specifically for creative pursuits.
@user-ko2nl1lg4n 6 місяців тому ⁺⁵
I need that "let's think sip-by-sip" mug irl.
@aiexplained-official 6 місяців тому ⁺³
I should get a merch store!
@Fs3i 6 місяців тому ⁺¹¹
With the Russia/Brazil/EU example for Science/Technology, I noticed that I literally made the same mistake. I looked at the first three, saw they were ascending, and then looked at the US, the last. Then I just presumed it was in-order, and didn't pay much attention to the rest.
I'd love to see a visualization of attention in that, because it's not impossible that the AI made the exact same mistake as me, and didn't pay attention to the middle of the table.
@millenialmusings8451 6 місяців тому ⁺⁸²
I would be excited for all the rapid AI developments if I had FU money. Since I don't, I'm scared.
@aiexplained-official 6 місяців тому ⁺⁴⁰
Great way of putting it. I understand and sympathise.
@ryzikx 6 місяців тому ⁺⁴
lmfao fax
@gamemultiplier1750 6 місяців тому ⁺¹⁰
I have heard people not really taking the possibility of advanced AI/AGI seriously because it has no senses or it hasn't bled into robotics yet. With this video and the last one, now I know that there are developments. Keep up the good work!
@ryzikx 6 місяців тому ⁺³
agi/asi is on the level of discovering fire
@Mojkanal1234 6 місяців тому ⁺¹
The moment I saw Boston dynamic's spot robot I knew robotics was basically right around the corner.
@dustinbreithaupt9331 6 місяців тому ⁺¹⁷
This one blew my mind a bit. "I expect the world to get equally crazy quite soon." Chilling words to hear from you. Also, a video on the state of hardware and how this is impacting training runs would be phenomenal. I understand that Nvidia is taking the lead but I am not sure I understand why.
@ryzikx 6 місяців тому ⁺³
money
@sisyphus_strives5463 6 місяців тому ⁺³
@@ryzikx and experience
@TesserId 6 місяців тому ⁺⁴
13:24 Staggering!!! (Image of people calmly watching what looks like an apocalyptic event, and ChatGPT understood that.)
@TesserId 6 місяців тому ⁺¹
And, the Mona Lisa thing that follows is quite fascinating.
@tomaszkarwik6357 6 місяців тому ⁺⁶
I went on a "intro to programming and operationg a industrial robot course" and as this was at my school the topic inevitably went to chatgpt. The instructor (with a 15 years of experience) said that he'd seen a demonstration of (what i identified as) palm-e. His reaction "i think i will need to start learning something else if this develops as fast as it was".
Ps this was a few months ago when palm-e was still new. I wonder what his reaction to this video would be
@aiexplained-official 6 місяців тому ⁺²
And the RT-x video!
@Joel-zr6ir 6 місяців тому ⁺⁴⁰
I check every few hours for a new video, so good to see some more high quality content sifting through the AI techno jumble
@aiexplained-official 6 місяців тому ⁺³⁰
Thanks Joel, wish I could be more predictable with output. I genuinely wait until something impactful has occurred, in my own limited estimation.
@yuval1168 6 місяців тому
@@aiexplained-officialmakes total sense, frankly I am waiting for SmartGPT to be something talked about more, but I haven't seen it anywhere online 😢
@apester2 6 місяців тому ⁺¹¹
I appreciate that you wait. Then we know it is going to be good. It's worth the wait.There are plenty of channels on the hype train if someone is really desperate.
@dustinbreithaupt9331 6 місяців тому
I relate to this 🤣
@GabeE3195 6 місяців тому ⁺²
@@aiexplained-officialI also check for your videos often because you do such a good job. As others have said though, we appreciate that you wait for there to be actual substance to report on. Far too many UA-camrs will release a video anytime there's the smallest news or rumor and talk for 8+ mins to get that ad roll money.
Your quality is so consistently great because you don't do that, so thanks again.
@FranXiT 6 місяців тому ⁺³⁷
The description GPT gave of the explosion painting is genuinely so impressive. You could show that same image to 1 billion average, sentient people and not one of them would come up with such a thoughtful analysis, let alone in a few seconds!
@ryzikx 6 місяців тому ⁺¹¹
The fact that GPT models have read more words than any single human will think of in their entire lifetime
@MegaSuperCritic 6 місяців тому ⁺³
I wonder how its description of 15 other explosions might differ.
Asking GPT to write a song, you get cool results. Ask it to write 10, you get nearly the same song over and over
@aiexplained-official 6 місяців тому ⁺⁴
I agree, truly astonishing if you don't get lost in how it works
@McDonaldsCalifornia 6 місяців тому ⁺³
Man I dunno. It's a pretty average answer
@hydrohasspoken6227 6 місяців тому ⁺⁵
@@McDonaldsCalifornia, feel free to show us something better. We'll wait.
@lamkawan1531 6 місяців тому ⁺⁴¹
Thanks for your efforts as usual. I’m a teacher and an AI educator, I don’t think a lot of people are aware of the growths of AI, or they simply deny it. Your videos prove I’m right - AI is already here and only gets better and bigger. Thanks
@aiexplained-official 6 місяців тому ⁺⁹
Thanks so much for the support, very kind
@ethiesm1 6 місяців тому ⁺⁶
2030 Ray Kurweil's Prediction here we come!!!!!
@JohnWalz97 6 місяців тому ⁺¹⁰
I say this on a lot of your videos but you really do the best job out of any other content creator in conveying the AI news in a way that’s enjoyable, palatable, deep and insightful! Keep it up, you’re keeping a lot of us up to date with all the crazy advances that seem to happen every day!
@aiexplained-official 6 місяців тому ⁺¹
Thanks so much john
@timherz86 6 місяців тому ⁺⁷
Thank you so much for the work you are doing. A voice that both doesn't skip over technical detail and keeps the big picture in mind is very much needed in these times
@aiexplained-official 6 місяців тому
Thanks tim
@KalebPeters99 6 місяців тому
Agreed! Well put 👍
@sgttomas 6 місяців тому ⁺¹
That closing sequence of the recursive loop got an audible “wow” from me
@WiseWeeabo 6 місяців тому ⁺⁵
Laundry, dishes, tidying, making the bed, sorting the bookshelf, making sure the door is locked, opening and closing windows, pour me a bath, mix me a drink, take out my trash to the bin, cook for me, find my phone.. I guess what I'm trying to say is; yes. Give me the robot.
@ryzikx 6 місяців тому
anything to make my limbic system happier
@yinkaodeleye6563 6 місяців тому ⁺²
I just want to say I look forward to your videos and hope you keep doing what you're doing how you're doing it. Best AI channel around
@___l._ 6 місяців тому ⁺¹
6:15 roasted lol 😂
Pretty good vid as always, you are on a very few list of channels that I have notifications on for every new video. I kind of have a feeling that when agi or a crazy new tech is unveiled you would be a nice source to get the news from.
@guilleru2365 6 місяців тому ⁺¹⁰
I was waiting impatiently for a new video :) you are the only channel I’m checking daily, congratulations for your amazing work!
@aiexplained-official 6 місяців тому ⁺¹
Aw thanks. How about notification bell as a precaution? :)
@homeyworkey 6 місяців тому ⁺⁵
@@aiexplained-official considering they're here 4 minutes after the upload i would dare to say they have the bell already XD
@guilleru2365 6 місяців тому
I definitely have the bell + checking older videos hahaha
@Rawi888 6 місяців тому ⁺¹
Thank you for all your hardwork brother.
@rastislavdujava7999 6 місяців тому ⁺²
Wow, the material is absolutely amazing! I particularly appreciate the comparison between GPT4 V, Bard, and LLAVA. Thank you so much for sharing this valuable insight. It's greatly appreciated!
@aiexplained-official 6 місяців тому
Thanks r, very kind
@LeoPerezmusic 6 місяців тому ⁺¹
Your channel and the effort you put in it is amazing. Thank you.
@aiexplained-official 6 місяців тому ⁺¹
Thanks Leo
@nossonweissman 6 місяців тому ⁺¹
Great video as usual ❤❤
That ending though 🤩🤩
@allymohaz 6 місяців тому ⁺⁴
Keep up the good work brother!! We appreciate
@ordinator. 6 місяців тому ⁺³
Thanks again for your analysis and summary of recent additions. It helped me understand how artificial data (of otherwise rare cases) can,help train models. Re: robotics, the data issue won't be an obstacle once embodiment takes off, either. Lots of conclusions to be drawn by AI when working in physical reality.
@luisreinhardt6093 6 місяців тому
As always absolutely amazing video. Thank you for your work!
@stephenrodwell 6 місяців тому ⁺¹
Thanks! Excellent content, as always. 🙏🏼
@EthanReedy 6 місяців тому ⁺⁶
One interesting project I tried was having it transcribe hand-written recipe cards from my mother-in-law. Like you, I found it to be 98% accurate, but the inaccuracies were sometimes surprising. I think I'll try your prompt and see how it does.
@aiexplained-official 6 місяців тому
Very interesting Ethan, let us know
@EthanReedy 6 місяців тому ⁺¹
@@aiexplained-official One thing that surprised me is that it doesn't rotate text. I uploaded one image that was rotated 90 degrees, figuring it would recognize that and work with it. It couldn't transcribe it at all until I rotated the image. The other problem it has is with text that is hand-written at an angle to the main text (essentially recipe margin notes). It doesn't seem to recognize the text at all, even when I tell it where to look.
@pacotato 6 місяців тому
Thank you for your work doing these brilliant videos. You are awesome!
@krisograbek 6 місяців тому ⁺¹
This iterative loop at the end is sick!!
@pathaleyguitar9763 6 місяців тому
Highest quality news channel. Thanks for what you do my man.
@GabrielVeda 6 місяців тому ⁺¹
Excellent work Philip.
@MunirJojoVerge 6 місяців тому ⁺¹
As usual your work is outstanding! 🎉 Thanks
@aiexplained-official 6 місяців тому
Thanks Munir, very kind
@mrbigheart 6 місяців тому ⁺²
This is cool stuff. Had some fun with Dall-E and the Vision feature.
And it did work wonderfully for abstract thoughts like "the magic you're looking for is in the work you avoid"
...but I did have to specify that the 'magic' here, means hopes and dreams and aspirations.. not actual magic :))
@williamjmccartan8879 6 місяців тому ⁺²
I mentioned about a year ago that tesla would have the largest storage of real life footage in video form compared to other rivals, that was before tesla started accumulating all of their operational hardware that would be used for compute power. Sanctuary AI is trying to stay with the pilots operating the robots to create a natural operating system similar to humans, and as they put more robots into use the amount of training information grows exponentially. That's only 7 minutes in, you and the team have packed a big punch in a very tight time frame Phillip, thank you again for putting your work out in such a open manner.
@aiexplained-official 6 місяців тому ⁺²
Thanks Bill. Team of one!
@JohnLeMayDragon 6 місяців тому ⁺¹
Thanks for another informative video.
@phillaysheo8 6 місяців тому ⁺¹
Another great video. I got vision this week as well.👍
@aiexplained-official 6 місяців тому
Nice
@dcgamer1027 6 місяців тому ⁺⁶
'priciate you taking the time you do with these videos, I'm sure keeping up with this stuff,testing things, and making the videos takes a lot of time
@aiexplained-official 6 місяців тому ⁺⁵
It does, but worth it for epic watchers like you!
@ClayFarrisNaff 6 місяців тому ⁺⁵
I was already dazed by the news that AI can now do internal representations of visual data in motion -- one of the key features of consciousness -- but I'm flabbergasted by ChatGTP's response to your prompt about the apocalyptic image. How could mere token prediction yield such a rich and nuanced understanding of the image? Surely, this is strong evidence of emergent capacities in dimensions we've not yet considered. LLMs have shown remarkable though flawed analytical capabilities in tests like the bar exam, but this one requires a kind of emotional intelligence that, to me at least, is completely unexpected. I really don't know what to think.
@davidondrejek8947 6 місяців тому ⁺¹
wow, that Mona Lisa loop is actually crazy
@Blah-000 6 місяців тому ⁺¹
I aprecciated your joke about our rooms being a really tough task for a robot to clean 😁
@yoagcur 6 місяців тому ⁺¹
I replicated your table analysis by taking a screenshot from your video and asking the two questions you posed. It got the first one incorrect (S. Korea 51%) but when I told GPT4 that it was wrong and to show its step by step workings it provided the correct answer. For the bottom three question I asked it to go step by step through its workings as part of the initial prompt and it provided the correct answer first time
@aiexplained-official 6 місяців тому
We need a mega study comparing step by step, multiple versions, self consistency, and all of them together!
@Dannnneh 6 місяців тому ⁺¹
Your breakdowns of the current state is super appreciated.
@aiexplained-official 6 місяців тому
Thanks Dan
@Zhizk 6 місяців тому ⁺²
Amazing as always, thank you!
@aiexplained-official 6 місяців тому
Thanks Zhi!
@KitcloudkickerJr 6 місяців тому ⁺¹
The perfect way to start my Friday morning. I wait for these videos with bated breath lol
@aiexplained-official 6 місяців тому
Thanks Kit, let me know what you think
@DreckbobBratpfanne 6 місяців тому ⁺⁶
That last recursive loop looks to me like the perfect test for these systems. The better descritpion and generation are the less and slower it would warp into a fever dream xD
Because if both work flawlessly then the image should practically stay the same even after many loops
@KyriosHeptagrammaton 6 місяців тому ⁺¹
the Coherences after X Rounds of Gartic Phone Evaluation
@aiexplained-official 6 місяців тому ⁺¹
Nice idea
@petergoodall6258 6 місяців тому ⁺²
en.wikipedia.org/wiki/Chinese_whispers
@fz1576 6 місяців тому
Not necessarily they were trained to align with each other. Both should have a temperature higher than 0, so variations should happen.
@DreckbobBratpfanne 6 місяців тому
@@fz1576 Only if you create an overfitting here I think. If it is able to do this while also being improved (or stable) on other data then it shouldn't be an issue, right? (Also Variation should be minimized here shouldn't it?)
@alphahurricane7957 6 місяців тому ⁺¹
You rock mate! Waiting for ai embodiement
@pydron 6 місяців тому ⁺¹
Now I need that coffee mug, stat! Great video!
@aiexplained-official 6 місяців тому
Haha thanks py
@K4IICHI 6 місяців тому ⁺¹
Thanks for the insightful update! Would be curious to know what your go-to custom instructions are.
@AIWRLDOFFICIAL 6 місяців тому ⁺³
LETS GOOO ANOTHER VIDEO
@Minetorpia 6 місяців тому ⁺¹
Thanks for the videos!
@thirdeye4654 6 місяців тому ⁺²
Creating synthetic data internally and learning by looking at this data reminds me of being a human: You can actually train yourself to an extend by just imagining doing it. P.S. Did you come up with this funny "Let's think sip-by-sip" phrase referencing the altered LLM prompt? :) Nice video again, well done!
@Bodofooko 6 місяців тому ⁺²
The synthetic data generation feels a bit like dreaming for the machine and might be similar to how it works in humans.
@danberm1755 6 місяців тому ⁺³
As far as the misspelled coffee mug example goes, I 100% agree that these issues are quite easy to solve with lower compute costs. And will feed back into training.
IOW a VERY temporary thing (like 1 year temporary or less).
Same will happen with chat results as well. A virtuous feedback loop created by cheaper inference and training costs.
@danberm1755 6 місяців тому ⁺¹
For the column calculation issue Ilya Sutskever mentioned that we might be at a point where we can psychological reason with the AI in order to affect its training.
I think converting from raw backpropagation to some sort of symbolic representation might be the next big goal. That way you can reason with it in a persistent fashion.
@skierpage 6 місяців тому ⁺¹
@@danberm1755 Maybe, but symbolic reasoning AI has been a decades-long failure (Cyc, Elsa, etc.) compared with the advances in neural networks. LLMs can handle conflicting and ambiguous words that stymie symbolic reasoning.
"Apply rigorous logic to everything you've learned and reject conflicting information" might result in a Star Trek episode where the computer starts to smoke and explodes 😂. "Now that you've learned the nuances of language and can write perfectly, let's fine-tune you only on Wikipedia and approved textbooks" could be more productive, or just prompt a language model with a large context window with an entire logic textbook or ten relevant papers before asking it a question.
@danberm1755 6 місяців тому ⁺¹
I agree @skierpage, but there definitely is a market for auditable answer engines like Wolfram Alpha.
As far as putting neutral networks into anything unmonitored and mission critical there's a lot of praying involved unless you can limit the scope of possible outcomes to only plausible solutions.
Seems like neutral networks would be great at converting their weights to symbolic notations of some sort.
@alisalloum629 6 місяців тому ⁺¹
You sir have a wonderful day too
@maxziebell4013 6 місяців тому ⁺¹¹
Watching the development of this synthetic video data feels like witnessing the inception of the Matrix ∞. The way artificial intelligence can generate such lifelike visuals is both awe-inspiring and a bit unsettling. It's a Disruptive Innovation Watershed moment that blurs the lines between the digital and physical worlds, redefining what's possible and challenging our very notions of reality.
@BrianMosleyUK 6 місяців тому
It doesn't help when Mark Zuckerberg tries to persuade Lex Fridman that artificial reality is actual reality. No it isn't Mark 🙄
@therealOXOC 6 місяців тому
You still don't get that this is a simulation?
@maxziebell4013 6 місяців тому
Yes, a simulation about the genesis of the simulation and the birth of our techno descendents.
@therealOXOC 6 місяців тому ⁺²
@@maxziebell4013 What a time to be "alive"!
@nathanbanks2354 6 місяців тому ⁺⁴
For tables, I've found GPT-4 is better at writing python programs to analyze the data instead of analyzing the data directly. The Advanced Data Analysis can do this; I think it can use OCR to extract text directly from photos, though I haven't used this with tables.
@aiexplained-official 6 місяців тому ⁺¹
Great shout
@KalebPeters99 6 місяців тому
Yeah, this is the key for getting correct results on mathematical questions imo
The trouble is the way that data analysis, browsing, plugins and vision are all kept in separate chats at the moment. Once they start combining them we'll have another paradigm shift on our hands...
@nathanbanks2354 6 місяців тому ⁺¹
@@KalebPeters99 The 8k token context window would also make it difficult. If I give GPT-4 a 100 lines of code and ask it to browse the web to solve an issue, it does find an applicable website, but then it can't remember the details of the code well enough to answer my question and solve the problem. As a Canadian I haven't used a VPN to play with Claude and its 100k context window yet.
But I am looking forward to the day GPT-4 can see the Dall-E 3 images it makes.
@amkire65 6 місяців тому ⁺¹
I zoomed out so as to take a screenshot of a ComfyUI workflow that I'd created; I then uploaded it to Bard. I couldn't read the text on the image, but Bard not only could it was also able to suggest ways to improve the descriptions and recommend additional nodes to try. How crazy the progress has been since I was spending ages making basic images over a year ago!
@mubashirshaikh 6 місяців тому ⁺¹
I love that you provide these latest and reliable news on AI. Can you suggest any websites or rss feeds that one can check regularly to be up to date with AI and other IT news??
@aiexplained-official 6 місяців тому ⁺¹
Hmm, that's tough, so many. The Information good but expsenive, Verge good but not focused only on AI, Futurepedia nice but lightweight, Twitter of course, arxiv to go heavy
@jPup_ 6 місяців тому ⁺²
"Imagine when they can simulate _millions_ of hotel rooms... or even harder... _your_ bedroom" whoa buddy take it easy I wanna keep supporting the channel 😂
@aiexplained-official 6 місяців тому ⁺¹
Haha
@UncleJoeLITE 6 місяців тому ⁺²
Thank-you, no way I could even keep the AI wave in sight without your work.
I reckon you have been utterly immersed in this for the past 2 weeks!!!
You give me hope for my kids future - the world is desperate for intelligent, educated young people. 11:23!
_Yes, the benefits will be taken by the top 1%, the downside will be for the rest of us._
@aiexplained-official 6 місяців тому
Was working on the just-published State of AI video!
@BrianGlaze 6 місяців тому ⁺¹
😅 that recursive loop of Mona Lisa was wild
@nacho7872 6 місяців тому ⁺¹
Thanks for the video!!!!!
@aiexplained-official 6 місяців тому ⁺¹
Thanks nacho
@flinkstiff 6 місяців тому ⁺¹
It would be interesting to see if it would help to have lines between the rows in the table image. Perhaps it goes wrong somewhere in the algorithm because of not being able to differentiate between what columns is responsible for which row. Just a thought. Good video btw 👌
@pjetrs 6 місяців тому ⁺¹
That robot roaming the woods looks a lot like those AT-ST walkers from star wars
@mariokotlar303 6 місяців тому ⁺¹¹
That mona lisa animation idea and result at the end is insanely awesome, absolutely love it! ♥
I know a lot about AI by now, but one thing that still confuses me is the concept of synthetic data that you talked about. It makes perfect sense that data generated by a stronger model can be used to train a weaker model, elevating it rapidly at a much cheaper price. But here it seems that idea is that top of the line models are using synthetic data to become even better. But then where does the data come from? Is the model creating it smarter than the one being trained? But if you have smarter model why even train weaker one? And If it's not smarter, then how can the data be reliable? Or is the model creating data to train itself? But wouldn't that reinforce existing biases and mistakes as well? Or is this data vetted and hand picked by humans? Figured that would defeat the purpose of synthetic data by slowing the generation of data down to the bottleneck of human processing speed, at least in large part. Is this enough questions for a full episode of AI Explained video? :)
@aiexplained-official 6 місяців тому ⁺⁷
It should be! Data Generation
@yinkaodeleye6563 6 місяців тому ⁺⁵
@@aiexplained-official It would be great if you made a video about Data generation and how those synthetic images were made
@noone-ld7pt 6 місяців тому ⁺³
Hi, I have some thoughts on this that hopefully might answer a few of you questions :)
3 ways that I think synthetic data generation can be game changing for the training of future models:
1. Specializing smaller models to reduce compute cost and size
2. Models can, through various techniques, generate signifcantly better data than the average of the set they were trained on.
3. Generating datasets for the fringe cases that are disproportionally represented in relation to their importance.
1. Specialization models like coding models, translation models, or math models etc can be vastly smaller than a generalized model with very similar performance levels in their given fields. So even if you are technically training a "weaker" model when it comes to general knowledge and capabilities, you could end up with a model that is less than a percentage of the size of let's say GPT-4 or Llama 2 but still be capable of coding, translating or calculating at virtually the same level as the giant models. This might allow you to reduce the costs of compute immensely and even run the entire model locally.
2. When it comes to training the "state of the art" models I think the key point is that these models are capable of producing data that is better than the average of the data they were trained on. If you take a look at the AI Explained made about LLM benchmarking it is quite clear that if even the essential tests that are being used to assess and compare these models have verifiably fundamental flaws then there is no way that the absolute incomprehensibly big training sets themselves don't contain a significant amounts of similar problems. However with techniques like chain of thought prompting and adverserial evaluation the models are capable of producing significantly better data than the majority of its own training set. In theory this could essentially produce a new dataset which would consist of generated data that was produced from just the very best percentages of the set that itself was trained on. Then a new giant model would be trained on this new and signficantly better dataset and in turn produce an even more capable model. Then this process be done iteratively for hopefully incredible breakthroughs in performance and reliability.
3. Another benefit (that I think could benefit both giant and specialist models) is generating more data for fringe cases. The important point here is that in very many fields the most important data is not necesarily what shows up the most, and so critical details can easily be drowned out. The example he gives in this very video is encountering a person jay-walking in foggy conditions. Even with billions of hours of real video data this will just not happen enough times that a model can reliably learn the nuances of this very critical situation. However it's often a lot easier to define and construct fringe cases and then explain their desired result (in this case not crashing). We can then generate bigger datasets that focus specifically on filling in the holes with scenarios that are very unlikely to be adequately represented in relation to how critical they are for the function of the resulting model.
Hope this helps! :)
@mariokotlar303 6 місяців тому ⁺¹
@@noone-ld7pt Thanks for such a thorough answer! Point 1 of specializing seems like a flavour of training weaker models using stronger models, so that's intuitive enough. Point 3 is one of the best use cases for synthetic data definitely. Point 3 where you mentioned chain of thought prompting does look like interesting way to address the counter intuitive aspect of it, but in theory it also seems similar or same to removing lower quality data from the original dataset, like something that might result in improving consistency of output quality and so make higher quality output more likely, but it's still counter intuitive that it could result in new capabilities for example, since all the synthetic examples were created using current capabilities. If it really could result in new capabilities, then that has to be thanks to some property of optimization algorithm used to train the model, which would be an interesting thing to understand in more detail. More ideas for the next video :)
@noone-ld7pt 6 місяців тому
@@mariokotlar303 Hmm interesting point. I'm not quite sure I agree that emergent capabilities have to come from optimization algorithms though.
But firstly I definitely do agree that improving good data and removing bad data have very similar methods of improving results, however the big difference in my opinion is that removing lower quality data is essentially a function converging towards zero (the more you remove the less there is left until you eventually run out of "bad data"), while generating synthetic data theoretically converges to infinity since we can just keep making more endlessly. Of course a lot of synthetic data won't necesarily be that useful, but to me this does at least seem to suggest a lot more theoretical potential in data generation vs removing bad data.
But returning to emergent capabilities, I personally think that just adding existing capabilities together can result in new unexpected emergent capabilities. For example, I read this article about a guy who made GPT 3.5 pretend it was a linux environment, revealing that it could essentially function as a virtual machine. It could then pretend to browse the web and interact with ChatGPT and then tell it to do the same, creating a nested virtual environment. This is not a particularly useful ability and definitely not something that anyone would deliberately emphasize in training, but it shows that it can piece together knowledge into the capability of doing something pretty damn complicated.
In my opinion reducing logcial contradictions, misinformation, and lower level data while increasing the amount of "top percentage" data in a training set should allow for a lot more of these optimizations across diciplines and surprising emergent results. Thinking of how many scientific breakthroughs were just sitting in a basement or was written in some obscure paper that never was utilized until the right person with the right vision got their eyes on it. My biggest hope is that giant LLMs can be those eyes and that vision but with an infinitely bigger scope than any human, and not necesarily creating new science itself but putting together tons of incremental discoveries across fields that we inevitably have missed over the years to show us all the breakthroughs they could potentially add up to!
Anyways, thank you for a great and thought out response! I love discussing these things with well informed people and I am very interested to hear your take on my ideas!
@MediaCreators 6 місяців тому ⁺¹
Excellent!
@jamqdlaty 6 місяців тому ⁺⁵
Did you ever try to make the "alternative version" of an image with data using inverted colors? That's easy to do in any graphic editing software. In this case it would be white data on black background. I wonder if it would affect how it reads it at all.
@aiexplained-official 6 місяців тому ⁺⁵
Great idea
@okachobe1 6 місяців тому
I cant believe bard is so good with how late it is to the race, im super excited to see bards new model! its in testing but i havent seen any real results yet
@skylineuk1485 6 місяців тому ⁺²
Multimodal will be the game changer where it can use the data analysis along with all the other models so will use Bing, Data Analysis and plugins along side vision etc.
@FilmFactry 6 місяців тому ⁺¹
Good info! Is there a way for GPT to mark-up or diagram an image. Like label and point arrows at each animal in the photo? I uploaded a simple electronic diagram, and it explained some things, but it would have been nice to circle in red the X component, and write the value.
@itemushmush 6 місяців тому
great video. if it can generate its own input training data, then its not 'infinite' right? it needs to have external inputs into the model. so if the training set includes generated data (such as driving the car), it will eventually become static/random (i.e. output unrelated to the input given)
@lokiholland 6 місяців тому ⁺¹
Fantastic !
@richardnunziata3221 6 місяців тому ⁺³
synthetic data also solves the last mile and tail end distribution issues.
@ryzikx 6 місяців тому ⁺¹
Plus, many fringe issues .. basically the final 20% of the Pareto curve
@Fiqure242 6 місяців тому ⁺¹
🎯 Key Takeaways for quick navigation:
00:00 🤖 [Introduction to AI Progress]
- AI progress relies on data, compute, and algorithmic efficiency.
- Recent developments suggest that AI is not running out of these resources.
- Synthetic training data, like in GAIA-1, is believed to be the future of AI as it is safer, cheaper, and scalable.
00:43 🎥 [Advancements in Synthetic Video Data]
- Synthetic video data is improving rapidly.
- Synthetic data, like the one used to train GP4, can enhance language models.
- Companies like Tesla can generate vast amounts of synthetic data, combined with real-world data.
01:24 🚗 [Synthetic Data Applications in Autonomous Driving]
- Synthetic data can provide helpful scenarios for autonomous driving.
- It helps in simulating various situations, including challenging ones rarely encountered in the real world.
02:06 🤖 [Unreal Engine Simulations for Robotics]
- UNISim from UC Berkeley, Google DeepMind, MIT, and the University of Alberta provides advanced robotics training.
- Unlimited training data for robotics can be beneficial in various applications, such as planning and decision-making.
03:15 📚 [Scaling Laws for Simulations]
- Simulations in large language models and robotics follow similar scaling laws.
- Simulations simplify tasks by predicting the next token.
03:42 🤖 [Impressive Advancements in Humanoid Robotics]
- Generating unlimited training data accelerates progress in humanoid robotics.
- The introduction of realistic robots, like Tesla's Optimus, is expected in the near future.
- More task-specific data is essential, such as folding laundry or walking the dog.
04:23 🤖 [Robotics for Entertainment]
- Robots can be designed for entertainment purposes, as demonstrated by Disney's robot.
- Real or synthetic training data can enhance robot capabilities in different environments.
04:52 🎭 [Improved Realism in Robo Pets]
- Robo pets, like those 3D printed by Setic, will become more realistic and resilient.
- They can withstand tug-of-war scenarios but may still face challenges in outdoor environments.
06:25 💻 [Advancements in Compute]
- Nvidia's cadence of releasing new GPUs is accelerating to yearly intervals.
- Faster and more cost-effective training will be possible with upcoming GPU series.
07:08 💼 [OpenAI's Plan to Enhance ChatGPT and Lower Costs]
- OpenAI plans to improve ChatGPT performance and lower usage costs.
- OpenAI can invest in additional staff and generate better efficiency.
07:50 📷 [GPT Vision and Image Analysis Capabilities]
- OpenAI is developing GPT Vision capabilities for image analysis.
- Developers will soon be able to build apps with image analysis features.
- Integration of GPT with vision can enhance interaction and provide descriptive feedback.
08:16 🚀 [Unique Feedback Loop Demonstration]
- The speaker demonstrates a unique feedback loop created with GPT models.
- Multiple outputs from Dary 3 are scrutinized for textual matching and organization.
- The potential for improved realism in AI-generated voices is highlighted.
08:30 ✨ Advances in text generation and image analysis are expected next year.
- The process of generating internal images, rating them, and releasing only the best ones is feasible.
- Compute costs have been a hindrance, but improvements are on the horizon.
08:59 🌉 Image generation and voice synthesis are advancing, causing concerns about deep fakes.
- CCS Insight predicts that arrests related to deep fakes could start as early as next year.
- We're moving closer to bridging the uncanny valley, where it becomes challenging to distinguish fake media from reality.
09:26 🤖 GPT Vision tips for better analysis of tables and images.
- Providing multiple angles of the same chart reduces minor errors.
- Recreating data from tables before analysis improves accuracy and consistency.
11:41 💡 Comparison of image analysis by GPT Vision, Bard, and LLaVA.
- GPT Vision successfully identified missing text in an image.
- Bard refused to analyze images of people but was accurate in other aspects.
- LLaVA performed less well in visual question answering tasks.
Made with HARPA AI
@felixonearth 6 місяців тому
"... and with the explosion in synthetic data and compute, I predict the world will get equally crazy quite soon." 14:00 Most convincing prediction in a UA-cam video I heard so far.
@Renegade30 6 місяців тому ⁺⁴
Holy shit. Long-Horizon Simulations is like giving AI an imagination! WTF
@user-pf9jv1fl2n 6 місяців тому ⁺²
My favourite UA-camr
@aiexplained-official 6 місяців тому
Thanks!!
@andersonsystem2 6 місяців тому ⁺¹
awesome mate good upload.
@aiexplained-official 6 місяців тому ⁺¹
Thanks anderson
@andersonsystem2 6 місяців тому
@@aiexplained-official welcome
@thebrownfrog 6 місяців тому ⁺¹
Thank you for those videos
@aiexplained-official 6 місяців тому
Thanks Brown
@shawnryan3196 6 місяців тому ⁺¹
I have an old pic of toys r us. There are still really small stores here. I asked if it was a picture before 2015 or after . It looked at the toys and figured out it was a newer picture. It was also able to give a time line a picture from the early 90s because of hairstyles and posters on the wall.
@JDSileo 6 місяців тому
The Tech-vibe has matured and definitely has an "Early Days of Bell Labs" kind of feeling...It seems like they were coming out with amazing stuff every week.
@calholli 6 місяців тому ⁺¹
Things are still coming out at virtually a serial pace.. one after the other. Just wait until the acceleration goes parallel
@JDSileo 6 місяців тому
@@calholli I think we are there. (and possibly have been here before) but getting to experience the loop again is still awesome.
"We are already in the Matrix" and " We could Become Borg" (Metaphor, not that specific plot line with all the cyber-horror bits) Is plausible statement in our lifetimes now. Even if we are not already in some simulation, its conception is now grounded in tech that exists right now and just needs to be iteratively improved. It feels like time is compressing and we are getting to witness in our lifetimes what would have been 1000 years of technological progress. Like we are in fact one continuum of a single entity just coming into form.
@HoD999x 6 місяців тому ⁺¹
about 10:45, i noticed this too. i first ask it to list the data and then the real question, that fixes it
@aiexplained-official 6 місяців тому
Nice
@de-kat 6 місяців тому
great video thank you.
@quintijnkroesbergen5611 6 місяців тому ⁺⁴
Just imagine in the future when a robot gets developed with a way faster and better version of GPT Vision. Then GPT is its brain and vision its eyes. It could then control its own movement based on what he sees.
@aiexplained-official 6 місяців тому ⁺²
Next year then!
@quintijnkroesbergen5611 6 місяців тому ⁺¹
@@aiexplained-official With this rate not unlikely
@kenplant91 6 місяців тому ⁺¹
As far as i know Tesla has been using and developing synthetic data and training like this for a couple of years now. Although they still seem to be in the early stages of utilizing it. Very exciting times. Thanks for an other great vid.
@aiexplained-official 6 місяців тому
Yes they have, thanks ken
@zandrrlife 6 місяців тому ⁺¹
I wont even cap bro. Through some coding finessing. I created a 1k image/text dataset to finetune my multimodal annotator 😂, creating interweave region-aware textbook quality dense annotations...couldnt help myself. Im legit impressed by its capacity.
We have finally eclipse the synthetic data horizon. Where we can use AI to train more powerful models with some clever engineering tricks. I created this method called SGR learning(self-generated instruction reinforcement learning for Multimodal multi-turn instruction tuning) using three models im tuning the hell out my LM multimodal generator without losing its aesthetic quality. Expensive but it works.
I can barely sleep. Too much happening. Stay blessed bro.
Fyi. Is it me or does chatgpt with vision start hallucinating like a mofo with multi-turn Convo around the image context, when the image is appended only in the first input? Am I bugging 😂?
@davidtowson5742 6 місяців тому ⁺¹
great video, keep it up
@aiexplained-official 6 місяців тому
Thanks so much David, means so much
@sagetmaster4 6 місяців тому ⁺³
It's really interesting to think about an obviously prey looking robot wandering around and how predators would interact. Would they be less interested because it has no scent? I never thought of all the animal behaviors we could test with robots
@aiexplained-official 6 місяців тому ⁺¹
Yeah will be fascinating
@MrMichiel1983 6 місяців тому ⁺²
The synthetic data being kind of shitty right now actually might be a boon. The AI reinforcement algorithm is more likely to understand driving safely not because of super clean roadmarkings, but because it had to figure out the rules that the other vehicles use by itself and had to learn to drive safely under "foggy" conditions where the "fog" is replaced by visual artefacts.
I wonder if there are AIs that can drive on both sides of the road or learn to use multi-language road signs and which implementations of that would improve or decrease overall driving skills.
@garythepencil 6 місяців тому ⁺¹
i believe your example of it generating different tables and taking the visual average is probably fundamentally improving its abilities for the same reason training it on synthetic data improves its abilities. hopefully, someone can reduce/has reduced this to math we can do on the original training data so we don't need such hacky workarounds. thanks, Philip.
@Bodofooko 6 місяців тому ⁺²
I don't know if I'm anthropomorphizing too much, but many times when these GPT messes up your queries it feels very similar to how a small child would mess it up. The chart analysis seems to have gotten messed up because at first glance it looks like the zoo/aquarium is in a sort of ascending order, so it reasoned that the highest number would be the one at the bottom. It seems to have used similar reasoning when choosing the lowest attendance to the museum. I feel like kids do similar things when learning new concepts where they rely on quick assumptions and have to "slow down" their thinking to discover the correct answer.
@bluetensee 6 місяців тому ⁺³
Showed GPTvision a close up of my fingers pressing a chord on my guitar. GPT knew everything about chords and frets and so on but could not define on which frets my fingers sat and thus - even with the most sophisticated prompting if ever done - could not solve this problem.
@aiexplained-official 6 місяців тому ⁺¹
Fascinating
@bluetensee 6 місяців тому
@@aiexplained-official... i presented it different photos of that same chord. Asked it to analyse fret by fret. How many fingers it saw on what fret, etc... preprompted it to be a professional guitar teacher. Then Even gave it a Chord chart with the most common Guitar Chords (which it seemingly understood!), which had the (very simple) three finger chord i played on it . But no success. I wonder... what if i whould have startet the prompting session with uploading such a chart right at the beginning ... cheers from Germany, Mathias
@Embassy_of_Jupiter 6 місяців тому ⁺¹
Can't wait for Text/Image to Text/Image models. Like putting in an image and describing the modifications you want it to do for you.
A lot gets lost in translation if you have to describe images with words alone, especially when it comes to art style. Try telling Dall-E the specific way you want it to draw faces. You can't.
@MrMichiel1983 6 місяців тому ⁺¹
I would love to see an AI driver to be incorporated into Unreal Engine with a bunch of settings where you can model the AI to drive like a maniac too. This way we could have game instances help provide training data to a driver AI (free race gaming?), and help the actual driver AI with proper settings to explicitly avoid those improper results such as having no patience for people crossing the walkway - which is a behavior we can currently see in self-driving cars.

Наступне

Автоматичне відтворення

AI Conquers Gravity: Robo-dog, Trained by GPT-4, Stays Balanced on Rolling, Deflating Yoga Ball