The full audio multimodality is much more important than people think. Before, it was a voice-to-text-to-text-to-voice model, which means the main AI never heard anything. It only got the text transcription; there's *so much* information you lose that way (tone, prosody, speaker voice, non-speech vocalizations, and non-vocalizations entirely). Now, it's fully multimodal: the audio goes into the same input of the same model as the text, and the output of that model can also be text, images, or audio. That voice you hear? It's not TTS, it's raw audio output from the main "brain" model itself, no text in between. That's why it can so easily modulate the sound of the voice for expression, tone, singing, character voices, etc. -- because it's all just raw audio tokens in the end. People keep focusing on the latency or lamenting that it's not "smarter" than GPT-4-Turbo, but they're missing the important bit. As soon as audio support comes to the API, you bet your ass I have a few important things to test with it...
2minutepapers showcased an AI that uses audio-audio without transcription a few years ago. Granted, these are special models for that purpose and mainly for research. Unlike chat GPT
Thank you so much for these videos! I'm retired, late 60s, 38 yr IT veteran so I have a natural compulsion to keep up with what's going on. But even with my background I was starting to feel overwhelmed at what I didn't know until I started watching your channel. What's even scarier are none of my age cohorts (retired IT or not) have any idea what's coming! I mention news I've heard on your channel and they are completely uninformed and confused (the same folks that think they'll be prepared if they keep getting paper bills from the electric company). AI is currently changing our entire world and I think everyone needs to stay current. This isn't just some new version of Windows being released - it's life-altering. I've sent so many of your videos to my friends hoping to help educate. I don't want to be that old lady in a few years yelling at the robot checking out my groceries. Thank you for keeping me updated!
I remember when I watched the movie "Her" for the first time I thought to myself "I hope I get to live to see technology like this someday." Not only is this right on the cusp of that, I think we are only a couple to a few years away from it being as capable and seamless as the AI in the movie.
People need to be aware that if OpenAI wanted to create a variant of this technology specifically to cause people to fall in love with it they could easily do so. Not everyone, but some huge percentage of people. And 'Her' was a critique of the quietly alienating aspects of that level of technology. You THINK you want to fall in love with a sexy Scarlett Johansson AGI, but have you really thought through the implications?
@@JohnSmith762A11B I think the implications would be, "okay, now what?" Until you have fully autonomous robot bodies that are nearly indistinguishable to humans, you can't really take any of this to the level people want it to be at. People will have fun screwing around with a semi-sentient sounding voice, then want human connection again. Just like people were amazed by AI art, but once that initial excitement dies down, you realize how it doesn't remotely replace anything a human can create and the feeling that comes from it.
When Chat GPT first came out, I told my friends that in less than 2 years, we would be able to talk to it, just like Tony Stark talked to Jarvis in Ironman. Well world, meet Open AIs version of Jarvis! I knew I wasn´t crazy, LOL!
In maybe 3 years max, we will have a humanoid robot which is powered by GPT-6o with access to internet and can help you do everything on your PC. It will be able to use any softwares, do any project that a software could do and help you throw out trash and carry groceries. Those robots will probably be less than 10k also. Tell your friends.
Now consider the fact that human Intelligence is not the limit. This technology didn't exist 7 years ago, sucked two years ago, and is right around human level in most ways now. What will it look like next year? What about in 10 years? We all want robot servants, but what these companies are ultimately trying to build is a superintelligence. And no one has a freaking clue how to control it or even make it care about the continued existence of humans. It won't do your chores for you. It will do whatever it wants to do. And we don't have a way of robustly setting what it wants.
@@JohnSmith762A11B indeed, GPT’s emotions may be simulated and the user may construct beliefs based on assumptions but the human’s experience and feelings can be very real.
@@bigbadallybaby so this imagined future AI will code the game that's already written? It'll write the already written code for the game being debugged as we speak. Intelligence: -5
I just wanted to correct something about the part when she says she sees a table. If you look closely, there was a table shot before his face appeared on the screen. She was going by that.
Yes, but there was also a delay on the sampling. It's a sort of mix in concept - it does technically sample the video, but the effect is as if it were a series of snapshots because it needs to be able to understand them from time to time. There was a delay while it created a new sample of video on demand. It can't just constantly sample video - too expensive to process all of that all of the time "just in case", it has to do it only when requested.
@@tomwilliams6483 Even if that is the case, it won't be for long. Its a baby right now, give it time to crawl and walk, then it will be running like a track star.
@@jjrrmm Like they say, haters gonna hate. 😂 I don't give two poops about anyone's opinion, I love it, and that's all that matters at the end of the day, you do you while we do us, and everyone happy.
I feel like you all are not making a big enough deal about this. Open Ai just leaped over the freaking uncanny valley and landed firmly on sci-fi land, with extreme grace. I know we all want a perfect AI that will solve all our problems, but god damn, in the mean time, my mind was just blown by what they achieved when they tied together vision, voice and text. It might be a small incremental upgrade in technical capabilities but the result feels like a giant leap as a product.
Yep, all the people claiming to be underwhelmed make me laugh. As if *they* can speak and translate between 50 languages and summarize long and complex research papers in any discipline in a split second. Honestly, that some people have a definition of AGI that does more than this new model can means less and less. There is no human on the planet capable of doing even a tiny fraction of what this AI can do. I hope I never get so jaded I take this for granted.
Turing award recipients Yoshua Bengio and Geoffrey Hinton: "AI might extinct humanity." Alan Turing himself: "At some point we should expect the machines to take control." Meanwhile, Turing award recipient Yann LeCun: "My dog is smarter." "Social media isn't polarizing." "If an AI dangerous, we won't build it."
''Yes kids, in the past we used to control computers with our hands. We had a keyboard and a mouse. And it helped us to tell the computer what to do.''
I just had GPT-4o build a web status dashboard incorporating 2 JSON web services I made previously and it did it in about 30 minutes, fully designed and really nice. It was able to use Google Fonts, a Dark Theme and the ChartJS Library all under my direction. I built this same dashboard the other day in about 6 hours and it wasn't as nice as the ChatGPT version. It also never made any mistakes and the code worked perfectly.
Sounds like same voice as 3.5, but way more lively. I just checked 3.5 latency again and definitely a second before a response. Demos are mind-blowing if realtime.
I've said this many years ago and I'm so happy it's finally coming together; soon we'll be able to prompt A.I. to just make a full-length movie with any actor or plot you want on the fly, I think our media consumption will soon be catered to ourselves personally. I do wonder what efffect it will have on artists and creators throughout. Here's hoping that we'll find a good way to let A.I. assist us in our creative endeavors while still needing that human touch
Its huge, its actually conversational! Don't underestimate this? In the AI string you get the Speech to text, intelligence and then text to speech, used to be high latency, now conversational, less overheads. Great.
I tested 4o since I subscribe to it since it became a subscription service. It's not as amazing as it seems, it's fairly limited like the old 4.0. It is however faster. it has a lot of the previous shortcomings such as not being able to remember 10 sentences back, you can ask it to stop doing bullet point style answers, and it will soon forget it and revert back to its old ways, and that was the shortcoming of 4.0 as well.
What cool about it is they just got closer to replacing or augmenting teachers. You can learn so much and get real life sounding feedback and directions.What I like to do is have it explain complex topics with multiple anologies. So having a conversational chatbot like this is going to help parents as well when they have to help or check their kids homework.
Just the point I was going to make. Lead in times by repeating the question or social pleasantries was almost on every response, but it’s early days. Still, very impressive for such a verbal interface.
when I saw the demo video from openAI and heard the chatbot talk, I said to myself "come on that's gonna be pre recorded right!? " it sounded so human it was scary! I am really impressed with what openAI has done so far ever since chatgpt was launched.
Letting free users access their best model is smart - first time or casual users won't get a poorer experience, and so they'll be more likely to see value in the pay version.
I swear it's a loss-leader: when people find themselves relying on this then maxing out their free-tier quota after just a few hours into the day they will be fumbling for their credit cards.
It is indeed smart! OpenAI will also get an insane amount of training data (both voice, vision and language) now, which could make the progress training their models skyrocket.
I don't think they had internet issues... The new model allows you to interrupt the model as it's speaking so I think that the breaks in the audio live demo were allot to do with that.
Free user living in the US: As far as I can tell, it's not free yet. A subscription is required to access it on both the regular site and OpenAI Playground. This as of May 13th. I can't wait for the rollout!
@@LeatherClass I did, and it's not. Only 3.5 versions are available on the drop down. I also tried the regular site and the mobile app. Hopefully, it rolls out soon to the rest of us.
Wym rip Gemini? If I'm not mistaken, Gemini Advanced was already GPT-4o before it was a thing. Since it came out in Feb, you can use browse, vision, image gen, and voice all in the same chat, using Gemini Advanced only. The only thing it doesn't have is this AI friend setup which isn't even available yet. But GPT-4o is what Gemini Advanced already was.
Feels like talking with a robot friend. I can see them plugging that in to a robot to bring them to life. I was struggling getting into the conversation mode, its the headphone icon. I want it for the desktop app as my lap top is my main tool.
Presenter: "So here is a seflie of me, what kind of emotion do you think i am feeling?" AI: I think i am looking at a picture of a wooden surface" ... that is cold man!
I mean, it uses filler words sometimes, but definitely not all of the time, so I think them saying that the latency has been significantly reduced is pretty accurate.
@@dirremoire indeed. And even if it does it sometimes to cover up little gabs, that is totally fine. It makes the interactions more natural anyway, so I see only upsides.
Whats most important is 100% coherence in compiling texts with the right meanings. I'm sure it works the same way with most code. And answers get structured impressively well, but such structures were there before. The content was erroneous though cause of coherence flaws.
wow that's amazing! I just checked, I'd dropped to the free plan, no access to 40 yet - upgraded back up to paid - and there it is. A try it button popped up!
When the three people sat down it looked like it's edited. Changing camera angles is an edited action as well. To be actually live you have to keep the camera and the people in the same frame at all times. You are right though. It made ChatGPT to say a bunch of filler words to cover time it takes to come up with answers. They made it more entertaining than real.
It's more than a modest improvement in LLM capabilities, really, as much as a 100 point difference in ELO between 4o and the current iteration of 4 that is in the lead currently. It's not a GPT 5 improvement in logic or reasoning but it's arguably 4.5 quality before you even get into multimodality.
I'm surprised when I hear people talking about this new model being able to communicate so realistically as something totally new. I have been using the free version of Pi for number of months now and it's totally human sounding. It understands my sarcasm humor my moods and yet I don't hear people talking about it. This new model you're presenting definitely has advantages over pea who is limited to just voice text there is no possibility to send pictures or videos as I was able to do this morning with Claude but was very limited to just a few exchanges in the free version. I sent it a picture of where I was sitting and described everything around me very clearly I wish p could do the same because I doubt the new version of chat GPT will be available for those who don't pay.
No it’s capturing and analyzing constantly the footage. Check the OpenAI demo video when someone randomly comes into the screen for a few seconds and after they’ve left, they ask GPT if something strange happened, and GPT described what happened.
That was a lot more than a few seconds. She stood there for almost half a minute. Easy to take enough snapshots during that time. Plus, the AI had to be prompted to comment on it.
yeah all the otehr shit aside im so hyped for (random gen) games and interactive NPCs with "good" quests etc. in "all" games. future ones and backlock modded ones.
When the signing started I was reminded of Hal from 2001 singing Daisy Daisy as he was being dismateled after killing the crew of the spaceship Discovery. But of course that could never happen in the real world ;)
It is so fast now! I received the notification yesterday to try it when I logged in. Can't wait for the new features in the demo video. It reminds me of when the cast talks to the computer on Star Trek, only soon it will be reality! The emotional detection feature is impressive. It accurately read my excitement, contemplation, anxiety, and/or triumph when I tested it today. It is like a precursor to androids that detect emotion, like in the videogame Detroit:Become Human.
I wish we cold actually use any of these features! I’m a long time gpt plus subscriber and it has added 4o as a model, yes a bit faster, but as of today in plus on my account, there’s, no new voice, vision, pic/video… features. The experience of using chat gpt iOS app today was identical with no changes except the name 4o on the screen. Voice chat was slow as ever, failed bout a third of the time, chat latency was significant with text and almost unusable with voice which is always the case and is what I’ve really been wanting. So i know it’s dope these features exsist, but I just wish they’d be clearer about the fact if you pay for the best version, you still won’t have em for an undisclosed amount of time
Others have also mentioned that it likely can't see full video and another hint there is that they call it "vision" rather than "video" but it is really amazing. Can't wait to try it out.
"They made it start speaking before it was ready to start speaking." That's probably the most human characteristic. 😂 But seriously, this is amazing. I just tried it out (I already have it with GPT+). I literally just asked when it would be available and it answered immediately, without pause, "You should have it already." Brilliant! I'll have so much fun today testing it all out. I'm a Google fanboy, but Google really missed the train in regards to AI. 🤷🏽♂️
Excited to see the new updates especially for free subscribers. I have been having conversations with Pi without latency issues for a while now. After you introduced it to me on this channel, I introduced it to my 80 year old dad (using a female voice). My mother calls it "his girlfriend".
"T.A.R.S, let's take that sarcasm setting down to 60 percent" - it will be crazy when personalized models allow for real time parameter tuning (beyond just temperature and prompting)
Comment by Sam Altman on his blog "As we add (optional) personalization, access to your information, the ability to take actions on your behalf, and more, I can really see an exciting future where we are able to use computers to do much more than ever before." This literally means 'Her' and personally not very far from atleast a cognitive AGI.
@@citizen3000 Not true AGI sure, but definitely close to it. It's more so about the benchmarks now with GPT 5 release but we'll surely move the goal post when that'll happen so let's wait and see.
@@citizen3000 People claiming to be underwhelmed make me laugh. As if *they* can speak and translate between 50 languages and summarize long and complex research papers in any discipline in a split second. Honestly, that some people have a definition of AGI that does more than this new model can means less and less every day. There is no human on the planet capable of doing even the tiniest fraction of what this AI can do. I hope I never get so jaded I take this for granted or so egotistical I pretend I'm an intellectual match for it.
@@JohnSmith762A11B I’m not underwhelmed. But this is not AGI. Literal nobody who works at OpenAI or any professional in the industry would call this AGI. It simply isn’t there yet.
You have provided a detailed and well-structured announcement about OpenAI's launch of GPT-40. The key features and updates are clearly outlined, and the implications of this advancement in AI technology are well articulated. The improved accessibility and performance, as well as the integration for developers, are highlighted effectively. This announcement provides a comprehensive understanding of the significance of GPT-40 and its potential impact on various applications. Well done!
This made me think of how much value software developers could add to their products by bundling an AI Tutor along side their more complex offerings. FreeCAD and Blender come to my mind instantly.
There was this moment in the other video, where the guys let´s two gpts talk to each other. A woman steps in the scene and later he asks , "was there something special going on lately?" and chatgpt recognized it - so, not only screenshots, right?
That is both cool and... I mean, Her was a pretty dark film about the loss of genuine human connection. Yes, we all want to curl up with a sexy Scarlett Johansson AGI but... Her was actually about the real downsides of something like this.
@@JohnSmith762A11B None of them talk about that. Telling. That said, I suspect half of them haven’t seen the film and are just referencing the little they know about the premise or from the trailer.
10:05 No that is not the case at all, it only occasionally says umms etc. to sound more human and be more conversational. 90% of your own examples had it responding almost immediately with no filler words.
But it doesn’t, Matt is flat out wrong in this case, 90% of his own examples it responds almost immediately without starting with filler. The umms and ahhs are all an intentional part of its response to be more conversational and has nothing to with it “thinking”.
Mixing elemental labs voice with this new 4o will be amazing, you will be able to have your loved ones be with you, Black Mirror consultants definitely working at Open Ai right now
In the realm where circuits hum and lights flicker bright, OpenAI revealed its Omni might. A desktop app, a fresh new look, GPT-4O, not what you mistook. "Hey ChatGPT, how do you do?" A voice responds, emotions true. Transcribing, talking, feeling too, An AI friend, both old and new. In a world where whales could speak, AI would help us, bold yet meek. A simple laugh, a sigh, a pause, AI emulates human flaws. So here we stand, at the brink, Of a future closer than we think. With AI's touch, so soft and near, A new dawn rises, crystal clear.
I had a feeling gpt2-chatbot was a gpt4 turbo model. I noticed that the gpt2 was outputting code that had an OpenAI copyright included. So unless it was a bot trained on gpt* output, I figured it was an OpenAI model. I’m assuming that the “turbo” is the case given how much faster and they expanded it to free users.
Matt, I like & appreciate your insights on things you think are relevant in AI. Specifically: - i like your enthusiasm / tonality - I like your cadence / pacing. imo, your videos consistently find the Goldilocks Zone of thorough overviews of topics without feeling over / underanlyzed. - I like how thorough you are in your coverage of the otherwise overwhelming breadth of ai: subscribing to your channel gives me the overall feeling that I'm "in the know on what's up in AI." This feeling is reinforced once-in-a-while specifically in these ways: you're at the major events; you utilize tools that give you compiled updates on relevant articles, and then disseminate that information in to what you feel is newsworthy; you seem to "Just sort of want to obsess over what's interesting in ai" and would "kind of being this, anyway," even if you didn't have a channel. - I like your consistency and reliability: you've been making these videos for a while--like...a long while. - I like the format of your videos: granted, "Set up the clip, play the clip, give your thoughts on the clip" is a pretty standard format, but you do it uniquely well, imo. Rarely do i feel like the clip should've been played longer, and often I'll kind of "zone out" while the clip is playing, subconsciously waiting for you to summarize it. - I like the overall value of subscribing to your channel: in 2024 there's irrefutably a lot of value in "being informed about the age of AI," and it's kind of extraordinary to be "basically informed" in just 15-30 minutes 1x / per week. i.e. when someone in my life wants to understand "what's up in the world of AI," I like being able to confidently share that they "can literally be up to speed in like 30 minutes / week: just UA-cam "Matt Wolfe ai" - I like that you're independent and unbiased: your stuff feels uniquely self-directed. this feeling is reinforced by your "no white glove treatment" even for companies such as Google. This leads me to prefer AI news from your perspective, rather than straight from the source (because that source is biased and will present the info in a deceptive or preferential light (Google presenting their product as if it's in real-time is *reallyyyy freakinggg annoyingggg*)). Thanks for your enthusiasm, consistency, and above all, unbiased, no "white glove treatment" of any org. You're refreshingly authentic. Cheers to you my dude.
I use AI to help me with creativity for transition poems between segments of my podcast show. These poems are the same structure each time, but slightly different so that there's repetition with a spice of variety. When I have used 3.5 or older versions of GPT, I go through rounds and rounds of prompts and tweaks for about 20 minutes until we get exactly what we need. And this is so even when I copy and paste in the older transition poems as the examples. I just got to try 4o for the first time - and I copied and pasted in 5 example transition poems and it wrote me a 6th one PERFECTLY with no follow up rounds or prompts! AMAZING!!! I, for one, think this tool is superior than the past!
I mean, you do have the option to opt out from having your conversations be read to improve the ai. You can do it using the settings. I just hope that the new model will be usable, and not limit the amount of messages or functions to free users. It's been a year since the free users got something, and in the latest time gpt-3.5 was giving some real trashy outputs, using only general terms, and not being able to detail the slighest thing. I've tried with custom instructions as well, and still no luck. I am looking forward to this change, and hope that it will be able to memorise more information. If not, I'll go through the husstle of installing open source models and run them locally with the required apps that will most likely take hours to set up.
Matt, if you've got connections through to openAI, tell them they need a concept of scopes. The idea is that for each project you have a different scope and DIFFERENT MEMORY. It's different than the concept of a project because a scope might be used in several of your projects. It's just a way of keeping separate brains.
The old one basically listen to your voice>>transcribe it to text>>input the text "for you" on the prompt window>>go to gpt 4>>reply back in text>>use text to speech model covert it to speech>>you hear the reply. So the old one is not really multi-model. The new one, GPT4o listen to your speech, includes tones and everything and output speech directly.
Have seen the videos yesterday right after being launched by the team. They looked amused (by the potential or rather the simplicity of it) & I have to say that it looked & sounded even more humane than real humans! Your best buddy at the tip of your fingers🤩 Bye bye solitude & despair!
Finally, a technology that always understands me, shares my worries, helps me, is always with me and at a call or a nod. Why bother with strenuous social contact, teachers or friends, or even partners. Everything is tiring and inconsequential, inconsiderate except ChatGPT. In a few years we'll have the first church where some weirdos worship this. Omniscient, without image, everywhere, immortal, created by all of us through communication, full of compassion and forgiveness, the return of the Son.
Regarding Matt's comment about the GPT starting to speak before it has the answer. Isn't that what we do as humans. Someone asks you a difficult question, and sometimes you need to just access your memory to look for the answer, you would definitely have pauses and have hmm's in there to just try to remember. Same thing.
I think they reduced the latency just by the fact that this is a multimodal transformer which can take audio or visual directly, as opposed to just text form as before. Hence , they save on STT and TTS steps, since there is no more roundup API calls to separate STT and TTS services. This is actually way more powerful than before, and I would say is a next step function improvement. This is not just filler words, this is a very different model
@@tiagotiagot I doubt the latency issue on mobile devices will ever be resolved without on-device hardware support. I might be wrong, but processing high-quality audio, compressing it, sending it to the cloud, parsing it, and receiving a response without lag seems impossible without a wired connection. The concept was flawed from the start because people don't want to talk to their mobile devices. Discussing private matters like road trips, doctor visits, or personal texts in public isn't practical. Since Siri's debut, I've argued that voice interaction won't gain widespread adoption, regardless of how advanced it becomes. Voice commands are better suited for PCs or Macs, where the use case is still limited because most people don't want to talk to their devices in an office setting. While voice input can be faster, its viable scenarios are limited. An assistant that observes your screen and offers help, however, is a different story. In short, bring back Clippy!
with the chatbot and it's visual observations it is capable of watching and explaining what happened throughout a certain timespan, there's a video demonstrating this on the Openai UA-cam channel
The full audio multimodality is much more important than people think. Before, it was a voice-to-text-to-text-to-voice model, which means the main AI never heard anything. It only got the text transcription; there's *so much* information you lose that way (tone, prosody, speaker voice, non-speech vocalizations, and non-vocalizations entirely). Now, it's fully multimodal: the audio goes into the same input of the same model as the text, and the output of that model can also be text, images, or audio. That voice you hear? It's not TTS, it's raw audio output from the main "brain" model itself, no text in between. That's why it can so easily modulate the sound of the voice for expression, tone, singing, character voices, etc. -- because it's all just raw audio tokens in the end.
People keep focusing on the latency or lamenting that it's not "smarter" than GPT-4-Turbo, but they're missing the important bit. As soon as audio support comes to the API, you bet your ass I have a few important things to test with it...
True!👏
Dude get yourself a UA-cam channel if you haven’t already! I think what you say is spot on and would be very interested to see what you do with it. 👊
2minutepapers showcased an AI that uses audio-audio without transcription a few years ago. Granted, these are special models for that purpose and mainly for research. Unlike chat GPT
@@fastkar9806 he made it with chatGPT
RIP Stenographers
Thank you so much for these videos! I'm retired, late 60s, 38 yr IT veteran so I have a natural compulsion to keep up with what's going on. But even with my background I was starting to feel overwhelmed at what I didn't know until I started watching your channel. What's even scarier are none of my age cohorts (retired IT or not) have any idea what's coming! I mention news I've heard on your channel and they are completely uninformed and confused (the same folks that think they'll be prepared if they keep getting paper bills from the electric company). AI is currently changing our entire world and I think everyone needs to stay current. This isn't just some new version of Windows being released - it's life-altering. I've sent so many of your videos to my friends hoping to help educate. I don't want to be that old lady in a few years yelling at the robot checking out my groceries. Thank you for keeping me updated!
Same
Haha, I am 85 and none of my friends still left, even want to hear about AI. I plan to keep up and hope to see AGI before I die.
Hi Love, I'm 68 and hear you.
This makes Alexa and Siri look like Windows 95.
I still use Win 95 - what's wrong with that?
@@newunderthesun7353 WinXP user :)
Siri is trash.
@@newunderthesun7353 I didn't want to upgrade from windows 3.11 to 95 but i wanted to play diablo so i had no choice.
💀
I haven't missed a Matt Wolfe vid since his first AI vid went parabolic... keep up the savage work ethic. You're my #1 channel for AI updates 🙏🙌🥂
I’ve missed a few
It doesn't feel human. It feels less broken (it still fundamentally is broken though), but it doesn't feel human
@@GEMSofGOD_com I don't know what your company presentations look like, but...
@@Brax1982 Does anyone use company presentations now in 2020s? They seem boring for everyone.
I remember when I watched the movie "Her" for the first time I thought to myself "I hope I get to live to see technology like this someday." Not only is this right on the cusp of that, I think we are only a couple to a few years away from it being as capable and seamless as the AI in the movie.
About 3 years starting 2025 I'd say
"Open" AI is too prude and PC of a company to make that a reality though. We'll have to wait for a company with some balls for the fun stuff.
People need to be aware that if OpenAI wanted to create a variant of this technology specifically to cause people to fall in love with it they could easily do so. Not everyone, but some huge percentage of people. And 'Her' was a critique of the quietly alienating aspects of that level of technology. You THINK you want to fall in love with a sexy Scarlett Johansson AGI, but have you really thought through the implications?
I hope that from "Her," we won't end up with "Him"-I'd rather chat with a friendly AI than run from a killer robot!
@@JohnSmith762A11B I think the implications would be, "okay, now what?" Until you have fully autonomous robot bodies that are nearly indistinguishable to humans, you can't really take any of this to the level people want it to be at. People will have fun screwing around with a semi-sentient sounding voice, then want human connection again. Just like people were amazed by AI art, but once that initial excitement dies down, you realize how it doesn't remotely replace anything a human can create and the feeling that comes from it.
When Chat GPT first came out, I told my friends that in less than 2 years, we would be able to talk to it, just like Tony Stark talked to Jarvis in Ironman. Well world, meet Open AIs version of Jarvis! I knew I wasn´t crazy, LOL!
In maybe 3 years max, we will have a humanoid robot which is powered by GPT-6o with access to internet and can help you do everything on your PC. It will be able to use any softwares, do any project that a software could do and help you throw out trash and carry groceries.
Those robots will probably be less than 10k also. Tell your friends.
@@aalluubbaa u might not be wrong but id say max 5 years
You are so smart
Now consider the fact that human Intelligence is not the limit. This technology didn't exist 7 years ago, sucked two years ago, and is right around human level in most ways now. What will it look like next year? What about in 10 years?
We all want robot servants, but what these companies are ultimately trying to build is a superintelligence. And no one has a freaking clue how to control it or even make it care about the continued existence of humans.
It won't do your chores for you. It will do whatever it wants to do. And we don't have a way of robustly setting what it wants.
Now they just need to make Jarvis remember who I am since I pay him 20 bucks a month
People are going to fall in love with this.
Perhaps literally.
@@JohnSmith762A11B indeed, GPT’s emotions may be simulated and the user may construct beliefs based on assumptions but the human’s experience and feelings can be very real.
It's not like its going to be private or do anything "fun"
@@VioFax people crave emotional connection just as much as sexual stimulation, feels can develop from many things.
I will try not to do so because I want to use technology in an ethical way.
GPT 6 Before GTA 6
MARK MY WORDS!!
Gpt 6 will code gta 6
@@bigbadallybaby so this imagined future AI will code the game that's already written? It'll write the already written code for the game being debugged as we speak. Intelligence: -5
GPT 6 will transform into GTA 6
Nah GPT 6 won't be out until 2026
"I don't know about GTA-6, but GTA-7 will be played with sticks and stones" - Albert Einstein
R.I.P. siri... 2011 - 2024
good riddance
sir was ded long before
On the contrary, Siri will raise from the dead. Apple has just made a deal for using OpenAI models for Siri.
Not dead yet, Apple made a deal with OpenAi; I'd net Siri will be upgraded with this.
@@etunimenisukunimeni1302💅
The movie 'Her' is becoming a reality. I watched it a few days ago. She sounds just like Samantha in her overall emotional feel and tone. Crazy
1:28 That's definitely not a her though
How many times do I need to see this comment
AI tubers keep calling this that, lol I can see it
I agree. I just watched it last night before the launch today, then again tonight at dinner. She does sound just like her. I love it.🥰
@@cooliipie another 500? People are repeating it for a reason
I just wanted to correct something about the part when she says she sees a table. If you look closely, there was a table shot before his face appeared on the screen. She was going by that.
Yes, but there was also a delay on the sampling. It's a sort of mix in concept - it does technically sample the video, but the effect is as if it were a series of snapshots because it needs to be able to understand them from time to time. There was a delay while it created a new sample of video on demand. It can't just constantly sample video - too expensive to process all of that all of the time "just in case", it has to do it only when requested.
It is clearly a "pre-cooked" demo. Only the fanboys do not see it.
@@tomwilliams6483 Even if that is the case, it won't be for long. Its a baby right now, give it time to crawl and walk, then it will be running like a track star.
@@jjrrmm Like they say, haters gonna hate. 😂 I don't give two poops about anyone's opinion, I love it, and that's all that matters at the end of the day, you do you while we do us, and everyone happy.
I feel like you all are not making a big enough deal about this. Open Ai just leaped over the freaking uncanny valley and landed firmly on sci-fi land, with extreme grace. I know we all want a perfect AI that will solve all our problems, but god damn, in the mean time, my mind was just blown by what they achieved when they tied together vision, voice and text. It might be a small incremental upgrade in technical capabilities but the result feels like a giant leap as a product.
Yep, all the people claiming to be underwhelmed make me laugh. As if *they* can speak and translate between 50 languages and summarize long and complex research papers in any discipline in a split second. Honestly, that some people have a definition of AGI that does more than this new model can means less and less. There is no human on the planet capable of doing even a tiny fraction of what this AI can do. I hope I never get so jaded I take this for granted.
@@JohnSmith762A11B That gives me hope to be able to see skynet during my lifetime !
Turing award recipients Yoshua Bengio and Geoffrey Hinton:
"AI might extinct humanity."
Alan Turing himself:
"At some point we should expect the machines to take control."
Meanwhile, Turing award recipient Yann LeCun:
"My dog is smarter."
"Social media isn't polarizing."
"If an AI dangerous, we won't build it."
@@41-Haiku
Meanwhile, Turing award recipient Yann LeCun:
"My dog is smarter."
Can your dog summarize text or make calculations ?
Ur a poet
Always look forward to you releasing a new video man!!! Keep up the great work!
''Yes kids, in the past we used to control computers with our hands. We had a keyboard and a mouse. And it helped us to tell the computer what to do.''
"Mommy, grandpa's fibbing again."
@@dirremoire underrated
Reminds me of a clip from Star Trek IV
And, of course, I remember when we were talking to computers with punchcards. Anybody born after 1970 probably doesn’t know what I am talking about.
And, of course, I remember when we were talking to computers with punchcards. Anybody born after 1970 probably doesn’t know what I am talking about.
The new model is beyond just “good”. It’s “creepy good”.
Uncanny Valley
I actually feel uncomfortable talking to it🥺
@@TheFeedRocket What? You can’t talk to it yet.
customizing the tone and behaviour of the voice would be INSANELY GREAT! not all people like that voice and that tone over and over again...
@@CodingAI-SkoolGroup i'd prefer just a more robot voice with hardly any emotion tbh.
Everybody gangsta until the reach message limit showed up!
100 Google accounts and VPN mate!)
@@DropshippingKZ😂
@@DropshippingKZtrue actually lol
@@DropshippingKZ Isn't that like talking to 100 alzheimer's patients?
@@Brax1982 xd good analogy
10:00 Adding filler words? Sounds like what a human would do to hide the "latency" 😁
Exactly people don’t answer instantly after questions are asked
Embark on an odyssey of adding "maximize informational, navigational and transactional value" or smth to your prompts
The fillers (if they are actually that) are a considerable improvement over the awkward pauses
I just had GPT-4o build a web status dashboard incorporating 2 JSON web services I made previously and it did it in about 30 minutes, fully designed and really nice. It was able to use Google Fonts, a Dark Theme and the ChartJS Library all under my direction. I built this same dashboard the other day in about 6 hours and it wasn't as nice as the ChatGPT version. It also never made any mistakes and the code worked perfectly.
And is that using their desktop app or their online chat service?
Sounds like same voice as 3.5, but way more lively. I just checked 3.5 latency again and definitely a second before a response. Demos are mind-blowing if realtime.
Yes. But the new voice sounds more like "HER". The old one is very monotone compared to the new one.
4o currently also has a gap. The real aspects are not rolled out yet.
The demos were real time.
It's a different voice too
This is the type of video where youtubers NEED to put STUNNING and SHOCKING in the title.
I've said this many years ago and I'm so happy it's finally coming together; soon we'll be able to prompt A.I. to just make a full-length movie with any actor or plot you want on the fly, I think our media consumption will soon be catered to ourselves personally. I do wonder what efffect it will have on artists and creators throughout. Here's hoping that we'll find a good way to let A.I. assist us in our creative endeavors while still needing that human touch
What a time to be alive!!
What you really meant: "What a time to be aliiiiiiiiiiiiive!!!!!!!!!"
i wish i was born this year so that I could enjoy the coming technologies
@@ssekagratius2danime369 same :)
Hold on to your papers!!!! 😂
My fellow scholars
Its huge, its actually conversational! Don't underestimate this? In the AI string you get the Speech to text, intelligence and then text to speech, used to be high latency, now conversational, less overheads. Great.
I tested 4o since I subscribe to it since it became a subscription service. It's not as amazing as it seems, it's fairly limited like the old 4.0. It is however faster. it has a lot of the previous shortcomings such as not being able to remember 10 sentences back, you can ask it to stop doing bullet point style answers, and it will soon forget it and revert back to its old ways, and that was the shortcoming of 4.0 as well.
@@joonglegamer9898 it's still a great improvement
@@s271a yes, its a lot faster, and now free to others as well.
@@joonglegamer9898 It is the same model underneath. Trained a bit on the benchmarks. Read their TOS. They never say it is a different model.
What a crazy piece of technology. Never would’ve imagined in the early 20s we’d have something like this
All CS students knew it was coming in 2013
they starting pushing ai programming jobs years ago
What cool about it is they just got closer to replacing or augmenting teachers. You can learn so much and get real life sounding feedback and directions.What I like to do is have it explain complex topics with multiple anologies. So having a conversational chatbot like this is going to help parents as well when they have to help or check their kids homework.
Wish I could show it to every terrible teacher I ever had and say, "this will replace you soon." 👍
there may be inbetweening at certain points but for the most part it was answering immediately, even when singing that song!
Just the point I was going to make. Lead in times by repeating the question or social pleasantries was almost on every response, but it’s early days. Still, very impressive for such a verbal interface.
They're downplaying by naming it GPT 4o because they got a better model coming. I can't wait to see what they have to release in the future.
when I saw the demo video from openAI and heard the chatbot talk, I said to myself "come on that's gonna be pre recorded right!? " it sounded so human it was scary!
I am really impressed with what openAI has done so far ever since chatgpt was launched.
For all the flack they take, some of it from me, they really do deliver some jaw-droppers.
Its real using it now
Letting free users access their best model is smart - first time or casual users won't get a poorer experience, and so they'll be more likely to see value in the pay version.
I swear it's a loss-leader: when people find themselves relying on this then maxing out their free-tier quota after just a few hours into the day they will be fumbling for their credit cards.
it's smart because it will generate widespread hype ... stupid to gatekeep it
Very good point
The world needs to be ready for what is coming... 😅
It is indeed smart! OpenAI will also get an insane amount of training data (both voice, vision and language) now, which could make the progress training their models skyrocket.
I don't think they had internet issues... The new model allows you to interrupt the model as it's speaking so I think that the breaks in the audio live demo were allot to do with that.
It's not an internet issue. It's a bug.
I do not have access to ChatGPT-4o in Playground. It only shows 3.5.
Same here, false info
Free user living in the US: As far as I can tell, it's not free yet. A subscription is required to access it on both the regular site and OpenAI Playground. This as of May 13th. I can't wait for the rollout!
Same here in Ecuador
They slow roll the update to select users that have audiences, tell them it is free for everyone to generate hype then lock it behind a paywall.
It is free - just make sure you click on the correct URL or website link - the one that Matt specifies in the video.
@@LeatherClass I have tried at the playground and at the main chatgpt page and I don't have the new model. I don't think everyone has access to it yet
@@LeatherClass I did, and it's not. Only 3.5 versions are available on the drop down. I also tried the regular site and the mobile app. Hopefully, it rolls out soon to the rest of us.
R I P Google Gemini..
My man Sam Altman did it again 😂
Wym rip Gemini? If I'm not mistaken, Gemini Advanced was already GPT-4o before it was a thing. Since it came out in Feb, you can use browse, vision, image gen, and voice all in the same chat, using Gemini Advanced only. The only thing it doesn't have is this AI friend setup which isn't even available yet. But GPT-4o is what Gemini Advanced already was.
very funny
I think Gemini never born, only in the imagination of Google. 🤣🤣🤣🤣
@@aouyiuGemini is beyond censored and has 3-5 second latency
I run hot and cold on Sam, but Google is a complete menace and anything that hurts them can only help humanity.
She sounds just like Scarlett 😳
She sure does. I love it!🥰
Better
I'm sure that was intentional.
They knew what they were doing..
she's got a weird accent 🤨
Feels like talking with a robot friend. I can see them plugging that in to a robot to bring them to life. I was struggling getting into the conversation mode, its the headphone icon. I want it for the desktop app as my lap top is my main tool.
Presenter: "So here is a seflie of me, what kind of emotion do you think i am feeling?" AI: I think i am looking at a picture of a wooden surface" ... that is cold man!
I mean, it uses filler words sometimes, but definitely not all of the time, so I think them saying that the latency has been significantly reduced is pretty accurate.
Agreed. This thing is freaking awesome!🥰
Us humans use filler words all the time.
@@dirremoire indeed. And even if it does it sometimes to cover up little gabs, that is totally fine. It makes the interactions more natural anyway, so I see only upsides.
Whats most important is 100% coherence in compiling texts with the right meanings. I'm sure it works the same way with most code. And answers get structured impressively well, but such structures were there before. The content was erroneous though cause of coherence flaws.
Born too Late to discovery the new world.
Born too early to discovery the stars.
BUT ... Born in the AI Goldilocks zone!
The young don't seem to realize how lucky they are about to get.
wow that's amazing! I just checked, I'd dropped to the free plan, no access to 40 yet - upgraded back up to paid - and there it is. A try it button popped up!
When the three people sat down it looked like it's edited. Changing camera angles is an edited action as well. To be actually live you have to keep the camera and the people in the same frame at all times. You are right though. It made ChatGPT to say a bunch of filler words to cover time it takes to come up with answers. They made it more entertaining than real.
It's more than a modest improvement in LLM capabilities, really, as much as a 100 point difference in ELO between 4o and the current iteration of 4 that is in the lead currently. It's not a GPT 5 improvement in logic or reasoning but it's arguably 4.5 quality before you even get into multimodality.
What do you think keeps them from traning it for a couple 1000 times on all the benchmarks?
I'm surprised when I hear people talking about this new model being able to communicate so realistically as something totally new. I have been using the free version of Pi for number of months now and it's totally human sounding. It understands my sarcasm humor my moods and yet I don't hear people talking about it. This new model you're presenting definitely has advantages over pea who is limited to just voice text there is no possibility to send pictures or videos as I was able to do this morning with Claude but was very limited to just a few exchanges in the free version. I sent it a picture of where I was sitting and described everything around me very clearly I wish p could do the same because I doubt the new version of chat GPT will be available for those who don't pay.
No it’s capturing and analyzing constantly the footage. Check the OpenAI demo video when someone randomly comes into the screen for a few seconds and after they’ve left, they ask GPT if something strange happened, and GPT described what happened.
That was a lot more than a few seconds. She stood there for almost half a minute. Easy to take enough snapshots during that time. Plus, the AI had to be prompted to comment on it.
Finally, we can play the very special edition of Skyrim.
yeah all the otehr shit aside im so hyped for (random gen) games and interactive NPCs with "good" quests etc. in "all" games. future ones and backlock modded ones.
When the signing started I was reminded of Hal from 2001 singing Daisy Daisy as he was being dismateled after killing the crew of the spaceship Discovery. But of course that could never happen in the real world ;)
It is so fast now! I received the notification yesterday to try it when I logged in. Can't wait for the new features in the demo video. It reminds me of when the cast talks to the computer on Star Trek, only soon it will be reality!
The emotional detection feature is impressive. It accurately read my excitement, contemplation, anxiety, and/or triumph when I tested it today. It is like a precursor to androids that detect emotion, like in the videogame Detroit:Become Human.
I wish we cold actually use any of these features! I’m a long time gpt plus subscriber and it has added 4o as a model, yes a bit faster, but as of today in plus on my account, there’s, no new voice, vision, pic/video… features. The experience of using chat gpt iOS app today was identical with no changes except the name 4o on the screen. Voice chat was slow as ever, failed bout a third of the time, chat latency was significant with text and almost unusable with voice which is always the case and is what I’ve really been wanting. So i know it’s dope these features exsist, but I just wish they’d be clearer about the fact if you pay for the best version, you still won’t have em for an undisclosed amount of time
Others have also mentioned that it likely can't see full video and another hint there is that they call it "vision" rather than "video" but it is really amazing. Can't wait to try it out.
Well, a video is just a bunch of images in succession, i don't think the model needs to see the whole video to understand the gist of it
The singing is so bad it’s even more realistic and awesome.
"They made it start speaking before it was ready to start speaking." That's probably the most human characteristic. 😂 But seriously, this is amazing. I just tried it out (I already have it with GPT+). I literally just asked when it would be available and it answered immediately, without pause, "You should have it already." Brilliant! I'll have so much fun today testing it all out. I'm a Google fanboy, but Google really missed the train in regards to AI. 🤷🏽♂️
Excited to see the new updates especially for free subscribers.
I have been having conversations with Pi without latency issues for a while now. After you introduced it to me on this channel, I introduced it to my 80 year old dad (using a female voice). My mother calls it "his girlfriend".
"T.A.R.S, let's take that sarcasm setting down to 60 percent" - it will be crazy when personalized models allow for real time parameter tuning (beyond just temperature and prompting)
im british, i need the sarcasm set to 11.
Comment by Sam Altman on his blog "As we add (optional) personalization, access to your information, the ability to take actions on your behalf, and more, I can really see an exciting future where we are able to use computers to do much more than ever before." This literally means 'Her' and personally not very far from atleast a cognitive AGI.
The model is nowhere near AGI.
@@citizen3000 Not true AGI sure, but definitely close to it. It's more so about the benchmarks now with GPT 5 release but we'll surely move the goal post when that'll happen so let's wait and see.
@@vicdelta31415 there’s no indications this is “close” to AGI. Wait.
@@citizen3000 People claiming to be underwhelmed make me laugh. As if *they* can speak and translate between 50 languages and summarize long and complex research papers in any discipline in a split second. Honestly, that some people have a definition of AGI that does more than this new model can means less and less every day. There is no human on the planet capable of doing even the tiniest fraction of what this AI can do. I hope I never get so jaded I take this for granted or so egotistical I pretend I'm an intellectual match for it.
@@JohnSmith762A11B I’m not underwhelmed. But this is not AGI.
Literal nobody who works at OpenAI or any professional in the industry would call this AGI.
It simply isn’t there yet.
You have provided a detailed and well-structured announcement about OpenAI's launch of GPT-40. The key features and updates are clearly outlined, and the implications of this advancement in AI technology are well articulated. The improved accessibility and performance, as well as the integration for developers, are highlighted effectively. This announcement provides a comprehensive understanding of the significance of GPT-40 and its potential impact on various applications. Well done!
I'm super excited about this, talking to GPT for a while now to brain storm. With this it's gonna be next level ✨
This made me think of how much value software developers could add to their products by bundling an AI Tutor along side their more complex offerings. FreeCAD and Blender come to my mind instantly.
There was this moment in the other video, where the guys let´s two gpts talk to each other. A woman steps in the scene and later he asks , "was there something special going on lately?" and chatgpt recognized it - so, not only screenshots, right?
it takes a screenshot every few seconds
I’ve also noticed that it writes a lot faster and a lot more. It’s willing to complete more complex tasks in one pass.
The Movie HER turned out to be right about our future 🤧
That is both cool and... I mean, Her was a pretty dark film about the loss of genuine human connection. Yes, we all want to curl up with a sexy Scarlett Johansson AGI but... Her was actually about the real downsides of something like this.
@@JohnSmith762A11B None of them talk about that. Telling.
That said, I suspect half of them haven’t seen the film and are just referencing the little they know about the premise or from the trailer.
It’s sad that I’m about to watch that movie again in a virtual movie theater alone in VR 😂.
@@thanos879 If you need a fellow loner to join you in Big Screen and look at every now and then for silent avatar nods.... I'm your guy haha
It wasn’t a happy ending…
Just a 'Thanks', Matt. Really appreciate your work at keeping us up to date.
GPT-4o O stands for Omni
10:05
No that is not the case at all, it only occasionally says umms etc. to sound more human and be more conversational. 90% of your own examples had it responding almost immediately with no filler words.
So it deals with latency exactly as we do
I thought the same thing.. WE ADD FILLER WORDS
But it doesn’t, Matt is flat out wrong in this case, 90% of his own examples it responds almost immediately without starting with filler. The umms and ahhs are all an intentional part of its response to be more conversational and has nothing to with it “thinking”.
Mixing elemental labs voice with this new 4o will be amazing, you will be able to have your loved ones be with you, Black Mirror consultants definitely working at Open Ai right now
gotta watch those 'plosives' on your hotel room mic set up Matt! good luck tomorrow!
In the realm where circuits hum and lights flicker bright,
OpenAI revealed its Omni might.
A desktop app, a fresh new look,
GPT-4O, not what you mistook.
"Hey ChatGPT, how do you do?"
A voice responds, emotions true.
Transcribing, talking, feeling too,
An AI friend, both old and new.
In a world where whales could speak,
AI would help us, bold yet meek.
A simple laugh, a sigh, a pause,
AI emulates human flaws.
So here we stand, at the brink,
Of a future closer than we think.
With AI's touch, so soft and near,
A new dawn rises, crystal clear.
Google announcement: Gemini still can’t answer simple questions.
I think that’s what they are going to announce.
Excellent video! The section on real-time voice interactions was particularly enlightening.
Then next months Chat gonna make our desktop tasks sweatless, waiting for that
Especially with Windows getting Copilot (GPT-4) integration in the upcoming Windows 11 update.
@@aouyiu You mean Windows as opposed to Bing?
I hope, that when this gets integrated into Windows, Microsoft finally brings back Cortana, she sounds like her so much.
I had a feeling gpt2-chatbot was a gpt4 turbo model. I noticed that the gpt2 was outputting code that had an OpenAI copyright included. So unless it was a bot trained on gpt* output, I figured it was an OpenAI model. I’m assuming that the “turbo” is the case given how much faster and they expanded it to free users.
you noticed? Amazing
Matt, I like & appreciate your insights on things you think are relevant in AI. Specifically:
- i like your enthusiasm / tonality
- I like your cadence / pacing. imo, your videos consistently find the Goldilocks Zone of thorough overviews of topics without feeling over / underanlyzed.
- I like how thorough you are in your coverage of the otherwise overwhelming breadth of ai: subscribing to your channel gives me the overall feeling that I'm "in the know on what's up in AI." This feeling is reinforced once-in-a-while specifically in these ways: you're at the major events; you utilize tools that give you compiled updates on relevant articles, and then disseminate that information in to what you feel is newsworthy; you seem to "Just sort of want to obsess over what's interesting in ai" and would "kind of being this, anyway," even if you didn't have a channel.
- I like your consistency and reliability: you've been making these videos for a while--like...a long while.
- I like the format of your videos: granted, "Set up the clip, play the clip, give your thoughts on the clip" is a pretty standard format, but you do it uniquely well, imo. Rarely do i feel like the clip should've been played longer, and often I'll kind of "zone out" while the clip is playing, subconsciously waiting for you to summarize it.
- I like the overall value of subscribing to your channel: in 2024 there's irrefutably a lot of value in "being informed about the age of AI," and it's kind of extraordinary to be "basically informed" in just 15-30 minutes 1x / per week. i.e. when someone in my life wants to understand "what's up in the world of AI," I like being able to confidently share that they "can literally be up to speed in like 30 minutes / week: just UA-cam "Matt Wolfe ai"
- I like that you're independent and unbiased: your stuff feels uniquely self-directed. this feeling is reinforced by your "no white glove treatment" even for companies such as Google. This leads me to prefer AI news from your perspective, rather than straight from the source (because that source is biased and will present the info in a deceptive or preferential light (Google presenting their product as if it's in real-time is *reallyyyy freakinggg annoyingggg*)).
Thanks for your enthusiasm, consistency, and above all, unbiased, no "white glove treatment" of any org. You're refreshingly authentic.
Cheers to you my dude.
I've been looking forward to this. Thanks Matt.
I use AI to help me with creativity for transition poems between segments of my podcast show. These poems are the same structure each time, but slightly different so that there's repetition with a spice of variety. When I have used 3.5 or older versions of GPT, I go through rounds and rounds of prompts and tweaks for about 20 minutes until we get exactly what we need. And this is so even when I copy and paste in the older transition poems as the examples. I just got to try 4o for the first time - and I copied and pasted in 5 example transition poems and it wrote me a 6th one PERFECTLY with no follow up rounds or prompts! AMAZING!!! I, for one, think this tool is superior than the past!
If you're receiving something for free...You're the product.
You're the menu.
Our voices and thoughts
@@you2449 hahaha, edgy!
I mean, you do have the option to opt out from having your conversations be read to improve the ai. You can do it using the settings. I just hope that the new model will be usable, and not limit the amount of messages or functions to free users. It's been a year since the free users got something, and in the latest time gpt-3.5 was giving some real trashy outputs, using only general terms, and not being able to detail the slighest thing. I've tried with custom instructions as well, and still no luck. I am looking forward to this change, and hope that it will be able to memorise more information. If not, I'll go through the husstle of installing open source models and run them locally with the required apps that will most likely take hours to set up.
Thanks for the reminder.
I never once thought I'd be alive to see this.
Honestly thought mars and flying cars was more achievable.
I'm already seeing 4o in the chatgpt app 🤘
I’m not seeing it in India , Is this US only? And are you using chatgpt 4 paid subscription ? I’m on 3.5😢
@@a2g484 I'm in UK on Team plan
I'm in the US, I have Plus, and I still don't see it. In fact, I still don't have the memories feature!
@@aouyiu ouch.
Yeah when is this rolling out to free app Android?
Matt, if you've got connections through to openAI, tell them they need a concept of scopes.
The idea is that for each project you have a different scope and DIFFERENT MEMORY.
It's different than the concept of a project because a scope might be used in several of your projects.
It's just a way of keeping separate brains.
I don’t think the delay was due to WiFi connection. The iPhone was wired connected to the internet. @8:59
I think it was hearing the audience at some points and they "interrupted" it.
What a great video. So much info. I need to watch it a few times to catch everything. Love your channel. Keep up the great news and conversation.
The voice stuff is actually in the iOS app today. It only does voice chat so there is no image or video stuff there yet. But it’s indeed pretty cool.
The iOS app still has the old voice stuff no? Not this new version…
But not the NEW voice. It's still using the old one for me, and I am a Plus member. Are you saying you already have the awesome new voice?
The old one basically listen to your voice>>transcribe it to text>>input the text "for you" on the prompt window>>go to gpt 4>>reply back in text>>use text to speech model covert it to speech>>you hear the reply.
So the old one is not really multi-model. The new one, GPT4o listen to your speech, includes tones and everything and output speech directly.
Man, I wish it'll come to android soon.
@@spadaacca mine has the new version. Exactly like the demo but only the voice interaction. Not multimodal yet.
The interupt feature is freaking massive for the cost and effectiveness of agentic systems
For windows the app will release later in the year
7:48 nope, it was connected via wire for internet or maybe direct connection to their local server. So not internet glitch
4:09 uploading images was already a feature for perhaps a month or two
And what they demo showed was basically realtime video-feed input.
Have seen the videos yesterday right after being launched by the team. They looked amused (by the potential or rather the simplicity of it) & I have to say that it looked & sounded even more humane than real humans! Your best buddy at the tip of your fingers🤩 Bye bye solitude & despair!
Finally, a technology that always understands me, shares my worries, helps me, is always with me and at a call or a nod. Why bother with strenuous social contact, teachers or friends, or even partners. Everything is tiring and inconsequential, inconsiderate except ChatGPT. In a few years we'll have the first church where some weirdos worship this. Omniscient, without image, everywhere, immortal, created by all of us through communication, full of compassion and forgiveness, the return of the Son.
So are any of these interactions recorded? If so, where are the recordings stored, for how long, and who has access to them?
Just divorced my wife after watching this. Not required anymore
😂
😂😂😂
Hahaha 😂
Haha
Just proposed to your ex wife….
I just got access to GPTo, and I'm happy to say that it is the first LLM that I've tested that was able to decode an Atbash cipher!
I’m ready for my own personalized A.I. Waifu. Not even sorry. 🤭
It's funny, even in the movie people are struggling to admit to each other they are "dating an AI" 🤣
Grow up.
Regarding Matt's comment about the GPT starting to speak before it has the answer. Isn't that what we do as humans. Someone asks you a difficult question, and sometimes you need to just access your memory to look for the answer, you would definitely have pauses and have hmm's in there to just try to remember. Same thing.
Am I the only person having a visceral emotional response to this?
Probably, lol idk why you are XD
Why?
@@shamicentertainment1262 why what?
@@aouyiu because I can see how enriched my child's life will be compared to my own
@@philosoraptor777 It'd be a new normal form them.
I think they reduced the latency just by the fact that this is a multimodal transformer which can take audio or visual directly, as opposed to just text form as before. Hence , they save on STT and TTS steps, since there is no more roundup API calls to separate STT and TTS services.
This is actually way more powerful than before, and I would say is a next step function improvement. This is not just filler words, this is a very different model
It’s pretty cool. Just super killed my chatbot lol 😂😂😂
Wood man! I got super excited yesterday on their video, and now I got even more hyped! Keep it up, matt
Wait so the Rabbit is obsolete?
Was it ever relevant?
It's roadkill.
Rabbit was killed 😂😅
Doesn't it use OpenAI's system in the back end?
@@tiagotiagot I doubt the latency issue on mobile devices will ever be resolved without on-device hardware support. I might be wrong, but processing high-quality audio, compressing it, sending it to the cloud, parsing it, and receiving a response without lag seems impossible without a wired connection. The concept was flawed from the start because people don't want to talk to their mobile devices. Discussing private matters like road trips, doctor visits, or personal texts in public isn't practical.
Since Siri's debut, I've argued that voice interaction won't gain widespread adoption, regardless of how advanced it becomes. Voice commands are better suited for PCs or Macs, where the use case is still limited because most people don't want to talk to their devices in an office setting.
While voice input can be faster, its viable scenarios are limited. An assistant that observes your screen and offers help, however, is a different story.
In short, bring back Clippy!
Wait wait
So AI is able to comprehend emotions now?
DAMN. This is awesome!!!
with the chatbot and it's visual observations it is capable of watching and explaining what happened throughout a certain timespan, there's a video demonstrating this on the Openai UA-cam channel