Meditation on GPT 4o and Google I/O

uncoverage

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 19 гру 2024

КОМЕНТАРІ • 65

@tintintin070 7 місяців тому ⁺²
Very interesting to hear your thoughts. Just to build on this, as you were talking I was thinking about how in the music world, in some Audio workstation (Fl studio, ableton live) a lot of those UI elements serve single functions, where it would actually be more time consuming to say “turn down the attack on that synth a bit” vs. just doing it yourself.
But then with more broad series of tasks like you said, “Put a spacey reverb with a low pass filter on the guitar track” it would be a lot easier to just say it and have an agent do it. What this got me thinking about is that depending on what you’re looking for, having an agent be your entire engineer could be a lot more productive and faithful to your creative vision. If I go to a studio, not looking for a producer to put his own twist on my idea but just looking to actualize my idea (*translate* it) then I would prefer an agent over a person
@uncoverage 7 місяців тому
and I think the complexity of that translation is what determines how good the output currently is. if the context isn't there for the AI to know what I mean when I ask for something, it will try to translate it and may come back with something overly generic
@Archer.Lawrence 7 місяців тому ⁺¹³
I love this angle, I think LLMs as a new interface is a great point, and I think an under appreciated perspective.
I will say though, as a traditional and digital artist of more than a decade, I can say with absolute certainty that “AI is creative.” I think intuitively it shouldn’t be, and there is a deep human bias against this fact, but if you sit with the tools and commit yourself to being open to new forms of creativity, then you will see that this is undeniable.
AI is different from photoshop, or pen and paper. The tool ITSELF creates original concepts, visual themes, and stuff you never asked for but is perfect for what you did. Creativity is NOT a uniquely human trait. It’s just one we have been better than machines at until this moment in human history. The sooner our society can accept this, the sooner we can bask in one of the greatest times in creativity since the renaissance.
@uncoverage 7 місяців тому ⁺⁴
agreed that AI is creative! when people talk about "hallucination", I usually hear some form of "creativity". that said, I think the real problem is to solve for creativity + taste.
@Archer.Lawrence 7 місяців тому ⁺²
@@uncoverage Yesss, preach, another way to say creativity is “curated randomness” which is what an LLM is in general. Those hallucinations are a feature not a bug. And yes exactly, that taste is absolutely critical. People who have never made AI art think you just put some words in and, boom, done. I’ve spent hours and hours slaving over prompts, then editing every single photo, compiling and arranging them, cutting them to music, and packaging them into a final video product. My taste at every step of that process is what separates me from someone else, and that taste is what matters in the end as far as impact on the audience.
@thelenbax8497 7 місяців тому ⁺⁴
Such a refreshing take, I also think we're at the forefront of a creative renaissance. Very exciting times indeed.
@DeltaNovum 7 місяців тому ⁺²
Thank you for taking a stance on this outside of ego. I'd even say that alltough LLM's are just a facsimile of our own intelligence, our brains are just biological computers that take in information and pattern to create new combinations of those. To create new kinds of information and patterns. Which I believe creativity in its essence is.
@Archer.Lawrence 7 місяців тому
@@DeltaNovum Obviously the following is being facetious, but i think it’s funny that the whole time that everyone is asking “Are these computers conscious?!” the real question should be “Are WE even conscious?” At this point it kinda seems like we are biological automaton, machines running a program, without agency or free will. We are an eddie in the cosmic flow of entropy, nothing more, nothing less. These machine we are making are more like us than we want to admit, because it would undermine… basically every religion, spiritual belief, and even our own interpersonal experience of self. It FEELS like we have agency, but the machines may soon feel the same way. So who’s alive, and who’s not? Spooky.
@Shlooomth 7 місяців тому ⁺⁸
You ever write notes on your phone? Ever think about how much you notes would weigh if you printed them all to carry around? At some point a difference in amount becomes a difference in kind. The amount of information you can carry around on your smartphone is unimaginable. Similarly the big difference with realtime interaction is in the kinds of interactions it can have at a speed that makes them worth doing. I’m legally blind and GPT’s vision capabilities are really helpful sometimes. But often the limitation has been, if only I could just have it out continuously and give a running commentary for me like an audio description for real life. That’s why this technology’s value is indescribably high for me personally.
@uncoverage 7 місяців тому ⁺²
completely agree with everything you say here. I'm definitely on record saying how much I find human interface/UX changes profound! I'd love to hear more about how this technology has been transformational for you, particularly what use cases have been useful day-to-day
@caseyhoward8261 7 місяців тому
Thank you for the app creation "inspiration"! I'll be shipping that app in the next few weeks! ❤️
I'll update this message as soon as I launch it on the app store! 😉
@uncoverage 7 місяців тому
wait, can you actually reply instead of editing? it will show up in my notifications if you reply and I'd love to check it out
@adonisvillain 7 місяців тому ⁺⁷
Ohoho, that's what i gonna listen on today's walk. Love you thoughts quality
@uncoverage 7 місяців тому ⁺¹
excited to hear your thoughts!
@kidd8393 7 місяців тому ⁺¹
Great Video. Stumbled here by accident and really enjoyed listening to you.
I am with you on your basic thesis regarding summarizing and translating. Furthermore I would add search to it. Because this is what advanced chat bots and copilots are doing. They search in their trained knowledge base. And instead of having to understand how to google or formulate good queries, you can query in your own style. Still your query has to include all the information necessary to provide enough context for a quality answer. But the structure of this information is less fixated.
I think this is somewhat under utilized. Something as simple as a search box could be vastly more useful for less experienced users than before LLMs. And I don’t know if a chat bot or copilot is the final form of user interface for this kind of interaction.
Let’s see what’s apples take in this next month. I hope they innovate in some meaningful way.
But GPT4o is also some attempt in packing this functionality in a new kind of interface.
The final form will be standardized over the next years, after the hype cycle has closed down and the general consensus of LLMs is more realistic about its short comings.
@uncoverage 7 місяців тому
thank you! I'm glad you enjoyed this!
for your first point, I'd classify that under "translation". in general, the AI is not doing the searching itself. in fact, the discussion around retrieval augmented generation (learn.microsoft.com/en-us/azure/search/retrieval-augmented-generation-overview) shows how search itself is gapping up flaws in AI as-is.
of course, this doesn't invalidate your broader point. you're totally right that search becomes more useful if "best practices" can be inferred from a query of a less experienced user.
@helloimedden 7 місяців тому ⁺¹
I think all these points not only are super interesting to the topic you picked, but they could be completely applied, If not Word for Word than closely, to the way that we are creating AI. I think you summed it down to inputs and outputs really. And AI creation is pretty much a game of inputs and outputs. So when we talk about good and bad input and outputs, whether that’s on the creation of AI or the user side with interaction with AI, I think the issues are aligned in that sense.It relies on good and bad outputs. I think the difference is with creating AI and the use of it will come down to which one is first to need less quality inputs to produce a better quality output. So in a sense, I think Google more than open ai are betting that AI and their ability to provide better inputs to create it well in the long run make it so users can actually provide terrible or no inputs and fundamentally get better results then the need to have good or decent user input methods/design/features, but I think we’re ways away still and that requires you to live in googles world and let them scan your emails and life away
@uncoverage 7 місяців тому
congrats, you're 2/2 for sending me down rabbit holes on videos where you've commented :)
in particular, I'm thinking about my video on input resolution (ua-cam.com/video/cxgnJjzfF6M/v-deo.html) and how your point about "less quality inputs" is the equivalent of saying a "low resolution input" - just not for a GUI. I may have to explore this more in another video!
@merlin_cg 7 місяців тому ⁺¹
You have really interesting perspectives on this and I absolutely agree with ai being much better at explaining/translating! I would hesitate to straight up say that GPT 4o is just a UX play though, because the multimodal inputs being baked into one model should give it more dimensions for a huge increase in neural connections outside of language. The fact they can already combine all of these inputs into one model is actually really ground-breaking to me, and we are only seeing UX benefits because its being shown on a phone/tablet, imagine a few more steps in this direction with robotic integration.
I don't mean to make it dramatic but I do fear they are rushing things without enough thought.
@uncoverage 7 місяців тому
that's totally fair; of all the claims I made in this video, the idea of multimodality in the model itself is the piece of the technology that I understand the least.
your comment is definitely interesting given what's going on at OpenAI!
@SimonHuggins 7 місяців тому ⁺³
Multi-modal is the next step towards bridging the gap between different areas of thought. The synthesis between different areas of the brain is crucial to us as humans and a source of much of our inspiration. I see this as an important stepping stone. Embodiment being another.and VR as a way of synthetic interaction bringing it all together. I suspect the internal mechanism of LLMs will be refined over time too to help converge these interaction modalities.The creative potential for us to orchestrate all this in tandem with other humans and AI agents will be fascinating. Looking at the state of agents right now, we’re probably 2-3 years of years away from thinking of these agents as AI colleagues rather than their state of being useful juniors as it is right now. Amazing time to be around - when Science Fiction starts becoming Science Fact.
@uncoverage 7 місяців тому
agreed that multi-modality is a key interface requirement as AI gets folded into our computing environments - the more input mechanisms, the more fun things get!
@ViceZone 7 місяців тому ⁺¹
The terms "multi-modal" and "omni-modal" AI refer to the capabilities of artificial intelligence systems in processing and integrating different types of data or modes of input.
**Multi-modal AI:**
- **Definition:** Multi-modal AI systems are designed to process and integrate information from multiple modalities, such as text, images, audio, and video.
- **Functionality:** These systems can understand and generate responses that involve more than one type of input or output. For example, a multi-modal AI might be able to describe an image in text, generate images from text descriptions, or synchronize lip movements with speech.
- **Examples:** A chatbot that can answer questions based on both text and images, or a virtual assistant that processes voice commands while also analyzing visual data from a camera feed.
**Omni-modal AI:**
- **Definition:** Omni-modal AI represents a more advanced and comprehensive capability where the AI can handle an unlimited number of modalities and seamlessly integrate any new type of data or input modality as they emerge.
- **Functionality:** This type of AI can process and understand all available data forms and integrate them without requiring significant re-engineering for each new modality. It aims to provide a universal understanding and interaction capability across any possible modality.
- **Examples:** A highly advanced AI system that can understand, learn, and generate responses across all conceivable types of input, such as text, speech, gestures, images, videos, bio-signals, and even new forms of data that may be developed in the future.
In summary, while multi-modal AI is focused on combining and understanding multiple specific types of data, omni-modal AI aspires to a more universal capability, handling and integrating any and all types of data seamlessly.
@uncoverage 7 місяців тому ⁺¹
I'm fascinated by the idea of multi-modal vs. omni-modal, but I'm not sure I'm convinced of the difference between the two. isn't omni-modal just a "baked-in" multi-modal model, mostly reducing latency?
@helloimedden 7 місяців тому
12:53 very interesting point about creativity. I genuinely agree about with opinion about AI lacking in creativity as opposed to other functionality, I think the counterpoint I would make though: in terms of mass adoption and the general public, most people don’t really have a great understanding of what good creative is or really care about creativity if they have other benefits. So for buying things creativity generally won’t matter if there are benefits like: it’s cheaper or better quality or easier find. I know a lot of people Will happily pay for terrible logos or website, designs, or store, signage, designs, or billboards or any marketing content. Fivr thrives because people don’t either understand what’s good or bad, have bad taste, or just need it done.
I think in terms of someone in a creative/job that means they’re probably safer at least for a while. But in terms of end users and probably the majority we really will see people not being impacted by whether AI is good at creative or not. As long as they figure out those “convince/benefits” and how to explain it and addict us to them like they do so well in the tech world
@uncoverage 7 місяців тому
you're totally right - and I wonder if your point here is going to be one of the major effects of AI - a dying "middle class" of creative work as that level of quality becomes commoditized
@ozten 7 місяців тому ⁺²
I think Gpt-4o is about user growth and adoption. When OpenAI made gpt-2 and then gpt-3, only researchers were playing with it. It was a better markov chain. It wasn't until they had the breakthrough idea of RLHF training it to act as a helpful chatbot that user adoption exploded "over-night".
All of the amazing UX, Product, and demos are aiming for a similar step-change of adoption and active-daily use.
One speculation is that this real-world multi-modal usage can generate new training data for the foundational model(s).
@uncoverage 7 місяців тому ⁺²
yes!!! it's really interesting to see the industry grapple with how to "review" chat bots (and AI features in general) as a new plane of competition in tech emerges - I feel like LLMs are particularly hard because the provide the veneer that they can do nearly anything, but often have very narrow use cases where they make sense (or are implemented well) at the moment
@dreamphoenix 7 місяців тому ⁺¹
Thank you.
@uncoverage 7 місяців тому
thanks for watching!!
@MrArdytube 7 місяців тому ⁺²
A couple bits of feedback… i think that most people still are not clear on how a search engine like google is improved by adding an ai front end. Also, as an interface geek, it might be useful to cover the underlying design choices that are observed in different models. A key difference i see is that Claude3 opus is significantly more conversational… in case you want to explore a topic as you might with a friend… whereas gpt is very good at report style responses that summarize and organize information
@uncoverage 7 місяців тому
I'm curious what you mean by that first sentence, mostly because when I have shown people around me a more conversational search interface (e.g. early Bing/AI integrations), that's been the easiest thing for them to understand. what is much harder to understand is where the information is coming from and what should be trusted (in fact, many people I've shown just take it at face value without question)
@MrArdytube 7 місяців тому
@@uncoverage Well, my first sentence is ck… “i think that most people still are not clear on how a search engine like google is improved by adding an ai front end.” So… i am saying there is some improvement via adding an ai front end to google. Then, your question might be..What do i men when; i say improve. I mean an improved user experience… i do not mean more accurate, i do not mean more comprehensive… i mean a better user experience. So.. “ lets say i do a google search. Do i get one result, or two., or 29… no i get 20,000 results. Does any one of those results specifically answer my question … even if i can find it among the 20,000…probably not. So…if i understand you correctly, you are proposing that most people will prefer to read completely through 20,000 google results. Of course i am being ironic, but the fact is… unless you are doing academic research … the digested answer from perplexity will be what they want… which is what i meant. All that said, i asked google the question of what is the advantage of using ai vs a search engine like google… the google response was
Improved accuracy: AI-powered search engines deliver more accurate search results by understanding the context and intent behind a search query. Personalized results: AI-powered search engines can provide personalized search results based on a user's real-time search preferences.
Adding an AI front end to a search engine offers several advantages, enhancing both the user experience and the efficiency of search operations. Here are the key benefits:
Enhanced User Experience
Conversational UI: AI front ends, particularly those using Generative AI and Large Language Models (LLMs), can provide a conversational user interface. This allows users to interact with the search engine using natural language queries, making the search process more intuitive and user-friendly.
Personalization: AI can tailor search results based on user context and preferences. For example, using machine learning models like those integrated with Kubeflow and Katib, search engines can rank results by considering user-specific features such as location, age, and past interactions.
Dynamic and Flexible UI: Headless search architectures allow for greater flexibility in designing user interfaces. This means developers can create more engaging and customized search experiences without being constrained by backend limitations.
Improved Search Efficiency
Automatic Metadata Generation: Generative AI can automatically generate metadata for documents, improving the indexing process and making it easier to retrieve relevant results. This is particularly useful in enterprise settings where large volumes of documents need to be categorized and indexed efficiently.
Learn-to-Rank Algorithms: AI front ends can employ learn-to-rank algorithms to optimize the ranking of search results. These algorithms use training data to learn which results are most relevant to specific queries, improving the accuracy and relevance of search results.
Scalability and Performance: Headless search architectures decouple the front end from the backend, allowing for better performance and scalability. This separation enables faster updates and maintenance, as well as the ability to handle larger and more complex datasets effectively.
Additional Benefits
Spam and Ad Filtering: AI can help filter out spam and ads from search results, providing a cleaner and more relevant search experience. This is particularly beneficial for privacy-focused search engines that aim to deliver unbiased results.
Enhanced SEO Capabilities: AI-driven search engines can better handle technical SEO tasks, such as optimizing for mobile responsiveness, managing redirects, and ensuring proper site architecture. This leads to improved search engine rankings and better visibility.
@lilmichael212 7 місяців тому ⁺¹
I really do think where AI assistant UI really explodes, and new technological properties emerge that weren’t possible before is at the convergence between AI and powerful but low friction Spatial/AR devices. When AR hardware catches up to the functionality of these powerful models and they can break free from the 2D and have a real sense of presence in our lives with real time, real world shared context through something like glasses, THATS when all this stuff really makes sense and really has a “soul” of its own like you said. I largely see AR being the way we give eyes to AI and invite them into our lives in a real way. Until then they’ll largely be useful glorified Siris in our phones. The next era however comes when AI truly see’s what we see, augmenting your world in really useful ways and you don’t really even think about it much.
@uncoverage 7 місяців тому ⁺¹
I love this and definitely will explore this more deeply. I'm wondering about the lack of input precision AI interfaces encourage and whether that can be solved with more context, as you allude to
@red_onex--x808 7 місяців тому ⁺¹
Thx good perspectives
@uncoverage 7 місяців тому
thanks for watching!
@WigganNuG 7 місяців тому ⁺³
So, I heard GPT 4o is actually not just a UX / UI change, but actually under the hood tech is different; I've heard the the Speech interaction is actually 1 to 1, not being translated. Which is to say when you speak its not translated it to text for the AI to read, but its received as audio, and the AI voice response is not translated from text, but straight to audio response. Let me know if I heard that wrong, but I think its part of the reason its faster.
@uncoverage 7 місяців тому ⁺²
this is totally true. the multi modality is baked into the model, which is sick. but functionally, the difference seems to mostly be in UX
@davidpark761 7 місяців тому ⁺²
@@uncoverage the voice model that is instant, and the dramatically better image recognition and generation features are not out immediately, they are being gradually introduced to users over the next weeks
which is fucking bullshit just fucking GIVE IT TO ME!!!!!
@uncoverage 7 місяців тому ⁺²
ha!! agreed all around - i want to see what it feels like to use!!
@BenjiManTV 7 місяців тому ⁺¹
Seems to me most people are expecting so much from OpenAI that these updates are just slapping shinier paint on a ok running vehicle. Just make the vehicle run really really well and people will want to drive it. Make chatgpt answer questions very accurately, and people will use it.
@uncoverage 7 місяців тому
generally speaking, agreed. user interface is profound - but it also is only one part of the whole experience, and when it comes to LLMs, is far from the part that needs the most work. as for accuracy, I think that layer exists below the LLM at the moment, and largely is a function is underlying technologies up until the point that AI can reason
@georgeyoung8721 7 місяців тому ⁺¹
I think the interface should be considered an important part of the technology. increasing the understanding of an AI of its surrounding I think could be a way towards much more intelligent models, scary to think about all of the video and audio data open ai can collect from these New free models for trianing.
@uncoverage 7 місяців тому ⁺¹
completely agree! I have a whole playlist featuring my videos where interface plays a major role in the story: ua-cam.com/video/tRTGR90eaas/v-deo.html
as far as privacy goes, completely agree. i'm very curious about how Apple will spin their privacy angle, especially since it seems like they've partnered with OpenAI
@BingiQuinn 7 місяців тому ⁺²
I honestly like the way you dont face the camera th whole time besides the content of course.
@uncoverage 7 місяців тому
glad you like the style! let me know what you think of my other stuff too!
@flor.7797 7 місяців тому ⁺²
Who’s publishing content if Google doesn’t send traffic to publishers anymore 😅
@uncoverage 7 місяців тому ⁺¹
it’s kind of astounding that we’re on the verge of a major collapse of the web as we know it after this many years.
@MekonenMeteor123 7 місяців тому
Nailed it!
@uncoverage 6 місяців тому
thanks for watching!
@helloimedden 7 місяців тому
4:20 ohh I got a shout for my last comment and it’s right at 4:20 mins in nice 😂
@uncoverage 7 місяців тому
Ha! I'm glad you noticed it!! I'm so sorry I couldn't remember your username in the moment 😭
@heavenrvne888 7 місяців тому ⁺¹
yes!!
@uncoverage 7 місяців тому
🙏🙏
@DivinesLegacy 7 місяців тому
I didn’t order a yappuccino bro
@uncoverage 7 місяців тому
what *did* you order 🤨
@zidane3250 7 місяців тому ⁺¹
I don’t agree with you on the translation point. It’s bad, really bad.
@7200darkcharm 7 місяців тому ⁺¹
What languages are you speaking?
@uncoverage 7 місяців тому
I'm curious to hear more! also, I define "translation" broadly here, to the point that I almost considered subsuming "summarization" under my definition of "translation". as an example, I might ask a LLM to read a PDF for me and give it back in the style of a Shakespearean sonnet - a form of translation from one "form" of text to another.
@zidane3250 7 місяців тому ⁺¹
@@7200darkcharm
I’ve tested it on 60 pages of English to Arabic and some English to French texts. While the French-English translations were a bit better because of their shared Latin roots, overall the results were just mediocre.
@zidane3250 7 місяців тому
@@uncoverage AI language models are great at tasks within the same language, like summarizing. However, they struggle when switching from one language to another, probably because of the differences in language structures and logic.
@uncoverage 7 місяців тому
kind of makes me wonder if my broad definition of translation should be narrowed to "English-to-English" translation?

Наступне

Автоматичне відтворення

Unreasonably Effective AI with Demis Hassabis