Gemini 2.0 Flash

Sam Witteveen

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 27 січ 2025

КОМЕНТАРІ • 103

@imadsaddik Місяць тому ⁺⁵¹
Man that conversation with Gemini and in Thai was so so cool.
@mshonle Місяць тому ⁺³²
Sam speaks Thai! Quite the flex to slip in there!
@patientsrights Місяць тому ⁺²
This was not on my bingo card. I don't know why I'm surprised. Sam is a pretty clever guy.
@countofst.germain6417 Місяць тому ⁺³³
The voice is damn good, I'll give it that, sounds as good or better than advanced voice, also we have seen the native image output from openai in the demo.
@samwitteveenai Місяць тому ⁺⁵
AFAIK the openai image generation was all Dall-e
@countofst.germain6417 Місяць тому ⁺¹
@samwitteveenai It isn't released but they showed consistent characters and scenes, so i assume it must be native, I'm pretty sure they said it was, I could be wrong though. It was when they showed off the 3D modelling too.
@mdeutschmann Місяць тому ⁺²
The conversation is really nextgen 😳
@jacobgoldenart Місяць тому ⁺⁷
Woo. The versatility of the voice to go from whisper to different expressions is next level. Similar to notebook llm podcast feature. Impressive stuff!
@thenoblerot Місяць тому ⁺⁹
I've been building an VLM controlled Turtlebot2 based ROS robot (recently switched over to Gemini from Haiku 😢iykyk) Today's announcement was awesome. Native spatial reasoning is incredible and undersold! 3d bounding box creation is kinda wow. Not to mention the real-time speech, video and audio in.
The normies are not ready. I showed my septuagenarian parents my robot for the first time yesterday - at first they thought it was cute because it has STT and TTS, vision, silly animated face and arms... until they realized they had this weird alien intelligence wandering around their home and got creeped out 😆🤣and tbh i don't really blame them. What a time to be alive!
Thanks, Sam! Glad you've got early access - looking forward to seeing more!
@dandushi9872 Місяць тому ⁺⁷
One thing I love is that even if AGI won’t exist in the near future, we are definitely in a new Industrial Revolution! I’m excited ❤
@MojaveHigh Місяць тому ⁺²¹
It's day 5 for OpenAI and they are live, but here I am watching your overview of Gemini 2 Flash. And this is way more interesting.
@DekuParker119 Місяць тому ⁺¹
Yeah to tell us about apple intelligence AGAIN
@Timely-ud4rm Місяць тому
day 5 started? I check there twitter and haven't seen anything. and I try to keep up with this stuff
@Timely-ud4rm Місяць тому
@@DekuParker119 yay apple intelligence! I bought the new Mac mini and apple intelligence is shit. It's bacially old Siri with a new coat of paint. I know they haven't released the new macOS yet but I know it's gonna be shit.
@MojaveHigh Місяць тому ⁺¹
@@Timely-ud4rm They stream at 10am PST every day during the 12 days
@Timely-ud4rm Місяць тому ⁺¹
@@MojaveHigh I know haha my point was there annulments were so boring I missed it haha. day 1 I was surely ready to watch it.
@lydedreamoz Місяць тому ⁺⁸
Man your video was insane. Google is definitely going for OpenAI and Anthropic with 2.0
@paulmiller591 Місяць тому ⁺⁴
Wow, this could be very interesting for doing some customer guidance RAG work. My day has now been reorganised!
@unclecode Місяць тому ⁺¹
I said that before and say it again, I'm really happy that Google this year is back on track and focusing on two things: one, shipping regularly for developers, and also working on foundation and LLM enhancement. Keeping these two aligned is really something, and now look, they are the best one providing such real-time communication with LLM in such a native way, amazing.
@samwitteveenai Місяць тому
Agree lots of lessons have been learnt and acted on over the last 12 months.
@firesoul453 Місяць тому ⁺³
Finally a real alternative to advance voice mode.
@PrabhakarKrishnamurthyprof Місяць тому
Fascinating, multimodel, greatest experience. Thank you Gemini
@tonyrungeetech Місяць тому ⁺³
I'd love a video on how to use Gemini to make a voice based customer service agent. When it generates audio, can it make tool calls in the same response? Do you get a transcript of the audio and then use that for decision making, etc? I'm familiar with how to make general agentic workflows but not how to integrate audio or phone systems.
@KkfightStarBaal Місяць тому ⁺³
Can't wait until I use my Nuclear Powered Data Center with my own LLM!
@klammer75 Місяць тому
Finally! Been waiting for google to release something we can actually build with! It's go time Sam!
@FredPauling Місяць тому
First time I've been genuinely impressed with Gemini. Nice flex on the Thai by Sam and Gemini.
@amandamate9117 Місяць тому ⁺¹⁶
google with this, gonna destroy openAis 200$ subscriptions
@miticojo Місяць тому
Great review Sam
@yossawat24 Місяць тому ⁺²
รีวิวดีมาก ทำให้เข้าใจมากขึ้น ขอบคุณครับ 😊
@PhilEhI Місяць тому
Mind blown 💥
@CaveManMr Місяць тому
Love to see you play around with RAG and the live api interface.
@TomanswerAi Місяць тому
Great summary 🙏
@phongthe1272 Місяць тому
Awesome! I just updated AI knowledge by your video. I can not wait next video.
@phongthe1272 Місяць тому
I tried talking with Gemini by Japanese. It is not like my dream :)))
@j4cks0n94 Місяць тому
Google's release was very cool! For a new video, maybe compare OpenAI Realtime API and Google Multimodal Live API.
@TreeLuvBurdpu Місяць тому ⁺¹
One of the biggest shocks in this video is that you speak Thai fluently.
@kokoabassplayer Місяць тому ⁺²
You amazed me when you speak Thai! +1 sub from me.
@lgmuk Місяць тому
Very good video!
@diga4696 Місяць тому ⁺³
I just imagine that the children 50 years from now will laugh at the primitive capabilities of Gemini 2.0. Capabilities that are definitely alien to us.
@Timely-ud4rm Місяць тому
I wondered why they inproved the UI which I love the new design looks really clean. Gemini flash 2.0 is incredible can't wait to see the pro version
@chelly6941 Місяць тому
Crazy! It can answer the image not only the text! I think it totally surpasses the Openai.
@prod.bykbrd5723 Місяць тому
Thanks for everything ❤
@yellowboat8773 Місяць тому
Since llm inception, ive always wanted a true voice assistant. Not like the standard generic assistant, but an assistant that knows me, my work schedule, my hobbies, timetables, interests, all of it, then have the assistant be a true Assistant, jumping into your day when it needs, giving advice where needed, and also a general personality to talk to. I feel like this is the closest to that so far, we'll just need a way for the llm to remember all the important details over time, and adapt to me. Feels like we are not far off that. Whats your thoughts?
@samwitteveenai Місяць тому
This is one of the things I've been working on for the past year. You can do pretty well at the moment using things like knowledge graphs and constructing these on the fly to give the agent a memory, so that it knows things about you. One of the big challenges is getting it to see all the inputs that go into you from sources like mobile phones, offline text, and things that are not all online in general.
I currently try to have my personal agent track everything that I read on my computer and all the UA-cam videos that I watch, so that it can easily refer back to things that I've seen. The challenge is if I've seen them when it wasn't on my computer.
@husanaaulia4717 Місяць тому
The similar open source version is Janus 1.3B from DeepSeek
@dragonbing Місяць тому ⁺¹
It's more coherent than Chatgpt's AVM for longer conversation. Deepmind cooking
@matterhart Місяць тому
Does anyone else feel like the voice tone implied the AI didn't want to keep talking? Like that perfect middle ground of 'I'll be professional, but I'd really rather be doing something else'. Maybe I'm just picking up on the neutrality. Props to google for getting it so good and I'm only even noticing things this nuanced.
@samwitteveenai Місяць тому
This is a really interesting comment, for me I often find when I'm using it that it kind of feels like it's either too happy or too agreeable. I wonder how much we each interpret it differently. The different voices do sound different overall though and the system prompt does affect it as well.
@reza2kn Місяць тому ⁺²
I'm more impressed by your Thai than Gemini's 😅❤
@chaseme4248 Місяць тому
I watched the Google press conference, I feel like there's a lot of hype. I wouldn't get my hopes up
@ShlomiSchwartz Місяць тому
Kap khun krap - for this video 🙏🏼
@lalilaloe Місяць тому ⁺¹
This will be great for learning languages as well. Hopefully it can even correct your input in the near future
@RickOShay Місяць тому ⁺¹
Impressive! But can it admit that it cannot answer a question - can it say I'm sorry Dave I don't know how to open the pod bay doors - not just a preprogrammed response when asked about sensitive subject matter - but one that comes from the sense of emptiness or inadequacy. The recognition of the absence of self contained and verified knowledge. Not knowing is an important part of becoming self aware.
Self-awareness is created through the internalization and realization feedback loop - by becoming aware of its own limits and self - its boundaries - the ability to differentiate - me and not me.
And equally as important - the ability to admit defeat - knowing its limitations will go a long way to building confidence and trust in AI in general.
@LeoRizoLeon Місяць тому ⁺¹
Where can I try the voice?
@vsalvato53 Місяць тому
After a first phase of frustration trying to use the std Gemini interface which could not really help me in the multimode output I realized that this can be accessed, for the time being, via AI studio interface. The Vocal output is a wonderful plus... Though I could not manage to get Gemini 2.0 to generate images... is this something that can be done nly submitting a starting image such as the car to turn into convertible in the example?
@samwitteveenai Місяць тому
The image generation stuff is still in private preview for now, but should be available early next year to everyone.
@llmtime2178 Місяць тому ⁺¹
Image generation not working??
@samwitteveenai Місяць тому ⁺¹
still in private preview but hopefully public soon
@kora5 Місяць тому
พูดไทยก็ได้.... Really cool krub.
@jackbauer322 Місяць тому ⁺⁴
voice is not working
@samwitteveenai Місяць тому
voice I think is still in private preview
@kokoabassplayer Місяць тому
It's work in my case, it work OK. But can't make it to speak thai
@TheOrionMusicNetwork Місяць тому ⁺²
works for me
@avi7278 Місяць тому ⁺¹
This is the first time ive heard an AI voice and thought like yeah, I want to talk to that thing. It has a very authentic quality. I loke how it responds and is very like straight forward, i donno how to express it. "You can call me.gemeni, i have no preference" actually feels like someone authentically not caring what you call them because they're so much more than that it's feels childish to call it something. Weird....
@janalgos Місяць тому
How can you generate images with it? its telling me it's unable to generate images
@samwitteveenai Місяць тому ⁺²
that is still in the private preview for now, but hopefully will be made available for everyone soon
@Dht1kna Місяць тому
Interleaving text and audio is not supported right? or image and audio?
@samwitteveenai Місяць тому
it is not in the public release for outputs yet but is coming. Unfortunately Google has ask me not to show it in video currently thats why I used their examples.
@pandoraeeris7860 Місяць тому ⁺¹
OAI is choking today.
@sskhdsk Місяць тому
To be fair, OPENAI has released their native multi-modal 4o for a couple of months, but Gemini has the audio ability.
@samwitteveenai Місяць тому
and live video as well.
@tomdy69 Місяць тому
Gemini 1.5 was fully multimodal back in January. OpenAI 4o still can't really do video today.
@sskhdsk Місяць тому
@@tomdy69 I think we are talking about output space, in that case, Gemini just recognize multimodal things, but can't generate.
@TheMadManTV Місяць тому
Unlock a new era of agentic
@simongentry Місяць тому
Sam - has the been a price set for the API?
@samwitteveenai Місяць тому ⁺¹
no it will probably early next year before it goes GA with pricing
@simongentry Місяць тому
@@samwitteveenai thank you!
@simonl1860 Місяць тому
Tried to rebuild the scernaio with the car within Gemini chat as well as AI Studio, both with 2.0 Flash Experimental. I was not able to recreate a similar working version. In most cases it run over 30sec without a image response at all. Any ideas?
@samwitteveenai Місяць тому
The image editing requires image + text output and is still in private preview. If you are around for the next meetup happy to show you some demos and hopefully it should be out of preview early in the year
@yellowboat8773 Місяць тому
You are in Thailand? Nice
@samwitteveenai Місяць тому
alas I don't live in Thailand hence why my accent needs some work lol.
@apdurden Місяць тому
As of 12/11 I'm not able to get Gemini in AI Studio to respond back in any other language than English. Wonder if other languages are coming in a future update
@nufh Місяць тому ⁺¹
This is Flash? You mean the light version?
@samwitteveenai Місяць тому ⁺³
yes that is right
@RodyDavis Місяць тому
Thanks for the review! ✨🎉
@jjolla6391 Місяць тому
combining images and text should be done by a smart agent .. using separate (small) LLMs for doing what they are best at. The thought of a fat LLM that can do everything feels like a waste of energy (as if existing LLMs already didn't eat enough electricity).
@pandoraeeris7860 Місяць тому
Flash.
Whoa-oh.
Flash.
@procrastinatingrn3936 Місяць тому
Interesting if you can make this is in a math tutor
@samwitteveenai Місяць тому
My guess is it would depend a lot on what level of math you are doing and also how you are going to display it (e.g., all voice or a combination of voice and visual elements). You could imagine making something really cool for school students and young kids. I guess that a lot of the big companies are going to do that pretty soon.
@patientsrights Місяць тому
So when they showed how you could prompt: Say this in a whisper:
You're actually hearing it ... right now.
And it read that sentence in an amazing whisper. And other similar demos of how to *emphasize* a word, etc. Is any of that possible with Gemini Flash 2.0 or some other Google model today? Or is that still in the coming soon part?
@Mr_Magoo_ Місяць тому ⁺²
Another extremely censored Google creation.
@cosmicrdt Місяць тому
The voices sound like the ones used in noteboklm
@samwitteveenai Місяць тому
Yes, especially the fact that NotebookLM can have voices that talk across each other would show that they're coming out of a singular model, not just a TTS system. That said, Google has quite a lot of options to choose from for TTS as well, if you look at the Soundstorm paper and examples. google-research.github.io/seanet/soundstorm/examples/
@DigestdeIA Місяць тому
Any news on When the general public cnn use it?
@bonadio60 Місяць тому ⁺²
The code generated in the video uses GPT-3.5-turbo, poor google ;)
@SwapperTheFirst Місяць тому
What Sam asked it do, obviously.
@EricBLivingston Місяць тому ⁺¹
Is Google still manipulating prompts? If so, I have no interested in a political agenda machine.
@samwitteveenai Місяць тому
I think a lot of lessons have been learnt by the original Image problem.
@RhumpleOriginal Місяць тому
It's meh at best. Just tried it and it kinda understands things but the moment it hits a filter it shuts down completely. Maybe with 80 percent of tasks this is ok but it is about 20 percent towards completion
@moussatouhami7567 Місяць тому
first!!
@dinoscheidt Місяць тому
10:07 I speak German, English, French and Italian and thought: nice one 💅… then he busts out Thai 🧎🏻‍➡️…. 🏳️
(Thank you Sam for the great demo - love your engineering pov a lot)
@SravanKing-x1q Місяць тому
How to use an image model

Наступне

Автоматичне відтворення

Gemini 2.0 - How to use the Live Bidirectional API