Gemini 1.5 Pro for Video Analysis

Sam Witteveen

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 13 січ 2025

КОМЕНТАРІ • 99

@pratikindap 11 місяців тому ⁺³⁸
Gemini 1.5 seems like a truly gigantic leap in LLMs.. probably the first time I've been wowed since the release of gpt-4
@Charles-Darwin 10 місяців тому ⁺¹
on arxiv, this paper (posted about the same day as gemini update and OAI's sora) might be one / the reason they moved when they did - this paper follows 2 others from these students last fall... just odd how major it is and the timelines here. It's limited to more compute, but the accuracy with such a huge context is what's astounding. [ since links wont post here, just search "world model on million length video ring attention"] - 'Ring Attention' might be the RA in SORA
@123arskas 11 місяців тому ⁺⁶
This was awesome. Thank you Sir Sam.
@miladkhademinori2709 10 місяців тому ⁺⁴
Good job ❤ very exciting progress!
@micbab-vg2mu 11 місяців тому ⁺⁵
Sam - Great video! More Google content, please. New features made Gemini useful in my workflows.
@samwitteveenai 11 місяців тому ⁺¹
I will probably make a few more about this and possibly some new stuff from Google
@micbab-vg2mu 11 місяців тому
Excellent - I work for a pharmaceutical company, and a longer context window + high data retrieval accuracy is what I need. I wonder what is your opinion about new RAG systems based on new models@@samwitteveenai
@samwitteveenai 11 місяців тому ⁺¹
I still think RAG is relevant for now for most uses, but this majorly unlocks things that couldn't be done before without many calls to the LLM and things like MapReduce etc. I think models like this one unlock a lot for agents, which I would like to show at some point. I can totally see how this kind of model can help serious work like you are doing, much more than just chat with a bot stuff etc.
@micbab-vg2mu 11 місяців тому
Sure, I'll still be using RAGs. Having a longer context window gives me the chance to be more strategic and flexible with how I organize content chunks. This means I won't just be breaking things up every 1000 tokens without any thought. :)@@samwitteveenai
@hiramcoriarodriguez1252 11 місяців тому ⁺⁹
I hope they give access to this model soon :/
@junanraihan8130 10 місяців тому ⁺²
when will it be released for public?
@darshank8748 11 місяців тому ⁺²
Great video and model ( not seen yet)
@mandlasibanda3128 11 місяців тому ⁺³
Please anything on Data Analysis tasks ? (CSV, XLS ….)
@samwitteveenai 11 місяців тому ⁺²
I am planning to do one on code so let me try and put it in there
@randomgc511 11 місяців тому ⁺³
very interesting and informative! I am wondering how this would work for literature review type workflows. Say you choose a technical topic like text to video, and you upload 5-10 key relevant papers (like the ones hf summarized after the SORA release), how would will the model perform in synthesize the papers? An even crazier task is to add another literature review paper as an example, so it will be like 5-10 papers with a 1-shot prompt. If the model can kind of reason through this, the implications would be huge
@samwitteveenai 11 місяців тому ⁺¹
I have played with it with single papers and found it to be very interesting. I might give it a shot with a group of connected papers, interesting idea.
@reza2kn 10 місяців тому ⁺¹
Awesome video! thanks for being the hero we needed! Keep going forward and enjoy Singapore!
I wonder if it could be good at coding / making coding agents for people who don't know code at all.
@samwitteveenai 10 місяців тому ⁺²
I think that you really need to understand code to know when the LLMs are doing simple dumb things, but that level of understanding doesn't need to be super deep. Learning the basics of coding are still a very good skill to have also to improve thinking about these things.
@3dus 10 місяців тому ⁺¹
Could you try a narrative video? This would be really useful to understand the model's capacity to understand semantics of juxtaposed images.
@3dus 10 місяців тому
I’m a film editor. I think you could choose a classic narrative short video from UA-cam (cleared rights) and try different levels of questioning (high level narrative comprehension, emotion, character’s emotional arc etc)
@parthcosic 11 місяців тому ⁺¹
why blur release date of the video? it was the 16th of feb if you're wondering
@samwitteveenai 10 місяців тому
Certainly not intentional, my guess is the editor was blurring my email and that blur stayed on the screen.
@JacobAsmuth-jw8uc 10 місяців тому ⁺¹
Can you do a video with audio summarization? Feed it a large audio file and ask for a per-timestamp summary?
@samwitteveenai 10 місяців тому ⁺¹
The current release doesn't support audio yet, but you can do the timestamp summaries based on a whisper transcript etc.
@djtall3090 10 місяців тому
@@samwitteveenaiwhich ai is best at converting audio? can it be implemented or merged? The two to together would be amazing. Thanks
@JJBoi8708 10 місяців тому ⁺¹
Can you please do a video on Gemini 1.5 pro reading an entire college level science textbook?? That would be so awesome!
@nowshinnur 10 місяців тому ⁺³
i applied for beta
how long till i get access
@HaseebHeaven 10 місяців тому
i got access in 1 week i am just a normal developer. Maybe scientist get access really fast.
@antdgar 11 місяців тому ⁺¹
Informative vid. Thanks
@delxinogaming6046 10 місяців тому ⁺¹
Legal doc review basically fully automated at this point.
@jenishpatelbmc 11 місяців тому ⁺⁵
can you upload a storybook or novel and and ask about characterization of some new book that may have just released? so excited about this. can't wait to try it out
@samwitteveenai 11 місяців тому
happy to try, but need to find something that is a new book. Any suggestions?
@larryvelezbx 8 місяців тому
Any sense of whether it could understand a video with no captions or words spoken in the video. Like maybe 30 seconds of a stream in a snowstorm?
@ScottzPlaylists 10 місяців тому ⁺²
Can you share your Gemini Chat like ChatGPT allows you to ❓
What were the costs to process all that video multiple times ❓
Keep up the good work 👍
@samwitteveenai 10 місяців тому
This is in Google AI Studio not the consumer interface. in Gemini.google.com you can do all those. The 1.5 models will come there over time
@randotkatsenko5157 11 місяців тому ⁺²
I am making an automated video editor using gpt vision and another speech to text api. It does work this way, but I would like to see what Gemini can do!
Can you please test if Gemini 1.5 can act as a Professional Video Editor and output timestamps where to place zoom in/out effects, emojis or sound effects?
@Emerson1 11 місяців тому ⁺³
nice! could you try something harder, say: show a security camera video of a breakin and ask it describe what happens in the video(don't mention breakin).
@samwitteveenai 10 місяців тому
Got a link to footage like that?
@Ken129100 10 місяців тому
Hi Sam, do you have to upload the video manually every time?
@samwitteveenai 10 місяців тому
no once you upload it once you can query it many times for that session. It may lose it session to session though. I think they are looking at how to best handle this for the UI and API going forward.
@abhishekak9619 10 місяців тому
how long of a response can you get out of it. could it describe a full video like a normal human does, if yes how long of a video. will it ever be able to work with audio and video at once?
@TzeroOne 10 місяців тому
Thanks for this video
Why can't I access Gemini 1.5, although I'm using Gemini advanced. Is it not released publically or for some countries?
@samwitteveenai 10 місяців тому
It is not in Gemini Advanced yet, not sure if it will come to that or when.
@jmg9509 6 днів тому
You could've as Gemini to return the timestamp of its responses, so that you could then verify if it was said around the timestamp it returned. That why you'd actually have a higher likelihood of seeing if it was really said in the video.
@samwitteveenai 6 днів тому
This video was made almost a year ago. Back then, the timestamps weren't working as well on that version of Gemini. If you look recently, I made a video for Gemini Flash 2.0 Video Analyzer where I did exactly what you talked about getting timestamps back in it.
@jmg9509 6 днів тому
@@samwitteveenai Ahh okay nice!
@jewlouds 10 місяців тому
When will the waitlist be approved 😢😢😢
@samwitteveenai 10 місяців тому
I think they have started to approve some people for the waitlist as of yesterday
@Rostam2 10 місяців тому
Sam, how can I get Gemini to learn game strategy from video + sound in a tennis game?
@samwitteveenai 10 місяців тому
currently the audio version is not out but hopefully soon.
@Rostam2 10 місяців тому
@@samwitteveenai how can I best dm you for advice & help on my project
@Christian-go1oz 6 місяців тому
Not taking audio in is bizarre. any ideas why not?
@samwitteveenai 6 місяців тому ⁺¹
this video is quite old now. Current version should be able to handle Audio now
@carlocarnevali7790 10 місяців тому
Isn't there a way to play with it on their Vertex AI platform?
@samwitteveenai 10 місяців тому
not yet but I think it is coming.
@choiswimmer 11 місяців тому
Sam is on a like train right now
@darshank8748 11 місяців тому ⁺²
Oriol said that they are working on improving the speed
@samwitteveenai 11 місяців тому ⁺¹
I think Oriol and his team will improve a number of things about this. Don't forget this one is just the Pro.
@akash-fu6ts 10 місяців тому
In the future, will the Gemini 1.5 model with 1 million tokens be available for free?
@bassiroucisse2974 10 місяців тому
The applications for the general public are huge. One that come to my mind is police officers having to analyze hours of CCTV data.
@samwitteveenai 10 місяців тому ⁺²
Yes there are lots of security applications that I suspect Google isn't too keen on talking about.
@hashiromer7668 10 місяців тому
Hi, what's the pricing of this API?
@samwitteveenai 10 місяців тому
not publicly announced yet sorry.
@hashiromer7668 10 місяців тому
@@samwitteveenai
So what are you paying for when you use the model? Is it free at the moment?
@Synky 10 місяців тому
@@hashiromer7668I'd also like to know what the price was that you paid for in this demo or is that classified?
@VaibhavShewale 10 місяців тому ⁺¹
ooh man things would be so much different
@hqcart1 11 місяців тому
is there api for grmini1.5?
@jewlouds 10 місяців тому
It looks like once you are able to access 1.5 in ai studio you can also query the API as such
@mshonle 11 місяців тому
Man, can you imagine how long beam search would take with this?
@samwitteveenai 10 місяців тому ⁺¹
First reaction - yeah agree. 2nd - maybe not as long you think, depends on how they are doing the attention etc.
@belalsamyfarag 11 місяців тому
video icon is demed in my google ai studio
@samwitteveenai 11 місяців тому
Yeah it currently only works on the 1.5 models
@nanow1990 11 місяців тому ⁺²
They still didn't gave any access to 99% of people, feels like they gave it only to paid promoters 😔
@PianothShaveck 11 місяців тому ⁺⁸
Gemini models in my experience tend to hallucinate info not contained in the prompt much more often than other models; I'd do more tests about this. In most of the tests you're showing, it's very hard to tell if Gemini is actually retrieving correct info or it's just guessing. Since LLMs are very good at lying, one has to be very careful about what you ask a model: I'd never trust any of the results you showed in this video, because there's too high risk of the model making up stuff
@lilgarbagedisposal9141 11 місяців тому ⁺³
Gemini 1.5 Pro has nearly perfect recall.
@dusanbosnjakovic6588 11 місяців тому ⁺²
Great video. However, the video that was uploaded was probably not the one that would demonstrate its potential the best. Frame by frame analysis using conventional entity extraction could have yielded similar results. The context is written in text on the slides. Using things like sports analytics where might have been a bigger stretch where motion is tested.
@samwitteveenai 11 місяців тому ⁺²
I agree the sports or some form of action would show a different kind of analysis for action identification and understanding motion etc. I still think the idea of it making a set of notes from a hours worth of slides to be a pretty impressive feat. Will look into some ideas for the sports/action though too.
@dusanbosnjakovic6588 10 місяців тому
It would be interesting to see if it can figure out cause and effect. Or object permanence.
@Adhil_parammel 11 місяців тому
Let it read zig documentation and ask ds and algo qustions to it.
If it can learn language from grammar book it should solve ds algo leet code problem in zig.
@LukasSmith827 11 місяців тому
yeah this will definitely change how videos are consumed and video essay's are planned out idk
@amandamate9117 11 місяців тому
give him some downloaded viral tiktok video that has some stoic narration and let him generate similar text with same style and length of text. like at least 3 different text.
@arsalanarsalan1098 10 місяців тому ⁺¹
upload coding tutorial video of you and ask it to extract code and explainit , that will be true test of intelligence
@helix8847 10 місяців тому
Already been done before on other videos. Do some searching and you will see.
@SW-fh7he 11 місяців тому ⁺¹
Upload a public domain novel and ask to write a new chapter / prologue or epilogue.
@dawid_dahl 11 місяців тому ⁺²
When people say that RAG is not dead, that’s like Bill Gates saying “640K ought to be enough for anybody” back in 1981.
Just because inference times over 1M tokens is 1 minute long now, why assume that from now on until the end of the Universe, inference times across 1M will remain constant at 1 minute? Since when did things ever stagnate like that in digital technology?
It’s kind of baffling to me.
@samwitteveenai 11 місяців тому ⁺⁶
For most big companies 10mil tokens is a drop in the ocean on the data that they need to RAG over. RAG will still be around for a the coming future for serious applications.
@dawid_dahl 11 місяців тому ⁺¹
@@samwitteveenai I guess let’s agree to disagree on that one then. 😄
Thanks for another really awesome demo video! I really appreciate it. 🙌🏻
@samwitteveenai 11 місяців тому ⁺³
Agree to Disagree but I really appreciate you chiming in. I don't want this place to just be people who agree. I do totally agree with you that pricing will go down and speed will get faster over time.
@dusanbosnjakovic6588 11 місяців тому ⁺⁵
I agree that the speeds will improve, but there will always be a need to increase speed and capability. Using these two approaches together will achieve that. Even though we have storage needs that are bigger than we thought and bigger storage than we ever imagined for cheap, we still compress, we still distribute storage.
@dawid_dahl 11 місяців тому
@@dusanbosnjakovic6588 Yeah, probably there will be some kind of semantic router in most AI apps judging which kind of retrieval will make the most sense for each particular query.
@archiewisozk 10 місяців тому
Promo-SM 💥
@Moyemor 11 місяців тому ⁺¹
First like
@Moyemor 11 місяців тому
I subscribed after gemini 1.5 pro video
@Moyemor 11 місяців тому
I have one doubt
What is the output token length of gemini 1.5 pro
@samwitteveenai 11 місяців тому
not yet public but I talk about it in this vid
@zyxwvutsrqponmlkh 10 місяців тому ⁺¹
Is 1.5 as oppressively woke as the public 1.0?

Наступне

Автоматичне відтворення

Introducing Gemma - 2B 7B 6Trillion Tokens