“LLAMA2 supercharged with vision & hearing?!” | Multimodal 101 tutorial

AI Jason

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 9 чер 2024
Explore Multimodal language model, like LLaVA, which enables you reach GPT4 level multimodal abilities, unlock use cases like chat with images
🔗 Links
- Follow me on twitter: / jasonzhou1993
- Join my AI email list: www.ai-jason.com/
- My discord: / discord
- LLaVA link: llava-vl.github.io/
⏱️ Timestamps
0:00 Intro
1:03 What is multimodal?
1:23 LLaVA model
2:08 Demo
3:35 Use case: Product development
5:17 Use case: Content curation
6:27 Use case: Medical
7:07 Use case: Captcha
8:09 Use case: Robots
👋🏻 About Me
My name is Jason Zhou, a product designer who shares interesting AI experiments & products. Email me if you need help building AI apps! ask@ai-jason.com
#gpt #autogpt #ai #artificialintelligence #tutorial #stepbystep #openai #llm #largelanguagemodels #largelanguagemodel #chatgpt #multimodality #gpt4 #multimodal #llama2 #llama #llava #machinelearning
Наука та технологія

КОМЕНТАРІ • 54

@ex3aliber 9 місяців тому ⁺²
Jason your videos are next level!! Loved the agent that you made for research. I made one similar using your video and ive been using it to research my work and its pretty awesome! Saved me tons of time already!!!
@danmustlearn 9 місяців тому ⁺⁴
Great videos dude! Love the content and how compressed the info is!
@hchentw9154 8 місяців тому
again, great video. Thank you Jason.
@NadaaTaiyab 5 місяців тому
This was a great video. Thank you.
@MWKING 9 місяців тому ⁺¹
Thank you big man for such amazing videos, Thank you !
@amandamate9117 9 місяців тому ⁺²
excellent content bro, keep up the good work
@alexgzlz 9 місяців тому ⁺²
One of the best channels right now
@nessandroduyan9667 9 місяців тому ⁺⁸
Thank you AI Jason for sharing valuable AI developments. Would love to see in the future how to train the model on our own photos. Nice..
@milkybekele3969 9 місяців тому ⁺¹
Yup, I really want to know how to train and fine tune it as well ...
@juancasas5532 9 місяців тому ⁺³
i need more content from this channel!
@SaminYasar_ 9 місяців тому ⁺³
Absolute banger dude your content is actually top tier
@jeremyl9515 9 місяців тому ⁺¹
I concur 🤖
@CryptoWizards 9 місяців тому ⁺²
BEST AI Channel. Thank you Jason.
@sanchaythalnerkar9736 9 місяців тому ⁺¹
Great video as always
@Jim-ey3ry 9 місяців тому ⁺³
woahh, this is prob the best multi modal model ive tried, definitely open up lots of imagination!
@samirzerrouki3153 9 місяців тому ⁺¹
I love it ! Thank you!!!!!!!!!!!!!!!
@OryginTech 9 місяців тому ⁺¹
Legend 🙌🙌 super helpful
@pratiksingh3432 9 місяців тому ⁺¹
This channel will be huge
@kevon217 9 місяців тому
Great demo!
@JavArButt 9 місяців тому
Nice introduction, thank you for your effort
@sureshranjula2757 9 місяців тому ⁺²
Good stuff brother.
@jeremyl9515 9 місяців тому ⁺¹
謝謝 AI Jason!
@preben01 9 місяців тому ⁺¹
best AI content on youtube. Learned so much from you. Is it plausible to run this on consumer grade gaming machine with for instance rtx4090 ? Will you do an install / setup video?
@ramanshariati5738 9 місяців тому ⁺¹
awesome
@henkhbit5748 8 місяців тому
Great video, the 13b multi model are doing amazing good. Love to see a video for the following use case: say I am a HR manager and have 2 job positions JOB-A and JOB-B. Can an LLM do the filtering of job resumes based on the requirements of JOB-A and JOB-B with few shot training or fine tuning. Its a prediction task alike sentiment analysis...
@christopheprotat 6 місяців тому
Like the example use cases. Indeed it seems LLAVA is not the good for rich text OCR. Definitely an area of improvement. Still promising anyway. I would love a second episode on fuyu 8b or a tutorial on how to further fine tune LLAVA for specific use case. Thanks a lot for sharing !
@vitalysumin 9 місяців тому ⁺⁷
you used non squre images with a crop option. so what it saw was cropped
@AIJasonZ 9 місяців тому ⁺¹
Ohhh good catch, I tried again and that definitely solved some of the issues!
@8eck 9 місяців тому ⁺¹
LLaVA was out there a long time already. Great that they are not dead and added support for LLama V2
@AIJasonZ 9 місяців тому ⁺¹
yea the result is much better after support LLama2!
@8eck 9 місяців тому
@@AIJasonZ i haven't tested yet. It is able to understand/describe images better?
@8eck 9 місяців тому
@@AIJasonZ It was able to describe something that LLama v1 couldn't?
@Camxlare 9 місяців тому
Ok this is crazy! So now you can added more context. It's like us using our 5 senses to interpret information. But this part here @3:42...if this becomes possible where it builds full stack apps easily. Say goodbye to Junior developers. At that point anyone can sketch an app with the entire workflow, show the image to the A.I. along with the description like "Build this app you see with react in the front end, node js/express for the backend, create the api's and connect them to the front end" GAME OVER!!!
@hope42 9 місяців тому ⁺¹
It's my understanding that Palm 2 is hooked to Bard. Gemini is the future. Google has to figure out how to mesh Gemini into Palm 2 and Palm 2 into Gemini. Gemini has all the new multimodal features that Palm 2 I assume will pick up if they can learn how to sync it.
@adamhughes9938 9 місяців тому
Do you think the choice of vector database matters for storing this multimodal data? For example, does Weaviate vs.Cchroma offer certain features that might make it optimal for these multimodal vectors?
@trejohnson7677 9 місяців тому
lol try it.
@SloanMosley 9 місяців тому
I wonder how these multi modal models will affect robotics and self driving
@quakingoatz2656 7 місяців тому
Is there any python api for this? I want to use it for image recognition
@tonymok7752 6 місяців тому
Anyone knows how to fine tune it for custom dataset?
@Jump-2-the-moon 9 місяців тому ⁺¹
I wonder why it failed the captcha? There’s already AI out there that can crack the captcha easily.
@trejohnson7677 9 місяців тому
fucker i still fail captchas
@BlackTakGolD 9 місяців тому ⁺¹
This is insane... Because it's just a first experimental version of only a mere 13b parameter size model... And it can identify a pretty convoluted colourless picture and make the story out of it... Not to mention correctly rate a picture on an arbitrary score and identify what app you're gunning for without telling it the kind of app... The future looks pretty scary...
@matthewmazurek 9 місяців тому
Not a dermatologist, but the foot looks like pityriasis rosea. 🤔
@AIJasonZ 9 місяців тому
Hahah this is above my level 😝
@RealLexable 9 місяців тому
I bet google have and use them all since years and has a profile from and about every android user in its database.
@user-dl3wo8mw3k 8 місяців тому
6:45 Well the completion doesnot mentions that it is no expert in medical domain and must seek some doctor
@jksoftware1 7 місяців тому
LOL.. In the Elon Musk photo he was smoking a blunt not a cigar. There is a big difference.
@user-ti7fg7gh7t 3 місяці тому
it's not a tutorial
@HarshVerma-xs6ux 9 місяців тому ⁺²
First
@EngkuFizz 9 місяців тому ⁺¹
Can you teach us how to use this model locally 🫠

Наступне

Автоматичне відтворення

"Next Level Prompts?" - 10 mins into advanced prompting