100% Local Tiny AI Vision Language Model (1.6B) - Very Impressive!!

Поділитися
Вставка
  • Опубліковано 10 чер 2024
  • Local Tiny AI Vision Language Model (1.6B) - Very Impressive!!
    To try everything Brilliant has to offer-free-for a full 30 days, visit: brilliant.org/AllAboutAI.
    The first 200 of you will get 20% off Brilliant’s annual premium subscription.
    👊 Become a member and get access to GitHub:
    / allaboutai
    🤖 AI Engineer Course:
    scrimba.com/?ref=allabtai
    Get a FREE 45+ ChatGPT Prompts PDF here:
    📧 Join the newsletter:
    www.allabtai.com/newsletter/
    🌐 My website:
    www.allabtai.com
    Moondream GH:
    github.com/vikhyat/moondream
    This video was sponsored by Brilliant
    00:00 Local Vision Model Intro
    00:28 Flowchart
    01:27 Brilliant.org
    02:36 Text Tests
    05:05 Video Tests
    10:15 Speech to Speech Tests
  • Наука та технологія

КОМЕНТАРІ • 141

  • @AllAboutAI
    @AllAboutAI  4 місяці тому +2

    To try everything Brilliant has to offer-free-for a full 30 days, visit: brilliant.org/AllAboutAI. The first 200 of you will get 20% off Brilliant’s annual premium subscription.

    • @peter486
      @peter486 4 місяці тому

      is there no delay? or did you get the video?

    • @technolus5742
      @technolus5742 4 місяці тому

      The energy in the Brilliant ad lmao!
      I loved it. The change of tone really makes a distinction between the actual AllAboutAI content and the paid promo.
      Well done.

    • @bbrother92
      @bbrother92 2 місяці тому

      Please tell me how you how you connected PNG viewer to AI program?

  • @ItCanAlwaysGetWorse
    @ItCanAlwaysGetWorse 4 місяці тому +7

    @All About AI: Great video. Is the description of the hardware requirements of running the model locally somewhere ? Thanks!

  • @marcinkrupinski
    @marcinkrupinski 4 місяці тому

    Really good vid. Thank you!

  • @MacProUser99876
    @MacProUser99876 4 місяці тому

    Lovely video, mate!

  • @test12382
    @test12382 4 місяці тому +1

    Excellent use cases sir. Can you please teach us hardware requirements and tutorial for setting it up?

  • @3xOGsavage
    @3xOGsavage 4 місяці тому +3

    lol you just made a jarvis. idk how much you grinded but its absolutely worth it. subbed to you for some such content

  • @NightSpyderTech
    @NightSpyderTech 4 місяці тому +16

    Loving all the speech integration, but I was strugging to get it setup from the initial video. Would be great to see a more detailed video on getting it setup since it's a dependecy of the following few videos.

    • @ClipsofCoolStuff
      @ClipsofCoolStuff 4 місяці тому

      did you use ElevenLabs for the response voice or was it a local clone?

    • @AllAboutAI
      @AllAboutAI  4 місяці тому +1

      its the local OpenVoice one is this vid

    • @AllAboutAI
      @AllAboutAI  4 місяці тому +1

      noted! thnx for tuning in :)

    • @ccuny1
      @ccuny1 4 місяці тому +2

      @AllAboutAI Thank you. I would also love to see a fully integrated video for the whole workflow: this is so useful.

    • @duracell80
      @duracell80 4 місяці тому

      For Python I highly recommend virtual environments maybe could do something with WSL on Windows. Very easy on Linux to script services and such to run in venv watch folders for changes and add files to a queue. I suspect that's where Microsoft is going for file metadata and enhancing search with this.

  • @befikerbiresaw9788
    @befikerbiresaw9788 Місяць тому

    I love the Ad's theme music. Killer

  • @actorjohanmatsfredkarlsson2293
    @actorjohanmatsfredkarlsson2293 4 місяці тому +2

    I'm a bit confused the video have the names in the frame. Is it these names that gets picked up or would it work without the text tags. I wish you done a video without name labels.

  • @pablochacon7641
    @pablochacon7641 4 місяці тому +4

    Damn, this gotta be the best Brilliant ad i have ever seen my life

  • @mehmetbakideniz
    @mehmetbakideniz Місяць тому

    hi. I understand that you have created a repo for this video. Can you share the link for the repo to follow along the codes that you show in the video?

  • @pierruno
    @pierruno 4 місяці тому +2

    How can I test this locally?

  • @mickelodiansurname9578
    @mickelodiansurname9578 4 місяці тому +2

    concrete use case example... thanks

  • @johnflux1
    @johnflux1 3 місяці тому

    This is really awesome!
    In your code prompt, you spell "description" wrong - with a B instead of P.

  • @hikaroto2791
    @hikaroto2791 4 місяці тому +1

    Do you offer the scripts that you created to run this? Are they after payment? How much is it? I have entered your blog but the discord is not accessible and the web page has only general info but no where to "support" you and get access to the scripts..
    EDİT: Just support him by becoming a member inside youtube. The rest will be there

  • @kamilwysocki8850
    @kamilwysocki8850 3 місяці тому

    Hi man, observing you since 2023 and hope you re doing well. I've got a question. Could you please create guide how to by using Node.js (NestJS), PostgreSQL (Prisma), Pinecone (vector base ) and API OpenAI... make an AI with memory. Literally bought a course in which someone explains it, but in the course it s been stated "this is so simple to do that you don't need to know programming", eventually I ended up with course where of course everything has been showed however without explanation how to config and install those: Node.js (NestJS), PostgreSQL (Prisma), Pinecone (vector base). Literally you are creating so simple guides that non native such as me can understand everything. I ask for a lot, i know! Anyway, kind regards and thank you for everything.

  • @themessengerofgood
    @themessengerofgood 11 годин тому

    Awesome work! How do you think, is it possible to photo culling? I have a friend whos work as a photographer and he make around 2000 photo with 30-50 products per day, for marketplace sellers. After each shooting day, he has to spend around 3-4 hours for photo culling(sorting photos by product type). I really want to help my friend, and looking for quick Ai model that can analyse photo and sort these photos automatically for him

  • @duracell80
    @duracell80 4 місяці тому +1

    Very cool, similar level to LLaVA, can't wait to see this in Ollama. Any recommendations for CUDA? I tried with a 2GB VRAM/8GB RAM in Linux and it was a no go. Seemed to segfault as soon as it hit 7.7GB.
    Clock speed and CPU and minimum 16GB fast RAM would be the considerations for consumer usage but if something can run on an 8GB laptop that's gonna bring in a lot more users.

  • @gabscar1
    @gabscar1 4 місяці тому +1

    Very cool!

  • @thewatersavior
    @thewatersavior 4 місяці тому +2

    can you drop a colab in the description - also, great example, love to see this go viral

  • @nhtna4706
    @nhtna4706 4 місяці тому +7

    when u say local, with 1.6b parameters, what would be the size that you need on your local laptop, along with the memory/gpu etc?

    • @wurstelei1356
      @wurstelei1356 4 місяці тому +4

      For Mistral 7b it is 7*4=28 GB on 32 bit and 14 GB on half precision. For Moondream it is 1.6*4=6.4 GB and 3.2 GB on half precision. Add this together and you have the mem requirements. You could also split it up, lets say run Moondream on your GPU and Mistral on your CPU. Or you could shrink them down to 4 bit or even lower. But the models will perform worse, the lower you go.

    • @nhtna4706
      @nhtna4706 4 місяці тому +1

      @@wurstelei1356 cool, am assuming u r talking about the memory, correct? Is there any sizing doc link that talks about the cpu , gpu power, processor speed etc, along with size of SSD etc? For these models to run locally for pre training purposes ?

    • @AllAboutAI
      @AllAboutAI  4 місяці тому

      thnx :)
      @wurstelei1356

    • @jawadmansoor6064
      @jawadmansoor6064 4 місяці тому +1

      majority models that i see now a days are 16 bit float, so they are around 2.5GB to 3GB with 1.3 to 1.5 billion parameters, so it would not be very different from them. also, you can try looking it up (if open sourced) on hugging face.

    • @brianlink391
      @brianlink391 4 місяці тому +1

      @nhtna4706, you seem a bit old school. I would suggest asking these questions to ChatGPT. it can even give you step-by-step instructions, and if you have the Plus version, as you should, it'll research the internet to get the most recent information and walk you through the entire process step by step.

  • @scottt1234
    @scottt1234 2 місяці тому

    Are you able to release this on your Github?

  • @id104335409
    @id104335409 3 місяці тому +1

    Please, can you tell me if there is an AI computer vision/recognition software that can search through my images folder and find images? Example: Search for cat images - 56 images containing cat. Search dog: 47 images containing a dog. Like Google Image Search for my local folder?
    I CANNOT FIND A WORKING SOFTWARE. Everything is "train your model"

  • @theoriginalrecycler
    @theoriginalrecycler 4 місяці тому

    Excellent

  • @borisrusev9474
    @borisrusev9474 4 місяці тому +3

    Can you make a comparison video between this model and LLaVA?

    • @DevPythonUnity
      @DevPythonUnity 4 місяці тому +1

      this is is much better then LLaVA

    • @duracell80
      @duracell80 4 місяці тому

      I'd switch to this in Ollama when it becomes available, much more seamless dev wise to call a one liner. I would hope this inference is faster on CPU compared to LLaVA?

  • @MattJonesYT
    @MattJonesYT 4 місяці тому +6

    One of the major untapped uses of gpt4 vision is using it for OCR. It does far better than tesseract which always outputs very dirty results that have to be cleaned up. You can say "Write all of the text in this image. Perfectly preserve all of the formatting including bolding, italics and lists" and gpt 4 vision does as good a job as a human. This is very useful when dealing with books that have strange layouts of the text, gpt 4 vision can figure out how to correctly convert strange text layouts which tesseract always fails on. I would really like to see how these new vision models can be used for OCR.

    • @mbrochh82
      @mbrochh82 4 місяці тому +4

      Yea but the MAIN usecase for OCR would be scanning handritten journals, and nobody wants to send their intimate thoughts to OpenAI... I'm so eagerly waiting for a GPT4-level open source LLM that we can run locally that can finally read my shitty handwriting...

    • @fontende
      @fontende 4 місяці тому +1

      it's not, several lawyers and journalists already ruined all their careers by blindly relying on LLMs text outcomes. I think you haven't seen "Ai explained" tests, even GPT4 Vision hallucination and errors so high that it just refuses to view one text number in the stat table.
      There's even research paper that in business using LLMs are not practical, you must hire a human editor to check everything (references), so basically you are spending more time than just writing text yourself from the start.

    • @fontende
      @fontende 4 місяці тому

      And that's not taking the "trickery" topic by models, which was inserted there by corporations so no other New York Times will find out directly requesting model about dataset source. (if you instruct Ai to tell half truth to protect yourself from court - it will do that for all results) There's no clean legally dataset models anywhere, there was no audits anywhere (which people made with open encryption tools for example).

    • @MattJonesYT
      @MattJonesYT 4 місяці тому +2

      @@fontende Humans have an error rate too. If you have humans transcribe the text they will make mistakes. With AI it's very easy to have it do several attempts and iterations and see where it converges and that result will be much better than the first attempt you get from a human which will be thousands of times more expensive and take much longer.

    • @fontende
      @fontende 4 місяці тому

      @@MattJonesYT of course, that's how editor was created as profession, all writers for centuries still give first manuscript to editor before printing. Fact checking is a separate profession very important for newspapers.
      In case of ruined american lawyers careers with useless legal work made by ChatGPT there was needed 2 additional human personnel - an editor checking text structure, typos (it's important for official court documents) and a fact checker (ChatGPT made-up a dosen of nonsense court cases with references on them, you must manually see each, even if existed -read that, process, many work). I don't see practical use of any chatbots in any business if it's not selling chatbots, it's incredible leakers of data, robots is a different story.

  • @Graverman
    @Graverman 4 місяці тому +3

    scaling down deep learning is the way

  • @wurstelei1356
    @wurstelei1356 4 місяці тому

    I really want to give this video a second thumbs-up XD

  • @PrinceAnkrah
    @PrinceAnkrah 4 місяці тому +1

    Is open to the public or its private now

  • @aoeu256
    @aoeu256 3 місяці тому

    I wanna use the description thingy and labeler to help me learn chinese as a language immersion tool along with the DEEPL to search for videos/websites in chinese while typing in english, transparent windows with WINDOWS TOP + anki SRS, etc...

  • @chrisBruner
    @chrisBruner 4 місяці тому +3

    I've tried to replicate your system and am having problems. Could you maybe make a video on starting from scratch. I'm on Linux. Also lookat Ollama as a better open source version to run llm.

    • @AllAboutAI
      @AllAboutAI  4 місяці тому +1

      hey! yeah i do have some vids on my member section on this, might to a main channel vid someday too

  • @sh00ting5tar
    @sh00ting5tar 4 місяці тому

    The most important question: How did you get the Matrix code running on the TV?

  • @vaibhavmishra1100
    @vaibhavmishra1100 3 місяці тому

    Can it answer questions in the image??

  • @thewatersavior
    @thewatersavior 4 місяці тому

    hmm.. will it run on an esp32cam?

  • @ajayjasperj
    @ajayjasperj 4 місяці тому

    Can any one dm me the requirments of laptop to run mixtral 8x7b locally

  • @altered.thought
    @altered.thought 4 місяці тому +1

    Wow!!

  • @khaledalshammari857
    @khaledalshammari857 4 місяці тому

    loved your work! can you share the source code?

  • @carldraper616
    @carldraper616 4 місяці тому

    Do you have a github we can look at the code closer please?

    • @carldraper616
      @carldraper616 4 місяці тому

      I see the requirement for membership now, ill sign up :)

    • @scottt1234
      @scottt1234 2 місяці тому

      I just signed up. Where is the github repository?

  • @lokeshart3340
    @lokeshart3340 2 місяці тому

    Sir can we stream our webcam to it and say what's is in my hand..

  • @JOHN.Z999
    @JOHN.Z999 4 місяці тому +1

    Woooowww 🤯👏👏👏😁

  • @lancemarchetti8673
    @lancemarchetti8673 4 місяці тому +1

    Brilliant

  • @maxziebell4013
    @maxziebell4013 4 місяці тому +2

    Wow. Does this also work on Mac silicon?

    • @wurstelei1356
      @wurstelei1356 4 місяці тому +2

      New Macs are known to be pretty good at AI, thou I don't have one and GPU is still better, but more expensive. Mat Berman got some nice videos about his Mac and local AI.

    • @duracell80
      @duracell80 4 місяці тому

      Macs might run bigger models. These smaller ones bring in people with realistic lowest common denominator hardware. For example I have LLaVA running on a mini PC from 2018, inference is terribly slow but there are non interactive use cases. For Mac's you're gonna be able to do much more than these small models.

  • @picklenickil
    @picklenickil 4 місяці тому

    Er du norsk?

  • @ChrisM-tn3hx
    @ChrisM-tn3hx 4 місяці тому

    Soon, Captcha will have to start asking questions that only humans would get WRONG...

  • @frankdearr2772
    @frankdearr2772 4 місяці тому

    👍

  • @FloodGold
    @FloodGold 4 місяці тому

    "yeah" 31 times so yeah, haha
    My son says yeah a lot as well so yeah!

  • @cutterboard4144
    @cutterboard4144 3 місяці тому

    8:20 Cant you debug the AI to find out if Bradley Cooper really was identified as "Casanova", and why?
    Rhetoric question, just to state the obvious.

  • @Piotr_Sikora
    @Piotr_Sikora 4 місяці тому

    Will be nice to know how to fine-tune model in other language

  • @brianlink391
    @brianlink391 4 місяці тому +1

    As you're testing these models and creating these models, little do we know that we are models ourselves being tested and created.

  • @hgeldenhuys
    @hgeldenhuys 4 місяці тому

    You are Ironman

  • @SeattleShelby
    @SeattleShelby 15 днів тому

    Everyone knows a 10 pound cat would whoop a 25 pound dog. That’s just common sense.

  • @user-ik3jh7kr5n
    @user-ik3jh7kr5n 4 місяці тому

    Cats are actually prey to dogs; there's not much fighting happening here.

  • @DevPythonUnity
    @DevPythonUnity 4 місяці тому

    the vision model is quite good, but it has problems with describing porn pictures,

  • @aiglobalX
    @aiglobalX 4 місяці тому

    Excuse me sir, you have a well maintained channel, why did you steal my channel logo???? its a shame...

    • @gaijinshacho
      @gaijinshacho 4 місяці тому +2

      Your logo looks nothing like this channel logo. Are you feeling OK?

    • @elijahpavich1095
      @elijahpavich1095 4 місяці тому +1

      They're literally identical? I think you need to see the doctor 😂​@@gaijinshacho

    • @AllAboutAI
      @AllAboutAI  4 місяці тому

      what haha

  • @sherpya
    @sherpya 4 місяці тому +1

    another non open source model 😢

    • @matten_zero
      @matten_zero 4 місяці тому

      Is it possible they forgot to add a license? I couldn't find mention in README or other files

    • @sherpya
      @sherpya 4 місяці тому

      @@matten_zero it's based on Microsoft phi that has a restrictive license, the problem is you can't use it commercially even if willing to pay

    • @Greenthum6
      @Greenthum6 4 місяці тому +1

      I guess it is open source, but not for commercial use

    • @sherpya
      @sherpya 4 місяці тому +1

      @@Greenthum6 so it's not open source, it's just source available

    • @sherpya
      @sherpya 4 місяці тому +1

      @@Greenthum6 the problem is these local models are not intended for end users, but instead for developers that create chatbots or apps for end users. Since the inference is costly, a developer cannot make chatbots or apps for free or anyway without commercial usage, like running ads. So unfortunately these models are only for researchers, learning or educational videos. Nothing wrong with this and even this one is interesting and unique, but still talking about the base model phi from Microsoft, we really need a bunch of non open source and even not commercial available models? I also often see they are proposed as they were open source (not obviously referring at this video)

  • @microfx
    @microfx 4 місяці тому +29

    so you're running it on a "server"... why don't you start with telling us the required hardware minimum? Not watching any further tbh.

    • @fontende
      @fontende 4 місяці тому +2

      I think it's impossible to run such concert locally for now, only speech recognition will make a serious delay. And I'm on the market for new GPU, max you can get is 4080 with 16Gb memory for 1k and it's not enough for serious Llms.

    • @microfx
      @microfx 4 місяці тому +7

      yeah, whatever. I expect a video to have this information right at the beginning. or in the description. @@fontende

    • @NimVim
      @NimVim 4 місяці тому +3

      @@fontende ?? The 4090 and 3090 both have 24 gigs. Lol

    • @fontende
      @fontende 4 місяці тому

      @@NimVim and what? Be glad if you can buy such, 3090 is 3 year old toasted on most mining boom-good luck with that. LaL 🤏

    • @LeftBoot
      @LeftBoot 4 місяці тому

      🧠 Think 👨🏼‍💻 Shadow PC ⚡ in 🤩 VR 😉 ❇️

  • @olagunjujeremiah4003
    @olagunjujeremiah4003 4 місяці тому

    Why did you skip Ayo Edebiri… because she is black?

  • @inLofiLife
    @inLofiLife 4 місяці тому

    i'd love to see this working with camera and recognizing ppl (memGPT?)

  • @geomorillo
    @geomorillo 4 місяці тому

    wtf its pretty quick amazing