New Microsoft Vision Model has AMAZING TRICKS!!!

Поділитися
Вставка
  • Опубліковано 18 чер 2024
  • Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks. Florence-2 can interpret simple text prompts to perform tasks like captioning, object detection, and segmentation. It leverages our FLD-5B dataset, containing 5.4 billion annotations across 126 million images, to master multi-task learning. The model's sequence-to-sequence architecture enables it to excel in both zero-shot and fine-tuned settings, proving to be a competitive vision foundation model.
    🔗 Links 🔗
    Live demo - huggingface.co/spaces/gokaygo...
    Download the model here - huggingface.co/microsoft/Flor...
    Google Colab Notebook
    colab.research.google.com/#fi...
    ❤️ If you want to support the channel ❤️
    Support here:
    Patreon - / 1littlecoder
    Ko-Fi - ko-fi.com/1littlecoder
    🧭 Follow me on 🧭
    Twitter - / 1littlecoder
    Linkedin - / amrrs
  • Наука та технологія

КОМЕНТАРІ • 18

  • @Mephmt
    @Mephmt 12 днів тому +2

    Usecase idea: Put a wifi camera inside your refridgerator and cupboards. Write a program that, when run, takes a picture of the food that you have. Then it identifies products from pictures taken by the camera(s). It records the products it's found then saves them using RAG to build up a profile of your staple foods and other things you buy regularly. Then it could automatically order anything that's missing or simply put the name of the item into a shopping list for you. (This is just high level, the actual implementation might take a lot of hardware)

  • @lokeshsimha2
    @lokeshsimha2 14 днів тому +11

    Should I comment I'm the first, because no comments and no views untill now 😢😢 please support this guy friends he is really good

  • @darkreader01
    @darkreader01 14 днів тому +2

    I used the Zoom participants screenshot to get all the attendees' names, but the model failed to do so. Most of the models fail to do this accurately; even GPT-4 failed. I am not sure about GPT-4o as I haven't tested this with GPT-4o yet.
    Only Claude-3 passed this test. Even their smallest model, Calude-3 sonnet, passed this test every time I ran it. Reka AI did somewhat well, but it could not pass every time; sometimes, the names were not correctly spelled.
    And all other models, open source and closed source, could not pass this test.
    Note: I needed this list for some online classes, so I did this a few months ago and discovered that only the Claude-3 can pass this. After that, whenever I see a vision model, I test this test to see if they can pass it. 😄

  • @unclecode
    @unclecode 14 днів тому +2

    Beautiful! This shows how real "small" models (models less than 1B parameters), using what we've learned over the last 4-6 years, can now achieve amazing results. It's a well-written paper, and if I'm not wrong, it's MIT licensed. Thanks for sharing this, and get better soon :)

    • @1littlecoder
      @1littlecoder  14 днів тому +1

      Thank you. I'm surprised you guessed I'm not well :) I was quite unwell thanks!!

    • @unclecode
      @unclecode 14 днів тому

      @@1littlecoder One of my breakfast time tasks is checking if you've uploaded any new videos, so I hear your voice almost daily. You have a quite distinct voice, and I'm a bit OCD, like a model overfitting to data, it's not hard to detect that (my ML answer 😅). Anyway, I love your passion for making content even when you're unwell. I think this is a kind of meditation for you, not a task you're forced to do. Keep it up and drink lots of water!

  • @algoritm3034
    @algoritm3034 13 днів тому +1

    Please do more article analyses.

  • @testales
    @testales 13 днів тому +2

    There is already a ComfyUI node as it seems, so since I'm a lazy guy, I'll go for this. ;-)

  • @ojikutu
    @ojikutu 14 днів тому

    Thank you. I'll try to run this model locally. Its impressive for the size.

  • @henkhbit5748
    @henkhbit5748 14 днів тому +1

    Thanks, will try this model for ocr

  • @figs3284
    @figs3284 14 днів тому

    This model is really good for its size

  • @gani3326
    @gani3326 14 днів тому

    I listen to him regularly from the EU...

  • @Macorelppa
    @Macorelppa 14 днів тому +1

    I experience AGI in my nightmares every day. 🤖💭