Florence-2: Fine-tune Microsoft’s Multimodal Model

Поділитися
Вставка
  • Опубліковано 1 жов 2024

КОМЕНТАРІ • 72

  • @TheVarun6
    @TheVarun6 2 місяці тому +1

    Hi, I'm looking to fine-tune Florence 2 for Segmentation task. Would appreciate your insights!

  • @SridharanS-vz7re
    @SridharanS-vz7re 3 місяці тому +5

    how to train this model on custom dataset for OCR

    • @Roboflow
      @Roboflow  2 місяці тому

      Florence-2 can do OCR out-of-the-box.

    • @hegalzhang1457
      @hegalzhang1457 2 місяці тому

      @@Roboflow Thanks, do you have this sample? I have a similar task (OCR with region), but I am not sure how to organize the OCR and region data. Do you have any examples code for this task?

  • @qqD-h2r
    @qqD-h2r 2 місяці тому +1

    hello bro!Thank you for your selfless sharing all along.When I was fine-tuning Florence-2, I encountered some issues, and now I would like to seek your advice.
    Resolving Accuracy Issues in Chinese Output for Florence-2 Fine-Tuned with LoRA:Using the llava-instruct-chinese dataset, the image encoder weights are frozen, and the language part of Florence-2 is fine-tuned using the LoRA method. While performing the "CAPTION" task, the model is capable of outputting in Chinese, but the accuracy of the answers is zero. How can this issue be resolved?

  • @meshkatuddinahammed
    @meshkatuddinahammed 25 днів тому +1

    But don't we have to change any layer to detect objects of our interest in the dataset? How does it do automatically?

    • @Roboflow
      @Roboflow  25 днів тому

      Nope. That’s why multimodal models are so powerful. They can solve different vision tasks without any architecture changes.

  • @konstantinpokalioukhine8879
    @konstantinpokalioukhine8879 20 днів тому +1

    Thanks for the tutorial - great video. I implemented this on a small dataset, about 100 observations. The fine-tuned model lost the ability to detect classes that didn't belong to the custom dataset. In fact, it detected the new objects pretty much every time on unseen data, even if it they were not present. Is fixing that simply a matter of a larger dataset? Are negative examples required?

    • @Roboflow
      @Roboflow  20 днів тому

      How many epochs have you run it for? What’s your learning rate?

    • @konstantinpokalioukhine8879
      @konstantinpokalioukhine8879 20 днів тому

      @@Roboflow thanks for the reply. I ran 10 epochs and the lr was 0.00002. The last epoch had a tr. loss of 1.47 and val. loss of 1.69.

  • @ahmeddiaamaroufi867
    @ahmeddiaamaroufi867 2 місяці тому

    Hey, I've gone through 10 different companies and I still love yours the most.
    I'm excited about your service and also run two UA-cam channels with 560k and 280k subscribers. Could we work together to make a video about your service?
    I have some ideas for how I would do the video. Have you done any work with UA-camrs in the past?
    I hope we can work together on this collaboration. If you have any questions, feel free to ask.
    Best regards,
    diaa maroufi.

  • @yeongnamtan
    @yeongnamtan 8 днів тому

    In normal OD, we load the best weights. For Florence, where in the code do you load the best weights after finetuning ?

  • @nikilragav
    @nikilragav 2 місяці тому +1

    9:35 how did you see this embedding vector projection thing for the Roboflow 100 datasets?

  • @zabique
    @zabique 7 днів тому

    How to output ALL classes at once that can be recognized on the picture?

  • @NakulMali-j6d
    @NakulMali-j6d 3 місяці тому +2

    Hell Sir Thanks for your all videos and efforts. I am following your channel, but I request you please upload one detail video on how to finetuning Yolov5 model for custome images classification.

    • @Roboflow
      @Roboflow  3 місяці тому

      Does YOLOv5 support classification?

    • @NakulMali-j6d
      @NakulMali-j6d 2 місяці тому

      @@Roboflow Yes

  • @abdshomad
    @abdshomad 3 місяці тому +2

    I've been waiting for this tutorial for days.
    Thank you again for being the first to comprehensively review this new model.
    Super exited! 🎉🥳

    • @Roboflow
      @Roboflow  3 місяці тому

      As usual you are the first one to comment on the video! Thanks a lot for all the support! 🔥

  • @dabaizhang-x5b
    @dabaizhang-x5b 3 місяці тому

    Master, could you please tell me if Florence-2 can perform SER (Semantic Entity Recognition) and RE (Relation Extraction) tasks? If so, what should my dataset look like? 🤔

  • @hegalzhang1457
    @hegalzhang1457 2 місяці тому

    Hey guys do you have have example to finetune an OCR model by Florence-2?

  • @NaveenKumarLaskari
    @NaveenKumarLaskari 2 місяці тому +1

    Thanks for the Video tutorial.
    Though multiple tasks can be achieved by this model, all the videos are single task
    Can you explain how we can tune the model for two different tasks, for example : OCR and OD

    • @Roboflow
      @Roboflow  2 місяці тому

      The model is still capable of doing both detection and OCR. We just focused on OD fine-tuning in this video. Take a look here to learn more about other tasks: ua-cam.com/video/hj_ybcRdk5Y/v-deo.html&ab_channel=Roboflow

  • @3DFinalCut1
    @3DFinalCut1 3 місяці тому +1

    Thank you for this very informative video. Something like this helps enormously.
    Dziękuję!
    I have a question and maybe someone here can give me a tip. I am looking for a tool that searches for similar images in a folder and shows me the results so that I can clean up the data set later. I have already found tools that do this, but they mainly work with image hashing methods or use fuzzy matching algorithms. But I wonder if there aren't already tools that use AI to solve this task. Does anyone use such a tool?

    • @Roboflow
      @Roboflow  2 місяці тому

      You can make it happen using CLIP model. We covered that topic here: ua-cam.com/video/YxJkE6FvGF4/v-deo.html

  • @JasonLiu-w3v
    @JasonLiu-w3v Місяць тому +1

    Thank you for providing such a nice video! I`m a college student who has been self-learning computer vision. I `m more interested in the automated annotation capabilities or zero-shot performance of computer vision tools. Your video really helped me a lot! Thank you

    • @Roboflow
      @Roboflow  Місяць тому

      Important area of research! It is moving really fast over the past few years.

  • @kylewang6704
    @kylewang6704 2 місяці тому +1

    Do you have plan to release tutorial about finetuning Florence-2 on other vision tasks such as captioning? I wonder if Florence-2 can be tuned in a way so it can do object detection/classification and also provide the reason for its prediction.

    • @Roboflow
      @Roboflow  2 місяці тому +1

      That’s quite possible. We will release a KeyPoint detection video this week and if no new model come out we will drop one more Florence-2 tutorial.

  • @kylewang6704
    @kylewang6704 2 місяці тому +1

    Thank you for the awesome tutorial! I wonder what about the detection accuracy comparing to YOLO based model?

    • @Roboflow
      @Roboflow  2 місяці тому +1

      We cover that topic in the video ;) something tells me you didn’t watch till the end.

  • @yj1548
    @yj1548 3 місяці тому +1

    Good video.and I'm curious about what can be done to improve mispelled class names on object detection tasks,do you have any ideas?

    • @Roboflow
      @Roboflow  3 місяці тому

      I think you asked me this question on Twitter, but let me answers here as well. 1. Longer training could fix it. 2. Fuzzy class matching. In the video we filter out anything that is not exact match. Hamming distance for example.

  • @VLM234
    @VLM234 3 місяці тому +1

    Very informative video. Thanks for making auch a valuable video free of cost. Just one request when your you make tutorials if possible try to do inferencing, training or fine tuning on agricultural or satellite related data.

    • @Roboflow
      @Roboflow  3 місяці тому

      Next time I will try to find some cool datasets from this domains

  • @SatyamKumar-cb2mt
    @SatyamKumar-cb2mt 3 місяці тому +1

    Thanks a ton for this awesome video! Every single term is explained so clearly-it's super helpful.
    I can't wait to dive in the code and start putting this knowledge to use!

    • @Roboflow
      @Roboflow  3 місяці тому +1

      Thanks a lot! I really put an effort and try not to fall into a bias (not assume that people know those things).

  • @adrianalbertomarinbalseca7132
    @adrianalbertomarinbalseca7132 2 місяці тому

    ufss se ve bacano pero en si necesita de internet esos entrenamientos :/ si en el caso no hubiera

  • @jimshtepa5423
    @jimshtepa5423 3 місяці тому +1

    can you please upload recording of teh community session for those of us who are in different zone or might otherwise miss the call?

    • @Roboflow
      @Roboflow  3 місяці тому

      Sure! All our community sessions are available to re-watch on YT channel

  • @barderino5673
    @barderino5673 2 місяці тому

    I would really really really really really like to see how you do train multiple datasets on different tasks like OD , OCR, REGION_PROPOSAL , and maybe something like OPEN_VOCABULARY on 1 set and MORE DETAILED CAPTION on another and seeing if effectively can transfer the knowledge for example including in the captioned images things that are not in the caption dataset but are in the other or improve OCR in images description

  • @artem-yw8km
    @artem-yw8km 3 місяці тому +1

    Thank you for this turtorial, was working on these kind of setup for a couple of days. You definetely could save lot of time

    • @Roboflow
      @Roboflow  3 місяці тому

      Sad I didn’t save your time this time.

  • @8-P
    @8-P 3 місяці тому

    For the community session I have a couple of (beginner) questions:
    - the google collabs on roboflow seem to be linux based, is there an easy way to make them work on windows?
    - in general, how do I download a model (YOLO) to use in a python app (on windows)
    - are there models that would run for realtime video detection on a regular laptop with an integrated iGPU?
    - I am planning to use a YOLO model for a sports live stream, but only have a simple 3 Year old mid range laptop on me - would it be better to send the stream over to my desktop PC with an Rtx3060Ti-8GB and let the model run there (and send back the detection back and sync on the laptop) - if a laptop is underpowered?
    - for simple applications, like the realtime sports detection of yours, would it be better to run it on my own hardware or investigate in cloud servers for inference?
    Thank you very much for your tutorials, the help a lot!

  • @nicolassuarez2933
    @nicolassuarez2933 2 місяці тому

    Sorry, but if you do not explain how to fine-tune real custom data from scratch, the tutorial is almost useless...

    • @Roboflow
      @Roboflow  2 місяці тому

      I’m afraid I don’t understand what you mean. That’s pretty much the topic of the video. Maybe there is a part that you expected but was not there?

  • @arifahnurainia272
    @arifahnurainia272 2 місяці тому

    thank you for the video tutorial, you are cool..... 👏👏👏
    I hope there is this tutorial using jupyter notebook 😁

  • @jk_c66
    @jk_c66 3 місяці тому +1

    thank you roboflow for providing such nice and lovely tutorials for free and with a nice instructions

  • @UhuruSsemakula
    @UhuruSsemakula 3 місяці тому +1

    Can you please make one for Object detection using web camera?

    • @Roboflow
      @Roboflow  2 місяці тому +1

      You mean using Florence-2 and Webcam? Or webcam in general?

    • @UhuruSsemakula
      @UhuruSsemakula 2 місяці тому

      Yes Florence-2 and Webcam

  • @bladethirst1
    @bladethirst1 3 місяці тому +1

    Is this applicable to grade handwritten pdf math assignments?

    • @Roboflow
      @Roboflow  3 місяці тому

      Florence-2 can be really good at OCR processing of handwritten text. Not sure about math equations. We would need to confirm that.

    • @bladethirst1
      @bladethirst1 3 місяці тому

      @@Roboflow {'': 'In this image we can see a book with some text on it.'} This is the test output of a handwritten math problem deduction, is there someway to get more detailed caption or the OCR output?

  • @geniusxbyofejiroagbaduta8665
    @geniusxbyofejiroagbaduta8665 3 місяці тому +1

    Please sir also tech us how to annotate with it

    • @Roboflow
      @Roboflow  3 місяці тому

      You mean how to automatically annotate images?

  • @suphotnarapong355
    @suphotnarapong355 3 місяці тому +1

    Thank you

  • @sandrojunioraraujo3706
    @sandrojunioraraujo3706 2 місяці тому

    Wonderful tutorial! Could you make a tutorial about how to fine tune florence 2 for the segmentation task?

    • @Roboflow
      @Roboflow  2 місяці тому +1

      I'm almost sure I'll create Google Colab covering this topic. Not sure about UA-cam video.

  • @indranilcool
    @indranilcool Місяць тому

    Why does the Florence model results are different when you re-run the code ?

    • @Roboflow
      @Roboflow  Місяць тому

      My guess is same reason why CharGPT responses are different every time you run it. Try adjusting temperature value.

    • @유영재-c9c
      @유영재-c9c 7 днів тому

      You should do `do_sample=False`

  • @richardobiri2642
    @richardobiri2642 2 місяці тому

    what if I want to detect fake and authentic certificates ? please any help

    • @Roboflow
      @Roboflow  2 місяці тому

      You mean distinguish between authentic and fake certificates? Do you have a dataset for that?

    • @richardobiri2642
      @richardobiri2642 2 місяці тому

      @@Roboflow yes I do

  • @geniusxbyofejiroagbaduta8665
    @geniusxbyofejiroagbaduta8665 3 місяці тому

    Thanks Sir. Please do fine-tuning for Oct, captioning and segmentation task

    • @Roboflow
      @Roboflow  3 місяці тому

      Did you tried to run OCR with pre-trained model?

  • @Jordufi
    @Jordufi 3 місяці тому +1

    Nice video, as usual