YOLO-World - Real-Time, Zero-Shot Object Detection

Поділитися
Вставка
  • Опубліковано 1 жов 2024
  • YOLO-World - It is a zero shot model which means you can detect objects without training your model on it.
    GitHub: github.com/Aar...
    For queries: You can comment in comment section or you can email me at aarohisingla1987@gmail.com
    The YOLO-World builds the YOLO detector with the frozen CLIP-based text encoder for extracting text embeddings from the input texts, e.g., object categories or noun phrases.
    The YOLO-World contains an Re-parameterizable Vision-Language Path Aggregation Network (RepVL-PAN) to facilitate the interaction between multi-scale image features and text embeddings. The RepVL-PAN can re-parameterize the user's offline vocabularies into the model parameters for fast inference and deployment.
    The YOLO-World is pre-trained on large-scale region-text datasets with the region-text contrastive loss to learn the region-level alignment between vision and language. For normal image-text datasets, e.g., CC3M, we adopt an automatic labeling approach to generate pseudo region-text pairs.

КОМЕНТАРІ • 43

  • @ezequieligomez2135
    @ezequieligomez2135 4 місяці тому

    Is this pre-trained on O365+GoldG or COCO dataset?
    How would I get to specifically get the one pre-trained on O365+GoldG?

  • @harshays2873
    @harshays2873 4 місяці тому

    please make a video for training on custom data for this model

  • @عدنانمهداوي-ن5ث
    @عدنانمهداوي-ن5ث 6 місяців тому

    Yolo in real time is very slow, you know why??

  • @rickyS-D76
    @rickyS-D76 3 місяці тому

    Thanks, do you have detailed video on video object detection with label and confidence score...or any other resource that can be helpful. Thank you.

  • @jeffg4686
    @jeffg4686 6 місяців тому

    Oh nice. How do they come up with these ridiculous names...
    Is this actually better than grounding DINO, or just faster?
    Also, do they have safetensors?
    Do certain model types not work with safetensors, or is this their new plan to infect all the computers?

  • @anamikamaurya22
    @anamikamaurya22 6 місяців тому

    My god....now programmer will become the creater of 2025

  • @2xback2back14
    @2xback2back14 7 місяців тому

    Hello, can you please demonstrate how to give custom text in "text to image generation using stackGAN", and even after 1000 epochs my model doesnt seem to generate birds images.
    Please help me.

    • @CodeWithAarohi
      @CodeWithAarohi  7 місяців тому

      I will try to cover this requested topic when I will continue with the GAN playlist.

  • @bb-andersenaccount9216
    @bb-andersenaccount9216 7 місяців тому

    good job. however it is not clear when setting the classes if you are giving a description prompt or just picking a pre trained class as usual. the person class you show in the example might be a typical pre trained label class instead a description prompt. this makes the example confusing

  • @TheJAM_Sr
    @TheJAM_Sr 5 місяців тому

    I just wanted to say I found your channel this week and really appreciate your classes. I won’t even say they are tutorials because I can take what I learn and easily apply them to my project.

  • @himanshudnk
    @himanshudnk 7 місяців тому

    i still not clear how it is different from traditional yolo models vs yolo world , as it is like we using pretrained model and in that we give classes as per and it is able to detect, is it also like yolov8 for example is trained on 80 classes , so yolo world has more other classes?

    • @CodeWithAarohi
      @CodeWithAarohi  7 місяців тому

      Using yolov8, We can detect the object classes. Suppose if model is trained on coco dataset then you can only detect those 80 classes which are present in coco dataset. And suppose, you created a custom yolov8 model to detect 5 classes then yolov8 will be able to detect those 5 classes.
      But in yolo-world, you can write the name of any object you want to detect. And it will detect that object because yolo world is trained on images and their text descriptions.

  • @iPrashantSmp
    @iPrashantSmp 7 місяців тому

    How can I know the list of pretrained classes in the YOLOWorld world model?

    • @CodeWithAarohi
      @CodeWithAarohi  7 місяців тому

      I am not sure but YOLO-World is pre-trained on large-scale vision-language datasets, including Objects365, GQA, Flickr30K, and CC3M

  • @informative7410
    @informative7410 6 місяців тому

    How to convert yolo world into tflite ???

  • @learn_with_gaddal
    @learn_with_gaddal 7 місяців тому

    Awesome, thank you so much for sharing this information.

  • @hemachandhers
    @hemachandhers 7 місяців тому

    can you put video on fine tuning yolo world on custom dataset mam

    • @CodeWithAarohi
      @CodeWithAarohi  7 місяців тому

      ua-cam.com/video/kl7yszVU6Tg/v-deo.htmlsi=WRSX79c0QmuMBrWh

    • @Satchi017
      @Satchi017 6 місяців тому

      @@CodeWithAarohi Yes, how to build a custom yolo-world model for a totally new class, which is not even in large-scale vision-language datasets (Objects365, GQA, Flickr30K, and CC3M)

    • @Satchi017
      @Satchi017 6 місяців тому

      Sorry ma'am, the person class is in the pre-trained classes. I guess the example is biased.
      How can I detect the car FM antenna on your example image?

    • @CodeWithAarohi
      @CodeWithAarohi  6 місяців тому

      @@Satchi017 check this: ua-cam.com/video/WbCgU4GrjV4/v-deo.htmlsi=qbiPic5BmDPTUAPn

    • @Satchi017
      @Satchi017 6 місяців тому

      ​@@CodeWithAarohi Ma'am, I have viewed the video. Rather than detecting "hard hat" and "gloves", how can I detect the object (Red Probe/wire) in the image (a.jpg)?

  • @ShittheswaranSelvakumar
    @ShittheswaranSelvakumar 6 місяців тому

    nice explanation mam.. Thank you...:)

  • @aneerimmco
    @aneerimmco 3 місяці тому

    informative, Thank you.

  • @Hemamalini-f3i
    @Hemamalini-f3i 7 місяців тому

    How to convert these detections into annotations?

    • @CodeWithAarohi
      @CodeWithAarohi  7 місяців тому

      There is no need to convert the detections into annotations for custom object detection. But still if you want to do that then you can write a script to fetch the bounding boxes co ordinates and store them in a file.

  • @arnavthakur5409
    @arnavthakur5409 7 місяців тому

    Ma'am your work is really incredible

  • @p.logesharavind3528
    @p.logesharavind3528 7 місяців тому

    This is really cool and interesting .!

  • @soravsingla8782
    @soravsingla8782 7 місяців тому

    Awesome

  • @Sunil-ez1hx
    @Sunil-ez1hx 7 місяців тому

    Amazing video

  • @pifordtechnologiespvtltd5698
    @pifordtechnologiespvtltd5698 7 місяців тому

    Nice