Voxel51
Voxel51
  • 229
  • 85 511
ECCV 2024: Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models
In this talk, I will introduce our recent work on open-vocabulary 3D semantic understanding. We propose a novel method, namely Diff2Scene, which leverages frozen representations from text-image generative models, for open-vocabulary 3D semantic segmentation and visual grounding tasks. Diff2Scene gets rid of any labeled 3D data and effectively identifies objects, appearances, locations and their compositions in 3D scenes.
ECCV 2024 Paper: Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models
arxiv.org/abs/2407.13642
About the Speaker
Xiaoyu Zhu is a Ph.D. student at Language Technologies Institute, School of Computer Science, Carnegie Mellon University. Her research interest is computer vision, multimodal learning, and generative models.
Переглядів: 84

Відео

ECCV Redux: Zero-shot Video Anomaly Detection: Leveraging LLMs for Rule-Based Reasoning
Переглядів 8821 годину тому
Video Anomaly Detection (VAD) is critical for applications such as surveillance and autonomous driving. However, existing methods lack transparent reasoning, limiting public trust in real-world deployments. We introduce a rule-based reasoning framework that leverages Large Language Models (LLMs) to induce detection rules from few-shot normal samples and apply them to identify anomalies, incorpo...
ECCV 2024 Redux: Day 3- Closing the Gap Between Satellite & Street View Imagery Generative Models
Переглядів 71День тому
Closing the Gap Between Satellite and Street-View Imagery Using Generative Models With the growing availability of satellite imagery (e.g., Google Earth), nearly every part of the world can be mapped, though street-view images remain limited. Creating street views from satellite data is crucial for applications like virtual model generation, media content enhancement, 3D gaming, and simulations...
ECCV 2024 Redux: Day 3- High-Efficiency 3D Scene Compression Using Self-Organizing Gaussians
Переглядів 225День тому
In just over a year, 3D Gaussian Splatting (3DGS) has made waves in computer vision for its remarkable speed, simplicity, and visual quality. Yet, even scenes of a single room can exceed a gigabyte in size, making it difficult to scale up to larger environments, like city blocks. In this talk, we’ll explore compression techniques to reduce the 3DGS memory footprint. We’ll dive deeply into our n...
ECCV 2024 Redux: Day 3- Skeleton Recall Loss for Connectivity Conserving and Resource Efficient Seg.
Переглядів 42День тому
Skeleton Recall Loss for Connectivity Conserving and Resource Efficient Segmentation of Thin Tubular Structures We present Skeleton Recall Loss, a novel loss function for topologically accurate and efficient segmentation of thin, tubular structures, such as roads, nerves, or vessels. By circumventing expensive GPU-based operations, we reduce computational overheads by up to 90% compared to the ...
ECCV 2024 Redux: Day 1 - Tree-of-Life Meets AI
Переглядів 53День тому
A central challenge in biology is understanding how organisms evolve and adapt to their environment, acquiring variations in observable traits across the tree of life. However, measuring these traits is often subjective and labor-intensive, making trait discovery a highly label-scarce problem. With the advent of large-scale biological image repositories and advances in generative modeling, ther...
ECCV 2024 Redux: Day 1 - Robust Calibration of Large Vision-Language Adapters
Переглядів 53День тому
We empirically demonstrate that popular CLIP adaptation approaches, such as Adapters, Prompt Learning, and Test-Time Adaptation, substantially degrade the calibration capabilities of the zero-shot baseline in the presence of distributional drift. We identify the increase in logit ranges as the underlying cause of miscalibration of CLIP adaptation methods, contrasting with previous work on calib...
ECCV 2024 Redux: Day 1 - Fast and Photo-realistic Novel View Synthesis from Sparse Images
Переглядів 53День тому
Novel view synthesis generates new perspectives of a scene from a set of 2D images, enabling 3D applications like VR/AR, robotics, and autonomous driving. Current state-of-the-art methods produce high-fidelity results but require a lot of images, while sparse-view approaches often suffer from artifacts or slow inference. In this talk, I will present my research work focused on developing fast a...
Computer Vision Meetup: Deploying ML models on Edge Devices using Qualcomm AI Hub
Переглядів 8314 днів тому
In this talk we address the common challenges faced by developers migrating AI workloads from the cloud to edge devices. Qualcomm aims to democratize AI at the edge, easing the transition to the edge by supporting familiar frameworks and data types. ​This is where Qualcomm AI Hub comes in. Developers can follow along, gaining knowledge and tools to efficiently deploy optimized models on real de...
Computer Vision Meetup: Human-in-the-loop: Practical Lessons for Building Comprehensive AI Systems
Переглядів 11714 днів тому
AI systems often struggle with data limitations, data distribution shift over time, and a poor user experience. Human-in-the-loop design offers a solution by placing users at the center of AI systems and leveraging human feedback for continuous improvement. In this talk, we’ll dive deeply into a recent project at Merantix Momentum: A interactive tool for automatic rodent behaviour analysis in v...
Computer Vision Meetup: Curating Excellence: Strategies for Optimizing Visual AI Datasets
Переглядів 10414 днів тому
In this talk Harpreet will discuss common challenges plaguing visual AI datasets, their impact on model performance, and share some tips and tricks for curating datasets to make the most of any compute budget or network architecture. Speaker: Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG...
Computer Vision Meetup: PostgreSQL for Innovative Vector Search
Переглядів 64Місяць тому
There are a plethora of datastores that can work with vector embeddings. You are probably already running one that allows for innovative uses of data alongside your embeddings - PostgreSQL! This talk will focus on showing examples of how features already present in the PostgreSQL ecosystem allow you to leverage it for cutting edge use cases. Live demos and lively discussion will be the focus of...
Computer Vision Meetup: Pixels Are All You Need Utilizing 2D Image Representation in Robotics
Переглядів 227Місяць тому
Many vision-based robot control applications (like those in manufacturing) require 3D estimates of task-relevant objects, which can be realized by training a direct 3D object detection model. However, obtaining 3D annotation for a specific application is expensive relative to 2D object representations like segmentation masks or bounding boxes. In this talk, Brent will describe how we achieve mo...
Computer Vision Meetup: Accelerating Machine Learning Research and Development for Autonomy
Переглядів 242Місяць тому
At Oxa (Autonomous Vehicle Software), we designed an automated workflow for building machine vision models at scale from data collection to in-vehicle deployment, involving a number of steps, such as, intelligent route planning to maximise visual diversity; sampling of the sensor data w.r.t. visual and semantic uniqueness; language-driven automated annotation tools and multi-modal search engine...
Computer Vision Meetup: Using Elasticsearch Vector Search in FiftyOne
Переглядів 112Місяць тому
In this short demo, Steve Pousty (Developer Advocate at Voxel51) shows you how to leverage Elastic’s vector search search capabilities for computer vision use cases using the FiftyOne open source library. Not a Meetup member? Sign up to attend the next event: voxel51.com/computer-vision-ai-meetups/ Recorded on Oct 10, 2024 at the AL, Machine Learning and Computer Vision Meetup. #computervision ...
Computer Vision Meetup: Elastic is for the Birds: Identifying Embedding Images using Vector Search
Переглядів 100Місяць тому
Computer Vision Meetup: Elastic is for the Birds: Identifying Embedding Images using Vector Search
Computer Vision Meetup: RGB-X Model Development: Exploring Four Channel ML Workflows
Переглядів 111Місяць тому
Computer Vision Meetup: RGB-X Model Development: Exploring Four Channel ML Workflows
Computer Vision Meetup: How Renault Leveraged Machine Learning to Scale Electric Vehicle Sales
Переглядів 140Місяць тому
Computer Vision Meetup: How Renault Leveraged Machine Learning to Scale Electric Vehicle Sales
Scaling Industrial AI with FiftyOne
Переглядів 982 місяці тому
Scaling Industrial AI with FiftyOne
Computer Vision Meetup: GPUs at Scale - Trials of a GPUaaS Provider
Переглядів 842 місяці тому
Computer Vision Meetup: GPUs at Scale - Trials of a GPUaaS Provider
Visual AI in Healthcare: NVIDIA’s VISTA-3D and MedSAM-2 Medical Imaging Models
Переглядів 4812 місяці тому
Visual AI in Healthcare: NVIDIA’s VISTA-3D and MedSAM-2 Medical Imaging Models
Visual AI in Healthcare: Exploring Instance Imbalance in Medical Semantic Segmentation
Переглядів 952 місяці тому
Visual AI in Healthcare: Exploring Instance Imbalance in Medical Semantic Segmentation
Visual AI in Healthcare: Advancing Comparative Computational AI in Veterinary Oncology
Переглядів 1382 місяці тому
Visual AI in Healthcare: Advancing Comparative Computational AI in Veterinary Oncology
Visual AI in Healthcare: Interpretable AI Models in Radiology
Переглядів 2362 місяці тому
Visual AI in Healthcare: Interpretable AI Models in Radiology
Computer Vision Meetup: It's in the Air Tonight. Sensor Data in RAG
Переглядів 1492 місяці тому
Computer Vision Meetup: It's in the Air Tonight. Sensor Data in RAG
Computer Vision Meetup: Data-Centric AI Competition on Hugging Face Spaces
Переглядів 562 місяці тому
Computer Vision Meetup: Data-Centric AI Competition on Hugging Face Spaces
Computer Vision Meetup: Reducing Hallucinations in ChatGPT and Similar AI Systems
Переглядів 1472 місяці тому
Computer Vision Meetup: Reducing Hallucinations in ChatGPT and Similar AI Systems
Computer Vision Meetup: Accelerating Multimodal RAG Pipelines with NVIDIA and OSS Integrations
Переглядів 1752 місяці тому
Computer Vision Meetup: Accelerating Multimodal RAG Pipelines with NVIDIA and OSS Integrations
Computer Vision Meetup: 5 Handy Ways to Use Embeddings, the Swiss Army Knife of AI
Переглядів 772 місяці тому
Computer Vision Meetup: 5 Handy Ways to Use Embeddings, the Swiss Army Knife of AI
Computer Vision Meetup: Agentic RAG in 2024
Переглядів 5272 місяці тому
Computer Vision Meetup: Agentic RAG in 2024

КОМЕНТАРІ

  • @redforestx7371
    @redforestx7371 10 днів тому

    Amazing work!

  • @redforestx7371
    @redforestx7371 10 днів тому

    WOW. This looks so amazing. I can't wait to use this!

  • @ajwaus
    @ajwaus 12 днів тому

    Really interesting

  • @sylviaschmitt
    @sylviaschmitt 13 днів тому

    Thank you for sharing the video. Does this plugin assume a vector engine like qdrant is used as backend?

  • @sylviaschmitt
    @sylviaschmitt 13 днів тому

    Thank you for sharing this video on the Active Learning plugin. Is it possible to use the plugin for multi-class multi-label tasks as well?

  • @sai.sankarwork
    @sai.sankarwork 26 днів тому

    When I try the dev install process in a git bash terminal, it fails at a point because of a package error. "Collecting shapely>=1.7.1 (from -r requirements\extras.txt (line 7)) Using cached shapely-2.0.6-cp312-cp312-win_amd64.whl.metadata (7.2 kB) ERROR: Could not find a version that satisfies the requirement open3d>=0.16.0 (from versions: none) ERROR: No matching distribution found for open3d>=0.16.0" How can this be solved?

  • @NishantRoy-h4d
    @NishantRoy-h4d Місяць тому

    Very interesting demo; would you mind sharing the Colab link?

  • @deemon101
    @deemon101 Місяць тому

    ok, and how does one start it?

  • @AmeeliaK
    @AmeeliaK 2 місяці тому

    This was very helpful! Llama Index grows so fast, it feels overwhelming for a beginner.

  • @BD_Gaming2013
    @BD_Gaming2013 2 місяці тому

    !second comment

  • @SergeyPavlov-b4c
    @SergeyPavlov-b4c 3 місяці тому

    I want to work with my custom dataset. I'd like you to show me how to do it and which benefits I can get using your product. Examples, how can I refine my own data with fiftyone

  • @menghuitan1628
    @menghuitan1628 3 місяці тому

    Isn't the "Grid Trick" similar to using ControlNet, a type of model for controlling image diffusion models by conditioning the model with an additional input image?

  • @ByTobys
    @ByTobys 4 місяці тому

    Love your product!

  • @MohitAkhakharia
    @MohitAkhakharia 4 місяці тому

    How to we execute the plugin logic in the code? This doesn't seem to work: logging.info("removing approximate duplicates") operator_uri = "@jacobmarks/image_deduplication/remove_all_approximate_duplicates" params = { "sim_choices": "sim", # You may need to adjust this based on your similarity run key "threshold_value": 0.4 } # Create an invocation request request = foe.InvocationRequest(operator_uri, params=params) # Create an executor and execute the request executor = foe.Executor(requests=[request]) result = executor.trigger(operator_uri, params=params) print(result.to_json()) # logging.info(f"Found approximate duplicates: {result.result}") return result

  • @alivirat6926
    @alivirat6926 5 місяців тому

    The video was great, thanks mate for explination.

  • @ashwinkumar5223
    @ashwinkumar5223 5 місяців тому

    Wonderful 👍

  • @rishiraj2548
    @rishiraj2548 6 місяців тому

    Great!

  • @HarisonRoberto
    @HarisonRoberto 6 місяців тому

    the search result are only online images? or can it be local images?

    • @voxel51
      @voxel51 6 місяців тому

      you can drag and drop a local image in :)

  • @kai_harm942
    @kai_harm942 6 місяців тому

    Made that look *way* too easy. I spent a whole hour last night trying to get the first line of code to work! It was because my Python paths were thrown about the place

  • @technologyencroyable
    @technologyencroyable 7 місяців тому

    How to build the js part of code to generate umd.js file in dist folder. I am build using yarn build but the generated umd file is not working and not opening new panel. Please help

  • @technologyencroyable
    @technologyencroyable 7 місяців тому

    How to build the js part of code to generate umd.js file in dist folder. I am build using yarn build but the generated umd file is not working and not opening new panel. Please help

    • @voxel51
      @voxel51 7 місяців тому

      Great question. Try `yarn install` as well. Make sure that the plugin is in your plugins directory. And when you want to change the plugin, make sure you use `yarn dev`. If you have more questions about FiftyOne Plugins, check out the #plugins channel in the FiftyOne community Slack! slack.voxel51.com/

  • @MyJunkEmail
    @MyJunkEmail 8 місяців тому

    great tutorial, can you use a local instance of SD?

  • @AlainPilon
    @AlainPilon 9 місяців тому

    It is great that you give us a list of next steps, but a link to each of these points would have been nice!

  • @ZixuWang-ul8hr
    @ZixuWang-ul8hr 9 місяців тому

    nice job!

  • @aimadnessbot
    @aimadnessbot 10 місяців тому

    This is good! But i believe the data should also grab eye movement. Eye movement is crucial to map intention and will aid in robot navigation. Apple's headset has the hardware to monitor both eye direction and head direction.

  • @huynhphanngockhang5733
    @huynhphanngockhang5733 10 місяців тому

    I have a idea for build a autonomous drone using computer vision to detect objects that is labled with a GPS location before.

  • @rezamahmoudi163
    @rezamahmoudi163 11 місяців тому

    please slide share?

  • @SeedmancChitOKun
    @SeedmancChitOKun 11 місяців тому

    How does it select which images would be kept as "representatives" and which removed?

  • @aldem34
    @aldem34 11 місяців тому

    I want words like these intitle:"keyword" For better search efficiency for topics

  • @wata1991
    @wata1991 Рік тому

    Is it possible to use this and find the most similar image given user submitted photos? For example I'm trying to do something to detect trading cards, where the input would be photos of cards submitted by users.

  • @tyronetyrone2652
    @tyronetyrone2652 Рік тому

    Hello, I downloaded and installed FiftyOne, but I don’t know how to use it. All your videos didn’t explain how to use it.

    • @ByTobys
      @ByTobys 4 місяці тому

      There is lots of documentation online on their website, check it out! Its really not difficult to get it running, but its "only" an API, so some python Experience is definetly helpful to get it running. :)

    • @ByTobys
      @ByTobys 4 місяці тому

      There is lots of documentation online on their website, check it out! Its really not difficult to get it running, but its "only" an API, so some python Experience is definetly helpful to get it running. :)

  • @davidgrayson181
    @davidgrayson181 Рік тому

    I love this

  • @beiddouwang6643
    @beiddouwang6643 Рік тому

    good

  • @omarelsherif010
    @omarelsherif010 Рік тому

    Thanks for clear explanation❤

  • @jasonwell5299
    @jasonwell5299 Рік тому

    Thank you so much bro. Nice tutorial.

  • @divyanshnautiyal8110
    @divyanshnautiyal8110 Рік тому

    getting Not Found

  • @robosergTV
    @robosergTV Рік тому

    Thanks, great overview

  • @ChrisWiggins1
    @ChrisWiggins1 Рік тому

    Look promising, I was going through your tutorial, and I was hoping to see how you can import your own database.

  • @ritagislason
    @ritagislason Рік тому

    🤩 Promo'SM

  • @vanessacrosbyfitzgerald
    @vanessacrosbyfitzgerald Рік тому

    Can you perform the initial labeling on images that have not been annotated yet? On part 5 and I have not seen that information yet. Did I miss it?

  • @DigiDriftZone
    @DigiDriftZone Рік тому

    Can you edit/correct or add/remove annotations directly in FiftyOne?

  • @sapsan1234
    @sapsan1234 Рік тому

    I am really excited about this product! Thank you for this hands-on video!

  • @akshayiitk4440
    @akshayiitk4440 Рік тому

    "Wow, this video is incredibly informative and well-produced! The speaker does a fantastic job of explaining the complex topic of speech recognition and the new Whisper model from OpenAI in a way that's easy to understand. Great job, highly recommended to anyone interested in this field!"

  • @magdalenakate6781
    @magdalenakate6781 Рік тому

    splendid 🙂✌️️️!! Find out how your competition ranks better = 'Promosm'!!

  • @AliHamza-ys8dt
    @AliHamza-ys8dt Рік тому

    how to add our own dataset into FiftyOne. I want to label my own data.

    • @ByTobys
      @ByTobys 4 місяці тому

      As mentioned in the video, fiftyone isn't a classical annotation tool, but it provides hooks to do that with cvat, labelbox etc and then load the labeled data back into fiftyone. For me the cvat solution worked perfectly fine. Everything is perfectly documented on their website, check it out! :) If you want to load your annotation data which is in your own format, and not in a typical dataformat (COCO,...) you'll have to write a few lines of python codes yourself. For that purpose I have implemented a DatasetHandler-class. You'll have to convert into fiftyone-format by iterating through your data and turn them into fiftyone Detection-Objects: detections.append( fo.Detection(label=my_label, bounding_box=my_bbox) ) Fiftyone doesn't work "out of the box", but it's a great tool for working with CV-Data!

  • @Himakarbavikaty
    @Himakarbavikaty Рік тому

    Hi I am getting the following error in colab and jupyter notebook with custom data and coco 2017 (default data) MalformedQueryException: Cannot attach/detach dataset to/from a batch project Kindly help me to solve this issue

  • @vernenfelcher6442
    @vernenfelcher6442 2 роки тому

    𝐩яⓞ𝓂𝓞Ş𝐦

  • @Kk-vx1id
    @Kk-vx1id 2 роки тому

    Hi guys, how are you? How to change the font on the interface of fiftyone, I hope to get your reply!

  • @CannibalWarthog
    @CannibalWarthog 2 роки тому

    A installation tutorial would be nice