[CVPR 2023] OrienterNet: Visual Localization in 2D Public Maps with Neural Matching

Biggest Breakthroughs in Computer Science: 2023

What Are Neural Networks Even Doing? (Manifold Hypothesis)

Почему Бандера убил Ленина. ❤️ПІДПИШИСЬ!❤️ Андрій Попик. чат рулетка. #андрійпопик #чатрулетка

Каха инструкция по шашлыку

УСИК против Дерзкого РУССКОГО! Этот Бой Невозможно Забыть!

[NeurIPS 2023] SNAP: Self-Supervised Neural Maps for Visual Positioning and Semantic Understanding

Paul-Edouard Sarlin

Переглядів 4 610

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 12 гру 2023
This is the video for our paper "SNAP: Self-Supervised Neural Maps for Visual Positioning and Semantic Understanding".
This work was done at Google and is published at NeurIPS 2023.
Paper: arxiv.org/pdf/2306.05407
Code: github.com/google-research/snap
Authors: Paul-Edouard Sarlin, Eduard Trulls, Marc Pollefeys, Jan Hosang, Simon Lynen
Abstract: Semantic 2D maps are commonly used by humans and machines for navigation purposes, whether it's walking or driving. However, these maps have limitations: they lack detail, often contain inaccuracies, and are difficult to create and maintain, especially in an automated fashion. Can we use raw imagery to automatically create better maps that can be easily interpreted by both humans and machines? We introduce SNAP, a deep network that learns rich neural 2D maps from ground-level and overhead images. We train our model to align neural maps estimated from different inputs, supervised only with camera poses over tens of millions of StreetView images. SNAP can resolve the location of challenging image queries beyond the reach of traditional methods, outperforming the state of the art in localization by a large margin. Moreover, our neural maps encode not only geometry and appearance but also high-level semantics, discovered without explicit supervision. This enables effective pre-training for data-efficient semantic scene understanding, with the potential to unlock cost-efficient creation of more detailed maps.
Наука та технологія

КОМЕНТАРІ • 8

@Unique-Concepts 5 місяців тому ⁺²
Fantastic work....Love it👏👏👏🙏🙏👌👌👌👍👍👍
@mlachahesaidsalimo9958 2 місяці тому
Your work is incredible ! Thank you for sharing. I really like the dynamism and playfulness of the presentation. Which software did you use to make the video presentation ? Thank you in advance for your reply
@pesarlin 2 місяці тому
Thank you! I used only PowerPoint :)
@RicanSamurai 6 місяців тому ⁺³
Fascinating! Very interesting novel approach to this problem. At 6:39, it appears as though you have ~12 map images that cover the area of interest (of which you highlight four), and then you are able to successfully get a position prediction from a query image. Do you have a sense of how densely that area needs to be covered by your map images before SNAP beats other models? Similarly, is there a map image density at which you see diminishing returns?
I'm just curious how many training images are necessary to cover a given region before SNAP's predictions become useful. For that same region in your example, would 50 map images of the region make a meaningful difference to the prediction?
Thanks!
@pesarlin 6 місяців тому ⁺²
We use a rig with 3 cameras so we actually have 36 images in these examples (each triangle is a camera pose). We have an ablation study in Table 1 in the paper: aerial-only is a bit worse than semantic maps, while StreetView-only is a bit worse than aerial+StreetView. So aerial-only can already get you quite far but having some coverage of ground-level images is important. During training we actually map with fewer images (20 instead of 36) so the model is pretty robust to sparse views, but indeed more is better. I don't have numbers at hand, but I guess that the performance is already quite saturated at 36 views (0.6 per meter), unless there is strong occlusion (e.g. from trucks) in most views.
@user-rs4mf6ju8d 5 місяців тому
Very impressive work!
Question: Can I generate a neural map for localization only from birds eye view? Let us say using images from a downward looking camera for a flight from Brussels to Amsterdam.
@anywallsocket 5 місяців тому
How you choose validation data areas within training data areas?
@pesarlin 5 місяців тому ⁺²
We randomly sampled a fixed number of S2 cells in each training city.

Наступне

Автоматичне відтворення

[CVPR 2023] OrienterNet: Visual Localization in 2D Public Maps with Neural Matching

[CVPR 2023] OrienterNet: Visual Localization in 2D Public Maps with Neural Matching

Biggest Breakthroughs in Computer Science: 2023

Biggest Breakthroughs in Computer Science: 2023

What Are Neural Networks Even Doing? (Manifold Hypothesis)

What Are Neural Networks Even Doing? (Manifold Hypothesis)

Почему Бандера убил Ленина. ❤️ПІДПИШИСЬ!❤️ Андрій Попик. чат рулетка. #андрійпопик #чатрулетка

Почему Бандера убил Ленина. ❤️ПІДПИШИСЬ!❤️ Андрій Попик. чат рулетка. #андрійпопик #чатрулетка

Каха инструкция по шашлыку

Каха инструкция по шашлыку

УСИК против Дерзкого РУССКОГО! Этот Бой Невозможно Забыть!

УСИК против Дерзкого РУССКОГО! Этот Бой Невозможно Забыть!

PEDRO PEDRO Championship!!! Who’s the champion?? 🤔🤯 @Mamiko #beatbox #challenge #fyp

PEDRO PEDRO Championship!!! Who’s the champion?? 🤔🤯 @Mamiko #beatbox #challenge #fyp

Contrastive Learning in PyTorch - Part 1: Introduction

Contrastive Learning in PyTorch - Part 1: Introduction

Learning Semantic Traversability with Egocentric Video and Automated Annotation Strategy

Learning Semantic Traversability with Egocentric Video and Automated Annotation Strategy

NeurIPS 2023 Vendor Hall

NeurIPS 2023 Vendor Hall

NeurIPS 2023 Poster Session 1 (Tuesday Evening)

NeurIPS 2023 Poster Session 1 (Tuesday Evening)

How ChatGPT is Trained

How ChatGPT is Trained

Mamba - a replacement for Transformers?

Mamba - a replacement for Transformers?

What are Diffusion Models?

What are Diffusion Models?

[ECCV 2022] LaMAR: Benchmarking Localization and Mapping for Augmented Reality

[ECCV 2022] LaMAR: Benchmarking Localization and Mapping for Augmented Reality

Stanford CS224W: Machine Learning w/ Graphs I 2023 I GNNs for Recommender Systems

Stanford CS224W: Machine Learning w/ Graphs I 2023 I GNNs for Recommender Systems

Microsoft больше не будет использовать React в браузере

Microsoft больше не будет использовать React в браузере

ПК за 40к для игр и работы в 2024 | Arc A580 - новый топ

ПК за 40к для игр и работы в 2024 | Arc A580 – новый топ

НЕДЕЛЯ с Sony Xperia 1 V - последний образец ЯПОНСКОГО ЧУДА? | ЧЕСТНЫЙ ОТЗЫВ

НЕДЕЛЯ с Sony Xperia 1 V — последний образец ЯПОНСКОГО ЧУДА? | ЧЕСТНЫЙ ОТЗЫВ

5 НЕЛЕГАЛЬНЫХ гаджетов, за которые вас посадят

5 НЕЛЕГАЛЬНЫХ гаджетов, за которые вас посадят

How charged your battery?

How charged your battery?

Apple watch hidden camera

Apple watch hidden camera

ВЫ ЧЕ СДЕЛАЛИ С iOS 18?!

ВЫ ЧЕ СДЕЛАЛИ С iOS 18?!