Top Qs
Timeline
Chat
Perspective

Visual place recognition

From Wikipedia, the free encyclopedia

Visual place recognition
Remove ads

Visual Place Recognition (VPR) is a content-based image retrieval task in which, given a database of images and a query image, the goal is to return the image in the database that is closest in geographic location to the query image.[1] This task is primarily focused on real-world images of outdoor urban locations, but can be applied to indoor environments. The modern approach to the VPR task is to train machine learning algorithms that can extract features which encode the geographic information of the image.[2] VPR is primarily used in robotics and self-driving applications for localization, mapping, and planning.

Thumb
A visualization of the modern approach to the visual place recognition task.
Remove ads

Problem definition

The VPR task is most commonly referred to as a content-based image retrieval task, in which a query image must be matched to an image in a database.[1] Queries are matched to database images based on whether they are images of the same "place." The term "place" has been defined differently across the field. Some experts define a "place" using location of the camera regardless of its orientation. Others argue that images that contain overlapping elements should constitute a "place" match.[3] Places can vary in size based on the use case of the VPR solution. A match is considered successful based on ground truth metrics associated with the images. These can include GPS location, camera pose, or human labelling.[2] For GPS location, a successful match is determined based on whether the query image is within a specified radius of the database image. Camera pose matches are determined using relative pose error. Human labelling is treated as a classification task, and a match is determined based on whether the label of the query image matches the ground truth label.

Remove ads

History

Summarize
Perspective

The concept of "place recognition" has its roots in psychology and neuroscience. Early 20th century research into navigation and wayfinding explored how animals recognize their surroundings and orient themselves.[4] Studies in rats found specific place cells that activated when the test subjects visited a known environment, and would update based on new visual information.[5] This prompted works studying human navigation, which investigated how landmarks, spatial memory, and relative distance affected models of place recognition.[6] These works introduced the concept of "features" in the environment as important characteristics that could be used to define a location, and proposed that these features could be learned in order to recognize the location.[7] Most experiments focused on human trials navigating an area, and subsequently being tasked with recalling the location of a specific place in the environment. While mostly unrelated to the image retrieval task, this research laid the groundwork for place recognition as a concept in navigation.

Place recognition began emerging as a computer vision task in the 1990's. The task was introduced in the context of robot navigation and localization in order to build maps of an environment.[8][9] Visual place recognition then explicitly developed as an image retrieval task, in order to recognize whether a robot has seen a location or not while building a map. The problem was addressed by using image signatures, an early form of image feature based on handcrafted pixel computations, to describe and compare images.[10] In the early 2000's, advancements to image feature extraction using algorithms such as PCA, SIFT, and SURF improved visual place recognition results.[11][12] This marked a point where visual place recognition was investigated as its own task, outside the scope of robotics mapping and localization.[13]

The advent of neural networks as feature extractors changed the common approach to VPR.[2] Research into VPR began to focus on training deep learning networks to perform feature extraction as opposed to earlier algorithms. Originally used for image classification, convolutional neural networks (CNN) presented a more powerful method of feature extraction that are generalizable to other tasks, including place recognition. These CNN approaches outperformed older techniques, and became the standard for the VPR task.[14][15] Transformer models have recently been applied to the VPR task, and have proved promising for both feature extraction and re-ranking matching images.[16][17]

Remove ads

Architectures

Summarize
Perspective

Modern VPR solutions are deep neural networks that consist of three main components: a feature extractor, a feature aggregator, and a match ranking method.[2] VPR is commonly performed using local image features of different sections of the image, which are extracted using a deep learning architecture such as a CNN or transformer. A feature aggregator is used to condense these local features into a single vector representation. Handcrafted feature aggregators such as VLAD[18] were previously considered state-of-the-art, but have since been replaced with learned neural network aggregators such as netVLAD.[19] This vector representation is then used to compare the query image to the images in the database via a similarity search based on a similarity metric like Euclidean distance or cosine similarity. These results are then ranked based on their vector similarity, and re-ranked using methods such as spatial verification. Research into the VPR task usually focuses on upgrading the feature extractor,[20] improving aggregator clustering,[21] or refining the data labelling of images in the database during training.[22] Other advancements focus on the re-ranking module,[17] or attempt to remove the re-ranking process entirely.[23]

Applications

Summarize
Perspective

VPR has been primarily used in robotics applications for localization and mapping during navigation.[1] VPR is used in SLAM algorithms in conjunction with topological maps or metric maps to define whether or not a robot has seen an area during exploration or navigation. This allows the robot to build a map of the environment based on visual information, without additional sensors like LiDAR or GPS. VPR can still be used in conjunction with additional sensors to provide a more robust approach to localization. VPR models have been deployed on a variety of autonomous agents including ground vehicles, aerial vehicles,[24] and underwater robots.[25] Computational limitations in deployment on physical robots has made efficiency a focus of modern VPR research.

Thumb
A visualization of ORB-SLAM2, a SLAM technique that utilizes visual place recognition. ORB-SLAM2 computes a 2D point cloud shown on the right, and uses the Bag of Words VPR technique for loop closure.

Outside of the domain of robotics, VPR has been studied by Akihiko Torii et al. using mobile phone cameras of city images.[26] Torii used Google street-view panoramas to train a VPR model which was then evaluated using a dataset of phone camera images taken across Tokyo with varying lighting and scene changes. Torii addresses potential uses of VPR in searching for images of a specific location for architectural or urban planning studies, or modelling an area's change over time. In the domain of city identity recognition, a classification task similar to VPR, a 2026 study has examined potential sources of bias in geotagged images such as those from Google street-view.[27] The study finds that reproducibility is difficult for city recognition due to similarities between cities in the same countries, the camera quality and image conditions varying per country, and different camera providing better features for the task. The study pushes for careful data sampling while using geotagged images so that the inherent bias can be accounted for.

Remove ads

References

Loading related searches...

Wikiwand - on

Seamless Wikipedia browsing. On steroids.

Remove ads