Correspondence problem

The correspondence problem, as the basis for calculating optical flow and stereo matching, is a fundamental problem in image processing.^[1] It refers to the problem in computer vision of ascertaining which parts of one image correspond to which parts of another image,^[2] where differences are due to movement of the camera, the elapse of time, and/or movement of objects in the photos. It is related to image registration, which is about finding a geometric transformation that aligns corresponding points on top of each other.

Correspondence is arguably the key building block in many related applications: optical flow (in which the two images are subsequent in time), dense stereo vision (in which two images are from a stereo camera pair), structure from motion (SfM) and visual SLAM (in which images are from different but partially overlapping views of a scene), and cross-scene correspondence (in which images are from different scenes entirely).

A simple method to find correspondences is PatchMatch. Modern correspondence algorithms use neural networks to find correspondences quickly and with high accuracy. The influential computer vision researcher Takeo Kanade famously once said that the three fundamental problems of computer vision are: “Correspondence, correspondence, and correspondence!”.^[3] However, the problem is now considered solved.

Remove ads

Basics

Summarize

Perspective

Given two or more images of the same 3D scene, taken from different points of view, the correspondence problem refers to the task of finding a set of points in one image which can be identified as the same points in another image. To do this, points or features in one image are matched with the points or features in another image, thus establishing corresponding points or corresponding features, also known as homologous points or homologous features. The images can be taken from a different point of view, at different times, or with objects in the scene in general motion relative to the camera(s). Finding corresponding pixels in stereo images is known as the correspondence problem. The result is usually a disparity map, in which a displacement vector to the corresponding pixel of the other image is determined for each pixel of one image. For this purpose, a unique correspondence between the points of the individual images must be established. Since the assignment of the pixels can be highly ambiguous and is not always possible, the correspondence problem is also referred to as a "ill-posed" problem, according to Hadamard's definition.^[4] Furthermore, the solution of the correspondence problem is made more difficult by perspective distortions, noise, and differences in illumination and contrast between the images.

The correspondence problem can occur in a stereo situation when two images of the same scene are used, or can be generalised to the N-view correspondence problem. In the latter case, the images may come from either N different cameras photographing at the same time or from one camera which is moving relative to the scene. The problem is made more difficult when the objects in the scene are in motion relative to the camera(s).

A typical application of the correspondence problem occurs in panorama creation or image stitching — when two or more images which only have a small overlap are to be stitched into a larger composite image. In this case it is necessary to be able to identify a set of corresponding points in a pair of images in order to calculate the transformation of one image to stitch it onto the other image.

Occlusions

One of the most significant sources of errors in stereoscopic correspondence determination is the presence of areas in a scene that are only visible from one camera perspective. For the image areas into which these regions of the scene are mapped, no corresponding elements exist in the other stereo image. These image areas are called occlusions. If occlusions are not adequately accounted for during correspondence determination, more or less pronounced miscorrections occur, depending on the approach, resulting in inaccurate depth reconstruction. Therefore occlusions represent a serious problem in stereo image processing.^[6]

Aperture Problem

In a stereo geometry with parallel optical axes, the displacement of corresponding image points in a stereo image pair is always parallel to the stereo base. With precise knowledge of the camera geometry, the direction of the disparity can be determined in advance, thus significantly simplifying the search for corresponding image points. However, image regions where no structures or intensity changes occur in the direction of the stereo base pose a problem. In this case, a displacement of corresponding image points cannot be detected. Since the detection of the displacement in stereoscopy is usually performed by a local operator, with the rest of the scene being ignored, this problem is also considered a special case of the so-called aperture problem, which is of particular importance in motion analysis (optical flow).^[7]

Constrains in Stereo Image Processing

Due to its specific nature, the correspondence problem, like many other ill-posed problems, can only be uniquely solved by exploiting appropriate prior knowledge. With the help of this prior knowledge, the solution space is appropriately reduced, and the problem is transformed into a "well-posed" problem.^[4] The constraints of the solution space relate, on the one hand, to the imaging process and the geometry of the cameras used (epipolar and uniqueness constraint), and on the other hand, to postulated properties of the observed scene (continuity, order, and gradient constrains).

Remove ads

Algorithms

Summarize

Perspective

The assignment of corresponding image elements in digital image processing can be achieved using various algorithms and mathematical methods. These methods differ, sometimes considerably, in their susceptibility to errors and the computational effort required.

Area based

In area based methods, individual image areas of the stereo images are assigned based on the grayscale values or the local environment of a pixel. The correspondence of the image areas is usually determined by calculating a similarity measure, such as a local cross-correlation. In the simplest case, the disparity arises from the shift of the image regions between the left and right images that exhibit the best degree of correspondence. Some of these methods use an interest operator to first select areas with specific properties from each image before determining the correspondence; these areas are then combined.^[8]

Feature based

Most existing stereoscopic approaches can be classified as feature-based methods. This technique involves first extracting features from the image data that describe the image at a more abstract level. Subsequently, a correspondence analysis is performed at the feature level. Commonly used features are edges, line or vertex points and edge or line segments. Differentiating filters, which extract grayscale variations such as edges or lines from the image signals, have a dominating importance in these methods.^[9]

Phase based

The basis for so-called phase-based methods for measuring disparity is the displacement theorem of the Fourier transform. However, in stereo image processing, a purely global shift between the images is generally not possible, since objects at different distances from the camera system exhibit different disparity values in the stereo images. Consequently, the shifts of corresponding image areas in the stereo image pair must be determined using local operators, so that phase correlation is generally only meaningful in conjunction with a Fourier transform limited to smaller image areas. The most important of the phase-based methods are the so-called phase difference methods. With these techniques, the phase information is derived from the response of complex filter pairs used to filter the input images. A key requirement for this method is that the phase of the filter responses is approximately a linear function of position. This property can be achieved if the filter transfer function has no offset and vanishes for negative frequencies. This property is called quadrature behavior.^[10] Since the phase information is invariant with respect to the amplitude of the filter responses, phase-based methods are also relatively robust with respect to interocular illumination and contrast differences. However, due to the ambiguity in the phase calculation, only disparity values up to half the modulation wavelength of the filter used can be measured. Like many other approaches, phase-based methods are also very sensitive to occlusions.^[6]

Eliminating the ambiguities

According to the uniqueness constraint, each pixel may only be assigned one disparity and thus at most one location in the considered space (this excludes semitransparent surfaces). However, ambiguities cannot be ruled out with any local or feature-based methods (the most similar areas in the stereo images do not necessarily belong together). Different methods are used to solve this problem, depending on the approach: In so-called regularization methods, cost or energy functions are formulated, taking the constraints (see basics) into account, and the global minimum is then sought within these functions. Another approach is represented by relaxation methods. In most approaches that apply this method in stereoscopy, features or image regions with special properties are first extracted from the image data. So-called nodes are then assigned to the image coordinates where these elements occur. Each of these nodes is further equipped with a set of variables, each representing a correspondence between the respective node and different elements in the other image. These variables are interpreted as probabilities^[1] or as neuronal activity^[4] (neural networks), depending on the approach. At the beginning of the actual relaxation process, the variables are initialized based on the similarity of the corresponding features or pixel values. Subsequently, the variable values are updated iteratively in a dynamic process, with violations of the constraints having an inhibitory or reducing effect and adherence to the constraints having a reinforcing effect. A clear disparity map is available when a steady state is reached. Using appropriate coupling, misattributions caused by occlusions can also be suppressed in this way.^[5]

Remove ads

Use

In computer vision the correspondence problem is studied for the case when a computer should solve it automatically with only images as input. Once the correspondence problem has been solved, resulting in a set of image points which are in correspondence, other methods can be applied to this set to reconstruct the position, motion and/or rotation of the corresponding 3D points in the scene.

The correspondence problem is also the basis of the particle image velocimetry measurement technique, which is nowadays widely used in the fluid mechanics field to quantitatively measure fluid motion.

References

Loading content...

External links

Loading content...

Loading related searches...

Wikiwand - on

Seamless Wikipedia browsing. On steroids.

Remove ads

Basics

Occlusions

Aperture Problem

Constrains in Stereo Image Processing

Algorithms

Area based

Feature based

Phase based

Eliminating the ambiguities

Use

See also

References

External links