One of the oldest robust methods used in image analysis and computer vision is the Hough transform. The idea is to map data into the parameter space, which is appropriately quantized, and then seek for the most likely parameter values to interpret data through clustering. A classical example is the detection of straight lines given a set of edge points. In the following, we take the example of estimating plane rigid motion from two sets of points.
Given p 2D points in the first set, noted , and q 2D points in the second set, noted , we must find a rigid transformation between the two sets. The pairing between and is assumed not known. A rigid transformation can be uniquely decomposed into a rotation around the origin and a translation, in that order. The corresponding parameter space is three-dimensional: one parameter for the rotation angle and two for the translation vector . More precisely, if is paired to , then
with
It is clear that at least two pairings are necessary for a unique estimate of rigid transformation. The three-dimensional parameter space is quantized as many levels as necessary according to the required precision. The rotation angle ranges from 0 to . We can fix the quantization interval for the rotation angle, say, at , and we have units. The translation is not bounded, but it is in practice. We can assume for example that the translation between two images cannot exceed 200 pixels in each direction ( ). If we choose an interval of 20 pixels, then we have units in each direction. The quantized space can then considered as a three-dimensional accumulator, of cells, that is initialized to zero.
Since one pairing of points does not entirely constrain the motion, it is difficult to update the accumulator because the constraint on motion is not simple. Instead, we can match n-tuples of points in the first set and in the second set, where n is the smallest value such that matching n points in the first set with n points in the second set completely determines the motion (in our case, n=2). Let be one of such matches, where and are both vectors of dimension 2n, each composed of n points in the first and second set, respectively. Then the number of matches to be considered is of order of (of course, we do not need to consider all matches in our particular problem, because the distance invariance between a pair of points under rigid transformation can be used to discard the infeasible matches). For each such match, we compute the motion parameters, and the corresponding accumulator cell is increased by 1. After all matches have been considered, peaks in the accumulator indicate the best candidates for the motion parameters.
In general, if the number of data is not larger enough than the number of unknowns, then the maximum peak is not much higher than other peaks, and it may be not the correct one because of data noise and because of parameter space quantization. The Hough transform is thus highly suitable for problems having enough data to support the expected solution.
Because of noise in the measurements, the right peak in the accumulator may be very blurred so that it is not easily detected. The accuracy in the localization with the above simple implementation may be poor. There are several ways to improve it.
The Hough transform technique actually follows the principle of maximum likelihood estimation. Let be the parameter vector ( in the above example). Let be one datum ( in the above example). Under the assumption that the data represent the complete sample of the probability density function of , , we have
by using the law of total probability. The maximum of is considered as the estimation of . The Hough transform described above can thus be considered as one approximation.
Because of its nature of global search, the Hough transform technique is rather robust, even when there is a high percentage of gross errors in the data. For better accuracy in the localization of solution, we can increase the number of samples in each dimension of the quantized parameter space. The size of the accumulator increases rapidly with the required accuracy and the number of unknowns. This technique is rarely applied to solve problems having more than three unknowns, and is not suitable for conic fitting.