One of the oldest robust methods used in image analysis and computer vision is the Hough transform. The idea is to map data into the parameter space, which is appropriately quantized, and then seek for the most likely parameter values to interpret data through clustering. A classical example is the detection of straight lines given a set of edge points. In the following, we take the example of estimating plane rigid motion from two sets of points.
Given p 2D points in the first set, noted , and q 2D
points in the second set, noted , we must find a rigid
transformation between the two sets. The pairing between
and
is assumed not known. A rigid transformation can be
uniquely decomposed into a rotation around the origin and a translation, in
that order. The corresponding parameter space is three-dimensional: one
parameter for the rotation angle
and two for the translation vector
. More precisely, if
is paired to
, then
with
It is clear that at least two pairings are necessary for a unique estimate
of rigid transformation. The three-dimensional parameter space is
quantized as many levels as necessary according to the required
precision. The rotation angle ranges from 0 to
.
We can fix the quantization interval for the rotation angle, say, at
,
and we have
units. The translation is not bounded,
but it is in practice. We can assume for example that the translation
between two images cannot exceed 200 pixels in each direction (
). If we choose an interval of 20 pixels,
then we have
units in each direction. The
quantized space can then considered as a three-dimensional
accumulator, of
cells, that
is initialized to zero.
Since one pairing of points does not entirely constrain the motion, it
is difficult to update the accumulator because the constraint on
motion is not simple. Instead, we can match n-tuples of points in
the first set and in the second set, where n is
the smallest value such that matching n points in the first set with
n points in the second set completely determines the motion (in our
case, n=2). Let be one of such matches,
where
and
are both vectors of dimension 2n,
each composed of n points in the first and second set, respectively.
Then the number of matches to be considered is of
order of
(of course, we do not need to consider
all matches in our particular problem, because the distance
invariance between a pair of points under rigid transformation can be
used to discard the infeasible matches). For each such match, we
compute the motion parameters, and the corresponding accumulator cell
is increased by 1. After all matches have been considered, peaks in
the accumulator indicate the best candidates for the motion
parameters.
In general, if the number of data is not larger enough than the number of unknowns, then the maximum peak is not much higher than other peaks, and it may be not the correct one because of data noise and because of parameter space quantization. The Hough transform is thus highly suitable for problems having enough data to support the expected solution.
Because of noise in the measurements, the right peak in the accumulator may be very blurred so that it is not easily detected. The accuracy in the localization with the above simple implementation may be poor. There are several ways to improve it.
The Hough transform technique actually follows the principle of
maximum likelihood estimation. Let be the parameter
vector (
in the above example). Let
be one datum (
in the above example). Under the assumption that the data
represent the complete sample of the probability
density function of
,
, we have
by using the law of total probability. The maximum of is considered as the estimation of
. The Hough
transform described above can thus be considered as one approximation.
Because of its nature of global search, the Hough transform technique is rather robust, even when there is a high percentage of gross errors in the data. For better accuracy in the localization of solution, we can increase the number of samples in each dimension of the quantized parameter space. The size of the accumulator increases rapidly with the required accuracy and the number of unknowns. This technique is rarely applied to solve problems having more than three unknowns, and is not suitable for conic fitting.