The least-squares method described in the last sections is usually called ordinary least-squares estimator (OLS). Formally, we are given n equations:
where is the additive error in the i-th equation with mean zero: , and variance . Writing down in matrix form yields
where
The OLS estimator tries to estimate by minimizing the following sum of squared errors:
which gives, as we have already seen, the solution as
It can be shown (see, e.g., [2]) that the OLS estimator produces the optimal estimate of , ``optimal'' in terms of minimum covariance of , if the errors are uncorrelated (i.e., ) and their variances are constant (i.e., ).
Now let us examine whether the above assumptions are valid or not for conic fitting. Data points are provided by some signal processing algorithm such as edge detection. It is reasonable to assume that errors are independent from one point to another, because when detecting a point we usually do not use any information from other points. It is also reasonable to assume the errors are constant for all points because we use the same algorithm for edge detection. However, we must note that we are talking about the errors in the points, but not those in the equations (i.e., ). Let the error in a point be Gaussian with mean zero and covariance , where is the 2 2 identity matrix. That is, the error distribution is assumed to be isotropic in both directions ( , ). In sec:Kalman, we will consider the case where each point may have different noise distribution. Refer to eq:A+C=1. We now compute the variance of function from point and its uncertainty. Let be the true position of the point, we have certainly
We now expand into Taylor series at and , i.e.,
Ignoring the high order terms, we can now compute the variance of , i.e.,
where is just the gradient of with respect to and , and
It is now clear that the variance of each equation is not the same, and thus the OLS estimator does not yield an optimal solution.
In order to obtain a constant variance function, it is sufficient to divide the original function by its gradient, i.e.,
then has the constant variance . We can now try to find the parameters by minimizing the following function:
where . This method is thus called Gradient Weighted Least-Squares, and the solution can be easily obtained by setting , which yields
Note that the gradient-weighted LS is in general a nonlinear minimization problem and a closed-form solution does not exist. In the above, we gave a closed-form solution because we have ignored the dependence of on in computing . In reality, does depend on , as can be seen in eq:derive. Therefore, the above solution is only an approximation. In practice, we run the following iterative procedure:
In the above, the superscript (k) denotes the iteration number.