One popular robust technique is the so-called M-estimators. Let be the residual of the datum, the difference between the observation and its fitted value. The standard least-squares method tries to minimize , which is unstable if there are outliers present in the data. Outlying data give an effect so strong in the minimization that the parameters thus estimated are distorted. The M-estimators try to reduce the effect of outliers by replacing the squared residuals by another function of the residuals, yielding
where is a symmetric, positive-definite function with a unique minimum at zero, and is chosen to be less increasing than square. Instead of solving directly this problem, we can implement it as an iterated reweighted least-squares one. Now let us see how.
Let be the parameter vector to be estimated. The M-estimator of based on the function is the vector which is the solution of the following m equations:
where the derivative is called the influence function. If now we define a weight function
then Equation (29) becomes
This is exactly the system of equations that we obtain if we solve the following iterated reweighted least-squares problem
where the superscript indicates the iteration number. The weight should be recomputed after each iteration in order to be used in the next iteration.
The influence function measures the influence of a datum on the value of the parameter estimate. For example, for the least-squares with , the influence function is , that is, the influence of a datum on the estimate increases linearly with the size of its error, which confirms the non-robusteness of the least-squares estimate. When an estimator is robust, it may be inferred that the influence of any single observation (datum) is insufficient to yield any significant offset [18]. There are several constraints that a robust M-estimator should meet:
Table 1: A few commonly used M-estimators
Figure 4: Graphic representations of a few common M-estimators
Briefly we give a few indications of these functions:
The modification proposed in [18] is the following
The 95% asymptotic efficiency on the standard normal distribution is obtained with the tuning constant c=1.2107.
There still exist many other -functions, such as Andrew's cosine wave function. Another commonly used function is the following tri-weight one:
where is some estimated standard deviation of errors.
It seems difficult to select a -function for general use without being rather arbitrary. Following Rey [18], for the location (or regression) problems, the best choice is the in spite of its theoretical non-robustness: they are quasi-robust. However, it suffers from its computational difficulties. The second best function is ``Fair'', which can yield nicely converging computational procedures. Eventually comes the Huber's function (either original or modified form). All these functions do not eliminate completely the influence of large gross errors.
The four last functions do not guarantee unicity, but reduce considerably, or even eliminate completely, the influence of large gross errors. As proposed by Huber [7], one can start the iteration process with a convex -function, iterate until convergence, and then apply a few iterations with one of those non-convex functions to eliminate the effect of large errors.