Another old robust method is the so-called regression diagnostics. It tries to iteratively detect possibly wrong data and reject them through analysis of globally fitted model. The classical approach works as follows:
Clearly, the success of this method depends tightly upon the quality of the initial fit. If the initial fit is very poor, then the computed residuals based on it are meaningless; so is the diagnostics of them for outlier rejection. As pointed out by Barnett and Lewis, with least-squares techniques, even one or two outliers in a large set can wreak havoc! This technique thus does not guarantee for a correct solution. However, experiences have shown that this technique works well for problems with a moderate percentage of outliers and more importantly outliers only having gross errors less than the size of good data.
The threshold on residuals can be chosen by experiences using for example
graphical methods (plotting residuals in different scales). Better is to use a
priori statistical noise model of data and a chosen confidence level. Let
be the residual of the idata, and
be the predicted
variance of the iresidual based on the characteristics of the data nose
and the fit, the standard test statistics
can be used. If
is not acceptable, the corresponding datum should be rejected.
One improvement to the above technique uses influence measures to pinpoint potential outliers. These measures asses the extent to which a particular datum influences the fit by determining the change in the solution when that datum is omitted. The refined technique works as follows:
As can be remarked, the regression diagnostics approach depends heavily on a priori knowledge in choosing the thresholds for outlier rejection.