High leverage points are observations that hae outlying values in covariate space
Popular (recent) method from Imon(2006) is to use the distance from the mean (DM) diagnostic to identify these points
It may however suffer from masking and swamping effects, due to low leverage points
In a logistic regression most of the extreme points in the covariaite pattern may have the smallest leverage values
Thus detecting high leverage points in logistic regression based on the leverage values method in linear regression is unsuccessful
Cut off point for any $latex b_j $ DM: $latex b_j \geq Median(b_j) + c.MAD(b_j) $ with $latex MAD(b_j) = Median{|b_j-Median(b_j)|}/0.6745 $ and c a constant to choose as 2 or 3
Proposing the Robust Logistic Diagnostic (RLGD) mixing the DM technique and the Diagnostic Robust Generalized Potentials from Habshah (2009)
First stage identifies high leverage points using robust estimator either using Minimum Covariance Determinants (MDC) or Minimum Volume Ellipsoid (MVE) (Rousseeuw 1984) then use the diagnostic approach to confirm
step 1 : For each ith point compute the RMD using either MCD or MVE estimators
step 2 : Ith points with $latex RMD_i>Median(RMD_i) + c.MAD(RMD_i)$ are suspected as high leverage points and two sets are created, one with without the high leverage values and one with only the leverage values
step 3 : Compute $latex b_i$ for set with high leverage points
step 4 : Delete $latex b_i$ greater than the cut off value
Performane of the DM method and the RLGD method were compared using the Detection Capacity and the False Alarm rate.
The RLGD method has a better detection probability and a false alarm rate up to 20%, better than the DM method.