Robust logistic diagnostic for the identification of high leverage points in logistic regression model

thumbnail of 3042-3050 Original by B.A Syaiba, M. Habshah, University Putra Malaysia, 2010, 9 pages 

This summary note was Posted on

  • High leverage points are observations that hae outlying values in covariate space
  • Popular (recent) method from Imon(2006) is to use the distance from the mean (DM) diagnostic to identify these points
  • It may however suffer from masking and swamping effects, due to low leverage points
  • In a logistic regression most of the extreme points in the covariaite pattern may have the smallest leverage values
  • Thus detecting high leverage points in logistic regression based on the leverage values method in linear regression is unsuccessful
  • Cut off point for any $latex b_j $ DM: $latex b_j \geq Median(b_j) + c.MAD(b_j) $ with $latex MAD(b_j) = Median{|b_j-Median(b_j)|}/0.6745 $ and c a constant to choose as 2 or 3
  • Proposing the Robust Logistic Diagnostic (RLGD) mixing the DM technique and the Diagnostic Robust Generalized Potentials from Habshah (2009)
  • First stage identifies high leverage points using robust estimator either using Minimum Covariance Determinants (MDC) or Minimum Volume Ellipsoid (MVE) (Rousseeuw 1984) then use the diagnostic approach to confirm
    • step 1 : For each ith point compute the RMD using either MCD or MVE estimators
    • step 2 : Ith points with $latex RMD_i>Median(RMD_i) + c.MAD(RMD_i)$ are suspected as high leverage points and two sets are created, one with without the high leverage values and one with only the leverage values
    • step 3 : Compute $latex b_i$ for set with high leverage points
    • step 4 : Delete $latex b_i$ greater than the cut off value
  • Performane of the DM method and the RLGD method were compared using the Detection Capacity and the False Alarm rate.
  • The RLGD method has a better detection probability and a false alarm rate up to 20%, better than the DM method.