Credit scoring with macroeconomic variables using survival analysis

Original by Tony Bellotti, Jonathan Crook, Credit Research Centre Management School and Economics, University of Edinburgh, 2007, 19 pages 

This summary note was Posted on

Survival analysis is used to study time of failure and allows to model not just if a borrower will default but when

Variables

  • Use macroeconomic variables: Bank interest rates, unemployment index, house price
  • Other variables: income, age, housing and employment status
  • Bank interest rates is the most significant variable for credit cards
  • Macro economic time series data can naturally be incorporated into survival model as time varying covariates (TVC)
  • Inclusion of macroeconomic variables gives a statistical significant explanatory model
  • A rise in consumer confidence is expected to increase risk as they will be more likely to consume and borrow making repayment more difficult

Model

  • Use Cox proportional hazard (PH) survival model to model the time of default of each case
  • Give more weight to bad cases as data sample contains a large proportion of  good cases with respect to bad ones
  • Censored data is data for recent applications or data that has not defaulted yet
  • Survival data is analysed through the hazard function which is the probability that an account has not defaulted by some time t after the account has been opened
  • Application data is fixed with respect to time while macroeconomic variables change over time. The value of the covariate is given as the value of the macroeconomic variable at the time of failure
  • Due to the large size of the training set, processing time was long. The model selection hence did not use forward or backwards selection

Performance measure

  • Importance of variable measured by the standardised marginal effect: absolute value of marginal effect times the standard deviation of the variable. Gives approximate  relative importance of variables
  • Use cost function (bit like H measure) to determine the value of prediction (rather than looking at ROC curve) with wrongly predicted as good going to bad costs 20 against 0 for correctly identified and good case identified as bad costs 1
  • ROC curve can give misleading conclusion (Hand 2005)
  • Cut-off threshold computed for each model to minimise total cost of errors on the training set. Analysis repeated with cut-offs caluclted on test sample for comparison
  • Mean cost per observation is computed on th test set for each model as the sum of costs of errors for all cases int he test set. Low mean  = good performance
  • Significance level  using Wal statistic (p-values below 0.05 or 0.01)

Forecast

  • In practice this model can be used for credit scoring by the incorporation of forecast of macroeconomic conditions onto the assessment of credit card application over a period of 12 months
  • Because a forecast is used one can use the model for stress testing replacing forecast with stressed forecast