Very similar to linear or logistic regression models except that the dependent variable is a measure of the timing or rate of event occurrence
Most method of survival analysis require that the event time be measured with respect to some origin time
Ideally the origin time is the same as the time at which observations begin and most software program assume that it is the case
Might need to take into account late entry or left truncation
Censoring is endemic to survival data
Any report of survival analysis should discuss the type, cause and treatment of censoring
Most common type of censoring is right censoring when an observation is terminated before an individual experiences an event
Censoring could be informative if it occurs at varying time because individuals drop out of the study
Slightly less common type of censoring is interval censoring when the exact time not not known, only between two point in time
If you know the exact time at which an event occurs, use methods that treat time as continuous
If not use discrete method (like when you only know the month or the year of the event)
For discrete method you must choose between a logit model and a complementary log-log model but in practice the choice is usually not consequential
Logit is more appropriate for truly discrete events
The most popular method for regression analysis of survival data is the Cox regression
Cox regression is semi parametric
However parametric methods are much better at handling left censoring or interval censoring and can generate predicted times to events
One major difference between survival regression and conventional linear regression is the possibility of time dependent covariates
If the data contain information on more than one event for each individual then special methods are needed to take advantage of the additional information
Repeated events provide more statistical power
Likely to be statistical dependence among those observations
There are four methods to provide correction for repeat events 1) Robust standard errors (Huber-white or sandwich estimates 2) Generalised estimating equation (GEE) 3) Random effect (mixed) models 4)Fixed effect methods
Stata will estimate random effects models for Cox regression but SAS wont
If event times are discrete, maximum likelihood estimation requires that models are estimated simultaneously suing the generalized logit model (no equivalent for log-log)
Conventional wisdom has it that there should be at least 5 (some say 10) events for each parameter in the model in order for max likelihood estimates to have reasonably good properties
Imputing values from random draws from the predictive distribution of the missing value. Generate several dataset (5 or more) each with slightly different imputed values. Then combine into a single set of parameters estimates
For survival analysis imputation should only be done on the predictor variables. Cases on dependent variable should just be deleted
Compare not nested models with AIC, SBC or BIC
Preference is given to models with the lowest values of those statistics, although no p-values can be calculated
Magnitudes of beta coefficients (hazard ratios) are difficult to interpret
Hazard ratios (always positive) are confusing because a value of 1 means no effect
The numeric value as a more straight forward value 100(HR-1)/100 is the percentage change in the hazard for one unit increase in the predictor
Hazard ratios are asymmetric no can not use standard errors. Report 95% confidence levels instead
Other stats can be chi-square test for the null hypothesis that all coefficients are zero