Whatever the technique applied, the approach of letting statistics decide which variables should be included in a model is popular among scientists. However **there hardly exists any statistical theory which justifies the use of those techniques**

### The five myths

**The number of variables in a model should be reduced until there are 10 events per variables – No!**

**Only variables with proven univariate model significance should be included – No!**-> Although univariable prefiltering is traceable and easy to do with standard software, one should better completely forget about it as it it neither a prerequisite nor providing any benefits when building multivariable models**Insignificant effects should be eliminated from a model****– No!**-> not necessarily as they might change the other variables weight and interactions**The reported P-values quantifies the type I error of a variable being falsely selected – No!**

**Variable selection simplifies analysis****– No!**-> Don’t just let the data speak. Expert background knowledge formalized for example by directed acyclic graphs is usually a much better guide and much more robust

### To remember

- Variable selection should always be accompanied by sensitivity analyses
- For prognostic models a good start is to use backward elimination with a selection criterion of 0.157 without a preceding univariable prefiltering
- For etiologic models Augmented Backward Elimination preceded by a careful preselection based on assumptions of causal roles of variables is a reasonable approach