
Not merely making the commonplace observation that any particular threshold is arbitrary

For example, only a small change is required to move an estimate from a 5.1% significance level to 4.9%

Statistical significance is not the same as practical importance, dichotomization into significant and non significant results encourages the dismissal of observed differences in favor of the usually less interesting null hypothesis of no difference, and any particular threshold for declaring significance is arbitrary
 Bring attention to this additional error of interpretation

Students and practitioners be made more aware that the difference between “significant” and “not significant” is not itself statistically significant

Changes in statistical significance are not themselves significant

Introductory courses regularly warn students about the perils of strict adherence to a particular threshold such as the 5% significance level
 Automatic use of a binary significant/non significant decision rule encourages practitioners to ignore potentially important observed differences

Focus only on the less widely known but equally important error of comparing two or more results by comparing their degree of statistical significance

We might think that “everybody knows” that comparing significance levels is inappropriate, but we have seen this mistake all the time in practice

In making a comparison between two treatments, one should look at the statistical significance of the difference rather than the differencebetween their significance levels

comparisons of the sort, “ X is statistically significant but Y is not,” can be misleading
Two examples of errors
Homosexuality and the number of older brothers and sisters
 In these data, homosexuality is more strongly associated with number of older brothers than with number of older sisters. However, no evidence is presented that would indicate that this difference is statistically significant
Health effects of low frequency electromagnetic fields
 The researchers in the chickbrain experiment made the common mistake of using statistical significance as a criterion for separating the estimates of different effects, an approach that does not make sense. At the very least, it is more informative to show the estimated treatment effect and standard error at each frequency