Statistical Reporting in clinical trials: overuse of arbitrary significance of the p-value
Giorgio Reggiardo, Data Manager and Biostatistician, Medi Service

In the scientific literature is famous and always recent the article published in 1987 on NEJM where SJ Pocock, MD Hughes, and RJLee illustrated the limits of the possible inappropriate use of thepvalueduring the writing of a clinical trial report. After the revisionof45 clinical reports the authorsconcluded that the overuse ofarbitrary significance levels (p-value less than 0.05) is detrimental togood scientific reporting, and more emphasis should be given to themagnitude of treatmentdifferences and to estimation methods such as confidence intervals. From 1987 to 2010 this article has been cited74 times by other authors. Five decades ago, Anscombe (1956)observed thatstatistical hypothesis tests were totally irrelevant, andthatwhat was needed were estimates ofmagnitudes of effects, withstandard errors.Ordinary confidence intervals provide moreinformation than the p-values. Knowing that a 95 confidenceinterval includes zero suggest us that, if a test of thehypothesis where the parameterequals zero is conducted, theresulting p-value will be greater than 0.05. It is evident that aconfidence interval provides both an estimate of the effect size and ameasure of its uncertainty.Despite a lot of publications on this specificproblem in some recent clinical reports it is possible toobservestatistic conclusions where it is still present the fallaciousaffirmation "the smaller p-valuethe more significant".Moreover some clinical report reported statisticalsignificance using asterisks(one, two, or three starsignificance). This practice has been heavily criticized becauseprovides lessinformation than exact p-value and has thepotential to mislead researchers into thinking that aneffect with twostars is more important than an effect with only one star.Thispresentation describes how the arbitrary use of the p-valuecan be often in contrast with theclinical hypothesis testing, whichexamines a credible null hypothesis about a clinical response.