Per cent agreement* | All journals N=127 | Medical journals n=68 | Psychiatric journals n=59 | P value† | |
One hypothesis clearly stated | 58% | 93 (73%; 65% to 80%) | 61 (90%; 80% to 95%) | 32 (54%; 42% to 66%) | <0.001* |
Statistical measures included | |||||
Any statistical measures | 94% | 123 (97%; 92% to 99%) | 68 (100%; 95% to 100%) | 55 (93%; 84% to 97%) | 0.04 |
Sample size | 88% | 123 (97%; 92% to 99%) | 68 (100%; 95% to 100%) | 55 (93%; 84% to 97%) | 0.04 |
Statistical significance | 93% | 124 (98%; 93% to 99%) | 66 (97%; 90% to 99%) | 58 (98%; 91% to 100%) | 1.0 |
P values | 92% | 101 (80%; 72% to 86%) | 61 (90%; 80% to 95%) | 40 (68%; 55% to 78%) | 0.004 |
CIs | 92% | 97 (76%; 68% to 83%) | 63 (93%; 84% to 97%) | 34 (58%; 45% to 69%) | <0.001* |
Effect sizes | 76% | 117 (92%; 86% to 96%) | 67 (99%; 92% to 100%) | 50 (85%; 73% to 92%) | 0.006 |
Multiple testing correction | 94% | 5 (4%; 2% to 9%) | 2 (3%; 1% to 10%) | 3 (5%; 2% to 14%) | 0.66 |
Other statistical measure(s) | 82% | 22 (17%; 12% to 25%) | 11 (16%; 9% to 27%) | 11 (19%; 11% to 30%) | 0.82 |
*In cases of disagreement between reviewers, the item was considered included if either reviewer reported it was included. If one reviewer did not enter a response, the presence of the item was based on the other reviewer’s response. For calculation of agreement, a missed response by one of two reviewers was considered an instance of disagreement. No more than two ratings were missing per item unless indicated.
†Testwise p value from Fisher’s exact test. *P<0.05 after Hommel multiple testing correction, corrected p<0.004.