Elsevier

NeuroImage

Volume 61, Issue 4, 16 July 2012, Pages 1300-1310
NeuroImage

Comments and Controversies
Ten ironic rules for non-statistical reviewers

https://doi.org/10.1016/j.neuroimage.2012.04.018Get rights and content

Abstract

As an expert reviewer, it is sometimes necessary to ensure a paper is rejected. This can sometimes be achieved by highlighting improper statistical practice. This technical note provides guidance on how to critique the statistical analysis of neuroimaging studies to maximise the chance that the paper will be declined. We will review a series of critiques that can be applied universally to any neuroimaging paper and consider responses to potential rebuttals that reviewers might encounter from authors or editors.

Introduction

This technical note is written for reviewers who may not have sufficient statistical expertise to provide an informed critique during the peer-reviewed process, but would like to recommend rejection on the basis of inappropriate or invalid statistical analysis. This guidance follows the 10 simple rules format and hopes to provide useful tips and criticisms for reviewers who find themselves in this difficult position. These rules are presented for reviewers in an ironic way1 that makes it easier (and hopefully more entertaining) to discuss the issues from the point of view of both the reviewer and author — and to caricature both sides of the arguments. Some key issues are presented more formally in (non-ironic) appendices.

There is a perceived need to reject peer-reviewed papers with the advent of open access publishing and the large number of journals available to authors. Clearly, there may be idiosyncratic reasons to block a paper – to ensure your precedence in the literature, personal rivalry etc. – however, we will assume that there is an imperative to reject papers for the good of the community: handling editors are often happy to receive recommendations to decline a paper. This is because they are placed under pressure to maintain a high rejection rate. This pressure is usually exerted by the editorial board (and publishers) and enforced by circulating quantitative information about their rejection rates (i.e., naming and shaming lenient editors). All journals want to maximise rejection rates, because this increases the quality of submissions, increases their impact factor and underwrites their long-term viability. A reasonably mature journal like Neuroimage would hope to see between 70% and 90% of submissions rejected. Prestige journals usually like to reject over 90% of the papers they receive. As an expert reviewer, it is your role to help editors decline papers whenever possible. In what follows, we will provide 10 simple rules to make this job easier:

  • Rule number one: dismiss self doubt

Occasionally, when asked to provide an expert opinion on the design or analysis of a neuroimaging study you might feel under qualified. For example, you may not have been trained in probability theory or statistics or – if you have – you may not be familiar with topological inference and related topics such as random field theory. It is important to dismiss any ambivalence about your competence to provide a definitive critique. You have been asked to provide comments as an expert reviewer and, operationally, this is now your role. By definition, what you say is the opinion of the expert reviewer and cannot be challenged — in relation to the paper under consideration, you are the ultimate authority. You should therefore write with authority, in a firm and friendly fashion.

  • Rule number two: avoid dispassionate statements

A common mistake when providing expert comments is to provide definitive observations that can be falsified. Try to avoid phrases like “I believe” or “it can be shown that”. These statements invite a rebuttal that could reveal your beliefs or statements to be false. It is much safer, and preferable, to use phrases like “I feel” and “I do not trust”. No one can question the veracity of your feelings and convictions. Another useful device is to make your points vicariously; for example, instead of saying “Procedure A is statistically invalid” it is much better to say that “It is commonly accepted that procedure A is statistically invalid”. Although authors may be able to show that procedure A is valid, they will find it more difficult to prove that it is commonly accepted as valid. In short, trying to pre-empt a prolonged exchange with authors by centring the issues on convictions held by yourself or others and try to avoid stating facts.

  • Rule number three: submit your comments as late as possible

It is advisable to delay submitting your reviewer comments for as long as possible — preferably after the second reminder from the editorial office. This has three advantages. First, it delays the editorial process and creates an air of frustration, which you might be able to exploit later. Second, it creates the impression that you are extremely busy (providing expert reviews for other papers) and indicates that you have given this paper due consideration, after thinking about it carefully for several months. A related policy, that enhances your reputation with editors, is to submit large numbers of papers to their journal but politely decline invitations to review other people's papers. This shows that you are focused on your science and are committed to producing high quality scientific reports, without the distraction of peer-review or other inappropriate demands on your time.

  • Rule number four: the under-sampled study

If you are lucky, the authors will have based their inference on less than 16 subjects. All that is now required is a statement along the following lines:

Reviewer: Unfortunately, this paper cannot be accepted due to the small number of subjects. The significant results reported by the authors are unsafe because the small sample size renders their design insufficiently powered. It may be appropriate to reconsider this work if the authors recruit more subjects.

Notice your clever use of the word “unsafe”, which means you are not actually saying the results are invalid. This sort of critique is usually sufficient to discourage an editor from accepting the paper; however – in the unhappy event the authors are allowed to respond – be prepared for something like:

Response: We would like to thank the reviewer for his or her comments on sample size; however, his or her concerns are statistically misplaced. This is because a significant result (properly controlled for false positives), based on a small sample indicates the treatment effect is actually larger than the equivalent result with a large sample. In short, not only is our result statistically valid. It is quantitatively stronger than the same result with a larger number of subjects.

Unfortunately, the authors are correct (see Appendix 1). On the bright side, the authors did not resort to the usual anecdotes that beguile handling editors. Responses that one is in danger of eliciting include things like:

Response: We suspect the reviewer is one of those scientists who would reject our report of a talking dog because our sample size equals one!

Or, a slightly more considered rebuttal:

Response: Clearly, the reviewer has never heard of the fallacy of classical inference. Large sample sizes are not a substitute for good hypothesis testing. Indeed, the probability of rejecting the null hypothesis under trivial treatment effects increases with sample size.”

Thankfully, you have heard of the fallacy of classical inference (see Appendix 1) and will call upon it when needed (see next rule). When faced with the above response, it is often worthwhile trying a slightly different angle of attack; for example2

Reviewer: I think the authors misunderstood my point here: The point that a significant result with a small sample size is more compelling than one with a large sample size ignores the increased influence of outliers and lack-of-robustness for small samples.”

Unfortunately, this is not actually the case and the authors may respond with:

Response: The reviewer's concern now pertains to the robustness of parametric tests with small sample sizes. Happily, we can dismiss this concern because outliers decrease the type I error of parametric tests (Zimmerman, 1994). This means our significant result is even less likely to be a false positive in the presence of outliers. The intuitive reason for this is that an outlier increases sample error variance more than the sample mean; thereby reducing the t or F statistic (on average).”

At this point, it is probably best to proceed to rule six.

  • Rule number five: the over-sampled study

If the number of subjects reported exceeds 32, you can now try a less common, but potentially potent argument of the following sort:

Reviewer: I would like to commend the authors for studying such a large number of subjects; however, I suspect they have not heard of the fallacy of classical inference. Put simply, when a study is overpowered (with too many subjects), even the smallest treatment effect will appear significant. In this case, although I am sure the population effects reported by the authors are significant; they are probably trivial in quantitative terms. It would have been much more compelling had the authors been able to show a significant effect without resorting to large sample sizes. However, this was not the case and I cannot recommend publication.”

You could even drive your point home with:

Reviewer: In fact, the neurological model would only consider a finding useful if it could be reproduced three times in three patients. If I have to analyse 100 patients before finding a discernible effect, one has to ask whether this effect has any diagnostic or predictive value.”

Most authors (and editors) will not have heard of this criticism but, after a bit of background reading, will probably try to talk their way out of it by referring to effect sizes (see Appendix 2). Happily, there are no rules that establish whether an effect size is trivial or nontrivial. This means that if you pursue this line of argument diligently, it should lead to a positive outcome.

  • Rule number six: untenable assumptions (nonparametric analysis)

If the number of subjects falls between 16 and 32, it is probably best to focus on the fallibility of classical inference — namely its assumptions. Happily, in neuroimaging, it is quite easy to sound convincing when critiquing along these lines: for example,

Reviewer: I am very uncomfortable about the numerous and untenable assumptions that lie behind the parametric tests used by the authors. It is well-known that MRI data has a non Gaussian (Rician) distribution, which violates the parametric assumptions of their statistical tests. It is imperative that the authors repeat their analysis using nonparametric tests.”

The nice thing about this request is that it will take some time to perform nonparametric tests. Furthermore, the nonparametric tests will, by the Neyman–Pearson lemma,3 be less sensitive than the original likelihood ratio tests reported by the authors — and their significant results may disappear. However, be prepared for the following rebuttal:

Response: We would like to thank the reviewer for his or her helpful suggestions about nonparametric testing; however, we would like to point out that it is not the distribution of the data that is assumed to be Gaussian in parametric tests, but the distribution of the random errors. These are guaranteed to be Gaussian for our data, by the Central limit theorem,4 because of the smoothing applied to the data and because our summary statistics at the between subject level are linear mixtures of data at the within subject level.”

The authors are correct here and this sort of response should be taken as a cue to pursue a different line of critique:

  • Rule number seven: question the validity (cross validation)

At this stage, it is probably best to question the fundaments of the statistical analysis and try to move the authors out of their comfort zone. A useful way to do this is to keep using words like validity and validation: for example,

Reviewer: I am very uncomfortable about the statistical inferences made in this report. The correlative nature of the findings makes it difficult to accept the mechanistic interpretations offered by the authors. Furthermore, the validity of the inference seems to rest upon many strong assumptions. It is imperative that the authors revisit their inference using cross validation and perhaps some form of multivariate pattern analysis.”

Hopefully, this will result in the paper being declined or – at least – being delayed for a few months. However, the authors could respond with something like:

Response: We would like to thank the reviewer for his or her helpful comments concerning cross validation. However, the inference made using cross validation accuracy pertains to exactly the same thing as our classical inference; namely, the statistical dependence (mutual information) between our explanatory variables and neuroimaging data. In fact, it is easy to prove (with the Neyman–Pearson lemma) that classical inference is more efficient than cross validation.”

This is frustrating, largely because the authors are correct5 and it is probably best to proceed to rule number eight.

  • Rule number eight: exploit superstitious thinking

As a general point, it is useful to instil a sense of defensiveness in editorial exchanges by citing papers that have been critical of neuroimaging data analysis. A useful entree here is when authors have reported effect sizes to supplement their inferential statistics (p values). Effect sizes can include parameter estimates, regression slopes, correlation coefficients or proportion of variance explained (see Table 1 and Appendix 2). Happily, most authors will have reported some form of effect size, exposing themselves to the following critique:

Reviewer: It appears that the authors are unaware of the dangers of voodoo correlations and double dipping. For example, they report effect sizes based upon data (regions of interest) previously identified as significant in their whole brain analysis. This is not valid and represents a pernicious form of double dipping (biased sampling or the non-independence problem). I would urge the authors to read Vul et al. (2009) and Kriegeskorte et al. (2009) and present unbiased estimates of their effect size using independent data or some form of cross validation.

Do not be deterred by the fact that reporting effect sizes is generally considered to be good practice — the objective here is to create an atmosphere in which punitive forces could expose the darkest secrets of any author, even if they did not realise they had them. The only negative outcome will be a response along the following lines:

Response: We thank the reviewer for highlighting the dangers of biased sampling but this concern does not apply to our report: by definition, the effect size pertains to the data used to make an inference — and can be regarded as an in-sample prediction of the treatment effect. We appreciate that effect sizes can overestimate the true effect size; especially when the treatment effect is small or statistical thresholds are high. However, the (in-sample) effect size should not be confused with an out-of-sample prediction (an unbiased estimate of the true effect size). We were not providing an out-of-sample prediction but simply following APA guidelines by supplementing our inference (“Always present effect sizes for primary outcomes.” Wilkinson and APA Task Force on Statistical Inference, 1999, p. 599).”

In this case, the authors have invoked the American Psychological Association (APA) guidelines (Wilkinson and APA Task Force on Statistical Inference, 1999) on good practice for statistical reporting in journals. It is difficult to argue convincingly against these guidelines (which most editors are comfortable with). However, do not be too disappointed because the APA guidelines enable you to create a Catch-22 for authors who have not reported effect sizes:

Reviewer: The authors overwhelm the reader with pretty statistical maps and magnificent p-values but at no point do they quantify the underlying effects about which they are making an inference. For example, their significant interaction would have profoundly different implications depending upon whether or not it was a crossover interaction. In short, it is essential that the authors supplement their inference with appropriate effect sizes (e.g., parameter estimates) in line with accepted practice in statistical reporting (“Always present effect sizes for primary outcomes.” Wilkinson and APA Task Force on Statistical Inference, 1999, p. 599).

When they comply, you can apply rule eight — in the fond hope they (and the editors) do not appreciate the difference between in-sample effect sizes and out-of-sample predictions.

  • Rule number nine: highlight missing procedures

Before turning to the last resort (rule number ten). It is worthwhile considering any deviation from usual practice. We are particularly blessed in neuroimaging by specialist procedures that can be called upon to highlight omissions. A useful critique here is:

Reviewer: The author's failure to perform retinotopic mapping renders the interpretation of their results unsafe and, in my opinion, untenable. Please conform to standard practice in future.

Note how you have cleverly intimated a failure to conform to standard practice (which most editors will assume is good practice). In most cases, this sort of critique should ensure a rejection; however, occasionally, you may receive a rebuttal along the following lines:

Response: We would like to thank the reviewer for his or her helpful comments: however, we like to point out that our study used olfactory stimuli, which renders the retinotopic mapping somewhat irrelevant.”

Although you could debate this point, it is probably best to proceed to rule number ten.

  • Rule number ten: the last resort

If all else fails, then the following critique should secure a rejection:

Reviewer: Although the authors provide a compelling case for their interpretation; and the analyses appear valid if somewhat impenetrable, I cannot recommend publication. I think this study is interesting but colloquial and would be better appreciated (and assessed) in a more specialised journal.”

Notice how gracious you have been. Mildly laudatory comments of this sort suggest that you have no personal agenda and are deeply appreciative of the author's efforts. Furthermore, it creates the impression that your expertise enables you not only to assess their analyses, but also how they will be received by other readers. This impression of benevolence and omnipotence makes your final value judgement all the more compelling and should secure the desired editorial decision.

Section snippets

Conclusion

We have reviewed some general and pragmatic approaches to critiquing the scientific work of others. The emphasis here has been on how to ensure a paper is rejected and enable editors to maintain an appropriately high standard, in terms of papers that are accepted for publication. Remember, as a reviewer, you are the only instrument of selective pressure that ensures scientific reports are as good as they can be. This is particularly true of prestige publications like Science and Nature, where

Acknowledgments

I would like to thank the Wellcome Trust for funding this work and Tom Nichols for comments on the technical aspects of this work. I would also like to thank the reviewers and editors of NeuroImage for their thoughtful guidance and for entertaining the somewhat risky editorial decision to publish this (ironic) article.

References (23)

  • N. Kriegeskorte et al.

    Circular analysis in systems neuroscience: the dangers of double dipping

    Nat. Neurosci.

    (2009)
  • Cited by (303)

    View all citing articles on Scopus
    View full text