Elsevier

NeuroImage

Volume 22, Issue 2, June 2004, Pages 879-885
NeuroImage

An empirical investigation into the number of subjects required for an event-related fMRI study

https://doi.org/10.1016/j.neuroimage.2004.02.005Get rights and content

Abstract

Optimising the number of subjects required for an event-related functional imaging study is critical for ensuring sufficient statistical power. We report an empirical investigation of this issue by employing a resampling approach to the data of 58 subjects drawn from four previous GO/NOGO studies. Using voxelwise measures and setting the activation map from the complete sample to be a “gold standard”, analyses revealed the statistical power to be surprisingly low at typical sample sizes (n = 20). However, voxels that were significantly active from smaller samples tended to be true positives, that is, they were typically active in the gold standard map and correlated well with the gold standard activation measure. The numerous false negatives that resulted from the lower SNR of the smaller samples drove the poor statistical power of those samples. Splitting the sample into two groups provided a test of the reproducibility of activation maps that was assessed using an alternative measure that quantified the distances between centres-of-mass of activated areas. These analyses revealed that although the voxelwise overlap may be poor, the locations of activated areas provide some optimism for studies with typical sample sizes. With n = 20 in each of two groups, it was found that the centres-of-mass for 80% of activated areas fell within 25 mm of each other. The reported analyses, by quantifying the spatial reproducibility for various sample sizes performing a typical event-related cognitive task, thus provide an empirical measure of the disparity to be expected in comparing activation maps.

Introduction

The number of subjects scanned in an fMRI study is very often dictated by practical constraints such as access to scanning time and costs. Under these conditions, an investigator must make a trade-off between the number of subjects to scan and the length of the experiment. Even though these decisions are made frequently, little is known about how many trials, scans or subjects are needed to yield reliable results.

Previous research addressing these issues has shown that the spatial extent of BOLD signal activation maps increases as the number of single trials averaged increases (Huettel and McCarthy, 2001). These authors have demonstrated that at an average of 50 trials (a typical number of trials in an fMRI study), even though the haemodynamic shape was stable, only 50% of the eventually activated voxels were deemed significant. The volume of the activation maps only reached asymptotic values after 150 trials were averaged. Similarly, for block design studies, it has been shown that when averaging across progressively increasing numbers of scans (where a scan, in this case, is defined as a time series of 100 volumes obtained during one 200 s stimulus presentation period: 20 s ON, 20 s OFF, etc.), the spatial extent of the activated voxels increased monotonically and failed to asymptote with as many as 22 scans (Saad et al., 2003).

Practically, it could be very difficult to obtain the required number of trials and scans as dictated by the above studies for each subject. This could also be highly dependent on the type of study involved. For example, a GO/NOGO study needs to develop a prepotency to respond, and thus the trials of interest (NOGOs), by design, must be infrequent. Under these circumstances, the number of trials will be dictated by the length of time the subject can comfortable remain in the scanner while maintaining their ability to perform the task. In this case, to increase the power and thus the reliability of the study, one viable option is to increase the number of subjects scanned. This, in turn, leads one to ask how many subjects are necessary to obtain a reliable group activation map.

To our knowledge, very few published studies have addressed this question. The first such paper (Friston et al., 1999) showed that conjunction analysis with a fixed-effect model is sufficient to make inferences about characteristics that are typical of populations. Using this method can reduce the number of subjects needed to infer differences between populations that are normally required using a standard random-effects model. Although this method is very useful, it does not give a clear indication of how many subjects are necessary to perform an event-related fMRI study. Desmond and Glover (2002) estimated mean differences and variability between two block conditions with fMRI data. These values were used to generate power curves and an estimation of the number of subjects needed to yield reliable results. For a threshold of P = 0.05, 12 subjects were required to achieve 80% power. At more realistic fMRI thresholds (i.e., after correcting for multiple comparisons), approximately twice as many subjects were required to yield similar power. However, this study addressed statistical power in block design experiments and may not extend to event-related designs.

This paper reports an empirical approach to the question of sample size and statistical reliability. Fifty-eight subjects performing similar event-related GO/NOGO tasks were tested. By varying the number of subjects included in the group activation maps, we were able to derive empirically the stability of these maps for different sample sizes.

Section snippets

Subjects and task design

Fifty-eight right-handed subjects (35 female, mean age: 30, range: 18–46) completed a GO/NOGO task after providing written informed consent. The GO/NOGO task required frequent responses and occasional response inhibitions. Subjects were presented with a serial stream of letters. A response was required for every occurrence of the alternating target letters, X and Y, unless the alternation order was broken. Minor variations in the task were presented to four different groups. Fourteen subjects

“Gold standard” analyses

The results of the power analyses are shown in Fig. 1. It was expected that we would find a “shoulder” in the graph after a certain number of subjects, which would then asymptote to a straight line up to 58 subjects. As can clearly be seen, this did not happen. The best-case scenario was at P = 0.01 where the power only reaches 0.5 after 32 subjects. As the P value became stricter (P = 0.000001), this deteriorated to 0.5 at 50 subjects. It is obvious that these activation maps are severely

Conclusions

When planning an event-related fMRI study, it is important to know how many subjects are required to yield reliable results. This paper attempted to answer that question empirically. Although these results might be applicable to the majority of fMRI researchers investigating cognitive processes (such as inhibition), it is important to note that these results may not translate to studies with a higher signal-to-noise ratio or that suffer smaller intersubject neuroanatomical variability. The

Acknowledgements

Supported in part by USPHS grants DA14100, GCRC M01 RR00058 and by the Irish Research Council for Humanities and Social Sciences.

Cited by (125)

  • Considerations of power and sample size in rehabilitation research

    2020, International Journal of Psychophysiology
View all citing articles on Scopus
View full text