Article Text

One-step extrapolation of the prediction performance of a gene signature derived from a small study
  1. Ling-Yi Wang1,2,
  2. Wen-Chung Lee1
  1. 1Research Center for Genes, Environment and Human Health, and Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
  2. 2Department of Medical Research, Tzu Chi General Hospital, Hualien, Taiwan
  1. Correspondence to Professor Wen-Chung Lee; wenchung{at}ntu.edu.tw

Abstract

Objective Microarray-related studies often involve a very large number of genes and small sample size. Cross-validating or bootstrapping is therefore imperative to obtain a fair assessment of the prediction/classification performance of a gene signature. A deficiency of these methods is the reduced training sample size because of the partition process in cross-validation and sampling with replacement in bootstrapping. To address this problem, we aim to obtain a prediction performance estimate that strikes a good balance between bias and variance and has a small root mean squared error.

Methods We propose to make a one-step extrapolation from the fitted learning curve to estimate the prediction/classification performance of the model trained by all the samples.

Results Simulation studies show that the method strikes a good balance between bias and variance and has a small root mean squared error. Three microarray data sets are used for demonstration.

Conclusions Our method is advocated to estimate the prediction performance of a gene signature derived from a small study.

  • gene signature
  • prediction
  • classification
  • receiver operating characteristic curve
  • microarray
  • learning curve

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/

View Full Text

Statistics from Altmetric.com

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.