A Comparison of Two Strategies for Building an Exposure Prediction Model

Ann Occup Hyg. 2016 Jan;60(1):74-89. doi: 10.1093/annhyg/mev072. Epub 2015 Sep 30.

Abstract

Cost-efficient assessments of job exposures in large populations may be obtained from models in which 'true' exposures assessed by expensive measurement methods are estimated from easily accessible and cheap predictors. Typically, the models are built on the basis of a validation study comprising 'true' exposure data as well as an extensive collection of candidate predictors from questionnaires or company data, which cannot all be included in the models due to restrictions in the degrees of freedom available for modeling. In these situations, predictors need to be selected using procedures that can identify the best possible subset of predictors among the candidates. The present study compares two strategies for selecting a set of predictor variables. One strategy relies on stepwise hypothesis testing of associations between predictors and exposure, while the other uses cluster analysis to reduce the number of predictors without relying on empirical information about the measured exposure. Both strategies were applied to the same dataset on biomechanical exposure and candidate predictors among computer users, and they were compared in terms of identified predictors of exposure as well as the resulting model fit using bootstrapped resamples of the original data. The identified predictors were, to a large part, different between the two strategies, and the initial model fit was better for the stepwise testing strategy than for the clustering approach. Internal validation of the models using bootstrap resampling with fixed predictors revealed an equally reduced model fit in resampled datasets for both strategies. However, when predictor selection was incorporated in the validation procedure for the stepwise testing strategy, the model fit was reduced to the extent that both strategies showed similar model fit. Thus, the two strategies would both be expected to perform poorly with respect to predicting biomechanical exposure in other samples of computer users.

Keywords: bias; optimism; statistical performance; variable selection.

Publication types

  • Comparative Study
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Biomechanical Phenomena / physiology
  • Cluster Analysis
  • Computer Simulation*
  • Electromyography / methods
  • Humans
  • Models, Statistical*
  • Occupational Exposure*
  • Research Design*
  • Superficial Back Muscles / physiology*