Analysis of traffic injury severity: an application of non-parametric classification tree techniques

Accid Anal Prev. 2006 Sep;38(5):1019-27. doi: 10.1016/j.aap.2006.04.009. Epub 2006 Jun 2.

Abstract

Statistical regression models, such as logit or ordered probit/logit models, have been widely employed to analyze injury severity of traffic accidents. However, most regression models have their own model assumptions and pre-defined underlying relationships between dependent and independent variables. If these assumptions are violated, the model could lead to erroneous estimations of injury likelihood. The classification and regression tree (CART), one of the most widely applied data mining techniques, has been commonly employed in business administration, industry, and engineering. CART does not require any pre-defined underlying relationship between target (dependent) variable and predictors (independent variables) and has been shown to be a powerful tool, particularly for dealing with prediction and classification problems. This study uses the 2001 accident data for Taipei, Taiwan. A CART model was developed to establish the relationship between injury severity and driver/vehicle characteristics, highway/environmental variables and accident variables. The results indicate that the most important variable associated with crash severity is the vehicle type. Pedestrians, motorcycle and bicycle riders are identified to have higher risks of being injured than other types of vehicle drivers in traffic accidents.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Accidents, Traffic / statistics & numerical data*
  • Adult
  • Aged
  • Decision Trees*
  • Factor Analysis, Statistical
  • Female
  • Humans
  • Male
  • Regression Analysis*
  • Risk Factors
  • Severity of Illness Index
  • Statistics, Nonparametric
  • Wounds and Injuries / epidemiology*