Objectives This study aimed to build and test the models of machine learning (ML) to predict the mortality of hospitalised motorcycle riders.
Setting The study was conducted in a level-1 trauma centre in southern Taiwan.
Participants Motorcycle riders who were hospitalised between January 2009 and December 2015 were classified into a training set (n=6306) and test set (n=946). Using the demographic information, injury characteristics and laboratory data of patients, logistic regression (LR), support vector machine (SVM) and decision tree (DT) analyses were performed to determine the mortality of individual motorcycle riders, under different conditions, using all samples or reduced samples, as well as all variables or selected features in the algorithm.
Primary and secondary outcome measures The predictive performance of the model was evaluated based on accuracy, sensitivity, specificity and geometric mean, and an analysis of the area under the receiver operating characteristic curves of the two different models was carried out.
Results In the training set, both LR and SVM had a significantly higher area under the receiver operating characteristic curve (AUC) than DT. No significant difference was observed in the AUC of LR and SVM, regardless of whether all samples or reduced samples and whether all variables or selected features were used. In the test set, the performance of the SVM model for all samples with selected features was better than that of all other models, with an accuracy of 98.73%, sensitivity of 86.96%, specificity of 99.02%, geometric mean of 92.79% and AUC of 0.9517, in mortality prediction.
Conclusion ML can provide a feasible level of accuracy in predicting the mortality of motorcycle riders. Integration of the ML model, particularly the SVM algorithm in the trauma system, may help identify high-risk patients and, therefore, guide appropriate interventions by the clinical staff.
- motorcycle accident
- machine learning (ml)
- logistic regression (lr)
- support vector machine (svm)
- decision tree (dt)
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
- motorcycle accident
- machine learning (ml)
- logistic regression (lr)
- support vector machine (svm)
- decision tree (dt)
Strengths and limitations of this study
This study first used machine learning to predict the mortality risk of motorcycle riders.
The support vector machine model generally works like a black box and cannot identify the relationship between mortality and various explanatory variables.
The incomplete records of patients and exclusion of those who were declared dead in the trauma registry system could cause result bias.
The single-centre setting may limit the generalisability of the results.
Motorcycle use is popular in numerous cities because it is a less expensive and convenient means of transportation. However, despite the less travel time, motorcycle riders who are involved in road traffic accidents tend to have a significantly high morbidity and mortality rate. Compared with other riders of motor vehicles, motorcycle riders are eight times more likely to be injured per vehicle mile,1 and they are also 30 times more likely to die in a motor vehicle crash2 and 58 times more likely to be killed on a per-trip basis.3 In Taiwan, motorcyclist fatalities account for nearly 60% of all driving fatalities,4 which are often associated with gender (men), advanced age, lack of helmet use, unlicensed status and driving under the influence of alcohol.5–9 In addition, head injury is the leading cause of mortality, followed by thoracic and abdominal injuries.6–9
Identifying patients who are at high risk is important for the integration of trauma management to maximise resources and improve quality of care.10 11 More robust and accurate individual predictions of mortality using better models might provide clinicians with more precise information about the likelihood of good or poor outcomes and improve individual trauma and mortality management.12 To identify the possibility of mortality, the Trauma and Injury Severity Score (TRISS) is frequently used, which was established in 1987, to estimate the survival probability of an individual patient with trauma based on logistic regression (LR) analysis of variables, including age, anatomical variable (Injury Severity Score; ISS), physiological variable (Revised Trauma Score) and different coefficients for blunt and penetrating injuries. However, TRISS has limitations and fails to determine an accurate classification in 15%–30% of patients with trauma.13 Even after the incorporation of other or revised predictors, such as blood pressure,14 comorbidities and separate categories for different age groups,15 into this model, the addition of more predictors to the basic TRISS model did not always result in higher performance.16–18 Although the revised TRISS derived from the USA National Trauma Database for trauma systems is inaccurate, particularly in the management of predominantly blunt injuries,19 further development of the model based on advanced methodological quality, performance in the subsets of patient groups and practical application is required for the prediction of mortality.16
Currently, machine learning (ML) had been successfully applied in real-life settings in several fields of study, including automatic medical diagnostics and personalised healthcare.20–22 The application of supervised ML methods to aid diagnosis and prognosis in patients with trauma has been a topic of interest. ML is based on how the human brain approaches pattern recognition tasks, thus providing an artificial intelligence-based approach to solve classification problems and improving their efficiency over time.23 The usefulness of ML is bolstered by the versatility of its techniques and utility for artificial intelligence, such as prediction, classification, planning, recognition and clustering.23 24 Different learning strategies were previously compared using field-specific datasets, of which several had a significantly better predictive power than the more conventional alternatives.25 Examples of multivariate techniques for pattern recognition include but are not limited to LR, support vector machine (SVM), decision tree (DT) and artificial neural networks. LR is a widely used and accepted statistical analysis tool that predicts the probability of the occurrence of an event.26 It aims to build a functional relationship between two or more independent predictors and one dependent outcome variable, with the assumption that the response variables are linearly related to the coefficients of the predictor variables.26
SVM uses a training set of data with one or more features to determine an optimal boundary that separates a set of cases. The binary SVM classifier establishes a set of optimal hyperplanes in a high-dimensional space with the maximal margin of the two classes.27 When all training points cannot be separated by the hyperplane, a soft margin method is used to establish a hyperplane that can separate the training data points.28 29 Moreover, the SVM model can be used for the classification of problems.30–34
DT is a hierarchical model that is composed of decision rules based on the optimal feature cut-off values that recursively classify independent variables into different groups.35–37 It has been built to search for a set of decision rules that can predict an outcome from a set of input variables.33 35 36 Some models are used to construct DT models, including classification and regression trees (CART), iterative dichotomiser 3 (ID3), χ2 automatic interaction detector DTs and C4.5 and C5.0 DTs.26 28 CART analysis is a combined approach based on non-parametric and non-linear variables for recursive partitioning analysis. In addition, it is an innovative DT model in which several predictive variables are used in identifying high-risk patients in various medical fields through progressive binary splits to develop prediction models and to enable better prediction and clinical decision-making.38–40
Thus, this study aimed to establish a model for the mortality prediction of motorcycle riders using ML algorithms based on data from a population-based trauma registry in a level 1 trauma centre.
Requirement for informed consent was waived according to the institutional review board regulations.
Detailed patient information was retrieved from the trauma registry system of our institution, a 2400-bed facility and level 1 regional trauma centre, between January 2009 and December 2015. Only patients with trauma who sustained injuries from a motorcycle accident and were hospitalised for treatment were included in the study. Patient information included the following variables: age; sex; use of a helmet; comorbidities, such as coronary artery disease (CAD), congestive heart failure, cerebral vascular accident, diabetes mellitus, end-stage renal disease and hypertension (HTN); vital signs, including temperature, systolic blood pressure, heart rate and respiratory rate; ISS; Glasgow Coma Scale (GCS) score; Abbreviated Injury Scale (AIS) in the different regions of the body; number of injured body regions according to AIS (number of AIS locations); inhospital mortality and laboratory values (white cell count, red blood cell and platelet count; haemoglobin (Hb), haematocrit (Hct), blood urine nitrogen (BUN), creatinine (Cr), alanine aminotransferase (ALT), aspartate aminotransferase (AST), sodium (Na), potassium (K) and glucose level; and blood alcohol concentration) on emergency admission.
Patient samples were divided into a training sample, which was used for predictor discovery and supervised classification to generate a plausible model, and a test sample, which was used to test the performance of the model that was generated in the training sample. Patients with missing data were not included for further analysis. Those who registered within the 6-year period between January 2009 and December 2014 were included in the training set, with a total of 6306 patients. The group was composed of 6161 survivors and 145 patients who died. In the test set, 946 patients were included, of which 923 survived and 23 died, within the 1-year period between January 2015 and December 2015. The sample similarity was assessed based on Euclidean distance for the quantitative data to reduce the sample that was designed for data analysis.41 The sample reduction used the Euclidean distance of the dist function in the stats package in R (R Foundation for Statistical Computing, Vienna, Austria). During sample reduction, the data size can be reduced to speed up calculations in the analysis.42 However, considering the exploratory nature of this study, all samples (n=6306) and reduced samples (n=1510) in the training set of this study must be analysed during ML classification.
The present study provides a performance comparison of the three different ML classifiers (LR, SVM and DT).
The LR classifier used the glm function in the stats package in R V.3.3.3. Univariate LR analyses were initially performed to identify the significant predictor variables of the mortality risk. A stepwise LR analysis was carried out to control the effects of the confounding variables that help identify the independent risk factors of mortality. The selected independent risk factors obtained from LR were also used as selected features for the implementation of the SVM and DT to explain their importance in determining mortality risk.
Support vector machine
The SVM classifier used the tune.svm and svm function in the e1071 package in R. In the training set, the SVM classifier was used for the prediction of mortality with regard to either all 32 variables or 12 selected features, as well as all samples and reduced samples in the training set. The mapping procedure was performed using the kernel function, which is a matrix of pairwise similarities between data points, such as a linear, polynomial or radial basis function (RBF).43 In the present study, the RBF kernel was used because it can control non-linear interactions between class labels and features.44 The two main parameters presented in the SVM with RBF kernel were the penalty parameter C and kernel hyperparameter γ. The penalty parameter C determined the trade-off between the fitting error minimisation and model complexity, whereas the hyperparameter γ defined the non-linear feature transformation on to a higher dimensional space and controlled the trade-off between errors due to bias and variance in the model.45 The optimal operating point was estimated by differentiating the parameter C and γ using a grid search for each combination of feature selection and dimension reduction with a 10-fold cross-validation.44
DT by CART that was based on the Gini Impurity Index used the rpart function in the rpart package in R. The CART analysis searched for the split on the variable that would partition the data into two different groups: a group of mostly ‘0s’ (people who survived) and ‘1s’ (people who died).46 47 Using the best overall split, the CART model partitioned the data and assigned a predicted class to each subgroup. CART repeated this same process on each predictor in the model, thus identifying the best split by iteratively testing all possible splits and producing the most significant reduction in impurity.38–40 CART proceeded recursively in this manner until the specified stopping criteria were met, a specified number of nodes were created or a further reduction in node impurity was obtained.38–40
An analysis of the receiver operating characteristic (ROC) curve was carried out to assess and compare the performance of the individual ML models. The predictive ability of the model was evaluated using confusion matrix and via an analysis of the area under the curve (AUC) between the two approaches of ML models.
Confusion matrix and geometric mean
The confusion matrix was used to calculate the accuracy, sensitivity and specificity of a given model with true-negative, true-positive, false-positive and false-negative values, and thus, it presents accuracy, which represents the overall proportion of correct classifications; sensitivity, which refers to the proportion of true positives that were accurately identified (eg, percentage of people who were declared dead) and specificity, which refers to the proportion of true negatives that were accurately identified (eg, percentage of people who survived and were declared dead). In addition, because the geometric mean can provide a good trade-off between sensitivity and specificity in a manner that a better accuracy in both classes leads to a larger value, it was calculated in this study according to the methods used by Sanz et al.48
To compare the performance of multiple ML classifiers in multiple training datasets, a non-parametric approach was used to analyse the areas under the correlated ROC curves using the roc and roc.test functions in the pROC package in R. This non-parametric approach considers the correlated nature of the data that two or more empirical curves are established based on tests performed on the same individual.49
All statistical analyses were performed using SPSS V.20.0 (IBM) and R V.3.3.3. For the categorical variables, the χ2 test was carried out to determine the significance of the association between the predictor and outcome variables. For the continuous variables, the Student’s t-test was conducted to analyse normally distributed data, whereas the Kolmogorov-Smirnov test or Mann-Whitney U test was performed to compare non-normally distributed data. Results were presented as mean±SD. A P value <0.05 was considered statistically significant.
Demographic information and injury characteristics of the patients
Patients with head and neck injury had a higher AIS score. However, patients with injury in the extremities had a lower AIS score compared with those who survived (table 1 and online supplementary figure 1). The patients who sustained more body region injuries in the number of AIS locations tended to have a higher mortality risk than those who survived. In addition, women and those who did not wear helmets had a higher risk of mortality compared with those who survived (table 1 and online supplementary figure 1). A statistically significant difference was observed between patients who died and those who survived in terms of age, ISS, GCS, temperature, platelet count, glucose, Hb, Hct, K, Cr, AST and ALT levels, as well as CAD incidence (table 2 and online supplementary figure 2). As the distribution patterns of Hb and Hct levels, as well as AST and ALT levels, are highly similar, only one of these two variables (ie, Hct and AST) was selected for further ML classification to prevent the inclusion of duplicate parameters. Therefore, a total of 32 variables were used for imputation into ML classifiers rather than considering selected features that were obtained by using the independent risk factors identified by the LR given below.
Supplementary file 1
Supplementary file 2
Performance of ML classifiers in the training set
LR considered 12 predictors (platelet count, glucose, BUN, Cr, AST, Na level, age, GCS, temperature, number of AIS locations, ISS and HTN) as independent risk factors for mortality in motorcycle riders for either all samples or reduced samples.
The predictive models were listed as
All samples (n=6306)
Reduced samples (n=1510)
The LR had an accuracy of 98.64% (sensitivity of 59.31% and specificity of 99.56%) and 94.44% (sensitivity of 60.00% and specificity of 98.10%) for all samples and reduced samples, respectively. The AUCs for all samples and reduced samples were 0.9528 and 0.9524, respectively (figure 1).
Support vector machine
In the training set, the SVM classifier was performed for the prediction of mortality considering either all 32 variables or the 12 selected features in all samples and reduced samples, respectively. With the use of the RBF kernel, the two parameters (C and γ) of the SVM model must be determined. The accuracy was highly robust to small changes in the hyperparameters. Thus, reasonable choices were obtained by a grid search of 2x where x is an integer between −8 and 4 for C and between −10 and −2 for γ. The values with the highest 10-fold cross-validation accuracy were C=0.25 and γ=0.00390625. Under the input of all variables into the model, the SVM achieved an accuracy of 98.62% (sensitivity of 62.07% and specificity of 99.48%) and 94.37% (sensitivity of 59.31% and specificity of 98.10%) for all samples and reduced samples, respectively (table 3). The AUCs for all samples and reduced samples were 0.9534 and 0.9526, respectively (figure 1). With the use of the selected features in the model, the SVM had an accuracy of 98.62% (sensitivity of 64.14% and specificity of 99.43%) and 93.84% (sensitivity of 62.76% and specificity of 97.14%) (table 3), and AUC values of 0.9517 and 0.9518 for all samples and reduced samples, respectively (figure 1).
As shown in figure 2, in the DT model, GCS was identified as the variable of the initial split with an optimal cut-off value of >3. Among the patients with a GCS higher than 3, glucose level was selected as the variable of the second split at a discrimination level of 180 mg/dL and 177 mg/dL for all samples and reduced samples, respectively. Glucose level below 180 mg/dL or 177 mg/dL for all samples and reduced samples, respectively, was the best predictor of mortality; the next best predictor was platelet count, with an optimal cut-off value of 201×103/µL. For the node, in patients with a GCS not greater than 3, ISS below 24 and glucose level below 218 mg/dL, these predictors were considered as significant variables for all samples and reduced samples along with a GCS >8 and glucose level below 198 mg/dL, and the number of AIS locations ≥3 was considered as an additional predictor for the splitting of the reduced samples. With all the variables used in the model, the DT had an accuracy of 98.92% (sensitivity of 62.76% and specificity of 99.77%) and 95.83% (sensitivity of 68.97% and specificity of 98.68%) for all samples and reduced samples, respectively. The AUC values for all samples and reduced samples were 0.8872 and 0.9289, respectively. With the selected features used in the model, the DT had an accuracy of 98.92% (sensitivity of 64.14% and specificity of 99.74%) and 95.83% (sensitivity of 70.34% and specificity of 98.53%) for all samples and reduced samples, respectively. The AUC values for all samples and reduced samples were 0.8872 and 0.9289, respectively (figure 1). In the condition wherein reduced samples but not all samples were used in the DT model, the number of AIS locations would be added in the split of the node, thus slightly increasing the sensitivity from 62.76% to 68.97% and from 64.14% to 70.34% with the input composed of all variables and selected variables, respectively. In addition, in the condition wherein selected features but not all variables were used in the DT model, the level of K was not used in the splitting of the node and substituted by the cut-off value of AST (≥104 IU/L), therefore slightly increasing the sensitivity from 62.76% to 64.14% and from 68.97% to 70.34% with input composed of all samples and reduced samples, respectively. The AUC values for all samples and reduced samples were 0.8875 and 0.9292, respectively (figure 1).
Comparison of the results of AUC analysis
When the AUCs for LR, SVM and DT were used for the training set (table 4 and figure 1), both LR and SVM had a significantly higher AUC than DT, regardless of whether all samples or reduced samples and whether all variables or selected features were used. However, no significant difference was observed in the AUC of LR and SVM, regardless of whether all samples or reduced samples, as well as all variables or selected features, were used. In addition, the DT sample reduction had a significantly higher AUC than that obtained using all samples. However, no significant difference was observed in the AUC of DT, regardless of whether all variables or selected features were used.
Performance of ML classifiers in test set
In test set, the LR model for all samples and reduced samples had an accuracy of 98.41%, with a sensitivity of 73.91% and specificity of 99.02%, in predicting mortality (table 3). These four SVM models had an accuracy of more than 98% and a specificity of approximately 99% in predicting mortality. In contrast, the SVM model for all samples with selected features had the highest sensitivity (86.96%) and geometric mean (92.79%). These four DT models had an accuracy of approximately 98% and a specificity of approximately 99% but a sensitivity of less than 70%. Considering that most patients survived and had a significantly high accuracy and specificity index in predicting mortality, the comparison should therefore focus on the sensitivity and geometric mean of the different ML models. All LR and SVM models, but not the DT models, had an increased sensitivity in the test set. In addition, the SVM model for all samples with selected features had the highest sensitivity and geometric mean.
LR is widely used in epidemiological studies for causal inference, and with the selection of built-in features, it does not necessarily use all the predictors. With a relatively limited number of variables, that is, variables less than 20, LR provides the estimates of the ORs of the risk factors.50 However, its limitations became apparent when a complex dataset with a high number of relevant exposures and multiple interactions was analysed.51 With the use of several predictors, data that can specify all interactions may not be obtained.51 In addition, the DT with CART analysis was exploratory and was not based on a probabilistic method, which may lead to an overestimation of the importance of the risk factors or may cause other potential confounders to be missed, thus affecting each patient’s actual risk.52 In contrast to LR, which is significantly affected by outliers using a linear discriminant analysis method, the SVM boundary is only minimally affected by outliers that are difficult to separate, despite the complexity of data.53 In addition, the use of kernels in the SVM model is beneficial for non-linear decision boundaries, thus allowing the classifier to solve more difficult classification problems than the linear analysis method.54 These three ML models (LR, SVM and DT) all had an accuracy and specificity of approximately 98% and 99%, respectively, but a sensitivity less than or approximately 70% in the training dataset. In this study, both LR and SVM had a significantly higher AUC than DT in the training set, regardless of whether all samples or reduced samples and whether all variables or selected features were used.
This study included the different variants of SVM, considering the sample size and feature selection, to show all possible improvements and conventional strategies, such as LR or DT. Although the sample reduction for SVM had been proposed to significantly improve the training speed of the SVM and save a lot of storage space,55 56 kernel use is a more efficient technique for the representation between samples. Thus, the computational complexity of SVM is not wholly governed by the number of samples but by the number of features, which is advantageous for the analysis in high-dimensional settings.54 In addition, feature selection in SVM may maximise the AUC.25 When aided by feature selection, the proposed SVM method identifies the most discriminating indexes for mortality prediction. Although both LR and SVM did not have a different AUC in the training procedure, the SVM model for all samples with selected features had a significantly higher sensitivity (86.96%) in predicting the mortality of motorcycle riders in the test set compared with the rest of the models. The higher sensitivity of SVM in the test set compared with that in the training set may be attributed to an improved quality of registered content and less missing data in our registered data after continuous quality assessment and years of working experience with the registers. Such increased sensitivity was also found in the LR model in the test set. With the addition of more data in the model, the SVM model may have an increased predictive power. In the present study, the feasibility of using SVM classification with feature selection can predict the mortality risk of motorcycle riders admitted in trauma care centres. However, the SVM model generally works like a black box, and it cannot identify the relationships between mortality and various explanatory variables. Therefore, this model cannot be directly used to validate our hypothesis on the increased sensitivity in the test set.
This study has several limitations. First, the patients who had incomplete records were excluded from the analysis. This could have caused result bias, and the results could have been different from the data acquired if the patients with incomplete records were included and the missing data on a variable were replaced by a value that is drawn from an estimate of the distribution of this variable.57–59 Imputation can include patients who might have relevant features for analysis. However, these patients were excluded due to errors in data collection or recording.57–59 Second, the exclusion of patients who were declared dead (either on arriving at the hospital or at the accident area itself) and patients with injuries who were discharged against the advice of physicians in the emergency department may cause a potential bias. Third, important data regarding injury mechanism and circumstance, including motorcycle speed and type, helmet material and impact force during collision, were missing. In addition, the imputation of physiological and laboratory data collected from the time of arrival at the emergency department cannot reflect the dynamic changes in haemodynamic and metabolic variables of the patients with trauma when resuscitation is possible. Furthermore, other DT-related methods, such as DT by C4.5,60 combined classifiers of LR and DT by C4.5,48 and random forest,61 have extremely satisfying performance in dealing with the classification problem. However, these techniques were not investigated in this study. Lastly, the study population was limited to a single urban trauma centre in southern Taiwan, which may not be representative of other populations.
ML can provide a feasible level of accuracy in predicting the mortality of motorcycle riders. However, there are significant theoretical and practical challenges to the translational implementation of this approach. The results of previous studies are extremely helpful and may help in establishing the first step towards the development of a prediction model that can be integrated into the trauma care system to identify an individual motorcycle rider’s risk of mortality.
P-JK and S-CW contributed equally.
Contributors P-JK wrote the manuscript. S-CW revised the manuscript. P-CC performed the statistical analyses and machine-learning programming. C-SR analysed the tables. Y-CC and H-YH collected the data and are responsible for the integrity of the registered data. C-HH designed the study and contributed to the analysis and interpretation of data. P-JK and S-CW contributed to this article equally. All authors have read and approved the final manuscript.
Funding This study was funded by Chang Gung Memorial Hospital with the grant CMRPG8F0891.
Competing interests None declared.
Ethics approval The Institutional Review Board of Chang Gung Memorial Hospital (reference no: 201600653B0).
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement No additional data are available.