Injury is currently an increasing public health problem in China. Reducing the loss due to injuries has become a main priority of public health policies. Early warning of injury mortality based on surveillance information is essential for reducing or controlling the disease burden of injuries. We conducted this study to find the possibility of applying autoregressive integrated moving average (ARIMA) models to predict mortality from injuries in Xiamen.

The monthly mortality data on injuries in Xiamen (1 January 2002 to 31 December 2013) were used to fit the ARIMA model with the conditional least-squares method. The values p, q and d in the ARIMA (p, d, q) model refer to the numbers of autoregressive lags, moving average lags and differences, respectively. The Ljung–Box test was used to measure the ‘white noise’ and residuals. The mean absolute percentage error (MAPE) between observed and fitted values was used to evaluate the predicted accuracy of the constructed models.

A total of 8274 injury-related deaths in Xiamen were identified during the study period; the average annual mortality rate was 40.99/100 000 persons. Three models, ARIMA (0, 1, 1), ARIMA (4, 1, 0) and ARIMA (1, 1, (2)), passed the parameter (p<0.01) and residual (p>0.05) tests, with MAPE 11.91%, 11.96% and 11.90%, respectively. We chose ARIMA (0, 1, 1) as the optimum model, the MAPE value for which was similar to that of other models but with the fewest parameters. According to the model, there would be 54 persons dying from injuries each month in Xiamen in 2014.

The ARIMA (0, 1, 1) model could be applied to predict mortality from injuries in Xiamen.

Few studies have used the autoregressive integrated moving average (ARIMA) model to forecast injury mortality. Our modelling approach shows that the ARIMA (0, 1, 1) model could reflect the trend of injury mortality in Xiamen and forecast mortality reliably for a short time period.

Some data reported in the Death Surveillance System were collected retrospectively from the bereaved, who did not necessarily know all of the illnesses of the deceased. Possible biases in disease reporting might affect the precision of our model.

The model did not consider the possible impact factors related to injury mortality, such as behavioural factors and weather changes.

Injuries that affect all ages of the population have become a serious worldwide public health threat. Deaths caused by injuries have a serious impact on communities and families.

The autoregressive integrated moving average (ARIMA) model, one of the most classic methods of time series analysis, was first proposed by Box–Jenkins in 1976.

Predicting the number of deaths due to injuries in future months will generate useful information for designing the strategies of public health services. The objective of this study was to describe the temporal trends of injury mortality in Xiamen and to determine the possibility of applying ARIMA models to forecast injury mortality in the upcoming months.

Xiamen is a coastal city located in the southeast of China, with a population of nearly two million in 2013. It covers six districts, including two rural regions (Xiang'an and Tong'an districts), two suburbs (Haicang and Jimei districts) and two urban areas (Huli and Siming districts). The Death Surveillance System has covered the whole city of Xiamen since 2002, when cause of death was classified according to the International Classification of Disease, Tenth Revision (ICD-10). In this study, demographic data were retrieved from the Xiamen Municipal Public Security Bureau, and monthly injury mortality rates in Xiamen were provided by the Xiamen Center for Disease Control and Prevention (CDC), which is responsible for managing the Death Surveillance System. The death and demographic data in this study only included registered Xiamen households. The ICD-10 codes of injury included all ‘V’, ‘W’, ‘X’ and ‘Y’ codes.

The values p, q and d in the ARIMA (p, d, q) model refer to the numbers of AR lags, MA lags and differences, respectively. Brackets are used to show parameters with statistical significance if not all parameters in each lag have statistical significance. The Box–Jenkins methodology was adopted to fit the ARIMA (p, d, q) model. Before constructing the model, we have to identify the stationary state of observed data in the series, of which the mean value remains constant. If non-stationary, the data would be transformed into a stationary time series by taking a suitable difference. The Ljung–Box test was used to measure the ‘white noise’ and residuals in the study. Three steps were performed to determine the degree of ARIMA: model identification, parameter estimation and testing, and application. The orders of the model were identified initially by the cut-off figure of the autocorrelation function (ACF) and the decay figure of the partial ACF (PACF). Schwartz's Bayesian criterion (SBC) was used to select an optimal model; the less the better. The conditional least-squares method was used for parameter estimation, and the t test was used for parameter testing. A parameter without statistical significance had to be removed from the model. The mean absolute percentage error (MAPE) was calculated to assess forecast accuracy and to select an optimum model. A lower MAPE value indicates better fit of the data.

The rates reported were the mean annual rates. The medians, Q1–Q3, were used to describe the distribution of age. The Cochran–Armitage trend test was used to examine the temporal trends in annual injury mortality for different genders. Significance was calculated for p<0.05. All data analysis was performed using SAS V.9.1.

In total, there were 8274 injury-related deaths in Xiamen from 2002 to 2013 (5121 male and 3153 female), with the trough in December (632 cases) and the peak in August (749 cases). Median age was 49 years (Q1–Q3, 34–73) for the total deaths in this study, 45 years (Q1–Q3, 31–63) for male deaths and 62 years (Q1–Q3, 40–81) for female deaths. The average annual injury mortality rate during these years was 40.99/100 000 persons, with nearly 1.61 times more male than female deaths. There was a statistically significant declining trend year by year in total mortality rates from injuries during this period (

Annual mortality rate of injuries in Xiamen, China, from 2002 to 2013.

The result of the above temporal trend test showed that the series of monthly injury mortality data in Xiamen from 2002 to 2013 was a non-stationary sequence. Therefore, we took the first-order differentiation to stabilise the variances. After first-order differentiation (d=1), the data that were not ‘white noise’ (p<0.01) were dispersed horizontally around zero (

Series of monthly mortality after first differentiation. The data after first-order differentiation are dispersed horizontally around zero, suggesting they are stationary.

Autocorrelation function and partial autocorrelation function (ACF and PACF) graphs after first differentiation. The shaded portion is the 95% CI range. The ACF cuts off at lag 1 with slow decay in the PACF, suggesting a moving average model (q=1).

Through frequent adjustment of the parameters according to the values of the Bayesian information criterion (BIC) shown by SAS software directly from low to high, three models ultimately passed the parameter tests (p<0.01) and residual tests (p>0.05): ARIMA (0, 1, 1) with SBC value 1021.28 and MAPE value 11.91%; ARIMA (4, 1, 0) with SBC value 1040.80 and MAPE value 11.96%; and ARIMA (1, 1, (2)) with SBC value 1026.19 and MAPE value 11.90%. We chose the ARIMA (0, 1, 1) model—where the MAPE value was similar to that of the other models, but had the fewest parameters—as the most appropriate model. The ACF and PACF graphs for residuals of the ARIMA (0, 1, 1) model confirmed that the data were fully modelled and that the model was suitable to be used for prediction (

Autocorrelation function and partial autocorrelation function (ACF and PACF) graphs of the residuals for the autoregressive integrated moving average (0, 1, 1) model. The shaded portion is the 95% CI range. As their correlation values are not outside the 95% CI limits, the residuals errors are considered to be white noise, indicating that this model is appropriate for prediction.

Actual and predicted mortalities and 95% CI of predicted mortalities. Most actual observed data are contained within the 95% CI of the predicted value, revealing that the prediction for the monthly injury mortality in Xiamen using the autoregressive integrated moving average (0, 1, 1) model is acceptable.

Predictions of injury mortality could generate useful information for designing the strategies of public health services. However, the causes of injury are complex and include personal, family and social factors.

Before constructing the model, we had to test for ‘white noise’; a time sequence consists of uncorrelated random variables and cannot be used to build a model.

Finally, the MAPE was calculated to assess the accuracy of the forecast and to select the optimum model. A lower MAPE value indicates a better fit of the data. The model with fewest parameters is preferred among those with similar MAPE values, because of the difficulty presented by the ARIMA model in explaining the parameters.

The use of ARIMA models enables us to create short-term predictions of injury mortality in China. However, certain points must be taken into account in the course of building these models. First, the predicted outcomes would be affected by small changes in various parameters. In order to improve the accuracy of prediction, the most recent data model should be added to update the ARIMA model.

There is at least one limitation in this study. The ARIMA (0, 1, 1) model could reflect the trend of injury mortality in Xiamen and forecast the future mortality reliably for a short time period. However, the model did not consider possible impact factors related to injury mortality, such as behavioural factors and weather changes (eg, rainfall, temperature).

The government urgently needs to evaluate the loss caused by injuries, using statistical methods such as time series. Our modelling approach shows that applying the ARIMA time series models to forecast injury mortality in Xiamen is feasible. ARIMA models based on historical surveillance data are important tools for monitoring and forecasting injuries.

The authors thank Director Long Dai and other colleagues in the Department of Chronic Non-communicable Disease Control and Prevention of the Xiamen Municipal CDC for data collection.

YL conceived and designed the research, and wrote the paper. TL performed the statistical analysis. MC and GC were responsible for materials and analysis tools. XW was responsible for study supervision. All the authors were involved in revision of the manuscript for important intellectual content.

This research received funding from The Fourth Period of the Xiamen Municipal Key Department Construction Project.

None declared.

The medical ethics committee of Xiamen Center for Disease Control and Prevention agreed that the use of injury mortality from the Disease Surveillance System did not involve personal private information, and the present study was retrospective without any biological experiment related to humans or animals. The committee therefore waived the need for ethics approval for utilisation of the data.

Not commissioned; externally peer reviewed.

No additional data are available.