Objectives Prostate cancer is the second most common cause of cancer-related death in males after lung cancer, imposing a significant burden on the healthcare system in Australia. We propose the use of autoregressive integrated moving average (ARIMA) models in conjunction with population forecasts to provide for robust annual projections of prostate cancer.
Design Data on the incidence and mortality from prostate cancer was obtained from the Australian Institute of Health and Welfare. We formulated several ARIMA models with different autocorrelation terms and chose one which provided for an accurate fit of the data based on the mean absolute percentage error (MAPE). We also assessed the model for external validity. A similar process was used to model age-standardised incidence and mortality rate for prostate cancer in Australia during the same time period.
Results The annual number of prostate cancer cases diagnosed in Australia increased from 3606 in 1982 to 20 065 in 2012. There were two peaks observed around 1994 and 2009. Among the various models evaluated, we found that the model with an autoregressive term of 1 (coefficient=0.45, p=0.028) as well as differencing the series provided the best fit, with a MAPE of 5.2%. External validation showed a good MAPE of 5.8% as well. We project prostate cancer incident cases in 2022 to rise to 25 283 cases (95% CI: 23 233 to 27 333).
Conclusion Our study has accurately characterised the trend of prostate cancer incidence and mortality in Australia, and this information will prove useful for resource planning and manpower allocation.
- adult oncology
- health informatics
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
Strengths and limitations of this study
Applied innovative time series analysis to health data.
Unique in combining temporal trends in conjunction with population forecasts to project prostate cancer cases in the future.
Lag time in official reported cases of prostate cancer.
Prostate cancer is the most commonly diagnosed internal cancer in Australia. In 2009, prostate cancer accounted for 33% of newly diagnosed cancers among males and 19% of all newly diagnosed cancers.1 Figures from the Australian Institute of Health and Welfare (AIHW) showed that 3079 men died from prostate cancer in 2012, representing 4.1% of all deaths in men and 12.6% of all cancer deaths in men, making prostate cancer second only to lung cancer as the most common cause of cancer death in men.2 Prostate cancer also has an impact on quality of life, with 42 500 disability-adjusted life years (DALYs) lost to prostate cancer, coming in second only to lung cancer (56 800 DALYs).3
A recent pattern of care study among Australian patients with prostate cancer across all risk categories showed that ‘radical prostectomy’ was the most common form of treatment (43%), followed by ‘no active treatment’ with 24% and ‘external beam radiation therapy’ (16%) were the most common modes of principal treatment.4 Based on data from an Australian and New Zealand registry of men diagnosed with prostate cancer and reported to the registry between 2015 and 2016, the mean age of diagnosis was observed to be 68 years, and the main mode of diagnosis was transrectal ultrasound guided biopsy, although transperineal biopsy was also observed to becoming more common over time.5
There is a significant temporal variation associated with changing rates of prostate-specific antigen (PSA) testing and changes in diagnostic procedures. Prostate cancer incidence has been observed to rise steeply associated with the introduction of PSA testing in the early 1990s, peaking in 1994 and then showing a decline from 1997 to 1999, followed by an increasing trend till 2009.6 We believe that the peak in incidence seen in 1994 could be attributed in part to the listing of the PSA case-finding test on the Medical Benefits Scheme that year, which enabled men to access this service at little or no cost to the consumer. The peak in 2009 could be attributed to changes in diagnostic procedures.1 7 Family history, ethnicity, lifestyle and environmental factors are risk factors for developing prostate cancer, and in particular age, where the risk of developing prostate cancer has been shown to increase dramatically from 200 per 100 000 males (among those aged 50–54 years) to 1000 per 100 000 males (among those aged 65–69).1
Projections of prostate cancer would be useful for clinicians, health administrators and researchers in several ways. Forecasts of prostate cancer volume will aid in the planning of facilities, equipment and staffing allocation, especially for regional areas. Modelling prostate cancer incidence and mortality will help researchers understand the determinants of disease incidence and mortality at the population level. Such predictions are especially important for health systems where there is an inherent lag time between diagnosis of cases and when they are officially notified and reported publicly.
Current methods to forecast prostate cancer (which includes other types of cancers as well) in Australia include linear and logarithmic trends8 and the age-period-cohort (APC) model with a power link.6 The limitations include inability to incorporate population demographic changes, adequately account for possible temporal correlation in the data and in addition, the APC model is only able to provide estimates at the broader age and period groups. In recent years, autoregressive integrated moving average (ARIMA) has been increasingly applied in the healthcare field, for instance, to look at trends and predictions in tuberculosis cases among specific patient demographics using routinely collected notifiable data, to examine trends in quality of care indicators among patients with prostate cancer using data from the Prostate Cancer Outcomes Registry-Victoria9 and to forecast monthly ambulance cases attended for hypoglycaemia and hyperglycaemia among patients with diabetes over a 7-year period using data from Ambulance Victoria.10 ARIMA model requires specification of the autoregressive, moving average and integration terms. This allows the data analyst to allow for seasonal variation, account for temporal correlation and incorporate covariates and then undertake forecasting. The performance of ARIMA models in terms of forecasting routinely collected healthcare data has been found to be comparable to other competing time series models, including the Bayesian shared two-component model.11
We hypothesise that formulating an appropriate ARIMA model and incorporating demographic male population projections would help improve the predictive capability of the model to forecast prostate cancer cases in Australia till 2022.
Research design and methods
Study design, population and setting
Data on the incidence of prostate cancer was obtained from the Australian Institute of Health and Welfare (AIHW), which included data from the 2013 version of the Australian Cancer Database (ACD)1 at the time of data analysis. In Australia, cancer is a notifiable disease. All hospitals, pathology laboratories, radiotherapy centres and registries of births, deaths and marriages report cancer cases and deaths to the state/territory population-based cancer registry, from which data are compiled into the ACD by the AIHW on an annual basis. Cancer reporting and registration is a dynamic process and records in the state and territory cancer registries may be modified if new information is received. As a result, the number of cancer cases reported by the AIHW for any particular year may change slightly over time and may not always align with state and territory reporting for that same year. The 41 286 new cases of cancer for 2013 were based on estimates made by the AIHW, because the 2013 incidence data for New South Wales were not available at the time of compilation of the 2013 ACD. Prostate cancer cases were defined by the 10th revision of the International Statistical Classification of Diseases and Related Health Problems diagnostic code C-61 (malignant neoplasm of prostate).
Mortality data were collated by AIHW from the National Mortality Database. Cause of Death Unit Record File (CODURF) data were provided to the AIHW by the Registries of Births, Deaths and Marriages and the National Coronial Information System (managed by the Victorian Department of Justice) and included cause of death coded by the Australian Bureau of Statistics (ABS). Deaths registered in 2012 and earlier were based on the final version of the ABS CODURF; deaths registered in 2013 and 2014 are revised and preliminary versions, respectively, and subject to revision by the ABS.
The annual projection of males aged 50 years and above in Australia was obtained from the ABS. The ageing of Australia’s population was expected to continue over the future period, which was postulated to be the result of sustained below replacement levels of fertility combined with increasing life expectancy at birth. Three main series of projections were selected from a possible 24 individual combinations of the various national-level assumptions by the ABS. Series B reflects current trends in fertility, life expectancy at birth and net overseas migration, whereas series A and series C are based on high and low assumptions for each of these variables, respectively.12
The study period included cases diagnosed from 1982 to 2012, and data were aggregated and analysed annually based on the year of diagnosis. The data were divided into training data and validation datasets. This was because we expected the models to perform well on the data from which they were derived from, so validation on an ‘external dataset’ was performed. Model generation was based on the data from 1982 to 1999 (training dataset) and model validation was based on the dataset 2000 to 2013 (validation dataset). Thereafter, we forecasted annual values from 2014 to 2022.
We studied three different end points, annual cases of prostate cancer, annual age-standardised incidence rate and annual age-standardised mortality rate. The standardisation was based on a weighted average of the age-specific incidence (or mortality) rates for a given year, with the weights determined by a specified ‘Australian’ population.
Patient and public involvement
There was no patient or public involvement in this research as the de-identified data were obtained from a published secondary source and aggregated at the annual level before analysis. Patients and the public were not involved in the design or planning of the study.
We sequentially formulated several different ARIMA models and chose one which provided for an accurate fit of the data based on the Mean Absolute Percentage Error (MAPE) as well as resulting in a parsimonious model. MAPE is calculated as , where Oi and Pi are the observed and predicted observations for the specific year ‘i’, respectively. A lower MAPE would indicate a better model fit. For modelling the number of prostate cancer cases, we also included the ‘number of males aged 50 years and above’ as a predictor variable.We visually examined the series for non-stationarity in addition to formally testing via the Dicky-Fuller test, as the latter is known to be underpowered with small number of time points. We undertook differencing of the series in the event the series was not stationary. Based on the final selected model, we forecasted the annual number of cases expected to be diagnosed in Australia from 2014 to 2022. The 95% CIs were calculated from the mean square errors of the model.
We also undertook two sensitivity analyses for the covariate ‘the number of males aged 50 years and above’ based on an inflation scaling factor on two different assumptions around trends in fertility, life expectancy at birth and net overseas migration. We similarly modelled the age-standardised incidence and age-standardised mortality rates for prostate cancer in Australia using the same approach, but did not include age as a covariate since the data were already standardised for age. Data analysis was undertaken in Stata V.14 (StataCorp, College Station, Texas, USA), with level of significance set at 5%.
The annual number of prostate cancer cases diagnosed in Australia increased from 3606 in 1982 to 20 065 in 2012 (figure 1). There were two peaks observed around 1994 and 2009, which have been thought to be related to PSA testing being listed in the Medicare Benefits Schedule in 1989 and subsequent changes in diagnostic procedures.
For modelling prostate cancer cases, we evaluated a number of formulation of the ARIMA model (table 1). We found that ARIMA (1,1,0) with the covariate ‘men aged 50+ years’ provided the lowest MAPE of 5.2%. The Dickey-Fuller test was not significant (p=0.597), but we still chose to undertake differencing of the series as the trend appeared non-stationary (figure 2). MAPE for the out of sample forecast was understandably slightly higher at 5.8% but reasonably close to the 5% for the in-sample forecasts. Table 2 shows the forecasted annual volume of prostate cancer cases from 2013 to 2022. We estimate that there will be 25 283 (95% CI: 23 233 to 27 333) cases of prostate cancer in Australia by 2022. Depending on trends in fertility, life expectancy and net overseas migration, we feel this forecast could range from 25 275 to 25 299.
Age-standardised prostate cancer rates showed an increasing trend over the years (figure 2). Similarly for age-standardised prostate cancer incidence, ARIMA(1,1,0) with 1 autoregressive and no moving average term and with differencing provide the lowest MAPE of 4.85% (table 3). The out of sample MAPE was similar (4.86%). We estimate the annual age-standardised incidence rate will increase from 161 (per 100 000 men) in 2013 to 178 (per 100 000 men) in 2022 (table 4).
Age-standardised mortality rates gradually increased from the 1980s till about 1995 and then there was a gradual decline (figure 3). Among the various models considered, we found that ARIMA (1,1,0) with just the differencing term and an autoregressive term of 1 provided a reasonably good MAPE of 3.24% and a MAPE of 3.22% for the out of sample forecasts (table 5). More complex models (eg, ARIMA (1,1,1)) improved MAPE only marginally 3.20% vs 3.24%. We estimate that the age-standardised prostate cancer mortality rate will decrease from 27.7 (per 100 000 men) in 2013 to 25.7 (per 100 000 men) in 2022 (table 6).
Our study has accurately characterised the trend of prostate cancer volume, age-standardised incidence and mortality rates in Australia. We have shown that the inclusion of demographic projections, specifically the forecast of ‘males aged 50 years and above’ helps to improve the forecast of prostate cancer cases in Australia.
We postulate that there are several reasons why the incidence of prostate cancer in Australia has risen in the last few years and mortality rates have declined. Some of the increase in incidence can be attributed to an ageing population, as evidenced from a less steep incline in prostate cancer incidence after age-standardisation that we see in our data. Another potential explanation is increased detection rate following PSA testing, which is one of two common tests used by clinicians to detect possible signs of prostate cancer, with the other being digital rectal examination. Early detection of prostate cancer and better and timely treatment of patients are possible reasons why the mortality rates are seeing a decline.
Our findings that age-standardised incidence of prostate cancer in Australia is increasing while mortality is declining is consistent with a recent large-scale Global Burden of Disease study.13 This study found that age-standardised incidence rates showed an increasing trend across all countries grouped into quintiles based on their socioeconomic index, but the increase was more marked among those with high-middle and high socioeconomic index. Similarly, they found that there was a decline in age-standardised death rates in countries belonging to high and high-middle socioeconomic index until early 2000s and then stabilising subsequently. The authors reported a 190% increase in age-standardised prostate cancer incidence rate between 1990 and 2015, along with a 15% reduction in age-standardised mortality rate for the same time period.13
The number of elderly Australians (aged 65+ years) has been shown to increase from one in seven people in 2011 to nearly one in six people in 2016, and this proportion has grown steadily since 1911, when the number was 1 in 25.14 This would have an obvious impact on the increase in the number of elderly people with chronic diseases including cancer. We have shown that incorporating this change in demographics along with trends in disease incidence helps improve forecasts.
We have modelled annual prostate cancer for the whole of Australia. It is also possible to create projections at the small area level (eg, state, local government area or statistical local area levels), but this would involve quantifying and specifying spatial dependencies in the data which is beyond the scope of this research. It is possible though for local health areas or hospitals to use this methodology to forecast their prostate cancer cases. Other studies have examined the geographical variation of prostate cancer and studied the relationship areal factors such as socioeconomic disadvantage with mixed findings,15–23 but the use of geospatial models to forecast disease incidence and mortality is limited in literature.
We also note that other clinical variables such as changes in the type of treatment (more novel hormonal agents, immunotherapy and radioligand treatment) could affect trends in survival, and this could be incorporated in the model when such data become readily available. A recent study has also postulated that detection bias due to changes in screening protocol may affect trends in incidence and mortality and this needs to be considered when interpreting the data.13 For instance, in the USA, a significant decline in screening rates has been attributed to the 2012 United States Preventive Services Task Force (USPSTF) recommendation against routine PSA screening.24 More recent Australian data are needed to evaluate the impact of the 2012 USPSTF guideline.We have shown that ARIMA models can be easily implemented to forecast prostate cancer, and this methodology can be extended to other cancers, non-communicable diseases, infectious diseases and morbidities such as patient risk factors. We also used MAPE to evaluate the fit of the model and to compare model performance. Other statistics can also be used, such as the mean squared error which is summed from the squared error values, but the latter for instance is not relative to the magnitude of the observation and is not intuitive to interpret by clinicians.
Our model generally predicted lower prostate cancer volume than AIHW (eg, 24 146 vs 25 310 for the year 2020) and higher age-standardised incidence rates compared with AIHW (eg, 174.8 vs 163.5 (per 100 000) for 2020). These differences are mainly due to varying forecast methods used. The AIHW used logarithmic link function with power law for projections.
We acknowledge that the time frame of the study is a limitation. Specifically, prostate cancer data from AIHW were only available up to 2013 at the time of analysis, we unfortunately do not have access to up to date data. However, based on clinical feedback, we do not think that the diagnosis and clinical management patterns of patients with prostate cancer has changed tremendously over the period of time from 2014 to 2019, but recommend a re-run of the analysis once updated data are available. A difference in time lag between data availability and analysis/reporting results was also observed in another closely related study in America looking at metastatic prostate cancer from 2004 to 2014, forecasting the cases and incidence rates for 2015 to 2025 and reporting the results in 2018.25
We selected ‘men aged 50 years and above’ as a covariate and it is possible to use a different age cut-off. However, this number was based on clinical practice guidelines for offering PSA testing for men every 2 years from age 50 to age 69 years.3 Based on the same guideline, PSA testing is also not recommended for a man who is unlikely to live for another 7 years, but this would be difficult to apply without adequate risk prediction models. Prostate cancer cases have been noted to mirror PSA testing trends, and the latter can potentially be used to model prostate cancer incidence. There were 778 469 PSA tests reportedly recorded as Medicare item number 66 655 in 2012. However, data on PSA tests from Australia’s Medical Benefits Schedule (MBS) are not reliable, with reports that they could be underestimates by up to 40%. The uncertainties regarding the cost-benefits of PSA screening and changing guidelines and recommendations over the years could contribute to the trends in prostate cancer.
The formal comparison of various time series models was also not the focus of this study. This would entail identifying competing models, simulation studies to cover the various time trend scenarios for prostate (and other cancers) and evaluation the forecasts using formal econometric statistical methods. However, it should be noted that the performance of the ARIMA models has been found to be comparable to other more complex models such as the two-component Poisson model fitted within the Bayesian framework, for surveillance and forecasts using notifiable disease registries,11 as well as the Grey model for predicting incidence of hepatitis B.26
Future studies could examine how the time trends could change according to specific demographic subgroups (eg, males born overseas or indigenous Australian males), as one recent study has shown different trends in metastatic prostate cancer among black, white, Hispanic and Asian males in America from 2004 to 2014.25 Similarly, we recommend that forecasts are also generated by geographic regions (eg, state, local government area, etc) as we expect population and patient risk profiles to vary geographically.
ARIMA models can be easily implemented in healthcare setting and the forecasts can be improved by incorporating demographic projections.
The authors would like to thank the Australian Institute of Health and Welfare and the Australian Bureau of Statistics for providing data that were used in the analysis for this study.
Contributors AE conceived the study, collated the data, analysed and wrote the initial draft as well as the final manuscript. SME provided critical input in the design of the study and writing the manuscript. JM provided critical input in the design of the study and writing the manuscript. FS provided critical input in the design of the study and writing the manuscript.
Funding statement The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Patient consent for publication Not required.
Ethics approval Ethics approval or this study was provided by Monash University Ethics Committee (Project ID: 12654).
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement Data are available on reasonable request. Extra data are available by emailing firstname.lastname@example.org
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.