Objective To develop a tool to inform individuals and general practitioners about benefits of lifestyle changes by providing estimates of the expected age of death (EAD) for different risk factor values, and for those who plan and decide on preventive activities and health services at population level, to calculate potential need for these.
Design Prospective cohort study to estimate EAD using a model with 27 established risk factors, categorised into four groups: (1) sociodemographic background and medical history, (2) lifestyles, (3) life satisfaction, and (4) biological risk factors. We apply a Poisson regression model on the survival data split into 1-year intervals.
Participants Total of 38 549 participants aged 25–74 years at baseline of the National FINRISK Study between 1987 and 2007.
Primary outcome measures Register-based comprehensive mortality data from 1987 to 2014 with an average follow-up time of 16 years and 4310 deaths.
Results Almost all risk factors included in the model were statistically significantly associated with death. The largest influence on the EAD appeared to be a current heavy smoker versus a never smoker as the EAD for a 30-year-old man decreased from 86.8 years, which corresponds to the reference values of the risk factors, to 80.2 years. Diabetes decreased EAD by >6.6 years. Whole or full milk consumers had 3.4 years lower EAD compared with those consuming skimmed milk. Physically inactive men had 2.4 years lower EAD than those with high activity. Men who found their life almost unbearable due to stress had 2.8 years lower EAD.
Conclusions The biological risk factors and lifestyles, and the factors connected with life satisfaction were clearly associated with EAD. Our model for estimating a person’s EAD can be used to motivate lifestyle changes.
- multivariable prediction model
- life expectancy
- public health
- statistics & research methods
- survival analysis
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
- multivariable prediction model
- life expectancy
- public health
- statistics & research methods
- survival analysis
Strengths and limitations of this study
Numerous risk factors of total mortality have been identified.
Estimated ages of death are easier to interpret in comparing different risk factors and their levels than HRs.
We compare differences in life expectancy for many different risk factors.
Estimated differences in estimated ages of death cannot be interpreted as causal effects.
Most people want to live a long and healthy life. Choices affecting the prospects of achieving this goal are continually made by individuals themselves and by health professionals. Which amenable determinants of health and longevity should be emphasised in specific individual situations? It is well known from observational epidemiological studies that risk factors describing the sociodemographic background,1 lifestyles,2 3 dietary factors,4 5 life satisfaction (LS)6–12 and metabolic health13–15 predict mortality. For example, vigorous physical activity has been found to decrease the risk of death by 22% compared with no physical activity. Smoking has been found to increase the hazard by 83% and life dissatisfaction by 49%.
Comparisons between different risk factors and their impact on survival could be carried out using expected age of death (EAD) that is easier to interpret than commonly presented HRs. Evidence-based decisions on how to improve the length of life, tailored to specific individual contexts, require reliable information on the EAD for different levels of these risk factors.
At different ages, the differences in EAD can vary considerably even if the HRs remain constant. The differences in life expectancies are generally larger for younger individuals, which illustrates the importance of lifestyle changes early in life. However, studies on the association between risk factors and EAD have rarely been reported in literature and they have generally only included a small number of risk factors simultaneously.
In this study, we analyse total mortality using a model with a large number of risk factors that have previously been found to be predictors of longevity. We include variables describing the socioeconomic background, medical history, lifestyle, LS and biological risk factors. We develop a multivariable prediction model to estimate EAD and report the results also using HRs. As biological risk factors are influenced by lifestyle and socioeconomic background factors, we apply methods developed for graphical models to avoid potential overadjustment by these intermediate risk factors.
We used data of the National FINRISK Study where cross-sectional health examination surveys were conducted every 5 years from 1987 to 2007 in specific areas of Finland.16 17 For each survey, a stratified, independent random sample was drawn from the general population using the national Population Information System. The age range was 25–64 years until 1992 and widened gradually in all areas to 25–74 years by 2007. The total sample size was 52 749 individuals, and the number of participants was 38 549 (73%). Participation rates decreased from 82% to 67% during these years. We excluded individuals who had more than 13 missing covariate values as we considered that the amount of information from these individuals was too small. We also excluded individuals who had total cholesterol below 3 or higher than 10, had body mass index (BMI) greater than 40, reported cancer or myocardial infarction diagnosed by a doctor at baseline, or disability or a disease which did not enable physical exercise 20–30 min as the risk of death of these individuals was likely to be too different from the other individuals. The analysis data set contained 35 804 individuals.
The cohorts formed by the survey participants were followed up for death using record linkage to the national Causes of Death Register maintained by Statistics Finland.18 The follow-up for death ended on 31 December 2014. Mean follow-up time was 16.03 years. During the follow-up period, 4310 deaths occurred (2689 in men, 1621 in women).
Potential risk factors
We categorised risk factors into four groups: sociodemographic background and medical history (abbreviated as background in the following), lifestyles, LS and biological risk factors. The first three groups were based on questionnaire data. Age, sex and education were included as sociodemographic background variables. Myocardial infarction of mother or father under age 60 (except that in the 2007 survey, the age limit for mothers was 65 years), and diabetes diagnosed with the respondent were included as medical background factors. The variables indicating lifestyles included dietary variables covering fresh vegetables and fruits, type of bread spread and type of milk, as well as indicators of smoking, alcohol use and physical activity.19 20 LS variables comprised stress, accomplishments in life, stretching one’s strength to the extreme at work, getting along with spouse or children, financial situation, having a friend and prospects of attaining the goals one would like to reach. Biological risk factors, BMI, systolic and diastolic blood pressure and serum non-high-density lipoprotein (HDL) cholesterol were measured with high quality.16 Details of these 27 risk factors are presented in the online supplementary material.
Age, number of cigarettes, alcohol use, BMI, blood pressure and cholesterol were modelled as continuous covariates, and all other risk factors as categorical covariates.
We performed multiple imputation (MI) to fill in the missing values.21 Twenty-five imputed data sets were generated using the classification and regression trees (cart) method22 by the mice package23 of the R software.24 As variables ‘I feel it impossible to attain the goals that I’d like to reach’ and ‘I feel that I do not have any good friends’ were asked only in two FINRISK surveys, these variables contained about 47% missing data, while the other variables related to LS contained about 21% missing data. Other variables contained 7% missing values or less. The descriptive statistics were calculated using the MI data.
We applied a Poisson regression model using the survreg function of the survival package. The survival data were split into 1-year intervals, within which the hazard function was assumed to be piecewise constant, using the Epi package.25 The logarithm of the baseline hazard as a function of continuous age t in interaction with sex, and the logarithm of the HRs of the continuous risk factors were modelled using natural cubic splines with 4 df, except the number of cigarettes and alcohol use, which were entered as linear terms in the linear predictor. The logarithms of the HRs of the categorical covariates were directly the regression coefficients in the Poisson regression model. Our hazard model was therefore
In the following, we use the shorthand notations for the parameter vector and risk factor vector , in which and correspond to the parameters, and age and sex related to the baseline hazard , respectively. In other words, the age-dependent baseline hazard was stratified with respect to sex, but other risk factors were assumed to act multiplicatively on the hazard.
We applied both simple and multiple regression models. First, the associations of each risk factor with death were estimated one by one, adjusting for age and sex. The second model was fully adjusted, thus we included all risk factors in the same model. All models contained the interaction term of calendar year before year 2000 and age below 75 to account for the decreasing hazard of death in the younger ages. In the projections we set this interaction term to correspond to the calendar year ‘2000 or later, or age 75 years or older’, which is the reference category with HR equal to 1. In the online supplementary material, we present estimates also for the model in which the biological risk factors were excluded but all other risk factors were included simultaneously.
The parameter estimates were pooled over the imputed data sets. The Wald tests were based on the pooled point estimates and covariance matrices.26 We also applied the likelihood ratio test for the multiply imputed data to test if the risk factor interactions with sex were statistically significant in the model.27
The estimates of the Poisson regression model and the corresponding Cox proportional hazards (PH) model were compared, and virtually no differences were found (data not shown). We also tested the PH assumption of the Cox model, and the global p value28 was 0.106 suggesting that the PH assumption was not violated.
The projected EAD value for an individual was estimated by calculating the linear predictor values using the parameter estimates of the Poisson regression model and risk factor values for each age year starting from the baseline age a of the individual. The conditional probability for each year of death given survival until that year was . Note that in the following we use integer valued t so that the integral in the previous expression simplifies to . Finally, the EAD was calculated using the basic definition of expectation of a discrete probability distribution as
In the last expression, the two terms correspond to the conditional probability and the survival function respectively. Note that standard cohort or period methods to estimate the EAD are not sufficient, as we need a parametric survival model to account for the large number of risk factors. The calculation of the EAD using R is illustrated in the online supplementary material.
The CIs for the EADs and their contrasts were calculated by parametric bootstrap with 1000 samples drawn from the multinormal distribution defined by the parameter estimates and their variance estimates, and then calculating the EAD estimate for each sample and using the 2.5% and 97.5% quantile points as the CI limits.
As there are various causal pathways between the risk factors, we applied the causal calculus by Judea Pearl29 for estimating EADs. The possible dependencies between the risk factors are illustrated in figure 1. We assumed that the biological risk factors can depend on the background, modifiable and other risk factors. In order to simplify the modelling assumptions, we assumed that the modifiable and other risk factors depended only on the background factors, and the additional dependencies between the modifiable and other risk factors were modelled only as associations without assuming (temporal) ordering between them.
Application of the causal effect package30 of the R software24 provided us with the formula to calculate the distribution of the time of death conditionally on the background, modifiable and other risk factors, thus the mediating biological risk factor variables were handled by integrating them out. This was conducted by generating predictive values for the biological risk factors using the numerical Monte Carlo method and MI based on the random forest method31 in the mice package,23 and then averaging the EADs based on the 1000 imputed data sets. The variance of these predictive distributions is large, thus the prediction intervals of these causal effect estimates are considerably larger than the full conditional prediction intervals based on fixing the values of all risk factors including the biological risk factors.
Patient and public involvement
This research was done without participant involvement. The participants of the FINRISK Study were not invited to comment on the study design and were not consulted to interpret the results. Participants were not invited to contribute to the writing or editing of this document for readability or accuracy.
Descriptive statistics of the baseline measurements of the participants aged 25–74 years are shown in online supplementary table S1.
For the associations between risk factors and the hazard of death, likelihood ratio test (p=0.44) indicated that there was no need to include interaction terms of sex and risk factors, except age. Therefore, only the interaction of sex and age was included, and other risk factors were entered only as main effects in the models.
Almost all risk factors were statistically significantly associated with the hazard of death even when adjusted for all other background, lifestyle, LS and biological risk factors (table 1a, table 1b, table 1c, figure 2 and online supplementary figure S1). Level of education remained a significant predictor even after adjustment for all the other risk factors (table 1a). The higher the education, the lower the hazard (HR=0.90 for highest vs lowest education). Mother’s myocardial infarction before age 60 was associated with higher hazard (HR=1.20), whereas the corresponding association between the diagnosis of the father and hazard was not significant. Diagnosed diabetes for which treatment by medicine was prescribed increased the hazard (HR=2.21).
Current smokers had higher hazard compared with never smokers (HR=1.67), and every 10 cigarettes per day increased the hazard (HR=1.18, table 1b). If alcohol consumption exceeded 84 g/week, an additional 100 g/week increased the hazard (HR=1.08), and if one reported feeling intoxicated less than once a month, the hazard decreased (HR=0.81) compared with those feeling intoxicated more frequently than once a month.
Whole or full milk increased the hazard (HR=1.53) compared with drinking skimmed milk (table 1b). Use of butter, butter-vegetable oil mixture or cooking margarine also increased the hazard (HR=1.14). Daily use of fresh vegetables decreased the hazard (HR=0.88) compared with using them twice a week or less frequently. Also, eating fruits and berries daily or almost daily decreased the hazard (HR=0.85) compared with consuming them twice a month or less frequently.
Engagement in high (HR 0.75) or medium (HR 0.81) volume of leisure time physical activity was associated with a lower hazard compared with those with no leisure time physical activity (table 1b).
Most of the variables indicating satisfaction with life were also significant predictors (table 1c). Having some stress, but no more than what is usual was associated with lower hazard (HR=0.73) than when feeling life almost unbearable. Hazard seemed to be higher for those who did not work (HR=1.13), but no clear trend can be seen among the other categories of the work stress variable.
The less one was satisfied with his/her accomplishments in life, the higher was the hazard (HR=1.32 for unsatisfied, table 1c). Similarly, the more one disagreed with feeling impossible to obtain one’s goals, the lower the hazard (HR=0.73 for somewhat disagree). If one’s financial status was somewhat better than before (HR=0.92), the hazard was lower than if the status was much better, about the same or worse.
Not having a spouse was associated with a higher hazard (HR=1.35) compared with often having trouble with the spouse (table 1c). Having never special trouble with children was associated with lower hazard (HR=0.89) than not having children or having often special trouble with children.
For non-HDL cholesterol the hazard was lowest at around 5 mmol/L, for BMI at slightly below 30 kg/m2, for systolic blood pressure at 120 mm Hg and for diastolic blood pressure at slightly above 80 mm Hg (figure 2).
The HR estimates were very similar with and without the biological risk factors (online supplementary table S2).
The largest influence on the EAD appeared to be a current smoker versus a never smoker as the EAD for a 30-year-old man decreased from 86.8 years, which corresponds to the reference values of the risk factors, to 82.6 years (table 2 contains the most influential risk factors reducing the EAD of a 30-year-old man by more than 2 years, online supplementary table S3 all risk factors and online supplementary table S4 the contrasts of the risk factor categories), and additionally, smoking 20 cigarettes per day decreased EAD further to 80.2 years while keeping all other risk factors at the same values. Diabetes decreased EAD almost as much to 80.3 years. Whole or full milk consumers had EAD of 84.5 years compared with 87.9 years of those consuming skimmed milk. Physically inactive men had EAD of 85.0 years whereas those with high activity had EAD of 87.4 years. Men, who found their life almost unbearable due to stress, had EAD of 84.0 years. For older men and for women the differences were similar but smaller. The estimates based on causal calculus were lower than the full conditional EADs based on fixing the biological risk factors in the mode values, which are relatively close to the low risk values (figure 2). BMI values below 22 and above 33, non-HDL cholesterol values below 3.6 and above 6.5, diastolic blood pressure above 85 and systolic blood pressure values below 110 and above 135 appeared to reduce the EAD when compared with the lowest risk values (online supplementary figure S2), but these optimal values are based on the other risk factors being at their optimal values. In practice, for example, overweight and obesity can increase blood pressure compared with normal weight, which can increase mortality.
The biological risk factors and lifestyles, and the factors connected with LS were clearly associated with the EAD, and these associations did not disappear when adjusted for a large number of risk factors. Factors like frequent smoking, certain dietary choices with saturated fat, low leisure time physical activity, having been diagnosed with diabetes and experiencing stress were associated with lower years of EAD across all age groups and in both sexes. We found that adjustment for a wide range of established risk factors had only marginal effect on the hazard rate estimates when compared with age-adjusted estimates in analyses based on large national cohort data. This suggests that there is a wide variety of potential interventions, which could increase the length of life among people with low life expectancies, as they are likely to have suboptimal values in several risk factors. The effects of interventions on EAD are likely to be largest in younger ages as for the oldest individuals the hazard of death is inevitably high. The causal calculus had only marginal influence on the EAD estimates when compared with the full conditional estimates also indicating that the projections are not sensitive to the overadjustment.
Comparison with other studies
Most of our findings for significant factors that were associated with mortality have been reported previously. However, calculations for life expectancies across risk factor categories have not been previously carried out in a large data set and with a large number of risk factors. There are several articles in which the number of risk factors was smaller, the number of modifiable lifestyle factors was small or the methods are more difficult to implement on individual level.32–34 Thus, we bring new information on how EAD may change across risk factor categories. Presenting the results on EAD makes the interpretation of the findings more comprehensible than using traditional HRs.
The largest reductions in EAD were found among smokers and diabetics, more than 6 years for a 30-year-old man when compared with never smokers and non-diabetics. Smoking has been found to increase all-cause mortality 1.83-fold in older people.3 New forms of tobacco use, especially e-cigarettes, have become popular during the past few years, but their long-term associations with mortality are currently unknown.35 E-cigarettes have been found to reduce the exposure to carcinogens and toxicants compared with traditional cigarettes,36 but they may increase the risk of respiratory diseases compared with never smokers.37 Type 2 diabetes has been found to increase all-cause mortality 1.85-fold.38
As many other observational cohort studies, also the FINRISK Study shows a J-shaped association with alcohol consumption and mortality, with moderate drinking associating to lower hazard than non-drinking. This has been the case in particular for cardiovascular diseases. However, there is ongoing debate as to how much of this is causal and how much is explained by possible reversed causality. A recent study based on Mendelian randomisation found no J shape in vascular disease incidence.20
Results from this study support previous finding also on the inverse associations of vegetables, and fruits and berries with mortality.5 In the present study, consumption of milk or bread spread with high content of saturated fat was directly associated with the hazard. This is in line with dietary guidelines suggesting limiting intake of saturated fat. However, most of the previous observational studies have failed to find an association between intake of saturated fat and mortality, although this may be partly due to methodological limitations.4
Leisure time physical activity is known to associate with mortality being in line with our findings.39
We found stress to increase all-cause mortality, but effect modifiers such as income have been found in the literature.12 LS has been found to be associated with lower hazard10 also in an earlier Finnish study where the adjusted HR was 1.49 for dissatisfied compared with satisfied men, but among women a corresponding association between LS and survival was not found. In Germany, a similar interaction with sex was found,8 and relevant determinants of LS were found to be psychological, social and lifestyle factors and perceived health.
The BMI values associated with the lowest hazard of death were around 30, and the highest hazard was at BMI values around 20 and above 35. In a simple regression analysis we found the lowest hazard around BMI value of 25 (data not shown), as higher BMI values are often associated with various other risk factors, which we adjusted for in the fully adjusted model. This difference can be partly explained by the causal path from obesity to death via hypertension, as the simple regression analysis in not adjusted for these intermediating variables. This phenomenon illustrates the need to change the BMI value and other risk factor values, which are associated (possibly causally) with the changes in the BMI, in order to obtain EAD changes, which could take place after a change in BMI (or other risk factors). For example, increase in BMI from 25 to 30 in the age group of 50–60 years is associated with 4.6 mm Hg increase in the systolic and 3.9 mm Hg increase in diastolic blood pressure on average, which eliminate the 5% lower hazard seen in the HR estimate of BMI (figure 2). These results align with those of a Mendelian randomisation study,40 in which the unconfounded estimate of the lowest hazard was found to be between BMI values of 22 and 25. The increased risk from low BMI values may also be driven by reversed causality from sickness causing weight loss.
We tested the interactions of the risk factors with sex in a model with age adjustment. Some of the interactions were found statistically significant (data not shown). However, in our large data set, some of these significant associations were of small magnitude. A particularly strong interaction was observed between sex and the variable on having trouble getting along with the spouse: single men appeared to have a higher hazard than other men, but for single women the hazard was close to that in other women. Feeling it impossible to attain the goals that one would like to reach appeared to increase the hazard slightly more for men than for women.
Despite the large number of risk factors being simultaneously adjusted for in our analysis, we found that low education was still associated with a higher mortality, which has been found also in numerous studies in different countries.1 41 42
Our results are in concordance with general health promotion guidelines. The important message of our results is that the HR estimates based on the age and sex-adjusted, and fully adjusted regression models were close to each other. A difference in a lifestyle factor is generally associated with differences in several other risk factors. For example, an increase in physical activity can reduce weight, blood pressure and cholesterol levels, which are all associated with a lower hazard of death. Therefore, the EAD differences might be even larger than our results, which were based on considering the differences of a single risk factor at a time.
Strengths and limitations
The strengths of this study include the use of a large cohort with highly standardised baseline measurements and a long follow-up period. Participation rates were high from 1987 to 1997, thus most deaths were from surveys where non-participation was unlikely to cause serious selection bias in our estimates. The risk factors appeared to satisfy the PH assumption, thus the extrapolation of the results in the oldest ages and also in the future seems realistic, but future work on interactions with calendar time could provide further insight to the associations of the risk factors and mortality. We have modelled the association of the continuous risk factors, including age, with death using the spline functions as in many cases the associations are not linear and a categorisation of continuous risk factors would result in information loss. Generally, latent confounders can compromise the results of observational studies, but we used a large number of risk factors in our models to mitigate such bias. The application of Judea Pearl’s causal calculus29 should mitigate the potential overadjustment by the intermediating biological risk factors (BMI, cholesterol and blood pressure).
The main weakness of this study concerns the possible causal interpretations of our results. Differences between the projected EADs represent differences between population subgroups, but not necessarily the effects of changing an individual’s risk factor values. For example, if an individual increases physical activity, it is likely that, for example, BMI, cholesterol and blood pressure also change. The scientific evidence on the effects of risk factor changes varies between risk factors. For example, quitting of smoking can reduce the risk of coronary heart disease quickly, but the risk of cancer decreases slowly over 10–15 years, but for several other risk factors evidence on causal relationships is vague. The accumulated risk factor history can have a considerable effect on the EAD via increased hazard of cancer and increased hazard of death after the cancer diagnosis. Therefore, a more appropriate interpretation of the differences is a comparison of two population groups. We feel that none of these risk factors could be potentially harmful, if a user attempts to change his/her risk factor value into a more optimal category. The effects of medication were not accounted for in our analyses, thus the associations of high blood pressure or cholesterol levels with mortality can be underestimates as more effective medication, which could have started after the baseline measurement, could have reduced the hazard during the follow-up. Risk factor values could have changed during the follow-up, for example, many smokers quit smoking at some time, thus the HRs of current smokers can also be underestimated. Our proxy variables for saturated fat in food were limited to type of milk and bread spread, thus omitting other sources of saturated fat. Type of milk and bread spread, however, are two of the main sources of saturated fat.4 Different risk factors are likely to contain different amount of measurement error. The most accurately measured risk factors were BMI, blood pressure and laboratory measurements, but self-reported lifestyle and LS factors are likely to vary considerably both between individuals and over time. We did not incorporate interactions in our models, except with age and sex, although some interactions have been found in the literature. As our data set contained only individuals below 75 years of age, our projections do not account for possible interactions of age and the other risk factors in the HRs in the oldest age groups. If the (continuous) risk factors have extreme values for some individuals, then our projections might not be very reliable as our data set contained only a small number of such individuals.
Conclusions and policy implications
EAD is an easy to understand measure for comparing survival associated with different risk factor values. The biological risk factors and lifestyles, and the factors connected with LS were clearly associated with EAD. Our model for estimating a person’s EAD can be used to motivate lifestyle changes. Furthermore, decision makers, who might have a possibility to influence working and other relevant environments, can use it to estimate potential need for preventive measures and medical care in the population.43
This work was supported by the Academy of Finland under grant numbers 266251 and 307907, and by the Duodecim Medical Publications.
Contributors TH, KK and SK had central role in planning, conduct and reporting of the work, accept full responsibility for the work and/or the conduct of the study, had access to the data and controlled the decision to publish. LSJ, PJ, MP, KB and PK participated in planning, conduct and reporting of the work.
Funding The study received external funding from Duodecim Medical Publications and was supported by the Academy of Finland (grant numbers 266251 and 307907).
Disclaimer Researchers were independent of the funders. All authors had full access to all of the data (including statistical reports and tables) in the study and can take responsibility for the integrity of the data and the accuracy of the data analysis.
Competing interests TH reports grants from Duodecim Medical Publications and grants from Academy of Finland during the conduct of the study.
Patient consent for publication Not required.
Ethics approval Ethical approval has been obtained according to the commonly required research procedures and Finnish legislation during each survey. The last three surveys were approved by the coordinating ethics committee of the Helsinki and Uusimaa Hospital District.
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement No data are available. The individual-level data cannot be distributed as they are sensitive data. However, there is a procedure for requesting access to individual data for research collaboration on https://thl.fi/en/web/thlfi-en/research-and-expertwork/population-studies/the-national-finrisk-study.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.