Has working-age morbidity been declining? Changes over time in survey measures of general health, chronic diseases, symptoms and biomarkers in England 1994–2014

Objectives As life expectancy has increased in high-income countries, there has been a global debate about whether additional years of life are free from ill-health/disability. However, little attention has been given to changes over time in morbidity in the working-age population, particularly outside the USA, despite its importance for health monitoring and social policy. This study therefore asks: what are the changes over time in working-age morbidity in England over two decades? Design, setting and participants We use a high-quality annual cross-sectional survey, the Health Survey for England (HSE) 1994–2014. HSE uses a random sample of the English household population, with a combined sample size of over 140 000 people. We produce a newly harmonised version of HSE that maximises comparability over time, including new non-response weights. While HSE is used for monitoring population health, it has hitherto not used for investigating morbidity as a whole. Outcome measures We analyse all 39 measures that are fully comparable over time—including chronic disease diagnoses, symptomatology and a number of biomarkers—adjusting for gender and age. Results We find a mixed picture: we see improving cardiovascular and respiratory health, but deteriorations in obesity, diabetes, some biomarkers and feelings of extreme anxiety/depression, alongside stability in moderate mental ill-health and musculoskeletal-related health. In several domains we also see stable or rising chronic disease diagnoses even where symptomatology has declined. While data limitations make it challenging to combine these measures into a single morbidity index, there is little systematic trend for declining morbidity to be seen in the measures that predict self-reported health most strongly. Conclusions Despite considerable falls in working-age mortality—and the assumptions of many policy-makers that morbidity will follow mortality – there is no systematic improvement in overall working-age morbidity in England from 1994 to 2014.


Competing interests
The author has worked on secondment at the UK Department for Work and Pensions (DWP) in 2015-16.

Data sharing
The statistical code enabling replication using publicly available data will be made available from www.benbgeiger.co.uk at the point the article is accepted. health at the highest levels, alongside stability in moderate mental ill-health and musculoskeletal-related health. In several domains we also see stable or rising chronic disease diagnoses even where symptomology has declined. While data limitations make it challenging to combine these measures into a single morbidity index, there is little systematic trend for declining morbidity to be seen in the measures that predict self-reported health most strongly.
Conclusions: Despite considerable falls in working-age mortality -and the assumptions of many policymakers that morbidity will follow mortality -there is no systematic improvement in overall working-age morbidity in England from 1994 to 2014.

Strengths and limitations of this study
 We provide a robust analysis of morbidity trends in England for 39 measures across two decades using the Health Survey for England ('HSE').
 We include every morbidity measure for which consistent trends can be constructed in the HSE.
 We take care to maximise comparability over time, including constructing new non-response weights.
 However, response rates for each stage of the HSE have declined over time, and it is impossible to rule out changing non-response biases.

INTRODUCTION
As life expectancy has increased in high-income countries, there has been a global debate about whether additional years of life are free from ill-health/disability. It is now largely accepted that old-age disability has declined in the US (albeit varying by age/method), 1 2 although chronic illness increased, 3 and the picture beyond the US is more mixed. [4][5][6] Yet this research agenda has not been matched by similar attention to morbidity trends in the working-age population. In the absence of evidence, policymakers have either made claims based on self-reports of general health [6][7][8] (which are unreliable, as we explain below), or in the case of social security, have assumed that working-age morbidity must have improved in recent decades given improvements in mortality 9 10 (despite the potential for declining mortality to coexist with rising morbidity). 6 Almost the only evidence on working-age trends in overall morbidity in high-income countries comes from the US. These studies have generally found deteriorating morbidity since the mid-1990s, particularly activities of daily living (ADLs) and physical functioning. [11][12][13][14] Other studies have focused on the older working-age population with similar results. 2 15 Again, not all measures show deteriorations, and not all studies come to identical conclusions, 16 but there is little sign of any improvement in morbidity among working-age Americans. Outside of the US, there is a paucity of evidence, but from the limited evidence that exists, there is again little sign of improving morbidity. 17 In this paper we provide new evidence on trends in morbidity in England over two decades, using 39 measures from the Health Survey for England (HSE), a high quality Government survey with a combined sample of 140,000 individuals. We make two contributions. Firstly, we provide one of the few systematic trend analyses of working-age morbidity in any high-income country outside the US. Secondly, we supplement self-report measures with 10 'biomarkers', which provide further insight into whether reported changes are simply reporting changes, but which have rarely been examined alongside self-reported working-age morbidity trends (Martin et al. 2010 20 being an exception).

Data source
This section follows the STROBE cross sectional reporting guidelines. 31 We use the HSE, an annual government-sponsored cross-sectional survey of 3,000-11,000 adults with no proxy responses. 21 A particular advantage is that the initial interview is followed by a nurse visit, which in selected years has also included a blood sample.
Nevertheless, analysing change in HSE is more complex than it might appear:   Third, HSE excludes those in communal establishments. While a smaller problem for the working-age population than older ages, 2 we minimise the impact of rising university attendance by focussing on those aged 25+ (Web Appendix 3). The upper limit of the working-age population is set to 59 (women) and 64 (men) to match state pension ages at the start of the period.

Patient involvement
As this is a health monitoring (rather than intervention) study using all available secondary data, patients were not directly involved. However, from previous discussions we are aware that the study will be of interest to patient/disability advocacy groups, who will receive jargon-free summaries of the research.

Measures
We do not focus on general health/participation restriction measures, as there are numerous non-health factors that influence how they are reported, including inter alia social security incentives, 23 gendered-and age-related expectations, and medicalisation. 24 Trends in such measures can therefore differ wildly between surveys covering nominally the same concept and population, e.g. for disability in England 25 or self-rated health in the US. 26 However, the measures clearly do capture something meaningful; 27 interested readers can find trends in seven such measures in Web Appendix 4.

ANALYSIS
We look both at unadjusted trends (reflecting levels of morbidity in the population) and trends after adjustment for sex and age, following others. 32 33 Individual survey years are grouped into 3-4 year periods to increase sample size and precision (single-year prevalence is given in Web Appendix 7). Because the start/end of trends varies across measures, the easiest-to-interpret measure is the percentage point change across the entire period available (sex/age-adjusted models show average marginal effects following a logistic regression).
To avoid a binary cut-off of statistical significance, 34 95% confidence intervals are used to convey the precision of each estimated trend. All analyses use weights, exclude boost samples that use different sampling methods, and adjust for the clustered nature of the main sample (although standard errors will be slightly underestimated as it is not possible to consistently adjust for sample stratification).
For reasons of space, we are unable to discuss previous HSE studies of aspects of morbidity in the main text; these are instead described in Web Appendix 8.

Conditions with sharply declining mortality
We start by focussing on cardiovascular disease (CVD) and respiratory illness, which have both seen sharp falls in mortality (by >50% and >25% respectively among 0-64 year-olds 1994-2013; Web Appendix A9). Trends in morbidity, however, are shown in  Looking first at high blood pressure, biomarker-measured high blood pressure has halved over two decades (similar improvements are found for the biomarkers for total and HDL cholesterol). Yet when we look at self-reports (either people reporting this as an LSI, or in response to a direct question about having recent diagnosed high blood pressure), we see sharp rises over time. There has been an increasing diagnosis of high blood pressure and increasing prescriptions of blood pressure-lowering drugs; these may have helped reduce the underlying incidence of high blood pressure while simultaneously raising people's awareness of morbidity.  Table 2 further shows declines in several key types of CVD (heart attack, mini-stroke, angina), whether measured through people's reports of the disease itself or their reports of its symptoms. Nevertheless, the morbidity declines (8-50%) are often not on the scale of the declines in mortality (>50%); this is likely to be because mortality declines are partly driven by improved treatment, 35 which means each incident CVD case is likely to last longer. 36 37 More surprisingly, the measures of 'any reported CVD' show no improvement (with some, uncertain signs of rises). Looking at its subcomponents (Web Appendix 6), this seems to be due to possible increases in diagnosed irregular heart rhythm and other heart trouble.
Finally, Table 2 shows that symptoms-based measures of respiratory morbidity have improved, particularly COPD symptoms (regular cough & phlegm) and breathlessness (at both levels), and more uncertainly for recent wheezing/asthma and wheezing stopping sleep. Again, though, diagnosis-related measures of asthma -reported diagnoses, or self-reports of having asthma as a longstanding illnesshave risen, even while underlying symptomology is improving.

Conditions with claims of increasing prevalence
The previous section focussed on conditions where there may be an a priori expectation that morbidity has improved (given declining mortality); in this section, we focus on three areas where there have been widespread claims of increasing prevalence -obesity, diabetes, and mental health.
Looking at Table 3, we do indeed confirm a considerable rise in obesity in HSE (an 8.0-9.7% rise from an obesity prevalence of 16.9% in 1994-96). The rise in high waisthip ratios -sometimes suggested to be a better measure of potential morbidity 38is if anything even sharper. This has come alongside little change in the prevalence of being underweight over this period.   Table 3 also confirms a sharp rise in diabetes. This can be seen whether diabetes is measured through people reporting diabetes as an LSI, a specific question about people currently taking medication for diabetes, or via a diabetes biomarker (glycated haemoglobin). It is worth noting that this clear rise in diabetes has occurred despite a decline in the age 0-64 death rate from diabetes, by more than one-third 1994-2013 (Web Appendix 3) -indeed, rising prevalence is because of falling mortality 39 -again demonstrating the difference between mortality and morbidity trends.
Trends in mental health are more contentious in the wider literature (see Web Appendix 8), and the measures in HSE are not as strong as in the more occasional Adult Psychiatric Morbidity Surveys. Nevertheless, HSE offers a unique annual perspective on mental health trends. As we might expect from increasing treatment/diagnosis, we see a doubling in people reporting a mental health LSI.
However, the symptoms-based measures show a more mixed picture:  Neither of the measures that capture more moderate mental ill-health show rising ill-health (these are 'psychiatric morbidity symptoms' and 'moderate anxiety/depression today', both with a relatively common prevalence of 15-25% ill-health at extremes in our other measure -the psychiatric morbidity scores (obtained from the 12-item GHQ scale) -we created a further measure based on a much higher GHQ threshold of 10 negative responses out of the 12 GHQ questions (compared to the conventional GHQ threshold of 4). Unlike the conventional GHQ measure, this also showed an increase over time (95% CI of a 0.4 to 1.4% rise; see Web Appendix 6). We should note however that the GHQ is not designed to capture severe anxiety/depression in this way.
Overall, while labelling of mental health conditions has undoubtedly risen, trends in mental health vary across measures. If we interpret higher GHQ thresholds as indicating more serious depression/anxiety, then we can see a consistent picture: moderate mental ill-health rose from the mid-1990s to the mid-2000s before falling, whereas more extreme mental ill-health has risen.

Activity limitations, musculoskeletal and pain
Pain/musculoskeletal conditions are a major component of working-age morbidity, yet very few previous studies show trends in symptomology, and even those that exist 40 sometimes have debatable comparability. 41 Table 4 shows a fall in somebut not all -HSE measures focussed on pain and musculoskeletal morbidity. Arthritis  There are some (similarly uncertain) signs that other musculoskeletal LSIs have also fallen, and noticeably fewer people say that they have any pain/discomfort today, although there has been no change in people saying they have extreme pain/discomfort. The echoes the limited wider evidence of rises in back pain over an earlier period 40 42 . Similarly

Other measures
Trends in other measures (for which we have no clear a priori expectations of trends) are shown in Table 5 below. This includes four biomarkers that are more difficult to compare directly to self-reports: -Trends are available for two biomarkers of inflammation (C-reactive protein ('CRP') and fibrinogen), which are commonly used as measures of heart disease risk. However, their interpretation is difficult as they are also associated with other conditions such as diabetes, cancer 43 and -in the case of CRP -even depression. 44 Table 5 shows that both biomarkers have rising morbidity from 1997-2000 to 2008-10 (although for CRP, the confidence interval is wide and there is a non-negligible possibility that the trend is negative).
-The two other biomarkers available in HSE are clearly focussed on anaemia and iron deficiency. Table 5 shows that both of these have declined, with particularly clear evidence for a decline in iron deficiency.

DISCUSSION
Despite considerable evidence on morbidity trends among older people, there are few published studies on morbidity trends among the working-age population, particularly outside the USA. In this paper, we have analysed trends in working-age morbidity in England 1994-2014 using a high-quality repeated cross-sectional study.
We see improvements in cardiovascular morbidity, respiratory morbidity and anaemia, but deteriorating obesity, diabetes, some biomarkers (fibrinogen and possibly also CRP) and mental ill-health at the highest levels. We see little systematic trend in more common mental ill-health or musculoskeletal conditions, pain/mobility, and self-care limitations. We should also stress that symptomology and chronic disease diagnoses often go in different directions -chronic disease diagnoses have sometimes stayed stable or even risen at the same time that underlying symptomology has declined (such as for mental health conditions, asthma, hypertension, and CVD as a whole), mirroring findings at older ages. 3 Our analysis has several strengths. We include every morbidity measure for which consistent trends can be constructed, including chronic disease, functioning and symptomology, and biomarkers. We use a single survey series collected by a single survey organisation; exclude under-25s for whom comparability of survey coverage is unlikely; and construct new non-response weights. Nevertheless, we must note three limitations. Firstly, response rates for each stage of the HSE have declined over time (see Web Appendix 3), and while we create new non-response weights covering the entire period, it is impossible to rule out changing non-response biases. Secondly, it is possible that people respond differently over time even to identical questions.
Third, there are several dimensions of morbidity for which there is little trend data in HSE. This includes several areas in which morbidity among the working-age population seems to be rising, including inter alia cognitive complaints, 45 allergic disorders, 46 and liver cirrhosis (see Web Appendix 3), as well as some areas in which morbidity seems likely to have fallen, such as chronic kidney disease. 47 For policymakers, this leaves the question of whether working-age morbidity as a whole is getting better or worse in England (at least for those who believe that health states can be put on a unidimensional scale). While it is not possible to create a single morbidity index here, Web Appendix 2 shows the association of each measure with bad general self-rated health (net of age, gender and education). This shows little systematic trend for falling morbidity to be seen in the measures that predict health the most (indeed, the evidence weakly points in the other direction, towards rising morbidity). Certainly there is no evidence that working-age morbidity as a whole has declined over the past twenty years in England despite falling mortality.
This mirrors both evidence from the Global Burden of Disease study for the UK (see Web Appendix 2), and more detailed analyses available for the US. [11][12][13][14] In conclusion, despite considerable falls in working-age mortality and gains in life expectancy -and the ensuing expectations of social security policymakers for improving morbidity -there is no systematic improvement in overall working-age morbidity in England from 1994 to 2014. However, two pieces of further research could strengthen this evidence base. Firstly, the ideal measures for analysing trends in morbidity are functional limitations measures, which are included in the HSE from 1996. However, these were last asked to the working-age population in 2001, and it is a priority to repeat these measures in future years of HSE. Secondly, there is a surprising paucity of studies looking at the changing morbidity of the working-age population outside the US. Given their importance in public debate -particularly in discussions of retirement ages and disability benefits -we hope that other authors

Interview measures
As shown in Appendix 1 above, the response rate for the initial face-to-face interview fell from 71.6% in 1994 to 55.5% in 2014. For those who took part in the initial face-to-face interview, the level of item missingness is shown below (including only those years in which each question was asked). This shows the item-missingness is generally very low -only 1 of the 30 measures variables have item-missingness greater than 1%. The only variable with noticeable missingness is BMI, which is understandable as this involves the interviewer taking height and weight measurements rather than simply asking for a verbal response. There are various reasons why people do not have a BMI measurement: -High weight: people with a very high weight are not weighed in HSE 'because the scales are inaccurate above this level', but the definition of this changed (from 130kg before 2011 to 200kg afterwards). 1.00% of respondents were not weighted for this reason in 2010, which fell to <0.1% 2012-14.
-Difficult to take measurement: other respondents have no valid BMI measurement because height or weight measures were not attempted, attempted but not obtained or useable, because the respondent was pregnant, or the respondent was too sick or unsteady. While this varies a little year to year (between 3.8% and 6.1%), there has been little systematic trend in this reason for non-response.
-Refusal: the most common reason for no BMI measurement is an outright refusal (including those refusing out of anxiety, though this tends to be a minor reason). In line with the general participation rates at each stage of the interview above, refusal rates rose sharply from 1.9% in 1994 to a peak of 11.5% in 2011, and remain at 8.3% in the 2014 data.
Because of the high level of item non-response for BMI, a non-response weight was created to try to correct for any biases that this introduces. This followed the identical procedure outlined in Appendix 1 for creating non-response weights for the nurse visit, blood sample etc.

Self-completion measures
As shown in Appendix 1 above, the response rate for the self-completion booklet fell from 71.2% in 1994 (almost everyone who took part in the initial interview) to 51.5% in 2014 (93% of those who took part in the initial interview). For those who completed the self-completion booklet, the level of item missingness is shown in the table below. Item missingness is relatively low compared to missingness from not completing the self-completion survey, though similarly to wider participation rates at each stage of the survey, item missingness does increase over time (e.g. for psychiatric morbidity symptoms, from 1.8% 1994 to 5.9% 2014).

Nurse visit measures
As shown in Appendix 1 above, the response rate for the nurse visit fell from 63.3% in 1994 to 37.3% in 2014. For those who took part in the nurse visit, the level of item missingness is shown in the table below.  This shows that far more people have missing observations for measured high blood pressure than for their waist-hip ratio. This is despite the fact that we explicitly INCLUDE those who are on blood pressure-lowering drugs (about 5% of the sample at the start of the period and 10% at the end), on the grounds that their lowered blood pressure still conveys useful information about their health state. The main reason for the remaining high level of missingness is because people have recently exercised, smoked, drank or ate (12.2%), which rose noticeably over time (from 6.1% to 13.6%).
Because of the high level of item non-response for high blood pressure (and the moderate level for waist-hip ratio), non-response weights were created to try to correct for any biases that are introduced. This followed the identical procedure outlined in Appendix 1 for creating non-response weights for the nurse visit, blood sample etc.

Blood sample measures
As shown in Appendix 1 above, the response rate for the blood sample fell from 53.3% in 1994 to 28.7% in 2014 (primarily due to higher refusal rates, though also affected by changes in eligibility; see discussion in Appendix 1). For those from whom a blood sample was taken, the level of item missingness is shown in the table below. All of these measures are affected by problems in transferring and storing the blood sample and with the measurement process, which results in problems with 3-10% of the blood samples depending on the measure and year. As for blood pressure, we explicitly INCLUDE those who are on lipidlowering drugs (0.4% 1994 to 7.9% 2014), on the grounds that their changed cholesterol level still conveys useful information about their health state. Item missingness is highest for fibrinogen, which not only has high rates of such failures (7.0-9.5%), but also has ineligibility due to likely infection (from raised CRP, 3.6-5.6% of those with blood samples) and taking drugs that affect the reading (3.7% 1994 to 7.7% 2009). Item missingness is also high for C-reactive protein (CRP), which also excludes those with likely infections.
Because of the high level of item non-response for fibrinogen and CRP (and the moderate level for other blood sample biomarkers), non-response weights were created to try to correct for any biases that are introduced. This followed the identical procedure outlined in Appendix 1 for creating non-response weights for the nurse visit, blood sample etc.

Appendix 2: Summarising multiple measures
Having reviewed trends in 39 morbidity measures, we have seen that morbidity in the English working-age population has improved in some respects and deteriorated in others. For those who view work-related morbidity as intrinsically multidimensional, 42 , this is the endpoint of our analysis. However, for those who conceive of morbidity as unidimensional -or those who are interested in morbidity as it relates to a unidimensional work capacity -this raises the question of how we weight different dimensions of morbidity to decide if the overall change in morbidity has been positive or negative.

Methods for creating unidimensional morbidity scales
Several methods have been proposed for creating unidimensional morbidity scales, but most of these are unavailable using the HSE data: • Weights can be based on empirically-derived preferences for different health states, of which the most famous example is the WHO Global Burden of Disease (GBD) study 43 . Some GBD estimates for trends in disability in the UK do exist, and suggest that the prevalence of disability in the working-age population is unchanged 1990-2010, though these results are only presented in passing. 1 For our analyses, however, we have no preference-based weights for most of the HSE measures (excluding the subset of measures that make up the EQ-5D scale).
• Those reporting limitations beyond a certain severity in any domain can be categorised as 'disabled', as recommended by the Washington Group on Disability Statistics (see above). However, as previously discussed, we have few functional limitations measures available in HSE.
• Latent morbidity scales can be created based on the inter-correlations between different measures (using item response theory), as used in the World Disability Report 46 and by researchers associated with the US National Bureau of Economic Research e.g. 47 . However, it is unclear why we would wish to weight items in this way: a given morbidity indicator may be severe, yet if it is unrelated to other morbidity measures it will be given a low weight.
• Latent morbidity scales can also be created based on the independent correlation between each indicator and a general measure of morbidity, such as general self-reported health or 48 as in 49 . This maintains some of the advantages of single-item measures (in providing a basis for making morbidity unidimensional), while avoiding the potential threats to validity discussed above. However, the inconsistent inclusion of measures in each HSE wave prevents a unidimensional morbidity scale being constructed here. 1 Trends in the UK GBD results are reported in Murray et al. 44 . However, Murray et al do not focus on trends in years lived with disability (YLD), other than to note that "YLDs per person by age and sex have not changed substantially in the UK, but age-specific mortality has been improving" (p1005). The figure in the supplementary appendix shows that YLDs have barely changed for either men or women at any age. However, the confidence intervals for YLDs as a whole in the main paper (Table 1) suggest that the confidence intervals for these trends are very wide. The public GBD data 45 do provide cause-disaggregated YLDs for the UK (and all other countries) for a slightly different period (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015), but are not age-standardised, are within broad age groups only (e.g. [15][16][17][18][19][20][21][22][23][24][25][26][27][28][29], and again lack estimates of uncertainty. An alternative way of summarising heterogeneous trends Nevertheless, we can examine if the areas in which morbidity has been improving or declining are those that are particularly important for general health. 48 (This uses the same intuition as the scales in Diederichs et al 2012). 49 The resulting inter-relationship of morbidity trends with their effect on general health is shown in Figure 1 below.

Figure 1: Trends in morbidity measures & their association with bad general health a
a 'Trend' is as reported above. 'Effect on bad health' shows the effect of the morbidity measure on (very) bad health after controlling for age, sex (and their interaction) and educational level, using all years for which the individual morbidity measure is available. (This shows average marginal effects following a logistic regression, with the same survey weights as in the main analyses above). Full details (and 95% CIs for the association with morbidity) are shown below.
It is easiest to interpret the figure by focussing on each group of measures in turn. Firstly, the biomarkers tend to have the weakest relationship with general health. Those with high levels of the diabetes biomarker (glycated haemoglobin) are 9.7% more likely to say they have bad health, and those who are underweight, with a high waist-hip ratio, raised fibrinogen, or low HDL cholesterol are 4-6% more likely to report bad health, but the other measures only had weaker relationships. Indeed, there was effectively no relationship between bad reported health and any of measured high blood pressure, high total cholesterol or iron deficiency.
Secondly, most of the measures based on medical labels have a moderately strong relationship with bad health (the weakest being lifetime asthma and recent high blood pressure, both of which can be asymptomatic), and these measures have mostly risen over time. There are however notable exceptions to this, including IHD/stroke LSI, recent angina and recent heart attack/stroke (the labelbased measures with some of the strongest relationships with bad reported health), as well as arthritis and other musculoskeletal LSIs.
Finally, symptom-based measures unsurprisingly tend to have stronger relationships with bad reported health, although this ranges from the moderate (those reporting 'recent wheezing/asthma attack' were 8.5% more likely to report bad health) to the very strong (those reporting 'extreme pain today' were 46.4% more likely to report bad health). In general, those symptoms-based measures with the strongest relationship with bad reported health were more likely to have increased over time ('extreme anxiety/depression today', 'locomotor limitations', and 'self-care limitations'). However, the size of the aforementioned declines in symptom-based measures of respiratory and cardiovascular morbidity was often greater.
The corresponding table (also showing the confidence intervals for the association of the measure with bad general health) is shown overleaf, ordered by the effect on bad health (which corresponds to top-to-bottom in Figure 1).

Sample coverage
As noted in the main paper, HSE is a household sample that excludes those in communal establishments. If we combine data from the 1991, 2001 and 2011 Censuses, 2 the communal population is as follows: This shows two things. Firstly, that there was a sharp rise in the working-age population in communal establishments 1991-2001 (from 230k to 560k), which was concentrated (>90% of the rise) among education-related communal establishments -although this is perhaps a slight overestimate given a definition change in the Census data. 3 Secondly, looking at education-related communal establishments in 2011, these are overwhelmingly (>90%) among [16][17][18][19][20][21][22][23][24] year olds. It therefore seems likely that the exclusion of communal establishments in HSE will lead to biases in young adults, and we therefore exclude [16][17][18][19][20][21][22][23][24] year olds from the trend analyses.

Sample weights
As noted in the main paper, HSE supplies non-response weights from 2003, including adjustments for non-response to the nurse visit and blood sample using health and socioeconomic status from the initial interview. However, there had been a substantial decline in response rates prior to 2003, as shown in the table below: In general these trends are due to increases in refusal rates. However, the blood sample response rate is affected by two noticeable changes in eligibility over this period (people who are pregnant or who had blood/clotting disorders were ineligible throughout): 1. In 1998, people who had ever had an epileptic fit were excluded from the blood sample. This raised the ineligibility rate to 3.5% of the sample in 1998, from 0.6% in 1994.
2. In 2010, this was then relaxed so that those who had had an epileptic fit more than 5 years ago were again included in the blood sample. This lowered the ineligibility rate from 3.1% in 2009 to 2.4% in 2010.
To try to increase the comparability over time, we create new weights 1994-2014 in three phases: • Firstly, we created a selection weight because some households were slightly more likely to be interviewed than others. (Until 2009, only three households at each address were interviewed. Those living at addresses with many households are therefore less likely to be interviewed). NatCen supplied selection weights for 2004-2013 to enable this (funded by this project), which are not available on the public HSE datasets. • Secondly, after adjusting for the first-stage selection weight, we created new individual-level (inverse probability) weights to match population age-sex-region totals in each year. 4 NatCen added the region variable for the 1994-1997 datasets to the public HSE datasets to enable this.
• Third, after the second-stage adjustment for individual non-response, for the later stages of the interview (self-completion, BMI measurement, nurse visit, blood sample), we created a further weight that adjusts for non-response among those responding to the individual interview. This is based on a logit regression model to predict nurse response based on: The revised weights are included in the Stata code to enable replication of the full paper. The final sample size is as follows:  Trends for these measures are shown in Table 9 below. Looking first at good general health, the table shows the trend from 1994-6, when 80.9% reported good general health. By 2011-14, there had been a decline of 0.8 percentage points. When we adjust for the changing age and sex distribution of the working-age population (labelled 'Adj.' in Table 1), the decline is only 0.1%, with a wide confidence interval (-0.9 to +0.7%), and there is therefore little evidence for any systematic trend. For several of the general health measures, there is more evidence of change over this period -but interpreting these is difficult, because the trends are in opposite directions. There is strong evidence for a rise in bad general health (a rise of 0.6-1.5% from a base of 4.4%), yet equally strong evidence for a decline in having problems with everyday activities (at both levels of severity), and being limited in activities by a longstanding illness. This shows the challenges in tracking population morbidity change through general, non-specific measures, which are likely to be as influenced by changes in reporting styles as much as changes in morbidity per se.
As an aside, UK Government publications have made claims based on healthy/disability-free life expectancy, most recently to argue that morbidity has been deteriorating. However, these trends are potentially misleading: they include older people as well as the working-age population; they confuse a combined mortality-morbidity measure with morbidity; and they are based on self-reports of global health that are unreliable, as we explore further below.   -"I am confined to bed" [This is part of the widely-used EQ-5D health status indicator 2 . However, for the purposes of this paper we have separated the individual measures that make up the EQ-5D in order to compare these to similar indicators of morbidity within each domain]. People are classified as having a problem with self-care today if they had some problems walking about or were confined to bed. Locomotor limitation This is based on the personal care disability scale used in the 2001 HSE report 3 . Respondents in 1995, 2000 and 2001 were asked if any of the following applied to them (interviewers were instructed to ignore temporary disabilities that are expected to last less than one year):

Activity limitations and MSDs
-"Cannot walk 200 yards or more on own without stopping or discomfort". People who reported a limitation were asked if they used a walking aid, and if they did, were then asked if they could walk 200 yards without the walking aid.
-"Cannot walk up and down a flight of 12 stairs without resting" disease; Sever's disease; Spondylitis, spondylosis; Stiff joints, joint pains, contraction of sinews, muscle wastage; Strained leg muscles, pain in thigh muscles; Systemic sclerosis, myotonia (nes); Tenosynovitis; Torn muscle in leg, torn ligaments, tendonitis; Walk with limp as a result of polio, polio (nes), after affects of polio (nes); Weak legs, leg trouble, pain in legs; and Worn discs in spine -affects legs. The code explicitly excludes: Damage/injury to spine results in paralysis; Sciatica or trapped nerve in spine; and Muscular dystrophy.

Circulatory
High blood pressure LSI Every year 1994-2011, people who report a longstanding illness (LSI) are then asked, 'what is the matter with you?'; up to 6 responses are then coded by the interviewer into a consistent coding frame based on the International Classification of Diseases. The high blood pressure LSI measure is based on the group labelled 'Hypertension/high blood pressure/blood pressure (nes)', which as of 2011 includes only the conditions listed in the group label. Recent high blood pressure Respondents in 1994Respondents in , 1998Respondents in , 2003Respondents in , 2006Respondents in and 2009Respondents in -2014 were asked a series of questions on whether they have high blood pressure: -Finally, those with doctor-diagnosed high blood pressure (excluding only when pregnant were asked: "Are you currently taking any medicines, tablets or pills for high blood pressure?", and those saying 'no' (or not giving an answer) were then asked, "Do you still have high blood pressure?" People were considered to have recent high blood pressure if they said they had ever been diagnosed as having high blood pressure by a doctor (excluding when pregnant), and that they still have high blood pressure or are currently taking medicines for it. While the question wording has stayed consistent, a discontinuity seems to be introduced by a change in question context. In some years (1994, 1998, 2003, 2006 and 2011), this question was preceded by a question that asked, "May I just check, have you ever had your blood pressure measured by a doctor or nurse?" (and then for those saying yes, they were asked how recently this was, and whether they were told that it was 'normal (alright/fine), higher than normal, lower than normal, or were you not told anything?'). However, in other years , this question was not asked. Given the way in which context can affect question interpretation, we treat these as two separate measures of recent high blood pressure. Biomarker high blood pressure During the nurse visit (which took place for all consenting respondents in all years except 1999, 2002 and 2004, when the nurse visit focussed on particular subsamples), respondents' blood pressure was measured. High blood pressure is defined as a systolic blood pressure >= 140mmHg and diastolic blood pressure >= 90mmHg following HSE established practice, in turn following 4 . The measurement of blood pressure changed in 2003, from a Dinamap monitor to an Omron monitor. A conversion is available between the two monitors based on a calibration study, and this has been regularly used by the HSE team to produce continuous trends in blood pressure -see www.hscic.gov.uk/catalogue/PUB00480. For adults, the conversion is as follows:  In the years 1994, 1998, 2006, and 2008-14, blood samples were obtained during the  nurse visit, which were then analysed for total cholesterol. A high level of total  cholesterol ('hypercholesterolaemia') is an established risk factor for CVD, and high cholesterol is defined following conventional practice at the NICE guidance 'audit level' of 5mmol/L or above 5 6 . The measurement of cholesterol changed slightly in 2010 when a new laboratory was used. This resulted in values that are an average of 0.1mmol/L higher, and later values are therefore adjusted by this amount to maintain comparability over time as in 5 .

Low HDL cholesterol
In the years 1994, 1998, 2006, and 2008-14, blood samples were obtained during the nurse visit, which were then analysed for high density lipoprotein (HDL) cholesterol. HDL cholesterol reduces the risk of CVD (it carries cholesterol away from the arteries towards the liver), and it is therefore low HDL cholesterol that indicates poorer health; low HDL cholesterol is here defined as 1 mmol/L or less 5 6 . The measurement of HDL cholesterol changed slightly in 2010 when a new laboratory was used. This resulted in values that are an average of 0.1mmol/L lower, and later values are therefore adjusted by this amount to maintain comparability over time as in 5 . Recent heart attack/stroke Respondents in 1994Respondents in , 1998Respondents in , 2003Respondents in , 2006 and 2011 were asked a series of questions on whether they have had a heart attack (within a battery of questions about different types of heart disease): -"Have you ever had a heart attack (including myocardial infarction or coronary thrombosis)?" -Those responding 'yes' were then asked "Were you told by a doctor that you had a Heart Attack (including myocardial infarction or coronary thrombosis)?" -Those with doctor-diagnosed angina were asked, "Have you had a heart attack (including myocardial infarction and coronary thrombosis) during the past 12 months?" Respondents in these years were similarly asked about stroke: -"Have you ever had a stroke?" -Those responding 'yes' were then asked, "Were you told by a doctor that you had a stroke?" -Those with doctor-diagnosed stroke were asked, "Have you had a stroke during the past 12 months?" People were considered to have recent IHD or stroke if they said they had ever been diagnosed as having stroke or a heart attack by a doctor, and that they have had a heart attack or stroke during the past 12 months. Recent angina Respondents in 1994Respondents in , 1998Respondents in , 2003Respondents in , 2006 Respondents in 1994Respondents in , 1998Respondents in , 2003Respondents in , 2006 and 2011 were asked a series of questions on different types of heart disease -including angina; heart attack (including myocardial infarction or coronary thrombosis); a heart murmur; abnormal heart rhythm; or other heart trouble. For EACH of these, they were asked: -"Have you ever had <type of heart disease>?" -Those responding 'yes' were then asked "You said that you had <type of heart disease>. Were you told by a doctor that you had <type of heart disease>?" -For heart murmurs only, women saying they had doctor-diagnosed heart murmurs were asked if they were pregnant when told this, and if so, whether they were ever told they had a heart murmur when they were not pregnant.
-Those with doctor-diagnosed heart disease (excluding heart murmurs when pregnant) were asked, "Have you had <type of heart disease> during the past 12 months?" People were considered to have recent CVD if they said they had a doctor-diagnosed heart condition and that they had had this during the past 12 months. Cardiovascular (CVD) LSI Every year 1994-2011, people who report a longstanding illness (LSI) are then asked, 'what is the matter with you?'; up to 6 responses are then coded by the interviewer into a consistent coding frame based on the International Classification of Diseases. The CVD LSI measure is based on the groups labelled 'Stroke/cerebral haemorrhage/cerebral thrombosis', 'Heart attack/angina', Hypertension/high blood pressure/blood pressure (nes)', 'Other heart problems', 'Piles/haemorrhoids incl. Varicose Veins in anus', 'Varicose veins/phlebitis in lower extremities', and 'Other blood vessels/embolic'. As of 2011 this includes: Aorta replacement; Aortic valve stenosis; Aortic/mitral valve regurgitation; Arterial thrombosis; Arteriosclerosis, hardening of arteries (nes); Artificial arteries (nes); Atrial Septal Defect (ASD); Blocked arteries in leg; Blood clots (nes); Cardiac asthma; Cardiac diffusion; Cardiac problems, heart trouble (nes); Cerebrovascular accident; Coronary thrombosis, myocardial infarction; Dizziness, giddiness, balance problems (nes); Hand Arm Vibration Syndrome (White Finger); Hardening of arteries in heart; Heart attack/angina; Heart disease, heart complaint; Heart failure; Heart murmur, palpitations; Hemiplegia, apoplexy, cerebral embolism; Hole in the heart; Hypersensitive to the cold; Hypertension/high blood pressure/blood pressure (nes); Intermittent claudication; Ischaemic heart disease; Low blood pressure/hypertension; Mitral valve stenosis; Pacemaker; Pains in chest (nes); Pericarditis; Piles/haemorrhoids incl. Varicose Veins in anus; Poor circulation; Pulmonary embolism; Raynaud's disease; St Vitus dance; Stroke victim -partially paralysed and speech difficulty; Stroke/cerebral haemorrhage/cerebral thrombosis; Swollen legs and feet; Tachycardia, sick sinus syndrome; Telangiectasia (nes); Thrombosis (nes); Tired heart; Valvular heart disease; Valvular heart disease; Varicose veins in Oesophagus; Varicose veins/phlebitis in lower extremities; Various ulcers, varicose eczema; Weak heart because of rheumatic fever; Wolff -Parkinson -White syndrome; and Wright's syndrome. It explicitly excludes balance problems due to ear complaint & haemorrhage behind eye. While the LSI coding frame generally stays consistent over this period, interpretation of 'IHD LSI' is complicated by two changes: 'Too much cholesterol in blood' is included in this category in 1994 only, and Polyarteritis Nodosa is later moved into this code (the documentation is not clear on whether this occurred in 2000 or 2001). Angina symptoms This is taken from the Rose Angina questionnaire 7 8 . Respondents in 19947 8 . Respondents in , 19987 8 . Respondents in , 20037 8 . Respondents in , 2006 and 2011 were asked a series of questions about symptoms of heart trouble (rather than whether they had been diagnosed): o Those who said they stop or slow down were asked, "If you stand still does the pain go away or not?" (If respondents were unsure, they were asked, "What happens to the pain on most occasions?"). If the pain goes away, they were asked, "How soon does the pain go away? Does it go in 10 minutes or less, or more than 10 minutes?" o Those who said the pain goes away in 10 minutes or less were asked, "Will you show me where you get this pain or discomfort? Where else" The interviewer then coded the site as Sternum (upper or middle) | Sternum lower | Left anterior chest | Left arm | Right anterior chest | Right arm | (Somewhere else). Following the HSE reports, possible angina is defined as chest pain or discomfort that (i) includes either the sternum or the left arm and left anterior chest; (ii) is prompted by hurrying or walking uphill (or by walking on the level, for those who never attempt more); (iii) makes the respondent either stop or slacken pace; and (iv) usually disappears in 10 minutes or less when they stand still. Heart attack symptoms This is taken from the Rose Angina questionnaire. Respondents in 1994Respondents in , 1998Respondents in , 2003Respondents in , 2006 and 2011 were asked, "Have you ever had a severe pain across the front of your chest lasting for half an hour or more?" As in the 2006 HSE report, those responding 'yes' are treated as having a possible heart attack (myocardial infarction

COPD symptoms
Respondents in 1995, 1996 and 2010 were asked: o "Do you usually cough first thing in the morning in the winter?" (In 2010 only, respondents had previously been asked "Do you usually cough first thing in the morning?" -but this is not used to filter people into the questions on coughing in winter).
o "Do you usually bring up any phlegm from your chest, first thing in the morning in the winter?" (Again, this was asked to everyone in all years, but was preceded by an additional, non-winter-specific question in 2010). -Grade 2 dyspnoea: people who report shortness of breath when hurrying on level ground or walking up a slight hill (or who report shortness of breath when walking on level ground, but who say they never walk up hill or hurry).
-Grade 3 dyspnoea: people who report shortness of breath when walking with people of own age on level ground, or who have to stop for breath when walking at own pace on level ground. (The same questions also exist in 1994 and 1998, but (i) the wider bank of questions differs substantially in the two versions and question context effects are likely; and (ii) the filtering into the final question differs between versions. However, the 1991-98 trends are included below). Recent wheezing/ asthma symptoms Respondents in 1995-97, 2001 and 2010 were asked the following two questions as part of the battery of questions on breathing problems: -"I am now going to ask you some questions about your breathing... Have you ever had wheezing or whistling in the chest at any time, either now, or in the past?" -Those that said yes were then asked, "Have you had wheezing or whistling in the chest in the last 12 months?" -(For those who said they had ever been told by a doctor they had asthma; see above), "When was your most recent attack of asthma? PROMPT IF NECESSARY: Less than 4 weeks ago | More than 4 weeks but within the last 12 months | One to five years ago | More than 5 years ago" People who said they had EITHER wheezing/whistling in the past 12 months or an asthma attack in the past 12 months were counted as having recent wheezing/asthma symptoms.
[It should be noted that the filtering to the second question is very slightly different in 2010 compared to previous years (it was only asked to people who said they had not had wheezing/whistling in the chest in the past 12 months). However, given the way that the derived variable is calculated here, the change in filtering does not introduce any discontinuities over time]. Wheezing stopping sleep Respondents in 1995-97, 2001 and 2010 were asked the following two questions as part of the battery of questions on breathing problems: During the initial face-to-face interview in all years (except 2013), respondents were asked if they would consent to having their height and weight measured by the interviewer. The reasons for missingness (and their trends over time) are given in the following web appendix; note that there are three changes that give rise to small discontinuities in 2009 and 2011. Obesity is a risk factor for diabetes (hence its inclusion in this section) but also heart disease and some cancers. Obesity is defined as a Body Mass Index (BMI) of >= 30kg/m 2 as per the World Health Organization's BMI classification 10

Other biomarkers
Raised C-reactive protein In the years 1998, 2003, 2006, and 2009, blood samples were obtained during the nurse visit, which were then analysed for C-reactive protein (CRP). CRP is an inflammatory marker, which can indicate heart-related inflammation (it is used to test for heart failure) but can also indicate other sorts of health damage including diabetes. However, there are still debates about exactly what CRP shows, both in terms of its causal role in heart disease, and whether it also indicates depression 16 . Raised CRP is defined as >3mg/L, the standard cut-off for a clinically significant rise in CVD 17 18 . Participants with CRP >10mg/L are excluded, as this is taken to be evidence of current infection rather than inflammation from chronic disease.

Raised Fibrinogen
In the years 1998, 2003, 2006, and 2009, blood samples were obtained during the nurse visit, which were then analysed for fibrinogen. Like CRP, fibrinogen is an inflammatory marker, which is both commonly thought to be a causal risk factor for CVD (it is a component of coagulation), and which seems to be a risk factor for other diseases (including cancer and diabetes) 19 .
While fibrinogen is often analysed as a continuous variable with no cutpoints 18 , we here define raised fibrinogen as>4mg/L as in 6 . As for CRP, participants with CRP >10mg/L are excluded, as this is taken to be evidence of current infection rather than inflammation from chronic disease. A change of analysis method and laboratory between 1994 and 1998 means that the 1994 results are not comparable to the later results 20 -"…lost much sleep over worry?" RESPONSES: "Not at all" | "No more than usual" | "Rather more than usual" | "Much more than usual"" -"…felt you were playing a useful part in things?" RESPONSES: "More so than usual" | "Same as usual" | "Less useful than usual" | "Much less useful"" -"…felt capable of making decisions about things?" RESPONSES: "More so than usual" | "Same as usual" | "Less so than usual" | "Much less capable"" -"…felt constantly under strain? RESPONSES: "Not at all" | "No more than usual" | "Rather more than usual" | "Much more than usual"" -"..felt you couldn't overcome your difficulties?" RESPONSES: "Not at all" | "No more than usual" | "Rather more than usual" | "Much more than usual"" -"…been able to enjoy your normal day-to-day activities?" RESPONSES: "More so than usual" | "Same as usual" | "Less so than usual" | "Much less than usual" -"…been able to face up to your problems?" RESPONSES: "More so than usual" | "Same as usual" | "Less able than usual" | "Much less able" -"…been feeling unhappy and depressed? RESPONSES: "Not at all" | "No more than usual" | "Rather more than usual" | "Much more than usual" -"…been losing confidence in yourself? RESPONSES: "Not at all" | "No more than usual" | "Rather more than usual" | "Much more than usual" -"…been thinking of yourself as a worthless person?" RESPONSES: "Not at all" | "No more than usual" | "Rather more than usual" | "Much more than usual"" -"…been feeling reasonably happy, all things considered?" RESPONSES: "More so than usual" | "Same as usual" | "Less so than usual" | "Much less happy" These make up the 12-item General Health Questionnaire GHQ-12; 22 , a well-validated, widely-used measure of probable mental ill-health (or more strictly, of general nonpsychotic psychiatric morbidity). A total score has been created by first ensuring that all questions were coded from 1 (positive symptom) to 4 (negative symptom), and then creating a sum score for all the number of questions in which people answered with categories 3 or 4 (indicating a negative symptom). A binary measure (often called GHQ caseness) was created for people who had negative symptoms for 4 or more of the 12 questions. Anxiety/depression (moderately / Extremely) In the self-completion survey in 1996, 2003-6, 2008, 2010-12 and 2014, respondents were asked 'Now we would like to know how your health is today. Please answer ALL the questions. By ticking one box for each question below, please indicate which statements best describe your own health state today': -"I am not anxious or depressed" -"I am moderately anxious or depressed" -"I am extremely anxious or depressed" [This is part of the widely-used EQ-5D health status indicator 2 . However, for the purposes of this paper we have separated the individual measures that make up the EQ-5D in order to compare these to similar indicators of morbidity within each domain]. Two outcome measures are based on this: whether people have any anxiety/depression (the 2 nd and 3 rd categories combined), and whether they have extreme anxiety/depression (3 rd 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60   F  o  r  p  e  e  r  r  e  v  i  e  w  o  n  l  y   Web Appendix page 24   family or leisure activities)"   -"I have some problems with performing my usual activities" -"I am unable to perform my usual activities" [This is part of the widely-used EQ-5D health status indicator 2 . However, for the purposes of this paper we have separated the individual measures that make up the EQ-5D in order to compare these to similar indicators of morbidity within each domain]. Two outcome measures are based on this: whether people have any problems (the 2 nd and 3 rd categories combined), and whether they are unable to perform their usual activities (3 rd category only).

Limitations in past 2wks
Every year, respondents were asked, "Now I'd like you to think about the two weeks ending yesterday. During those 2 weeks did you have to cut down on any of the things you usually do (about the house or at work or in your free time) because of your answer at <the LSI question> or some other illness or injury?" There have been two small changes to this question's wording in 1996. Firstly, 'work' was changed to 'work/school'. Secondly, 'your answer at <the LSI question>' was changed to 'a condition you have just told me about'. While it is impossible to be sure of the exact effect of these changes, neither seem likely to influence the results (at least for the 25+ age group where fewer individuals are in full-time education).  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60   F  o  r  p  e  e  r  r  e  v  i  e  w  o  n  l  y Web Appendix page 25

Circulatory
Beyond 'recent': 'Ever had' and 'DD' CVD In the main paper, we look at whether people report recent doctor-diagnosed CVD (looking separately at heart attack/stroke, angina, and any recent CVD). As shown in Web Appendix 2, this comes from three questions: whether people report ever having this condition; whether a doctor diagnosed this; and whether they have had an attack in the past 12 months / consider themselves to still have the condition. This Web Appendix therefore shows trends in the other versions of these measures, i.e. having ever had this type of CVD, and having ever doctor-diagnosed ('DD') CVD of this type.

Component measure:
Heart murmur Irregular heart rhythm Other heart disease In the main paper, we recent reports of doctor-diagnosed angina; heart attack (including myocardial infarction or coronary thrombosis); a heart murmur; abnormal heart rhythm; or other heart trouble (see Web Appendix 2). Angina and heart attack are also analysed in the main paper in their own right; in this Web Appendix, we further show trends separately in heart murmur, abnormal heart rhythm or other heart trouble.

Respiratory
Component measure: 'phlegm' In the main paper, we look at whether people report recent COPD (see Web Appendix 2. This combines two measures: regular cough + phlegm. This Web Appendix shows the trend in the phlegm measure on its own, without being combined with a regular cough.

Alternative version:
'LSI respiratory' In the main paper, we look at whether an asthma LSI (to examine alongside a direct question on diagnosed asthma); see Web Appendix 2. This Web Appendix also shows people reporting a longstanding illness ('LSI') which is included within the broader category of respiratory conditions. • If complaint is breathlessness with the cause also stated, this is coded with the cause -hence it also excludes breathlessness as a result of anaemia, breathlessness due to hole in heart, and breathlessness due to angina.

Component measure: Wheezing
In the main paper, we look at whether people report recent wheezing/asthma. As shown in Web Appendix 2, this comes from three questions: whether people report ever having had wheezing or whistling in the chest; whether they have had this in the past 12 months; and whether they have had an asthma attack in the past 12 months. This Web Appendix shows trends in the other versions of these measures, i.e. having ever had wheezing/whistling in the chest, and whether they have had this in the past 1 months.
Beyond 'recent': 'Ever had' and 'DD' diabetes In the main paper, we look at whether people report recent doctor-diagnosed diabetes As shown in Web Appendix 2, this comes from three questions: whether people report ever having this condition; whether a doctor diagnosed this; and whether they currently inject insulin / take other medication for diabetes. This Web Appendix shows trends in the other versions of these measures, i.e. having ever had diabetes, and having ever doctor-diagnosed ('DD') diabetes.

For comparison:
Walking limitation This is based on the personal care disability scale used in the 2001 HSE report 3 . Respondents in 1995, 2000 and 2001 were asked if of the following applied to them (interviewers were instructed to ignore temporary disabilities that are expected to last less than one year): "Cannot walk 200 yards or more on own without stopping or discomfort". People who reported a limitation were asked if they used a walking aid, and if they did, were then asked if they could walk 200 yards without the walking aid.

For comparison: Washing & dressing limitation
This is based on the personal care disability scale used in the 2001 HSE report 3 . Respondents in 1995, 2000 and 2001 were asked if any of the following applied to them (interviewers were instructed to ignore temporary disabilities that are expected to last less than one year): -"Cannot dress and undress without difficulty" -"Cannot wash hands and face without difficulty" For comparison to the 'problems with washing/dressing today' measure in the main paper (which covers a more extended period and is based on a different question; see Web Appendix 2), a measure is derived if respondents say they report either of these problems.

Other LSIs
Every year 1994-2011, people who report a longstanding illness (LSI) are then asked, 'what is the matter with you?'; up to 6 responses are then coded by the interviewer into a consistent coding frame based on the International Classification of Diseases. The various other LSIs are as follows: • The Blood Disorders LSI measure is based on the group 'Disorders of blood and blood forming organs and immunity disorders', which as of 2011 includes: Anaemia, pernicious anaemia, Blood condition (nes), blood deficiency, Haemophilia, Idiopathic Thrombochopenic Purpura (ITP), Immunodeficiences, Polycthaemia (blood thickening), blood to thick, Purpura (nes), Removal of spleen, Sarcoidosis (previously code 37), Sickle cell anaemia/disease, Thalassaemia, Thrombocythenia. It explicitly excludes Leukaemia -code 01.
• The Cancer LSI measure is based on the group 'Cancer (neoplasm) including lumps, masses, tumours and growths and benign (non-malignant) lumps and cysts', which as of

Appendix 7: Year-by-year trends
This appendix presents the year-by-year trends for all of the variables included in the main paper. The table row labelled 'start v end sig' presents the p-value for testing the null hypothesis that there is no difference between the first and last years in the series (whichever these years are). Note that this will differ from the confidence intervals presented in the main paper as these are grouped into multi-year periods with larger sample sizes and therefore greater precision.

Appendix 8: Others' health trend analyses using HSE data
Trends in some of these indicators have not previously been analysed (e.g. waist-hip ratio, fibrinogen). However, others have been studied but never integrated into a single picture of changing morbidity; we review these in this section. (For reasons of space these are included here rather than in the main text).

Cardiovascular morbidity
1998-2011 trends in the two biomarkers for total and HDL cholesterol using HSE data are shown in Oyebode, 5 who find similar results.

Respiratory morbidity
A subset of the HSE respiratory indicators (ever/past year wheezing, doctor-diagnosed asthma) were analysed by Hall and Mindell 23 showing trends 2001-2010, showing similar trends. They found stability in some measures (ever wheezing) but improvements in others (past-year wheezing) -at the same time as the reported prevalence of doctor-diagnosed asthma increased.

Obesity & diabetes
While the English trends in waist-hip ratio have not previously been analysed, earlier Scottish trends are given in Hotchkiss et al 2012. 13 Trends in diabetes have been covered in several HSE reports, e.g. Moody 2012, 14 as has BMI (see particularly the paper by Sperrin et al 2014, 24 who also created a publicly-available time-series HSE dataset for this purpose).

Activity limitations, pain & musculoskeletal morbidity
While musculoskeletal LSIs have not previously been analysed in HSE, a decline can also be seen in the General Household Survey. 28

Mental health
In the UK and most other high-income countries, benefit claims due to mental ill-health have been rising, 29 which has come alongside considerable increases in mental health diagnosis and treatment. 30 The extent to which this reflects rises in mental ill-health and genuinely declining work capacity, however, has long been the subject of debate. 31 32 Perhaps the most robust long-term general population data series in the UK is the Adult Psychiatric Morbidity Survey. 30 33 While some studies have used HSE to show rises in mental ill-health, others have used the same data to come to the opposite conclusion. 34 35 These contrasting conclusions are explained by Web Appendix 5 below: moderate mental ill-health fell between the mid-1990s and the mid-2000s, before rising in 2009, and with a particularly high prevalence in 2011. The conclusions of studies will therefore depend on the years they use as their start and end periods for the trend analysis. 5 It is also worth noting that our results for considerable increases in mental health LSIs can also be seen in a similar measure in the Labour Force Survey. 36 37 5 The major explanation why 'moderate anxiety/depression today' does not show a decline 2011-14 compared to 1994-6 is because of a single very high reported prevalence in 2011, which had reduced by 2012 and 2014.
The alternate measure ('psychiatric morbidity symptoms') was not asked in 2011. It has been suggested that multimorbidity has risen among older people in England 39 and for all age groups in Ontario, 40 although others have cautioned against using simple disease counts, 41 and the evidence cited in the introduction of the main paper suggests that rising chronic disease reporting may partly be a result of increasing awareness (rather than underlying prevalence) of disease.

Mortality in general
Given debates about whether historic improvements in life expectancy are being sustained, 50 51 it is important to note that in the period under study in this paper, working-age life expectancy was increasing. This can be seen in data from the Human Mortality Database (May 2016 update) 1993-2013, using one-year age and one-year period. This data shows that increases in mortality are not found for working-age people as a whole in any major country -for example, standardised workingage death rates have declined by 23% in the US and 35% in the UK over 1993-2013.

Cause-specific mortality for the 0-64 population
The main text refers to cause-specific morality in several places, referring to the death rate among 0-64 year olds from cardiovascular disease (CVD), respiratory conditions, diabetes, and liver cirrhosis. These death rates refer to UK deaths within relevant ICD-10 codes (I00-I99 for CVD, J00-J99 for respiratory conditions, E10-E14 for diabetes), standardised to the European standard population, and taken from the World Health Organization European Office's Health for All Database (May 2016 version), http://www.euro.who.int/en/data-and-evidence/databases/european-health-for-all-databasehfa-db.

Competing interests
The author has worked on secondment at the UK Department for Work and Pensions (DWP) in 2015-16.

Data sharing
The musculoskeletal-related health. In several domains we also see stable or rising chronic disease diagnoses even where symptomatology has declined. While data limitations make it challenging to combine these measures into a single morbidity index, there is little systematic trend for declining morbidity to be seen in the measures that predict self-reported health most strongly.
Conclusions: Despite considerable falls in working-age mortality -and the assumptions of many policymakers that morbidity will follow mortality -there is no systematic improvement in overall working-age morbidity in England from 1994 to 2014.

Strengths and limitations of this study
 We provide a robust analysis of changes over time in morbidity in England for 39 measures across two decades using the Health Survey for England ('HSE').
 We include every morbidity measure for which consistent comparisons over time can be constructed in the HSE.
 We take care to maximise comparability over time, including constructing new non-response weights.
 However, response rates for each stage of the HSE have declined over time, and it is impossible to rule out changing non-response biases.
 There are also several dimensions of morbidity for which there is little trend data in HSE. trends based on self-reports of general health, 6-8 which we know are unreliable. 9 10 The lack of evidence is even more problematic in the case of social security, where many policymakers have assumed that working-age morbidity must have improved in recent decades given improvements in mortality (despite the potential for declining mortality to coexist with rising morbidity) 6 -and that therefore high/rising levels of claims are not 'genuine'. 11 12 Almost the only direct evidence on changes over time in working-age morbidity in high-income countries comes from the US. Contrary to policymaker expectations, these studies have generally found deteriorating morbidity since the mid-1990s, particularly activities of daily living (ADLs) and physical functioning. [13][14][15][16] Other studies have focused on the older working-age population with similar results. 2

DATA AND METHODS
This section follows the STROBE cross sectional reporting guidelines. 23

Data source
Robust evidence of change over time requires a consistently-collected, high-quality data source. We use the HSE, an annual government-sponsored cross-sectional survey of 3,000-11,000 adults with no proxy responses. 24 A particular advantage is that the initial interview is followed by a nurse visit, which in selected years has also included a blood sample. Nevertheless, analysing change in HSE is more complex than it might appear:  Firstly, HSE was run by the Government Office of Population Censuses and Surveys in 1991-3, before changing to NatCen in 1994. We focus on 1994-2014 data given evidence of a discontinuity at this point across multiple variables.

Patient involvement
As this is a health monitoring (rather than intervention) study using all available secondary data, patients were not directly involved. However, from previous discussions we are aware that the study will be of interest to patient/disability advocacy groups, who will receive jargon-free summaries of the research.

Measures
We cannot interpret changes over time correctly without understanding different ways of operationalising 'morbidity'. 1 General health/disability measures -e.g. "How is your health in general?" -are the best conceptual match to our research question, and clearly do capture something meaningful in practice. 27 However, their generality means they suffer from what is variably conceptualised as 'response shift' 28 or 'differential item functioning' 29 ; that is, for any given question, different people (or even the same people at different times) report their general health/disability on different scales. Numerous factors contribute to this, ranging from the experience of ill-health itself 28 to non-health factors such as social security incentives, 30 genderedand age-related expectations, and medicalisation. 31 These inconsistent response scales mean that general health/disability measures are inadequate for answering our question: trends in such measures can differ wildly between different surveys covering nominally the same concept and population, e.g.
for disability in England 9 or self-rated health in the US. 10  should be based on multiple measures of specific activity limitations, rather than single questions about general participation restrictions. 32 Our systematic search found 39 specific morbidity measures that are comparable over time: these are summarised in Table 1, with further details in Web Appendix 5. no guarantee that a given question will be interpreted identically over time, 34 35 they seem substantially less likely to be affected by changing medical practice

ANALYSIS
In the first instance we look at unadjusted changes over time in each morbidity indicator, showing the actual levels of morbidity found in the population. However, we primarily focus on changes after adjustment for sex and age (following others 37 38 ), akin to standardising for the age-sex composition of the population. Given that our aim is to describe changes rather than to explain them, we do not further adjust for potential causal influences on morbidity that are likely to vary over the period, such as employment over economic cycles. This is a task for future research, but we should note that such analysis is possible using our publicly-available time-series dataset that includes inter alia employment status, education and region.
We chose to examine discrete changes from the start to the end of available data for each measure, rather than using linear or non-linear trend terms. Given our aims of informing policy debates, this has three advantages: a discrete change is simple to interpret; it is compatible with the different start/end years available for different measures; and it does not require any assumptions about the functional form of trends (linear trends are particularly unlikely given the role of non-linear economic cycles). Individual survey years are grouped into 3-4 year periods to increase sample size and precision, but single-year prevalence is given in Web Appendix 7. Given our binary outcome measures, we use logistic regression models with the following form: To avoid a binary cut-off of statistical significance, 40 95% confidence intervals are used to convey precision. All analyses use weights, exclude boost samples that use different sampling methods, and adjust for the multistage clustered sample design and the stratification of the sample across survey years using the SVYSET command in Stata (although standard errors will be slightly underestimated as it is not possible to consistently adjust for sample stratification within years). For reasons of space, we are unable to discuss previous HSE studies of aspects of morbidity in the main text; these are instead described in Web Appendix 8.

Conditions with sharply declining mortality
We start by focussing on cardiovascular disease (CVD) and respiratory illness, which have both seen large falls in mortality (by >50% and >25% respectively among 0-64 year-olds 1994-2013; Web Appendix 1). Changes over time in morbidity, however, are shown in Table 2. Looking first at high blood pressure, biomarker-measured high blood pressure has halved over two decades (similar improvements are found for the biomarkers for total and HDL cholesterol). Yet when we look at self-reports (either people reporting this as an LSI, or in response to a direct question about having recent diagnosed high blood pressure), we see large rises over time. There has been an increasing diagnosis of high blood pressure and increasing prescriptions of blood pressurelowering drugs; these may have helped reduce the underlying incidence of high blood pressure while simultaneously raising people's awareness of morbidity. Table 2 further shows declines in several key types of CVD (heart attack, ministroke, angina), whether measured through people's reports of the disease itself or their reports of its symptoms. Nevertheless, the morbidity declines (8-50%) are often not on the scale of the declines in mortality (>50%); this is likely to be because mortality declines are partly driven by improved treatment, 41 which means each incident CVD case is likely to last longer. 42 43 More surprisingly, the measures of 'any reported CVD' show no improvement (with some, uncertain signs of rises). Looking at its sub-components (Web Appendix 6), this seems to be due to possible increases in diagnosed irregular heart rhythm and other heart trouble.
Finally, Table 2 shows that symptoms-based measures of respiratory morbidity have improved, particularly COPD symptoms (regular cough & phlegm) and breathlessness (at both levels), and more uncertainly for recent wheezing/asthma and wheezing stopping sleep. Again, though, diagnosis-related measures of asthma -reported diagnoses, or self-reports of having asthma as a longstanding illnesshave risen, even while underlying symptomatology is improving.
Overall, Table 2  prevalence of self-reported CVD conditions such as heart attacks have only declined by a smaller amount, and recent doctor-diagnosed hypertension, any CVD, and asthma diagnoses have either stayed stable or risen.

Conditions with claims of increasing prevalence
The previous section focussed on conditions where there may be an a priori expectation that morbidity has improved (given declining mortality); in this section, we focus on three areas where there have been widespread claims of increasing prevalence -obesity, diabetes, and mental health.

Other measures
Changes over time in other measures (for which we have no clear a priori expectations) are shown in Table 5 below. This includes four biomarkers that are more difficult to compare directly to self-reports: -Changes over time are available for two biomarkers of inflammation (Creactive protein ('CRP') and fibrinogen). These are associated with a number of conditions including heart disease, diabetes, cancer 50 and -in the case of CRP -even depression. 51 Table 5 shows that both biomarkers have rising morbidity from 1997-2000 to 2008-10 (although for CRP, the confidence interval is wide and there is a non-negligible possibility that the change is negative).
-The two other biomarkers available in HSE are clearly focussed on anaemia and iron deficiency. Table 5 shows that both of these have declined, with particularly clear evidence for a decline in iron deficiency.

DISCUSSION
Despite considerable evidence on morbidity trends among older people, there are few published studies on changes in morbidity among the working-age population, particularly outside the USA. In this paper, we have analysed changes over time in working-age morbidity in England 1994-2014 using a high-quality repeated crosssectional study. We see improvements in cardiovascular morbidity, respiratory morbidity and anaemia, but deteriorating obesity, diabetes, some biomarkers (fibrinogen and possibly also CRP) and feelings of extreme anxiety/depression. We see little systematic change over time in more common mental ill-health or musculoskeletal conditions, pain/mobility, and self-care limitations. We should also stress that symptomatology and chronic disease diagnoses often go in different Our analysis has several strengths. We include every morbidity measure for which consistent changes over time can be constructed, including chronic disease, functioning and symptomatology, and biomarkers. We use a single survey series collected by a single survey organisation; exclude under-25s for whom comparability of survey coverage is unlikely; and construct new non-response weights.
Nevertheless, we must note three limitations. Firstly, response rates for each stage of the HSE have declined over time (see Web Appendix 3), and while we create new non-response weights covering the entire period, it is still possible that socioeconomically disadvantaged people (within any age-sex-region group) have become less likely to respond, and as they tend to be in worse health, this could mask deteriorating morbidity. It is impossible to rule out changing non-response biases, but there is no sign that this has occurred; for example, trends in education are similar in HSE and the gold-standard measure of qualification trends, the Labour Force Survey. 9 Secondly, even if non-response biases have not changed, it is possible that people respond differently over time even to identical questions. Third, there are several dimensions of morbidity for which there is little comparable data in HSE. This includes several areas in which morbidity among the working-age population seems to be rising, including inter alia cognitive complaints, 52 allergic disorders, 53 and liver cirrhosis (see Web Appendix 1), as well as some areas in which morbidity seems likely to have fallen, such as chronic kidney disease. 54 For policymakers, this leaves the question of whether working-age morbidity as a whole is likely to have been getting better or worse in England (at least for those who believe that health states can be put on a unidimensional scale). While it is not possible to create a single morbidity index here, Web Appendix 9 shows the association of each measure with bad general self-rated health (net of age, gender and education). This shows little systematic trend for falling morbidity to be seen in the measures that predict health the most (indeed, the evidence weakly points in the other direction, towards rising morbidity). Certainly there is no evidence that workingage morbidity as a whole has declined over the past twenty years in England despite falling mortality. This mirrors both evidence from the Global Burden of Disease study for the UK (see Web Appendix 9), and more detailed analyses available for the US. [13][14][15][16] In conclusion, despite considerable falls in working-age mortality and gains in life expectancy -and the ensuing expectations of social security policymakers for improving morbidity -there is no evidence of systematic improvement in overall working-age morbidity in England from 1994 to 2014. However, two pieces of further research could strengthen this evidence base. Firstly, the ideal measures for analysing changes in morbidity are functional limitations measures, which are included in the HSE from 1996. However, these were last asked to the working-age population in 2001, and it is a priority to repeat these measures in future years of HSE. Secondly, there is a surprising paucity of studies looking at the changing morbidity of the working-age population outside the US. Given their importance in public debate -particularly in discussions of retirement ages and disability benefitswe hope that other authors will repeat and extend our analyses here, including

Mortality in general
Given debates about whether historic improvements in life expectancy are being sustained, particularly in the US and UK, 1 2 it is important to note that in the period under study in this paper, working-age life expectancy was increasing. This can be seen in data from the Human Mortality Database (May 2016 update) 1993-2013, using one-year age and one-year period. This data shows that increases in mortality are not found for working-age people as a whole in any major countryfor example, standardised working-age death rates have declined by 23% in the US and 35% in the UK over 1993-2013.

Cause-specific mortality for the 0-64 population
The main text refers to cause-specific morality in several places, referring to the death rate among 0-64 year olds from cardiovascular disease (CVD), respiratory conditions, diabetes, and liver cirrhosis. These death rates refer to UK deaths within relevant ICD-10 codes (I00-I99 for CVD, J00-J99 for respiratory conditions, E10-E14 for diabetes), standardised to the European standard population, and taken from the World Health Organization European Office's Health for All Database (May 2016 version), http://www.euro.who.int/en/data-and-evidence/databases/european-health-for-all-databasehfa-db.

Appendix 2: Overall missingness in health measures
This appendix refers to overall item-level missingness; changing item-and unit-level missingness is covered in Appendix 3.

Interview measures
For those who took part in the initial face-to-face interview, the level of item missingness is shown below (including only those years in which each question was asked). This shows the itemmissingness is generally very low -only 1 of the 30 measures variables have item-missingness greater than 1%. -Difficult to take measurement: other respondents (between 3.8% and 6.1% depending on the year) have no valid BMI measurement because height or weight measures were not attempted, attempted but not obtained or useable, because the respondent was pregnant, or the respondent was too sick or unsteady.
-Refusal: the most common reason for no BMI measurement is an outright refusal (including those refusing out of anxiety, though this tends to be a minor reason). Refusal rates are 8.3% in 2014.

Self-completion measures
For those who completed the self-completion booklet, the level of item missingness is shown in the table below. Item missingness is relatively low compared to missingness from not completing the self-completion survey (51.5% of respondents in 2014).

Nurse visit measures
For those who took part in the nurse visit, the level of item missingness is shown in the table below. This shows that far more people have missing observations for measured high blood pressure than for their waist-hip ratio. This is despite the fact that we explicitly INCLUDE those who are on blood pressure-lowering drugs (about 5% of the sample at the start of the period and 10% at the end), on the grounds that their lowered blood pressure still conveys useful information about their health state. The main reason for the remaining high level of missingness is because people have recently exercised, smoked, drank or ate (12.2%).

Blood sample measures
For those from whom a blood sample was taken, the level of item missingness is shown in the table below.  All of these measures are affected by problems in transferring and storing the blood sample and with the measurement process, which results in problems with 3-10% of the blood samples depending on the measure and year. As for blood pressure, we explicitly INCLUDE those who are on lipidlowering drugs (0.4% 1994 to 7.9% 2014), on the grounds that their changed cholesterol level still conveys useful information about their health state. Item missingness is highest for fibrinogen, which not only has high rates of such failures (7.0-9.5%), but also has ineligibility due to likely infection (from raised CRP, 3.6-5.6% of those with blood samples) and taking drugs that affect the reading (3.7% to 7.7% dependent on the year). Item missingness is also high for C-reactive protein (CRP), which also excludes those with likely infections.

Dealing with item-level missingness
Because of the high level of item non-response for certain measures (BMI, high blood pressure, fibrinogen, and CRP), and moderate level for others (other blood sample biomarkers and waist-hip ratio) -and because of evidence of changing non-response at various stages of the survey processnon-response weights were created to try to correct for any biases that these introduce. This is described in further detail in Appendix 3.

Appendix 3: Changing non-response & weights
This appendix focuses on changes in unit-level non-response at different stages of HSE.

Sample frame coverage
As noted in the main paper, HSE is a household sample that excludes those in communal establishments. If we combine data from the 1991, 2001 and 2011 Censuses, 1 the communal population is as follows: This shows two things. Firstly, that there was a sharp rise in the working-age population in communal establishments 1991-2001 (from 230k to 560k), which was concentrated (>90% of the rise) among education-related communal establishments -although this is perhaps a slight overestimate given a definition change in the Census data. 2 Secondly, looking at education-related communal establishments in 2011, these are overwhelmingly (>90%) among 16-24 year olds. It therefore seems likely that the exclusion of communal establishments in HSE will lead to biases in young adults, and we therefore exclude [16][17][18][19][20][21][22][23][24] year olds from the trend analyses. Changing unit non-response within the sample frame As noted in the main paper, HSE supplies non-response weights from 2003, including adjustments for non-response to the nurse visit and blood sample using health and socioeconomic status from the initial interview. However, there had been a substantial decline in response rates prior to 2003, as shown in the table below: In general these trends are due to increases in refusal rates. However, the blood sample response rate is affected by two noticeable changes in eligibility over this period (people who are pregnant or who had blood/clotting disorders were ineligible throughout): 1. In 1998, people who had ever had an epileptic fit were excluded from the blood sample. This raised the ineligibility rate to 3.5% of the sample in 1998, from 0.6% in 1994.
2. In 2010, this was then relaxed so that those who had had an epileptic fit more than 5 years ago were again included in the blood sample. This lowered the ineligibility rate from 3.1% in 2009 to 2.4% in 2010.

Changing item non-response within responding people
There are also changes over time in item non-response (further detail on overall item non-response is given in Appendix 2). This includes: -BMI: there has been little systematic trend in one reason for the absence of a BMI measure (difficulty in taking BMI measurements). However, there are trends in other reasons: o Refusal: in line with the general participation rates at each stage of the interview above, BMI refusal rates rose sharply from 1.9% in 1994 to a peak of 11.5% in 2011, and remain at 8.3% in the 2014 data.
-Psychological distress: similarly to wider participation rates at each stage of the survey, item missingness within the self-completion survey does increase over time (e.g. for psychological distress symptoms, from 1.8% 1994 to 5.9% 2014).
-Measured high blood pressure: there was a noticeable rise over time in exclusion of high blood pressure measures on the grounds that people recently exercised, smoked, drank or ate (from 6.1% to 13.6%).

Creating non-response weights
To increase comparability over time, we create new weights 1994-2014 in several phases.

First-stage non-response weights
Firstly, we created a selection weight because some households were slightly more likely to be interviewed than others. (Until 2009, only three households at each address were interviewed. Those living at addresses with many households are therefore less likely to be interviewed). NatCen supplied selection weights for 2004-2013 to enable this (funded by this project), which are not available on the public HSE datasets.
Secondly, after adjusting for the selection weight, we created new individual-level (inverse probability) weights to match population age-sex-region totals in each year. Population data are annual mid-year population estimates from nomis. NatCen added the region variable for the 1994-1997 datasets to the public HSE datasets to enable this.

Second-stage non-response weights
After the first-stage adjustment for individual non-response, for the later stages of the interview (self-completion, BMI measurement, nurse visit, blood sample), we created a further weight that adjusts for non-response among those responding to the individual interview. This is based on a logit regression model to predict that stage of response based on: • Age and gender (4 age group categories interacted with gender); • Qualifications (degree or FT student / A-level or above / other qualifications / no qualifications); • Household type (presence of other adults in the household); • Employment status (yes/no); • Smoking (never regular smoker / ex-regular smoker / current regular smoker); and • Self-reported general health (bad or very bad health vs. other categories).
On the basis of these criteria, we create inverse probability weights -that is, we create a predicted probability of response for each respondent based on the logit regression model, and then create a weight that is the inverse of this predicted probability. The revised weights are included in the Stata code to enable replication of the full paper.

Final sample size
The final sample size is as follows:  Trends for these measures are shown in Table 9 below. Looking first at good general health, the table shows the trend from 1994-6, when 80.9% reported good general health. By 2011-14, there had been a decline of 0.8 percentage points. When we adjust for the changing age and sex distribution of the working-age population (labelled 'Adj.' in Table 1), the decline is only 0.1%, with a wide confidence interval (-0.9 to +0.7%), and there is therefore little evidence for any systematic trend. For several of the general health measures, there is evidence of change over this period -but interpreting this is difficult, because the trends are in opposite directions. There is strong evidence for a rise in bad general health (a rise of 0.6-1.5% from a base of 4.4%), yet equally strong evidence for a decline in having problems with everyday activities (at both levels of severity), and being limited in activities by a longstanding illness. This shows the challenges in tracking population morbidity change through general, non-specific measures, which are likely to be as influenced by changes in reporting styles as much as changes in morbidity per se.
As an aside, UK Government publications have made claims based on healthy/disability-free life expectancy -sometimes using these to argue that morbidity has been improving 3 , but more recently to argue that morbidity has been deteriorating. [4][5][6] However, these trends are potentially misleading: they include older people as well as the working-age population; they confuse a Short summaries of the resulting 39 measures are given in this paper, and full details are given in the table below. Measures are taken from the initial face-to-face survey unless otherwise specified. The Stata code to create these variables in consistent form from the publicly available HSE files are available from OSF 7 and www.benbgeiger.co.uk. -"I am confined to bed" [This is part of the widely-used EQ-5D health status indicator 8 . However, for the purposes of this paper we have separated the individual measures that make up the EQ-5D in order to compare these to similar indicators of morbidity within each domain]. People are classified as having a problem with self-care today if they had some problems walking about or were confined to bed. Locomotor limitation This is based on the personal care disability scale used in the 2001 HSE report 9 . Respondents in 1995, 2000 and 2001 were asked if any of the following applied to them (interviewers were instructed to ignore temporary disabilities that are expected to last less than one year):

Activity limitations and MSDs
-"Cannot walk 200 yards or more on own without stopping or discomfort". People who reported a limitation were asked if they used a walking aid, and if they did, were then asked if they could walk 200 yards without the walking aid. -"I am unable to wash or dress myself" [This is part of the widely-used EQ-5D health status indicator 8 . However, for the purposes of this paper we have separated the individual measures that make up the EQ- 5D in order to compare these to similar indicators of morbidity within each domain]. People are classified as having a problem with self-care today if they had some problems washing/dressing or were unable to wash/dress themselves.

Self-care limitation
This is based on the personal care disability scale used in the 2001 HSE report 9 . Respondents in 1995, 2000 and 2001 were asked if any of the following applied to them (interviewers were instructed to ignore temporary disabilities that are expected to last less than one year): -"Cannot get in and out of bed on own without difficulty" -"Cannot get in and out of a chair without difficulty" -"Cannot dress and undress without difficulty" -"Cannot wash hands and face without difficulty" -"Cannot feed, including cutting up food without difficulty" -"Cannot get to and use toilet on own without difficulty" People are classified as having a self-care limitation if they reported ANY of these limitations. Pain -"I have no pain or discomfort" -"I have moderate pain or discomfort" -"I have extreme pain or discomfort" [This is part of the widely-used EQ-5D health status indicator 8 . However, for the purposes of this paper we have separated the individual measures that make up the EQ-5D in order to compare these to similar indicators of morbidity within each domain]. Two outcome measures are based on this: whether people have any pain (the 2 nd and 3 rd categories combined), and whether they have extreme pain ( disease; Sever's disease; Spondylitis, spondylosis; Stiff joints, joint pains, contraction of sinews, muscle wastage; Strained leg muscles, pain in thigh muscles; Systemic sclerosis, myotonia (nes); Tenosynovitis; Torn muscle in leg, torn ligaments, tendonitis; Walk with limp as a result of polio, polio (nes), after affects of polio (nes); Weak legs, leg trouble, pain in legs; and Worn discs in spine -affects legs. The code explicitly excludes: Damage/injury to spine results in paralysis; Sciatica or trapped nerve in spine; and Muscular dystrophy.

Circulatory
High blood pressure LSI Every year 1994-2011, people who report a longstanding illness (LSI) are then asked, 'what is the matter with you?'; up to 6 responses are then coded by the interviewer into a consistent coding frame based on the International Classification of Diseases. The high blood pressure LSI measure is based on the group labelled 'Hypertension/high blood pressure/blood pressure (nes)', which as of 2011 includes only the conditions listed in the group label. Recent high blood pressure Respondents in 1994Respondents in , 1998Respondents in , 2003Respondents in , 2006Respondents in and 2009Respondents in -2014 were asked a series of questions on whether they have high blood pressure: -Finally, those with doctor-diagnosed high blood pressure (excluding only when pregnant were asked: "Are you currently taking any medicines, tablets or pills for high blood pressure?", and those saying 'no' (or not giving an answer) were then asked, "Do you still have high blood pressure?" People were considered to have recent high blood pressure if they said they had ever been diagnosed as having high blood pressure by a doctor (excluding when pregnant), and that they still have high blood pressure or are currently taking medicines for it. While the question wording has stayed consistent, a discontinuity seems to be introduced by a change in question context. In some years (1994, 1998, 2003, 2006 and 2011), this question was preceded by a question that asked, "May I just check, have you ever had your blood pressure measured by a doctor or nurse?" (and then for those saying yes, they were asked how recently this was, and whether they were told that it was 'normal (alright/fine), higher than normal, lower than normal, or were you not told anything?'). However, in other years , this question was not asked. Given the way in which context can affect question interpretation, we treat these as two separate measures of recent high blood pressure. Biomarker high blood pressure During the nurse visit (which took place for all consenting respondents in all years except 1999, 2002 and 2004, when the nurse visit focussed on particular subsamples), respondents' blood pressure was measured. High blood pressure is defined as a systolic blood pressure >= 140mmHg and diastolic blood pressure >= 90mmHg following HSE established practice, in turn following 10 . The measurement of blood pressure changed in 2003, from a Dinamap monitor to an Omron monitor. A conversion is available between the two monitors based on a calibration study, and this has been regularly used by the HSE team to produce continuous trends in blood pressure -see www.hscic.gov.uk/catalogue/PUB00480. For adults, the conversion is as follows: blood pressure measurement -these are discussed in the Web Appendices 2 and 3.

High cholesterol
In the years 1994, 1998, 2006, and 2008-14, blood samples were obtained during the nurse visit, which were then analysed for total cholesterol. A high level of total cholesterol ('hypercholesterolaemia') is an established risk factor for CVD, and high cholesterol is defined following conventional practice at the NICE guidance 'audit level' of 5mmol/L or above 11 12 .
The measurement of cholesterol changed slightly in 2010 when a new laboratory was used. This resulted in values that are an average of 0.1mmol/L higher, and later values are therefore adjusted by this amount to maintain comparability over time as in 11 . Low HDL cholesterol In the years 1994, 1998, 2006, and 2008-14, blood samples were obtained during the nurse visit, which were then analysed for high density lipoprotein (HDL) cholesterol. HDL cholesterol reduces the risk of CVD (it carries cholesterol away from the arteries towards the liver), and it is therefore low HDL cholesterol that indicates poorer health; low HDL cholesterol is here defined as 1 mmol/L or less 11 12 . The measurement of HDL cholesterol changed slightly in 2010 when a new laboratory was used. This resulted in values that are an average of 0.1mmol/L lower, and later values are therefore adjusted by this amount to maintain comparability over time as in 11 . Recent heart attack/stroke Respondents in 1994Respondents in , 1998Respondents in , 2003Respondents in , 2006 and 2011 were asked a series of questions on whether they have had a heart attack (within a battery of questions about different types of heart disease): -"Have you ever had a heart attack (including myocardial infarction or coronary thrombosis)?" -Those responding 'yes' were then asked "Were you told by a doctor that you had a Heart Attack (including myocardial infarction or coronary thrombosis)?" -Those with doctor-diagnosed angina were asked, "Have you had a heart attack (including myocardial infarction and coronary thrombosis) during the past 12 months?" Respondents in these years were similarly asked about stroke: -"Have you ever had a stroke?" -Those responding 'yes' were then asked, "Were you told by a doctor that you had a stroke?" -Those with doctor-diagnosed stroke were asked, "Have you had a stroke during the past 12 months?" People were considered to have recent IHD or stroke if they said they had ever been diagnosed as having stroke or a heart attack by a doctor, and that they have had a heart attack or stroke during the past 12  different types of heart disease -including angina; heart attack (including myocardial infarction or coronary thrombosis); a heart murmur; abnormal heart rhythm; or other heart trouble. For EACH of these, they were asked: -"Have you ever had <type of heart disease>?" -Those responding 'yes' were then asked "You said that you had <type of heart disease>. Were you told by a doctor that you had <type of heart disease>?" -For heart murmurs only, women saying they had doctor-diagnosed heart murmurs were asked if they were pregnant when told this, and if so, whether they were ever told they had a heart murmur when they were not pregnant.
-Those with doctor-diagnosed heart disease (excluding heart murmurs when pregnant) were asked, "Have you had <type of heart disease> during the past 12 months?" People were considered to have recent CVD if they said they had a doctor-diagnosed heart condition and that they had had this during the past 12 months. Cardiovascular (CVD) LSI Every year 1994-2011, people who report a longstanding illness (LSI) are then asked, 'what is the matter with you?'; up to 6 responses are then coded by the interviewer into a consistent coding frame based on the International Classification of Diseases. The CVD LSI measure is based on the groups labelled 'Stroke/cerebral haemorrhage/cerebral thrombosis', 'Heart attack/angina', Hypertension/high blood pressure/blood pressure (nes)', 'Other heart problems', 'Piles/haemorrhoids incl. Varicose Veins in anus', 'Varicose veins/phlebitis in lower extremities', and 'Other blood vessels/embolic'. As of 2011 this includes: Aorta replacement; Aortic valve stenosis; Aortic/mitral valve regurgitation; Arterial thrombosis; Arteriosclerosis, hardening of arteries (nes); Artificial arteries (nes); Atrial Septal Defect (ASD); Blocked arteries in leg; Blood clots (nes); Cardiac asthma; Cardiac diffusion; Cardiac problems, heart trouble (nes); Cerebrovascular accident; Coronary thrombosis, myocardial infarction; Dizziness, giddiness, balance problems (nes); Hand Arm Vibration Syndrome (White Finger); Hardening of arteries in heart; Heart attack/angina; Heart disease, heart complaint; Heart failure; Heart murmur, palpitations; Hemiplegia, apoplexy, cerebral embolism; Hole in the heart; Hypersensitive to the cold; Hypertension/high blood pressure/blood pressure (nes); Intermittent claudication; Ischaemic heart disease; Low blood pressure/hypertension; Mitral valve stenosis; Pacemaker; Pains in chest (nes); Pericarditis; Piles/haemorrhoids incl. Varicose Veins in anus; Poor circulation; Pulmonary embolism; Raynaud's disease; St Vitus dance; Stroke victim -partially paralysed and speech difficulty; Stroke/cerebral haemorrhage/cerebral thrombosis; Swollen legs and feet; Tachycardia, sick sinus syndrome; Telangiectasia (nes); Thrombosis (nes); Tired heart; Valvular heart disease; Valvular heart disease; Varicose veins in Oesophagus; Varicose veins/phlebitis in lower extremities; Various ulcers, varicose eczema; Weak heart because of rheumatic fever; Wolff -Parkinson -White syndrome; and Wright's syndrome. It explicitly excludes balance problems due to ear complaint & haemorrhage behind eye. While the LSI coding frame generally stays consistent over this period, interpretation of 'IHD LSI' is complicated by two changes: 'Too much cholesterol in blood' is included in this category in 1994 only, and Polyarteritis Nodosa is later moved into this code (the documentation is not clear on whether this occurred in 2000 or 2001). Angina symptoms This is taken from the Rose Angina questionnaire 13 14 . Respondents in 199413 14 . Respondents in , 199813 14 . Respondents in , 200313 14 . Respondents in , 2006 and 2011 were asked a series of questions about symptoms of heart trouble (rather than whether they had been diagnosed): o Those who said they stop or slow down were asked, "If you stand still does the pain go away or not?" (If respondents were unsure, they were asked, "What happens to the pain on most occasions?"). If the pain goes away, they were asked, "How soon does the pain go away? Does it go in 10 minutes or less, or more than 10 minutes?" o Those who said the pain goes away in 10 minutes or less were asked, "Will you show me where you get this pain or discomfort? Where else" The interviewer then coded the site as Sternum (upper or middle) | Sternum lower | Left anterior chest | Left arm | Right anterior chest | Right arm | (Somewhere else). Following the HSE reports, possible angina is defined as chest pain or discomfort that (i) includes either the sternum or the left arm and left anterior chest; (ii) is prompted by hurrying or walking uphill (or by walking on the level, for those who never attempt more); (iii) makes the respondent either stop or slacken pace; and (iv) usually disappears in 10 minutes or less when they stand still. Heart attack symptoms This is taken from the Rose Angina questionnaire. Respondents in 1994Respondents in , 1998Respondents in , 2003Respondents in , 2006 and 2011 were asked, "Have you ever had a severe pain across the front of your chest lasting for half an hour or more?" As in the 2006 HSE report, those responding 'yes' are treated as having a possible heart attack (myocardial infarction).

COPD symptoms
Respondents in 1995, 1996 and 2010 were asked: o "Do you usually cough first thing in the morning in the winter?" (In 2010 only, respondents had previously been asked "Do you usually cough first thing in the morning?" -but this is not used to filter people into the questions on coughing in winter).
o "Do you usually bring up any phlegm from your chest, first thing in the morning in the winter?" (Again, this was asked to everyone in all years, but was preceded by an additional, non-winter-specific question in 2010). -Grade 2 dyspnoea: people who report shortness of breath when hurrying on level ground or walking up a slight hill (or who report shortness of breath when walking on level ground, but who say they never walk up hill or hurry).
-Grade 3 dyspnoea: people who report shortness of breath when walking with people of own age on level ground, or who have to stop for breath when walking at own pace on level ground. -Those that said yes were then asked, "Have you had wheezing or whistling in the chest in the last 12 months?" -(For those who said they had ever been told by a doctor they had asthma; see above), "When was your most recent attack of asthma? PROMPT IF NECESSARY: Less than 4 weeks ago | More than 4 weeks but within the last 12 months | One to five years ago | More than 5 years ago" People who said they had EITHER wheezing/whistling in the past 12 months or an asthma attack in the past 12 months were counted as having recent wheezing/asthma symptoms.
[It should be noted that the filtering to the second question is very slightly different in 2010 compared to previous years (it was only asked to people who said they had not had wheezing/whistling in the chest in the past 12 months). However, given the way that the derived variable is calculated here, the change in filtering does not introduce any

Anthropometric & diabetes
During the initial face-to-face interview in all years (except 2013), respondents were asked if they would consent to having their height and weight measured by the interviewer. The reasons for missingness (and their trends over time) are given in Web Appendices 2 & 3; note that there are three changes that give rise to small discontinuities in 2009 and 2011. Obesity is a risk factor for diabetes (hence its inclusion in this section) but also heart disease and some cancers. Obesity is defined as a Body Mass Index (BMI) of >= 30kg/m 2 as per the World Health Organization's BMI classification 16  In the years 2003, 2006, and 2008-14, blood samples were obtained during the nurse visit, which were then analysed for glycated haemoglobin (HbA 1C ). HbA 1C is a measure of the share of haemoglobin (within red blood cells) that glucose is attached to, with higher levels indicated less well-controlled diabetes in the previous three months 20 . Following the recommendations of a 2009 expert committee, we mirror recent HSE reports in using a threshold of 48mmol/mol (i.e. 48 millimoles of glycated haemoglobin per mole of haemoglobin) as the threshold for raised HbA 1C , a different threshold to that used in earlier HSE reports. While the measurement of HbA 1C has been consistent in HSE from 1994, the units reported have changed from the % of haemoglobin that is glycated to mmol/mol. Earlier measures have been transformed into mmol/mol through the formula, mmol/mol = (% -2.15) x 10.929. HbA 1C was also measured in 1994 but using a different technique, which cannot be made comparable 21:67 . In the years 1998, 2003, 2006, and 2009, blood samples were obtained during the nurse visit, which were then analysed for C-reactive protein (CRP). CRP is an inflammatory marker, which can indicate heart-related inflammation (it is used to test for heart failure) but can also indicate other sorts of health damage including diabetes. However, there are still debates about exactly what CRP shows, both in terms of its causal role in heart disease, and whether it also indicates depression. 22 Raised CRP is defined as >3mg/L, the standard cut-off for a clinically significant rise in CVD 23 24 . Participants with CRP >10mg/L are excluded, as this is taken to be evidence of current infection rather than inflammation from chronic disease.

Raised Fibrinogen
In the years 1998, 2003, 2006, and 2009, blood samples were obtained during the nurse visit, which were then analysed for fibrinogen. Like CRP, fibrinogen is an inflammatory marker, which is both commonly thought to be a causal risk factor for CVD (it is a component of coagulation), and which seems to be a risk factor for other diseases (including cancer and diabetes) 25 .
While fibrinogen is often analysed as a continuous variable with no cutpoints 24 , we here define raised fibrinogen as>4mg/L as in 12 . As for CRP, participants with CRP >10mg/L are excluded, as this is taken to be evidence of current infection rather than inflammation from chronic disease. A change of analysis method and laboratory between 1994 and 1998 means that the 1994 results are not comparable to the later results 26 In the self-completion survey in most years (except 1996, 2007, 2011 and 2013), respondents were asked the following series of questions: -"Please read this carefully: We should like to know how your health has been in general over the past few weeks. Please answer ALL the questions by ticking the box below the answer which you think most applies to you. Have you recently...
-"…been able to concentrate on whatever you're doing?" RESPONSES: "Better than usual" | "Same as usual" | "Less than usual" | "Much less than usual" -"…lost much sleep over worry?" RESPONSES: "Not at all" | "No more than usual" | "Rather more than usual" | "Much more than usual"" -"…felt you were playing a useful part in things?" RESPONSES: "More so than usual" | "Same as usual" | "Less useful than usual" | "Much less useful"" -"…felt capable of making decisions about things?" RESPONSES: "More so than usual" | "Same as usual" | "Less so than usual" | "Much less capable"" -"…felt constantly under strain? RESPONSES: "Not at all" | "No more than usual" | "Rather more than usual" | "Much more than usual"" -"..felt you couldn't overcome your difficulties?" RESPONSES: "Not at all" | "No more than usual" | "Rather more than usual" | "Much more than usual"" -"…been able to enjoy your normal day-to-day activities?" RESPONSES: "More so than usual" | "Same as usual" | "Less so than usual" | "Much less than usual" -"…been able to face up to your problems?" RESPONSES: "More so than usual" | "Same as usual" | "Less able than usual" | "Much less able" -"…been feeling unhappy and depressed? RESPONSES: "Not at all" | "No more than usual" | "Rather more than usual" | "Much more than usual" -"…been losing confidence in yourself? RESPONSES: "Not at all" | "No more than usual" | "Rather more than usual" | "Much more than usual" -"…been thinking of yourself as a worthless person?" RESPONSES: "Not at all" | "No more than usual" | "Rather more than usual" | "Much more than usual"" -"…been feeling reasonably happy, all things considered?" RESPONSES: "More so than usual" | "Same as usual" | "Less so than usual" | "Much less happy" These make up the 12-item General Health Questionnaire GHQ-12; 28 , a well-validated, widely-used measure of probable mental ill-health. This is often termed general nonpsychotic psychiatric morbidity, but I here use the more easily understood term 'psychological distress' following Stochl et al 2016. 29 A total score has been created by first ensuring that all questions were coded from 1 (positive symptom) to 4 (negative symptom), and then creating a sum score for all the number of questions in which people answered with categories 3 or 4 (indicating a negative symptom). A binary measure (often called GHQ caseness) was created for people who had negative symptoms for 4 or more of the 12 questions. Anxiety/depression -"I am extremely anxious or depressed" [This is part of the widely-used EQ-5D health status indicator 8 . However, for the purposes of this paper we have separated the individual measures that make up the EQ-5D in order to compare these to similar indicators of morbidity within each domain]. Two outcome measures are based on this: whether people have any anxiety/depression (the 2 nd and 3 rd categories combined), and whether they have extreme anxiety/depression (3 rd category only).

Hearing, seeing & communication limitations
These measures were not included in the main paper due to the short time frame that we can examine trends over, but are included in the Web Appendix as they relate to important domains of morbidity. They were included in the disability scale used in the 2001 HSE report 9 . Respondents in 1995, 2000 and 2001 were asked if of the following applied to them (interviewers were instructed to ignore temporary disabilities that are expected to last less than one year): • "Cannot follow a TV programme at a volume others find acceptable (with hearing aid if normally worn)" ('hearing limitation') • "Cannot see well enough to recognise a friend across a road (four yards away) (with glasses or contact lenses if normally worn)" ('seeing limitation') • "Have problem communicating with other people -that is, have problem -"I am unable to perform my usual activities" [This is part of the widely-used EQ-5D health status indicator 8 . However, for the purposes of this paper we have separated the individual measures that make up the EQ-5D in order to compare these to similar indicators of morbidity within each domain]. Two outcome measures are based on this: whether people have any problems (the 2 nd and 3 rd categories combined), and whether they are unable to perform their usual activities (3 rd category only).

Limitations in past 2wks
Every year, respondents were asked, "Now I'd like you to think about the two weeks ending yesterday. During those 2 weeks did you have to cut down on any of the things you usually do (about the house or at work or in your free time) because of your answer at <the LSI question> or some other illness or injury?" There have been two small changes to this question's wording in 1996. Firstly, 'work' was changed to 'work/school'. Secondly, 'your answer at <the LSI question>' was changed to 'a condition you have just told me about'. While it is impossible to be sure of the exact effect of these changes, neither seem likely to influence the results (at least for the 25+ age group where fewer individuals are in full-time education). Measures not included in the main paper Trends in several measures are not included in the main paper, either The details of these measures are as follows:

Circulatory
Beyond 'recent': 'Ever had' and 'DD' CVD In the main paper, we look at whether people report recent doctor-diagnosed CVD (looking separately at heart attack/stroke, angina, and any recent CVD). As shown above, this comes from three questions: whether people report ever having this condition; whether a doctor diagnosed this; and whether they have had an attack in the past 12 months / consider themselves to still have the condition. Web Appendix 6 shows trends in the other versions of these measures, i.e. having ever had this type of CVD, and having ever doctor-diagnosed ('DD') CVD of this type.

Component measure:
Heart murmur Irregular heart rhythm Other heart disease In the main paper, we recent reports of doctor-diagnosed angina; heart attack (including myocardial infarction or coronary thrombosis); a heart murmur; abnormal heart rhythm; or other heart trouble (see above). Angina and heart attack are also analysed in the main paper in their own right; in Web Appendix 6, we further show trends separately in heart murmur, abnormal heart rhythm or other heart trouble.

Respiratory
Component measure: 'phlegm' In the main paper, we look at whether people report recent COPD (see above). This combines two measures: regular cough + phlegm. Web Appendix 6 shows the trend in the phlegm measure on its own, without being combined with a regular cough.

Alternative version:
'LSI respiratory' In the main paper, we look at whether an asthma LSI (to examine alongside a direct question on diagnosed asthma); see above. Web Appendix 6 also shows people reporting a longstanding illness ('LSI') which is included within the broader category of respiratory conditions. • If complaint is breathlessness with the cause also stated, this is coded with the cause -hence it also excludes breathlessness as a result of anaemia, breathlessness due to hole in heart, and breathlessness due to angina.

Component measure: Wheezing
In the main paper, we look at whether people report recent wheezing/asthma. As shown above, this comes from three questions: whether people report ever having had wheezing or whistling in the chest; whether they have had this in the past 12 months; and whether they have had an asthma attack in the past 12 months. Web Appendix 6 shows trends in the other versions of these measures, i.e. having ever had wheezing/whistling in the chest, and whether they have had this in the past 1 months.
Beyond 'recent': 'Ever had' and 'DD' diabetes In the main paper, we look at whether people report recent doctor-diagnosed diabetes As shown above, this comes from three questions: whether people report ever having this condition; whether a doctor diagnosed this; and whether they currently inject insulin / take other medication for diabetes. Web Appendix 6 shows trends in the other versions of these measures, i.e. having ever had diabetes, and having ever doctor-diagnosed ('DD') diabetes.

Activity limitations
For comparison: Walking limitation This is based on the personal care disability scale used in the 2001 HSE report 9 . Respondents in 1995, 2000 and 2001 were asked if of the following applied to them (interviewers were instructed to ignore temporary disabilities that are expected to last less than one year): "Cannot walk 200 yards or more on own without stopping or discomfort". People who reported a limitation were asked if they used a walking aid, and if they did, were then asked if they could walk 200 yards without the walking aid.

For comparison: Washing & dressing limitation
This is based on the personal care disability scale used in the 2001 HSE report 9 . Respondents in 1995, 2000 and 2001 were asked if any of the following applied to them (interviewers were instructed to ignore temporary disabilities that are expected to last less than one year): -"Cannot dress and undress without difficulty" -"Cannot wash hands and face without difficulty" For comparison to the 'problems with washing/dressing today' measure in the main paper (which covers a more extended period and is based on a different question; see above), a measure is derived if respondents say they report either of these problems.

Other LSIs
Every year 1994-2011, people who report a longstanding illness (LSI) are then asked, 'what is the matter with you?'; up to 6 responses are then coded by the interviewer into a consistent coding frame based on the International Classification of Diseases. The various other LSIs are as follows: • The Blood Disorders LSI measure is based on the group 'Disorders of blood and blood forming organs and immunity disorders', which as of 2011 includes: Anaemia, pernicious anaemia, Blood condition (nes), blood deficiency, Haemophilia, Idiopathic Thrombochopenic Purpura (ITP), Immunodeficiences, Polycthaemia (blood thickening), blood to thick, Purpura (nes), Removal of spleen, Sarcoidosis (previously code 37), Sickle cell anaemia/disease, Thalassaemia, Thrombocythenia. It explicitly excludes Leukaemia -code 01.
• The Cancer LSI measure is based on the group 'Cancer (neoplasm) including lumps, masses, tumours and growths and benign (non-malignant) lumps and cysts', which as of

Appendix 7: Year-by-year trends
This appendix presents the year-by-year trends for all of the variables included in the main paper. The table row labelled 'start v end sig' presents the p-value for testing the null hypothesis that there is no difference between the first and last years in the series (whichever these years are). Note that this will differ from the confidence intervals presented in the main paper as these are grouped into multi-year periods with larger sample sizes and therefore greater precision.

Appendix 8: Others' analyses over change over time using HSE data
Changes over time in some of these indicators have not previously been analysed (e.g. waist-hip ratio, fibrinogen). However, others have been studied but never integrated into a single picture of changing morbidity; we review these in this section. (For reasons of space these are included here rather than in the main text).

Cardiovascular morbidity
1998-2011 trends in the two biomarkers for total and HDL cholesterol using HSE data are shown in Oyebode, 11 who find similar results.

Respiratory morbidity
A subset of the HSE respiratory indicators (ever/past year wheezing, doctor-diagnosed asthma) were analysed by Hall and Mindell 31 looking at 2001-2010, and finding similar changes over time to our analysis. They found stability in some measures (ever wheezing) but improvements in others (pastyear wheezing) -at the same time as the reported prevalence of doctor-diagnosed asthma increased.

Obesity & diabetes
While the English trends in waist-hip ratio have not previously been analysed, earlier Scottish trends are given in Hotchkiss et al 2012. 19 Trends in diabetes have been covered in several HSE reports, e.g. Moody 2012, 20 as has BMI (see particularly the paper by Sperrin et al 2014, 32 who also created a publicly-available time-series HSE dataset for this purpose).

Activity limitations, pain & musculoskeletal morbidity
While musculoskeletal LSIs have not previously been analysed in HSE, a decline can also be seen in the General Household Survey. 33

Mental health
In the UK and most other high-income countries, benefit claims due to mental ill-health have been rising, 34 which has come alongside considerable increases in mental health diagnosis and treatment. 35 The extent to which this reflects rises in mental ill-health and genuinely declining work capacity, however, has long been the subject of debate. 36 37 Perhaps the most robust long-term general population data series in the UK is the Adult Psychiatric Morbidity Survey. 35 38 While some studies have used HSE to show rises in mental ill-health, others have used the same data to come to the opposite conclusion. 39 40 These contrasting conclusions are explained by the tables in Web Appendix 7 which show year-by-year changes: moderate mental ill-health fell between the mid-1990s and the mid-2000s, before rising in 2009, and with a particularly high prevalence in 2011. The conclusions of studies will therefore depend on the years they use as their start and end periods for the trend analysis. 3 It is also worth noting that our results for considerable increases in mental health LSIs can also be seen in a similar measure in the Labour Force Survey. 41   It has been suggested that multimorbidity has risen among older people in England 44 and for all age groups in Ontario, 45 although others have cautioned against using simple disease counts, 46 and the evidence cited in the introduction of the main paper suggests that rising chronic disease reporting may partly be a result of increasing awareness (rather than underlying prevalence) of disease.

Appendix 9: Summarising multiple measures
Having reviewed trends in 39 morbidity measures, we have seen that morbidity in the English working-age population has improved in some respects and deteriorated in others. For those who view work-related morbidity as intrinsically multidimensional, 47 , this is the endpoint of our analysis. However, for those who conceive of morbidity as unidimensional -or those who are interested in morbidity as it relates to a unidimensional work capacity -this raises the question of how we weight different dimensions of morbidity to decide if the overall change in morbidity has been positive or negative.

Methods for creating unidimensional morbidity scales
Several methods have been proposed for creating unidimensional morbidity scales, but most of these are unavailable using the HSE data: • Weights can be based on empirically-derived preferences for different health states, of which the most famous example is the WHO Global Burden of Disease (GBD) study 48 . Some GBD estimates for trends in disability in the UK do exist, and suggest that the prevalence of disability in the working-age population is unchanged 1990-2010, though these results are only presented in passing. 4 For our analyses, however, we have no preference-based weights for most of the HSE measures (excluding the subset of measures that make up the EQ-5D scale).
• Those reporting limitations beyond a certain severity in any domain can be categorised as 'disabled', as recommended by the Washington Group on Disability Statistics (see above). However, as previously discussed, we have few functional limitations measures available in HSE.
• Latent morbidity scales can be created based on the inter-correlations between different measures (using item response theory), as used in the World Disability Report 51 and by researchers associated with the US National Bureau of Economic Research e.g. 52 . However, it is unclear why we would wish to weight items in this way: a given morbidity indicator may be severe, yet if it is unrelated to other morbidity measures it will be given a low weight.
• Latent morbidity scales can also be created based on the independent correlation between each indicator and a general measure of morbidity, such as general self-reported health or 53 as in 54 . This maintains some of the advantages of single-item measures (in providing a basis for making morbidity unidimensional), while avoiding the potential threats to validity discussed above. However, the inconsistent inclusion of measures in each HSE wave prevents a unidimensional morbidity scale being constructed here. 4 Trends in the UK GBD results are reported in Murray et al. 49 However, Murray et al do not focus on trends in years lived with disability (YLD), other than to note that "YLDs per person by age and sex have not changed substantially in the UK, but age-specific mortality has been improving" (p1005). The figure in the supplementary appendix shows that YLDs have barely changed for either men or women at any age. However, the confidence intervals for YLDs as a whole in the main paper (Table 1) suggest that the confidence intervals for these trends are very wide. The public GBD data 50 do provide cause-disaggregated YLDs for the UK (and all other countries) for a slightly different period (2000-2015), but are not age-standardised, are within broad age groups only (e.g. [15][16][17][18][19][20][21][22][23][24][25][26][27][28][29], and again lack estimates of uncertainty.  54 To see how important measures are for general health, we regress 'bad' general health (see Appendix 5 for detail on the underlying question) on age, sex (and their interaction), educational level and each individual morbidity measure in turn, using all years for which that morbidity measure is available. That is, for each morbidity indicator morbidity we use the following model: badhealth = logit β morbidity + + β + ! " * male $ % + & '() * +, -… where β is our primary outcome coefficient showing the importance of that morbidity indicator for bad health, refers to a vector of age dummy variables, male $ refers to a binary gender dummy variable, '() * +, refers to a vector of education dummy variables (with four levels: degree/full-time student, A-levels/NVQ3/higher education below degree, other qualifications, or no qualifications), and , . , ! , and & refer to the coefficients on age, gender, their interaction and education respectively.
We adjust for education as well as age & sex to enable us to examine the importance of the measure for bad health, after taking account of whether general health and the measure are both strongly related to social status. Note however that it is not possible to control for all morbidity measures simultaneously (as we discuss just above) -so this is a rough indicator of the importance of that morbidity measure for general health, rather than a reliable indicator of the causal impact net of comorbidities.
The results of this analysis are shown overleaf, ordered by the effect on bad health. (We also repeat the trend in each measure for convenience; this is discussed following the table). Having estimated this, we can see if the areas in which morbidity has been improving or declining are those that are particularly important for general health. This is shown visually in Figure 1 below (the measures are not labelled to enable the overall pattern to be seen, but the top-to-bottom order of measures is the same in the figure as in the preceding table; i.e. the measure at the top of the figure is 'Pain-extreme'). It is easiest to interpret the figure by focussing on each group of measures in turn. Firstly, the biomarkers tend to have the weakest relationship with general health. Those with high levels of the diabetes biomarker (glycated haemoglobin) are 9.7% more likely to say they have bad health, and those who are underweight, with a high waist-hip ratio, raised fibrinogen, or low HDL cholesterol are 4-6% more likely to report bad health, but the other measures only had weaker relationships. Indeed, there was effectively no relationship between bad reported health and any of measured high blood pressure, high total cholesterol or iron deficiency.
Secondly, most of the measures based on medical labels have a moderately strong relationship with bad health (the weakest being lifetime asthma and recent high blood pressure, both of which can be asymptomatic), and these measures have mostly risen over time. There are however notable exceptions to this, including IHD/stroke LSI, recent angina and recent heart attack/stroke (the labelbased measures with some of the strongest relationships with bad reported health), as well as arthritis and other musculoskeletal LSIs.

Instructions to authors
Complete this checklist by entering the page numbers from your manuscript where readers will find each of the items listed below.
Your article may not currently address all the items on the checklist. Please modify your text to include the missing information. If you are certain that an item does not apply, please write "n/a" and provide a short explanation.
Upload your completed checklist as an extra file when you submit to a journal.
In your methods section, say that you used the STROBE cross sectionalreporting guidelines, and cite them as:

Title and abstract
Title #1a Indicate the study's design with a commonly used term in the title or the abstract 1, 3

Competing interests
The author has worked on secondment at the UK Department for Work and Pensions (DWP) in 2015-16.

Data sharing
The musculoskeletal-related health. In several domains we also see stable or rising chronic disease diagnoses even where symptomatology has declined. While data limitations make it challenging to combine these measures into a single morbidity index, there is little systematic trend for declining morbidity to be seen in the measures that predict self-reported health most strongly.
Conclusions: Despite considerable falls in working-age mortality -and the assumptions of many policymakers that morbidity will follow mortality -there is no systematic improvement in overall working-age morbidity in England from 1994 to 2014.

Strengths and limitations of this study
 We provide a robust analysis of changes over time in morbidity in England for 39 measures across two decades using the Health Survey for England ('HSE').
 We include every morbidity measure for which consistent comparisons over time can be constructed in the HSE.
 We take care to maximise comparability over time, including constructing new non-response weights.
 However, response rates for each stage of the HSE have declined over time, and it is impossible to rule out changing non-response biases.
 There are also several dimensions of morbidity for which there is little trend data in HSE.  9 10 The lack of evidence is even more problematic within social security, where many policymakers have assumed that working-age morbidity must have improved in recent decades given improvements in mortality (despite the potential for declining mortality to coexist with rising morbidity) 6 -and that therefore high/rising levels of claims are not 'genuine'. 11 12 Almost the only direct evidence on changes over time in working-age morbidity in high-income countries comes from the US. Contrary to policymaker expectations, these studies have generally found deteriorating morbidity since the mid-1990s, particularly activities of daily living (ADLs) and physical functioning. [13][14][15][16] Other studies have focused on the older working-age population with similar results. 2  This study therefore asks: is there empirical support for the hypothesis that workingage morbidity in England has declined? (H 1 ). Or does the evidence support alternative hypotheses of stable (H2) or even declining (H3) morbidity? We answer this using the Health Survey for England (HSE), a high quality Government survey with a combined sample of 140,000 individuals. We examine 39 specific aspects of morbidity rather than reducing morbidity to a single measure, partly because these produce more reliable trends, and partly to capture the multidimensional nature of morbidity. 23

DATA AND METHODS
This section follows the STROBE cross sectional reporting guidelines. 25

Data source
Robust evidence of change over time requires consistently-collected, high-quality data. We use the HSE, an annual government-sponsored cross-sectional survey of 3,000-11,000 adults with no proxy responses.  A particular advantage is that the

Patient involvement
As this is a health monitoring study using secondary data, patients were not directly involved. However, from previous discussions we are aware that the study will be of interest to patient/disability advocacy groups, who will receive jargon-free summaries of the research.

Measures
We cannot interpret changes over time correctly without understanding different ways of operationalising 'morbidity'. 1 General health/disability measures -e.g. "How is your health in general?" -are a simple way of measuring morbidity unidimensionally, and clearly do capture something meaningful. 50  These inconsistencies mean that general health/disability measures are inadequate for answering our question: trends in such measures can differ wildly between different surveys covering nominally the same concept and population, e.g. for disability in England 9 or self-rated health in the US. 10 Indeed, the HSE itself shows that England has experienced deteriorating 'bad general health' at the same time as To robustly answer our research question, we must instead focus on more specific morbidity measures that capture multiple aspects of morbidity. Our systematic search found 39 such measures that are comparable over time: these are summarised in Table 1 61 62 We should nevertheless note that there is no guarantee that a given symptom/impairment-based question will be interpreted identically over time. 63 64 3. Biomarkers -that is, objective measures of biological or physiological measures -have considerable strengths in analysing change, as they largely avoiding reporting biases that are likely to vary between socioeconomic groups and over time. 65 They do this at the price of an indirect and sometimes still-debated relationship to morbidity (see Web Appendix 5), and do not cover several important morbidity domains (e.g. we lack good biomarkers for mental distress, pain and fatigue).

ANALYSIS
In the first instance we look at unadjusted changes over time in each morbidity indicator, showing the actual levels of morbidity found in the population. However, we primarily focus on changes after adjustment for sex and age (following others 66 67 ), akin to standardising for the age-sex composition of the population. Given that our aim is to describe changes rather than to explain them, we do not further adjust for potential causal influences on morbidity that are likely to vary over the period, such as employment over economic cycles. This is a task for future research, but we should note that such analysis is possible using our publicly-available time-series dataset that includes inter alia employment status, education and region.
We chose to examine discrete changes from the start to the end of available data for each measure, rather than using linear or non-linear trend terms. Given our aims of informing policy debates, this has three advantages: a discrete change is simple to interpret; it is compatible with the different start/end years available for different measures; and it does not require any assumptions about the functional form of trends (linear trends are particularly unlikely given the role of non-linear economic cycles). Individual survey years are grouped into 3-4 year periods to increase sample size and precision, but single-year prevalence is given in Web Appendix 7. Given our binary outcome measures, we use logistic regression models with the following form: To avoid a binary cut-off of statistical significance, 69 95% confidence intervals are used to convey precision. All analyses use weights, exclude boost samples that use different sampling methods, and adjust for the multistage clustered sample design and the stratification of the sample across survey years using the SVYSET command in Stata (although standard errors will be slightly underestimated as it is not possible to consistently adjust for sample stratification within years). For reasons of space, we are unable to discuss previous HSE studies of aspects of morbidity in the main text; these are instead described in Web Appendix 8.

Conditions with sharply declining mortality
We start by focussing on cardiovascular disease (CVD) and respiratory illness, which have both seen large falls in mortality (by >50% and >25% respectively among 0-64 year-olds 1994-2013; Web Appendix 1). Changes over time in morbidity, however, are shown in Table 2. Looking first at high blood pressure, biomarker-measured high blood pressure has halved over two decades (similar improvements are found for the biomarkers for total and HDL cholesterol). Yet when we look at self-reports (either people reporting this as an LSI, or in response to a direct question about having recent diagnosed high blood pressure), we see large rises over time. There has been an increasing diagnosis of high blood pressure and increasing prescriptions of blood pressurelowering drugs; these may have helped reduce the underlying incidence of high blood pressure while simultaneously raising people's awareness of morbidity. Table 2 further shows declines in several key types of CVD (heart attack, ministroke, angina), whether measured through people's reports of the disease itself or their reports of its symptoms. Nevertheless, the morbidity declines (8-50%) are often not on the scale of the declines in mortality (>50%); this is likely to be because mortality declines are partly driven by improved treatment, 70 which means each incident CVD case is likely to last longer. 71 72 More surprisingly, the measures of 'any reported CVD' show no improvement (with some, uncertain signs of rises). Looking at its sub-components (Web Appendix 6), this seems to be due to possible increases in diagnosed irregular heart rhythm and other heart trouble.
Finally, Table 2 shows that symptoms-based measures of respiratory morbidity have improved, particularly COPD symptoms (regular cough & phlegm) and breathlessness (at both levels), and more uncertainly for recent wheezing/asthma and wheezing stopping sleep. Again, though, diagnosis-related measures of asthma -reported diagnoses, or self-reports of having asthma as a longstanding illnesshave risen, even while underlying symptomatology is improving.
Overall, Table 2  prevalence of self-reported CVD conditions such as heart attacks have only declined by a smaller amount, and recent doctor-diagnosed hypertension, any CVD, and asthma diagnoses have either stayed stable or risen.

Conditions with claims of increasing prevalence
The previous section focussed on conditions where there may be an a priori expectation that morbidity has improved (given declining mortality); in this section, we focus on three areas where there have been widespread claims of increasing prevalence -obesity, diabetes, and mental health.
Looking at Table 3, we do indeed confirm a large rise in obesity in HSE (an 8.0-9.7% rise from an obesity prevalence of 16.9% in 1994-96). The rise in high waist-hip ratios -sometimes suggested to be a better measure of potential morbidity 73 -is even larger. This has come alongside little change in the prevalence of being underweight over this period.   77 ), whereas more extreme mental ill-health has more consistently risen.

Activity limitations, musculoskeletal and pain
Pain/musculoskeletal conditions are a major component of working-age morbidity, yet very few previous studies show changes over time in symptomatology, and even those that exist 78 sometimes have debatable comparability. 79 Table 4 shows a fall in some -but not all -HSE measures focussed on pain and musculoskeletal morbidity.
Arthritis as a longstanding illness (LSI) has declined (the precision of the estimates is greater when looking at 2008-10 rather than 2011-14, and shows a decline of 0.3-1.2%). There are some (similarly uncertain) signs that other musculoskeletal LSIs

Other measures
Changes over time in other measures are shown in Table 5 below. This includes four biomarkers that are more difficult to compare directly to self-reports: -Changes over time are available for two biomarkers of inflammation (Creactive protein ('CRP') and fibrinogen). These are associated with a number of conditions including heart disease, diabetes, cancer 81 and -in the case of CRP -even depression. 82 Table 5 shows that both biomarkers have rising morbidity from 1997-2000 to 2008-10 (although for CRP, the confidence interval is wide and there is a non-negligible possibility that the change is negative).
-The two other biomarkers available in HSE are clearly focussed on anaemia and iron deficiency. Table 5 shows that both of these have declined, with particularly clear evidence for a decline in iron deficiency.

Mortality in general
Given debates about whether historic improvements in life expectancy are being sustained, particularly in the US and UK, 1 2 it is important to note that in the period under study in this paper, working-age life expectancy was increasing. This can be seen in data from the Human Mortality Database (May 2016 update) 1993-2013, using one-year age and one-year period. This data shows that increases in mortality are not found for working-age people as a whole in any major countryfor example, standardised working-age death rates have declined by 23% in the US and 35% in the UK over 1993-2013.

Cause-specific mortality for the 0-64 population
The main text refers to cause-specific morality in several places, referring to the death rate among 0-64 year olds from cardiovascular disease (CVD), respiratory conditions, diabetes, and liver cirrhosis. These death rates refer to UK deaths within relevant ICD-10 codes (I00-I99 for CVD, J00-J99 for respiratory conditions, E10-E14 for diabetes), standardised to the European standard population, and taken from the World Health Organization European Office's Health for All Database (May 2016 version), http://www.euro.who.int/en/data-and-evidence/databases/european-health-for-all-databasehfa-db.

Appendix 2: Overall missingness in health measures
This appendix refers to overall item-level missingness; changing item-and unit-level missingness is covered in Appendix 3.

Interview measures
For those who took part in the initial face-to-face interview, the level of item missingness is shown below (including only those years in which each question was asked). This shows the itemmissingness is generally very low -only 1 of the 30 measures variables have item-missingness greater than 1%. -Difficult to take measurement: other respondents (between 3.8% and 6.1% depending on the year) have no valid BMI measurement because height or weight measures were not attempted, attempted but not obtained or useable, because the respondent was pregnant, or the respondent was too sick or unsteady.
-Refusal: the most common reason for no BMI measurement is an outright refusal (including those refusing out of anxiety, though this tends to be a minor reason). Refusal rates are 8.3% in 2014.

Self-completion measures
For those who completed the self-completion booklet, the level of item missingness is shown in the table below. Item missingness is relatively low compared to missingness from not completing the self-completion survey (51.5% of respondents in 2014).

Nurse visit measures
For those who took part in the nurse visit, the level of item missingness is shown in the table below. This shows that far more people have missing observations for measured high blood pressure than for their waist-hip ratio. This is despite the fact that we explicitly INCLUDE those who are on blood pressure-lowering drugs (about 5% of the sample at the start of the period and 10% at the end), on the grounds that their lowered blood pressure still conveys useful information about their health state. The main reason for the remaining high level of missingness is because people have recently exercised, smoked, drank or ate (12.2%).

Blood sample measures
For those from whom a blood sample was taken, the level of item missingness is shown in the table below.  All of these measures are affected by problems in transferring and storing the blood sample and with the measurement process, which results in problems with 3-10% of the blood samples depending on the measure and year. As for blood pressure, we explicitly INCLUDE those who are on lipidlowering drugs (0.4% 1994 to 7.9% 2014), on the grounds that their changed cholesterol level still conveys useful information about their health state. Item missingness is highest for fibrinogen, which not only has high rates of such failures (7.0-9.5%), but also has ineligibility due to likely infection (from raised CRP, 3.6-5.6% of those with blood samples) and taking drugs that affect the reading (3.7% to 7.7% dependent on the year). Item missingness is also high for C-reactive protein (CRP), which also excludes those with likely infections.

Dealing with item-level missingness
Because of the high level of item non-response for certain measures (BMI, high blood pressure, fibrinogen, and CRP), and moderate level for others (other blood sample biomarkers and waist-hip ratio) -and because of evidence of changing non-response at various stages of the survey processnon-response weights were created to try to correct for any biases that these introduce. This is described in further detail in Appendix 3.

Appendix 3: Changing non-response & weights
This appendix focuses on changes in unit-level non-response at different stages of HSE.

Sample frame coverage
As noted in the main paper, HSE is a household sample that excludes those in communal establishments. If we combine data from the 1991, 2001 and 2011 Censuses, 1 the communal population is as follows: Changing unit non-response within the sample frame As noted in the main paper, HSE supplies non-response weights from 2003, including adjustments for non-response to the nurse visit and blood sample using health and socioeconomic status from the initial interview. However, there had been a substantial decline in response rates prior to 2003, as shown in the table below: In general these trends are due to increases in refusal rates. However, the blood sample response rate is affected by two noticeable changes in eligibility over this period (people who are pregnant or who had blood/clotting disorders were ineligible throughout): 1. In 1998, people who had ever had an epileptic fit were excluded from the blood sample. This raised the ineligibility rate to 3.5% of the sample in 1998, from 0.6% in 1994.
2. In 2010, this was then relaxed so that those who had had an epileptic fit more than 5 years ago were again included in the blood sample. This lowered the ineligibility rate from 3.1% in 2009 to 2.4% in 2010.

Changing item non-response within responding people
There are also changes over time in item non-response (further detail on overall item non-response is given in Appendix 2). This includes: -BMI: there has been little systematic trend in one reason for the absence of a BMI measure (difficulty in taking BMI measurements). However, there are trends in other reasons: o Refusal: in line with the general participation rates at each stage of the interview above, BMI refusal rates rose sharply from 1.9% in 1994 to a peak of 11.5% in 2011, and remain at 8.3% in the 2014 data.
-Psychological distress: similarly to wider participation rates at each stage of the survey, item missingness within the self-completion survey does increase over time (e.g. for psychological distress symptoms, from 1.8% 1994 to 5.9% 2014).
-Measured high blood pressure: there was a noticeable rise over time in exclusion of high blood pressure measures on the grounds that people recently exercised, smoked, drank or ate (from 6.1% to 13.6%).

Creating non-response weights
To increase comparability over time, we create new weights 1994-2014 in several phases.

First-stage non-response weights
Firstly, we created a selection weight because some households were slightly more likely to be interviewed than others. (Until 2009, only three households at each address were interviewed. Those living at addresses with many households are therefore less likely to be interviewed). NatCen supplied selection weights for 2004-2013 to enable this (funded by this project), which are not available on the public HSE datasets.
Secondly, after adjusting for the selection weight, we created new individual-level (inverse probability) weights to match population age-sex-region totals in each year. Population data are annual mid-year population estimates from nomis. NatCen added the region variable for the 1994-1997 datasets to the public HSE datasets to enable this.

Second-stage non-response weights
After the first-stage adjustment for individual non-response, for the later stages of the interview (self-completion, BMI measurement, nurse visit, blood sample), we created a further weight that adjusts for non-response among those responding to the individual interview. This is based on a logit regression model to predict that stage of response based on: • Age and gender (4 age group categories interacted with gender); • Qualifications (degree or FT student / A-level or above / other qualifications / no qualifications); • Household type (presence of other adults in the household); • Employment status (yes/no); • Smoking (never regular smoker / ex-regular smoker / current regular smoker); and • Self-reported general health (bad or very bad health vs. other categories).
On the basis of these criteria, we create inverse probability weights -that is, we create a predicted probability of response for each respondent based on the logit regression model, and then create a weight that is the inverse of this predicted probability. The revised weights are included in the Stata code to enable replication of the full paper.

Final sample size
The final sample size is as follows:

Appendix 4: General self-reported health/disability
Trends in seven general health/disability measure are available in HSE: Trends for these measures are shown in Table 9 below. Looking first at good general health, the table shows the trend from 1994-6, when 80.9% reported good general health. By 2011-14, there had been a decline of 0.8 percentage points. When we adjust for the changing age and sex distribution of the working-age population (labelled 'Adj.' in Table 1), the decline is only 0.1%, with a wide confidence interval (-0.9 to +0.7%), and there is therefore little evidence for any systematic trend. For several of the general health measures, there is evidence of change over this period -but interpreting this is difficult, because the trends are in opposite directions. There is strong evidence for a rise in bad general health (a rise of 0.6-1.5% from a base of 4.4%), yet equally strong evidence for a decline in having problems with everyday activities (at both levels of severity), and being limited in activities by a longstanding illness. This shows the challenges in tracking population morbidity change through general, non-specific measures, which are likely to be as influenced by changes in reporting styles as much as changes in morbidity per se.
As an aside, UK Government publications have made claims based on healthy/disability-free life expectancy -sometimes using these to argue that morbidity has been improving 3 , but more recently to argue that morbidity has been deteriorating. [4][5][6] However, these trends are potentially misleading: they include older people as well as the working-age population; they confuse a  -"I am confined to bed" [This is part of the widely-used EQ-5D health status indicator 8 . However, for the purposes of this paper we have separated the individual measures that make up the EQ-5D in order to compare these to similar indicators of morbidity within each domain]. People are classified as having a problem with self-care today if they had some problems walking about or were confined to bed. Locomotor limitation This is based on the personal care disability scale used in the 2001 HSE report 9 . Respondents in 1995, 2000 and 2001 were asked if any of the following applied to them (interviewers were instructed to ignore temporary disabilities that are expected to last less than one year):

Activity limitations and MSDs
-"Cannot walk 200 yards or more on own without stopping or discomfort". People who reported a limitation were asked if they used a walking aid, and if they did, were then asked if they could walk 200 yards without the walking aid.  5D in order to compare these to similar indicators of morbidity within each domain]. People are classified as having a problem with self-care today if they had some problems washing/dressing or were unable to wash/dress themselves.

Self-care limitation
This is based on the personal care disability scale used in the 2001 HSE report 9 . Respondents in 1995, 2000 and 2001 were asked if any of the following applied to them (interviewers were instructed to ignore temporary disabilities that are expected to last less than one year): -"Cannot get in and out of bed on own without difficulty" -"Cannot get in and out of a chair without difficulty" -"Cannot dress and undress without difficulty" -"Cannot wash hands and face without difficulty" -"Cannot feed, including cutting up food without difficulty" -"Cannot get to and use toilet on own without difficulty" People are classified as having a self-care limitation if they reported ANY of these limitations. Pain -"I have extreme pain or discomfort" [This is part of the widely-used EQ-5D health status indicator 8 . However, for the purposes of this paper we have separated the individual measures that make up the EQ-5D in order to compare these to similar indicators of morbidity within each domain]. Two outcome measures are based on this: whether people have any pain (the 2 nd and 3 rd categories combined), and whether they have extreme pain ( disease; Sever's disease; Spondylitis, spondylosis; Stiff joints, joint pains, contraction of sinews, muscle wastage; Strained leg muscles, pain in thigh muscles; Systemic sclerosis, myotonia (nes); Tenosynovitis; Torn muscle in leg, torn ligaments, tendonitis; Walk with limp as a result of polio, polio (nes), after affects of polio (nes); Weak legs, leg trouble, pain in legs; and Worn discs in spine -affects legs. The code explicitly excludes: Damage/injury to spine results in paralysis; Sciatica or trapped nerve in spine; and Muscular dystrophy.

Circulatory
High blood pressure LSI Every year 1994-2011, people who report a longstanding illness (LSI) are then asked, 'what is the matter with you?'; up to 6 responses are then coded by the interviewer into a consistent coding frame based on the International Classification of Diseases. The high blood pressure LSI measure is based on the group labelled 'Hypertension/high blood pressure/blood pressure (nes)', which as of 2011 includes only the conditions listed in the group label. Recent high blood pressure Respondents in 1994Respondents in , 1998Respondents in , 2003Respondents in , 2006Respondents in and 2009Respondents in -2014 were asked a series of questions on whether they have high blood pressure: -Finally, those with doctor-diagnosed high blood pressure (excluding only when pregnant were asked: "Are you currently taking any medicines, tablets or pills for high blood pressure?", and those saying 'no' (or not giving an answer) were then asked, "Do you still have high blood pressure?" People were considered to have recent high blood pressure if they said they had ever been diagnosed as having high blood pressure by a doctor (excluding when pregnant), and that they still have high blood pressure or are currently taking medicines for it. While the question wording has stayed consistent, a discontinuity seems to be introduced by a change in question context. In some years (1994, 1998, 2003, 2006 and 2011), this question was preceded by a question that asked, "May I just check, have you ever had your blood pressure measured by a doctor or nurse?" (and then for those saying yes, they were asked how recently this was, and whether they were told that it was 'normal (alright/fine), higher than normal, lower than normal, or were you not told anything?'). However, in other years , this question was not asked. Given the way in which context can affect question interpretation, we treat these as two separate measures of recent high blood pressure. Biomarker high blood pressure During the nurse visit (which took place for all consenting respondents in all years except 1999, 2002 and 2004, when the nurse visit focussed on particular subsamples), respondents' blood pressure was measured. High blood pressure is defined as a systolic blood pressure >= 140mmHg and diastolic blood pressure >= 90mmHg following HSE established practice, in turn following 10 . The measurement of blood pressure changed in 2003, from a Dinamap monitor to an Omron monitor. A conversion is available between the two monitors based on a calibration study, and this has been regularly used by the HSE team to produce continuous trends in blood pressure -see www.hscic.gov.uk/catalogue/PUB00480. For adults, the conversion is as follows: blood pressure measurement -these are discussed in the Web Appendices 2 and 3.

High cholesterol
In the years 1994, 1998, 2006, and 2008-14, blood samples were obtained during the nurse visit, which were then analysed for total cholesterol. A high level of total cholesterol ('hypercholesterolaemia') is an established risk factor for CVD, and high cholesterol is defined following conventional practice at the NICE guidance 'audit level' of 5mmol/L or above 11 12 .
The measurement of cholesterol changed slightly in 2010 when a new laboratory was used. This resulted in values that are an average of 0.1mmol/L higher, and later values are therefore adjusted by this amount to maintain comparability over time as in 11 .

Low HDL cholesterol
In the years 1994, 1998, 2006, and 2008-14, blood samples were obtained during the nurse visit, which were then analysed for high density lipoprotein (HDL) cholesterol. HDL cholesterol reduces the risk of CVD (it carries cholesterol away from the arteries towards the liver), and it is therefore low HDL cholesterol that indicates poorer health; low HDL cholesterol is here defined as 1 mmol/L or less 11 12 . The measurement of HDL cholesterol changed slightly in 2010 when a new laboratory was used. This resulted in values that are an average of 0.1mmol/L lower, and later values are therefore adjusted by this amount to maintain comparability over time as in 11 . Recent heart attack/stroke Respondents in 1994Respondents in , 1998Respondents in , 2003Respondents in , 2006 and 2011 were asked a series of questions on whether they have had a heart attack (within a battery of questions about different types of heart disease): -"Have you ever had a heart attack (including myocardial infarction or coronary thrombosis)?" -Those responding 'yes' were then asked "Were you told by a doctor that you had a Heart Attack (including myocardial infarction or coronary thrombosis)?" -Those with doctor-diagnosed angina were asked, "Have you had a heart attack (including myocardial infarction and coronary thrombosis) during the past 12 months?" Respondents in these years were similarly asked about stroke: -"Have you ever had a stroke?" -Those responding 'yes' were then asked, "Were you told by a doctor that you had a stroke?" -Those with doctor-diagnosed stroke were asked, "Have you had a stroke during the past 12 months?" People were considered to have recent IHD or stroke if they said they had ever been diagnosed as having stroke or a heart attack by a doctor, and that they have had a heart attack or stroke during the past 12 months. Recent angina Respondents in 1994Respondents in , 1998Respondents in , 2003Respondents in , 2006 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  different types of heart disease -including angina; heart attack (including myocardial infarction or coronary thrombosis); a heart murmur; abnormal heart rhythm; or other heart trouble. For EACH of these, they were asked: -"Have you ever had <type of heart disease>?" -Those responding 'yes' were then asked "You said that you had <type of heart disease>. Were you told by a doctor that you had <type of heart disease>?" -For heart murmurs only, women saying they had doctor-diagnosed heart murmurs were asked if they were pregnant when told this, and if so, whether they were ever told they had a heart murmur when they were not pregnant.
-Those with doctor-diagnosed heart disease (excluding heart murmurs when pregnant) were asked, "Have you had <type of heart disease> during the past 12 months?" People were considered to have recent CVD if they said they had a doctor-diagnosed heart condition and that they had had this during the past 12 months. Cardiovascular (CVD) LSI Every year 1994-2011, people who report a longstanding illness (LSI) are then asked, 'what is the matter with you?'; up to 6 responses are then coded by the interviewer into a consistent coding frame based on the International Classification of Diseases. The CVD LSI measure is based on the groups labelled 'Stroke/cerebral haemorrhage/cerebral thrombosis', 'Heart attack/angina', Hypertension/high blood pressure/blood pressure (nes)', 'Other heart problems', 'Piles/haemorrhoids incl. Varicose Veins in anus', 'Varicose veins/phlebitis in lower extremities', and 'Other blood vessels/embolic'. As of 2011 this includes: Aorta replacement; Aortic valve stenosis; Aortic/mitral valve regurgitation; Arterial thrombosis; Arteriosclerosis, hardening of arteries (nes); Artificial arteries (nes); Atrial Septal Defect (ASD); Blocked arteries in leg; Blood clots (nes); Cardiac asthma; Cardiac diffusion; Cardiac problems, heart trouble (nes); Cerebrovascular accident; Coronary thrombosis, myocardial infarction; Dizziness, giddiness, balance problems (nes); Hand Arm Vibration Syndrome (White Finger); Hardening of arteries in heart; Heart attack/angina; Heart disease, heart complaint; Heart failure; Heart murmur, palpitations; Hemiplegia, apoplexy, cerebral embolism; Hole in the heart; Hypersensitive to the cold; Hypertension/high blood pressure/blood pressure (nes); Intermittent claudication; Ischaemic heart disease; Low blood pressure/hypertension; Mitral valve stenosis; Pacemaker; Pains in chest (nes); Pericarditis; Piles/haemorrhoids incl. Varicose Veins in anus; Poor circulation; Pulmonary embolism; Raynaud's disease; St Vitus dance; Stroke victim -partially paralysed and speech difficulty; Stroke/cerebral haemorrhage/cerebral thrombosis; Swollen legs and feet; Tachycardia, sick sinus syndrome; Telangiectasia (nes); Thrombosis (nes); Tired heart; Valvular heart disease; Valvular heart disease; Varicose veins in Oesophagus; Varicose veins/phlebitis in lower extremities; Various ulcers, varicose eczema; Weak heart because of rheumatic fever; Wolff -Parkinson -White syndrome; and Wright's syndrome. It explicitly excludes balance problems due to ear complaint & haemorrhage behind eye. While the LSI coding frame generally stays consistent over this period, interpretation of 'IHD LSI' is complicated by two changes: 'Too much cholesterol in blood' is included in this category in 1994 only, and Polyarteritis Nodosa is later moved into this code (the documentation is not clear on whether this occurred in 2000 or 2001). Angina symptoms This is taken from the Rose Angina questionnaire 13 14 . Respondents in 199413 14 . Respondents in , 199813 14 . Respondents in , 200313 14 . Respondents in , 2006 and 2011 were asked a series of questions about symptoms of heart trouble (rather than whether they had been diagnosed): -"I am now going to ask you some questions mainly about symptoms of the chest. Have you ever had any pain or discomfort in your chest?" -Those that said 'yes' were asked:  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  o Those who said they stop or slow down were asked, "If you stand still does the pain go away or not?" (If respondents were unsure, they were asked, "What happens to the pain on most occasions?"). If the pain goes away, they were asked, "How soon does the pain go away? Does it go in 10 minutes or less, or more than 10 minutes?" o Those who said the pain goes away in 10 minutes or less were asked, "Will you show me where you get this pain or discomfort? Where else" The interviewer then coded the site as Sternum (upper or middle) | Sternum lower | Left anterior chest | Left arm | Right anterior chest | Right arm | (Somewhere else). Following the HSE reports, possible angina is defined as chest pain or discomfort that (i) includes either the sternum or the left arm and left anterior chest; (ii) is prompted by hurrying or walking uphill (or by walking on the level, for those who never attempt more); (iii) makes the respondent either stop or slacken pace; and (iv) usually disappears in 10 minutes or less when they stand still. Heart attack symptoms This is taken from the Rose Angina questionnaire. Respondents in 1994Respondents in , 1998Respondents in , 2003Respondents in , 2006 and 2011 were asked, "Have you ever had a severe pain across the front of your chest lasting for half an hour or more?" As in the 2006 HSE report, those responding 'yes' are treated as having a possible heart attack (myocardial infarction).

COPD symptoms
Respondents in 1995, 1996 and 2010 were asked: o "Do you usually cough first thing in the morning in the winter?" (In 2010 only, respondents had previously been asked "Do you usually cough first thing in the morning?" -but this is not used to filter people into the questions on coughing in winter).
o "Do you usually bring up any phlegm from your chest, first thing in the morning in the winter?" (Again, this was asked to everyone in all years, but was preceded by an additional, non-winter-specific question in 2010).  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  (heart problems/diabetes) we focus on those reporting problems in the past 12 months, it is not possible to construct a consistent measure of recent asthma, hence this variable refers to lifetime doctor-diagnosed asthma. Asthma LSI Every year 1994-2011, people who report a longstanding illness (LSI) are then asked, 'what is the matter with you?'; up to 6 responses are then coded by the interviewer into a consistent coding frame based on the International Classification of Diseases. The asthma LSI measure is based on the group labelled 'Asthma', which as of 2011 includes: Asthma; Bronchial asthma, allergic asthma; and Asthma -allergy to house dust/grass/cat fur. It explicitly excludes cardiac asthma. Shortness of breath o Those responding 'yes' or 'never walks up hill or hurries' are then asked, "Do you get short of breath walking with other people of (your/his/her) own age on level ground? Yes | No | Never walks with people of own age on level ground".
o Those responding 'yes' or 'never walks with people of own age' are then asked, "Do you have to stop for breath after walking at (your/his/her) own pace on level ground?" This has been combined into the longstanding MRC dyspnoea scale 15 as follows: -Grade 2 dyspnoea: people who report shortness of breath when hurrying on level ground or walking up a slight hill (or who report shortness of breath when walking on level ground, but who say they never walk up hill or hurry).
-Grade 3 dyspnoea: people who report shortness of breath when walking with people of own age on level ground, or who have to stop for breath when walking at own pace on level ground. -Those that said yes were then asked, "Have you had wheezing or whistling in the chest in the last 12 months?" -(For those who said they had ever been told by a doctor they had asthma; see above), "When was your most recent attack of asthma? PROMPT IF NECESSARY: Less than 4 weeks ago | More than 4 weeks but within the last 12 months | One to five years ago | More than 5 years ago" People who said they had EITHER wheezing/whistling in the past 12 months or an asthma attack in the past 12 months were counted as having recent wheezing/asthma symptoms.

Anthropometric & diabetes
During the initial face-to-face interview in all years (except 2013), respondents were asked if they would consent to having their height and weight measured by the interviewer. The reasons for missingness (and their trends over time) are given in Web Appendices 2 & 3; note that there are three changes that give rise to small discontinuities in 2009 and 2011. Obesity is a risk factor for diabetes (hence its inclusion in this section) but also heart disease and some cancers. Obesity is defined as a Body Mass Index (BMI) of >= 30kg/m 2 as per the World Health Organization's BMI classification 16  -Women responding 'yes' were then asked, "Can I just check, were you pregnant when you were told that you had diabetes?", and those responding 'yes' were then asked "Have you ever had diabetes apart from when you were pregnant?" -Finally, those with doctor-diagnosed diabetes (excluding only when pregnant were asked: "Do you currently inject insulin for diabetes?" and "Are you currently taking any medicines, tablets or pills (other than insulin injections) for diabetes?" People were considered to have recent diabetes if they said they had ever been diagnosed as having diabetes by a doctor (excluding when pregnant), and that they are injecting insulin or taking any other medicines for diabetes.

Diabetes LSI
Every year 1994-2011, people who report a longstanding illness (LSI) are then asked, 'what is the matter with you?'; up to 6 responses are then coded by the interviewer into a consistent coding frame based on the International Classification of Diseases. The diabetes LSI measure is based on the group labelled 'Diabetes', which as of 2011 includes Diabetes and Hyperglycaemia.

High glycated haemoglobin
In the years 2003, 2006, and 2008-14, blood samples were obtained during the nurse visit, which were then analysed for glycated haemoglobin (HbA 1C ). HbA 1C is a measure of the share of haemoglobin (within red blood cells) that glucose is attached to, with higher levels indicated less well-controlled diabetes in the previous three months 20 . Following the recommendations of a 2009 expert committee, we mirror recent HSE reports in using a threshold of 48mmol/mol (i.e. 48 millimoles of glycated haemoglobin per mole of haemoglobin) as the threshold for raised HbA 1C , a different threshold to that used in earlier HSE reports. While the measurement of HbA 1C has been consistent in HSE from 1994, the units reported have changed from the % of haemoglobin that is glycated to mmol/mol. Earlier measures have been transformed into mmol/mol through the formula, mmol/mol = (% -2.15) x 10.929. HbA 1C was also measured in 1994 but using a different technique, which cannot be made comparable 21:67 .  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  In the years 1998, 2003, 2006, and 2009, blood samples were obtained during the nurse visit, which were then analysed for C-reactive protein (CRP). CRP is an inflammatory marker, which can indicate heart-related inflammation (it is used to test for heart failure) but can also indicate other sorts of health damage including diabetes. However, there are still debates about exactly what CRP shows, both in terms of its causal role in heart disease, and whether it also indicates depression. 22 Raised CRP is defined as >3mg/L, the standard cut-off for a clinically significant rise in CVD 23 24 . Participants with CRP >10mg/L are excluded, as this is taken to be evidence of current infection rather than inflammation from chronic disease.

Raised Fibrinogen
In the years 1998, 2003, 2006, and 2009, blood samples were obtained during the nurse visit, which were then analysed for fibrinogen. Like CRP, fibrinogen is an inflammatory marker, which is both commonly thought to be a causal risk factor for CVD (it is a component of coagulation), and which seems to be a risk factor for other diseases (including cancer and diabetes) 25 .
While fibrinogen is often analysed as a continuous variable with no cutpoints 24 , we here define raised fibrinogen as>4mg/L as in 12 . As for CRP, participants with CRP >10mg/L are excluded, as this is taken to be evidence of current infection rather than inflammation from chronic disease. A change of analysis method and laboratory between 1994 and 1998 means that the 1994 results are not comparable to the later results 26:8.10.4 . Anaemia In the years 1994, 1998, 2006, and 2009, blood samples were obtained during the nurse visit, which were then analysed for haemoglobin. Haemoglobin dist ributes oxygen around the body, and low haemoglobin levels usually indicate anaemia. Various different thresholds for low haemoglobin have been used in the literature, particularly for older populations 27 , but we here used the longstanding WHO definition of <13g/dL for men and <12g/dL for women 24 .

Iron deficiency
In the years 1994, 1998, 2006, and 2009, blood samples were obtained during the nurse visit, which were then analysed for serum ferritin (which correlates directly with the amount of iron stored in the body). Iron deficiency is one of several possible causes of anaemia (alongside other nutritional deficiencies, genetic conditions such as sickle cell anaemia, infections, and blood loss). Iron deficiency is defined as a serum ferritin less than 45ng/ml 27 .

Mental health
Mental health LSI Every year 1994-2011, people who report a longstanding illness (LSI) are then asked, 'what is the matter with you?'; up to 6 responses are then coded by the interviewer into a consistent coding frame based on the International Classification of Diseases. The mental health LSI measure is based on the group labelled 'Mental illness/anxiety/depression/nerves (nes)', which as of 2011 includes: Alcoholism, recovered not cured alcoholic; Angelman Syndrome; Anorexia nervosa; Anxiety, panic attacks; Asperger Syndrome; Autism/Autistic (BBG: changed from 'autistic child'); Bipolar Affective Disorder; Catalepsy; Concussion syndrome; Depression; Drug addict; Dyslexia; Hyperactive child.; Nerves (nes); Nervous breakdown, neurasthenia, nervous trouble; Phobias; Schizophrenia, manic depressive; Senile dementia, forgetfulness, gets confused; Speech impediment, stammer; and Stress. It explicitly excludes Alzheimer's disease, degenerative brain disease. While the LSI coding frame generally stays consistent over this period, it is worth being aware of a minor wording change within 'mental health LSI': the condition labelled 'Autistic child' 1994-1997 was relabelled 'Autism/Autistic' in 1998.

Psychological distress (GHQ)
In the self-completion survey in most years (except 1996, 2007, 2011 and 2013), respondents were asked the following series of questions: -"Please read this carefully: We should like to know how your health has been in general over the past few weeks. Please answer ALL the questions by ticking the box below the answer which you think most applies to you. Have you recently...
-"…been able to concentrate on whatever you're doing?" RESPONSES: "Better than usual" | "Same as usual" | "Less than usual" | "Much less than usual" -"…lost much sleep over worry?" RESPONSES: "Not at all" | "No more than usual" | "Rather more than usual" | "Much more than usual""  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  -"…felt you were playing a useful part in things?" RESPONSES: "More so than usual" | "Same as usual" | "Less useful than usual" | "Much less useful"" -"…felt capable of making decisions about things?" RESPONSES: "More so than usual" | "Same as usual" | "Less so than usual" | "Much less capable"" -"…felt constantly under strain? RESPONSES: "Not at all" | "No more than usual" | "Rather more than usual" | "Much more than usual"" -"..felt you couldn't overcome your difficulties?" RESPONSES: "Not at all" | "No more than usual" | "Rather more than usual" | "Much more than usual"" -"…been able to enjoy your normal day-to-day activities?" RESPONSES: "More so than usual" | "Same as usual" | "Less so than usual" | "Much less than usual" -"…been able to face up to your problems?" RESPONSES: "More so than usual" | "Same as usual" | "Less able than usual" | "Much less able" -"…been feeling unhappy and depressed? RESPONSES: "Not at all" | "No more than usual" | "Rather more than usual" | "Much more than usual" -"…been losing confidence in yourself? RESPONSES: "Not at all" | "No more than usual" | "Rather more than usual" | "Much more than usual" -"…been thinking of yourself as a worthless person?" RESPONSES: "Not at all" | "No more than usual" | "Rather more than usual" | "Much more than usual"" -"…been feeling reasonably happy, all things considered?" RESPONSES: "More so than usual" | "Same as usual" | "Less so than usual" | "Much less happy" These make up the 12-item General Health Questionnaire GHQ-12; 28 , a well-validated, widely-used measure of probable mental ill-health. This is often termed general nonpsychotic psychiatric morbidity, but I here use the more easily understood term 'psychological distress' following Stochl et al 2016. 29 A total score has been created by first ensuring that all questions were coded from 1 (positive symptom) to 4 (negative symptom), and then creating a sum score for all the number of questions in which people answered with categories 3 or 4 (indicating a negative symptom). A binary measure (often called GHQ caseness) was created for people who had negative symptoms for 4 or more of the 12 questions. Anxiety/depression -"I am extremely anxious or depressed" [This is part of the widely-used EQ-5D health status indicator 8 . However, for the purposes of this paper we have separated the individual measures that make up the EQ-5D in order to compare these to similar indicators of morbidity within each domain]. Two outcome measures are based on this: whether people have any anxiety/depression (the 2 nd and 3 rd categories combined), and whether they have extreme anxiety/depression (3 rd category only).

Hearing, seeing & communication limitations
These measures were not included in the main paper due to the short time frame that we can examine trends over, but are included in the Web Appendix as they relate to important domains of morbidity. They were included in the disability scale used in the 2001 HSE report 9 . Respondents in 1995, 2000 and 2001 were asked if of the following applied to them (interviewers were instructed to ignore temporary disabilities that are expected to last less than one year): • "Cannot follow a TV programme at a volume others find acceptable (with hearing aid if normally worn)" ('hearing limitation') • "Cannot see well enough to recognise a friend across a road (four yards away) (with glasses or contact lenses if normally worn)" ('seeing limitation') • "Have problem communicating with other people -that is, have problem  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  Changes over time in several other measures are only presented in Web Appendices 4 & 6, rather than the main paper. Details of these variables are included below:

General health
General health

(bad / good)
Every year, respondents were asked, "How is your health in general? Would you say it was ... very good, good, fair, bad, or very bad?" Two outcome measures are based on this, following standard practice in the HSE reports: bad general health (which includes 'bad' or 'very bad' health) and good general health (which includes 'good' or 'very good' health).

Longstanding illness (LSI)
Every year 1994-2011, respondents were asked "Do you have any long-standing illness, disability or infirmity? By long-standing I mean anything that has troubled you over a period of time, or that is likely to affect you over a period of time?" (The response options were 'Yes' and 'No'). In 2012 the question was changed to be consistent with the Government's new harmonised disability questions for use in social surveys 30 , and is not comparable to the previous version. Limiting LSI Every year 1996-2011, respondents who said they had an LSI were than asked, "Does this illness or disability (do any of these illnesses or disabilities) limit your activities in any way?" (again allowing only Yes/No answers  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  -"I am unable to perform my usual activities" [This is part of the widely-used EQ-5D health status indicator 8 . However, for the purposes of this paper we have separated the individual measures that make up the EQ-5D in order to compare these to similar indicators of morbidity within each domain]. Two outcome measures are based on this: whether people have any problems (the 2 nd and 3 rd categories combined), and whether they are unable to perform their usual activities (3 rd category only).

Limitations in past 2wks
Every year, respondents were asked, "Now I'd like you to think about the two weeks ending yesterday. During those 2 weeks did you have to cut down on any of the things you usually do (about the house or at work or in your free time) because of your answer at <the LSI question> or some other illness or injury?" There have been two small changes to this question's wording in 1996. Firstly, 'work' was changed to 'work/school'. Secondly, 'your answer at <the LSI question>' was changed to 'a condition you have just told me about'. While it is impossible to be sure of the exact effect of these changes, neither seem likely to influence the results (at least for the 25+ age group where fewer individuals are in full-time education).

Circulatory
Beyond 'recent': 'Ever had' and 'DD' CVD In the main paper, we look at whether people report recent doctor-diagnosed CVD (looking separately at heart attack/stroke, angina, and any recent CVD). As shown above, this comes from three questions: whether people report ever having this condition; whether a doctor diagnosed this; and whether they have had an attack in the past 12 months / consider themselves to still have the condition. Web Appendix 6 shows trends in the other versions of these measures, i.e. having ever had this type of CVD, and having ever doctor-diagnosed ('DD') CVD of this type.

Component measure:
Heart murmur Irregular heart rhythm Other heart disease In the main paper, we recent reports of doctor-diagnosed angina; heart attack (including myocardial infarction or coronary thrombosis); a heart murmur; abnormal heart rhythm; or other heart trouble (see above). Angina and heart attack are also analysed in the main paper in their own right; in Web Appendix 6, we further show trends separately in heart murmur, abnormal heart rhythm or other heart trouble.

Respiratory
Component measure: 'phlegm' In the main paper, we look at whether people report recent COPD (see above). This combines two measures: regular cough + phlegm. Web Appendix 6 shows the trend in the phlegm measure on its own, without being combined with a regular cough.

Alternative version:
'LSI respiratory' In the main paper, we look at whether an asthma LSI (to examine alongside a direct question on diagnosed asthma); see above. Web Appendix 6 also shows people reporting a longstanding illness ('LSI') which is included within the broader category of respiratory conditions. The respiratory LSI measure is based on the group labelled 'Asthma', 'Bronchitis', 'Hayfever', or 'Respiratory other', which as of 2011 includes: Asthma: Asthma; Bronchial asthma, allergic asthma; and Asthma -allergy to house dust/grass/cat fur. It explicitly excludes cardiac asthma. • It explicitly excludes TB (pulmonary tuberculosis), Cystic fibrosis, Skin allergy, Food allergy, Allergy (nes), Pilonidal sinus, Sick sinus syndrome, Whooping cough.
• If complaint is breathlessness with the cause also stated, this is coded with the cause -hence it also excludes breathlessness as a result of anaemia, breathlessness due to hole in heart, and breathlessness due to angina.

Component measure: Wheezing
In the main paper, we look at whether people report recent wheezing/asthma. As shown above, this comes from three questions: whether people report ever having had wheezing or whistling in the chest; whether they have had this in the past 12 months; and whether they have had an asthma attack in the past 12 months. Web Appendix 6 shows trends in the other versions of these measures, i.e. having ever had wheezing/whistling in the chest, and whether they have had this in the past 1 months.

Beyond 'recent': 'Ever had' and 'DD' diabetes
In the main paper, we look at whether people report recent doctor-diagnosed diabetes As shown above, this comes from three questions: whether people report ever having this condition; whether a doctor diagnosed this; and whether they currently inject insulin / take other medication for diabetes. Web Appendix 6 shows trends in the other versions of these measures, i.e. having ever had diabetes, and having ever doctor-diagnosed ('DD') diabetes.

For comparison:
Walking limitation This is based on the personal care disability scale used in the 2001 HSE report 9 . Respondents in 1995, 2000 and 2001 were asked if of the following applied to them (interviewers were instructed to ignore temporary disabilities that are expected to last less than one year): "Cannot walk 200 yards or more on own without stopping or discomfort". People who reported a limitation were asked if they used a walking aid, and if they did, were then asked if they could walk 200 yards without the walking aid.

For comparison: Washing & dressing limitation
This is based on the personal care disability scale used in the 2001 HSE report 9 . Respondents in 1995, 2000 and 2001 were asked if any of the following applied to them (interviewers were instructed to ignore temporary disabilities that are expected to last less than one year): -"Cannot dress and undress without difficulty" -"Cannot wash hands and face without difficulty" For comparison to the 'problems with washing/dressing today' measure in the main paper (which covers a more extended period and is based on a different question; see above), a measure is derived if respondents say they report either of these problems.

Other LSIs
Every year 1994-2011, people who report a longstanding illness (LSI) are then asked, 'what is the matter with you?'; up to 6 responses are then coded by the interviewer into a consistent coding frame based on the International Classification of Diseases. The various other LSIs are as follows: • The Blood Disorders LSI measure is based on the group 'Disorders of blood and blood forming organs and immunity disorders', which as of 2011 includes: Anaemia, pernicious anaemia, Blood condition (nes), blood deficiency, Haemophilia, Idiopathic Thrombochopenic Purpura (ITP), Immunodeficiences, Polycthaemia (blood thickening), blood to thick, Purpura (nes), Removal of spleen, Sarcoidosis (previously code 37), Sickle cell anaemia/disease, Thalassaemia, Thrombocythenia. It explicitly excludes Leukaemia -code 01.

Appendix 8: Others' analyses over change over time using HSE data
Changes over time in some of these indicators have not previously been analysed (e.g. waist-hip ratio, fibrinogen). However, others have been studied but never integrated into a single picture of changing morbidity; we review these in this section. (For reasons of space these are included here rather than in the main text).

Cardiovascular morbidity
1998-2011 trends in the two biomarkers for total and HDL cholesterol using HSE data are shown in Oyebode, 11 who find similar results.

Respiratory morbidity
A subset of the HSE respiratory indicators (ever/past year wheezing, doctor-diagnosed asthma) were analysed by Hall and Mindell 31 looking at 2001-2010, and finding similar changes over time to our analysis. They found stability in some measures (ever wheezing) but improvements in others (pastyear wheezing) -at the same time as the reported prevalence of doctor-diagnosed asthma increased.

Obesity & diabetes
While the English trends in waist-hip ratio have not previously been analysed, earlier Scottish trends are given in Hotchkiss et al 2012. 19 Trends in diabetes have been covered in several HSE reports, e.g. Moody 2012, 20 as has BMI (see particularly the paper by Sperrin et al 2014, 32 who also created a publicly-available time-series HSE dataset for this purpose).

Activity limitations, pain & musculoskeletal morbidity
While musculoskeletal LSIs have not previously been analysed in HSE, a decline can also be seen in the General Household Survey. 33

Mental health
In the UK and most other high-income countries, benefit claims due to mental ill-health have been rising, 34 which has come alongside considerable increases in mental health diagnosis and treatment. 35 The extent to which this reflects rises in mental ill-health and genuinely declining work capacity, however, has long been the subject of debate. 36 37 Perhaps the most robust long-term general population data series in the UK is the Adult Psychiatric Morbidity Survey. 35 38 While some studies have used HSE to show rises in mental ill-health, others have used the same data to come to the opposite conclusion. 39 40 These contrasting conclusions are explained by the tables in Web Appendix 7 which show year-by-year changes: moderate mental ill-health fell between the mid-1990s and the mid-2000s, before rising in 2009, and with a particularly high prevalence in 2011. The conclusions of studies will therefore depend on the years they use as their start and end periods for the trend analysis. 3 It is also worth noting that our results for considerable increases in mental health LSIs can also be seen in a similar measure in the Labour Force Survey. 41  It has been suggested that multimorbidity has risen among older people in England 44 and for all age groups in Ontario, 45 although others have cautioned against using simple disease counts, 46 and the evidence cited in the introduction of the main paper suggests that rising chronic disease reporting may partly be a result of increasing awareness (rather than underlying prevalence) of disease.  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59

Appendix 9: Summarising multiple measures
Having reviewed trends in 39 morbidity measures, we have seen that morbidity in the English working-age population has improved in some respects and deteriorated in others. For those who view work-related morbidity as intrinsically multidimensional, 47 , this is the endpoint of our analysis. However, for those who conceive of morbidity as unidimensional -or those who are interested in morbidity as it relates to a unidimensional work capacity -this raises the question of how we weight different dimensions of morbidity to decide if the overall change in morbidity has been positive or negative.

Methods for creating unidimensional morbidity scales
Several methods have been proposed for creating unidimensional morbidity scales, but most of these are unavailable using the HSE data: • Weights can be based on empirically-derived preferences for different health states, of which the most famous example is the WHO Global Burden of Disease (GBD) study 48 . Some GBD estimates for trends in disability in the UK do exist, and suggest that the prevalence of disability in the working-age population is unchanged 1990-2010, though these results are only presented in passing. 4 For our analyses, however, we have no preference-based weights for most of the HSE measures (excluding the subset of measures that make up the EQ-5D scale).
• Those reporting limitations beyond a certain severity in any domain can be categorised as 'disabled', as recommended by the Washington Group on Disability Statistics (see above). However, as previously discussed, we have few functional limitations measures available in HSE.
• Latent morbidity scales can be created based on the inter-correlations between different measures (using item response theory), as used in the World Disability Report 51 and by researchers associated with the US National Bureau of Economic Research e.g. 52 . However, it is unclear why we would wish to weight items in this way: a given morbidity indicator may be severe, yet if it is unrelated to other morbidity measures it will be given a low weight.
• Latent morbidity scales can also be created based on the independent correlation between each indicator and a general measure of morbidity, such as general self-reported health or 53 as in 54 . This maintains some of the advantages of single-item measures (in providing a basis for making morbidity unidimensional), while avoiding the potential threats to validity discussed above. However, the inconsistent inclusion of measures in each HSE wave prevents a unidimensional morbidity scale being constructed here.

An alternative way of summarising heterogeneous trends
Nevertheless, we can examine if the areas in which morbidity has been improving or declining are those that are particularly important for general health. 53 (This uses the same intuition as the scales in Diederichs et al 2012). 54 To see how important measures are for general health, we regress 'bad' general health (see Appendix 5 for detail on the underlying question) on age, sex (and their interaction), educational level and each individual morbidity measure in turn, using all years for which that morbidity measure is available. That is, for each morbidity indicator morbidity we use the following model: badhealth = logit β morbidity + + β + ! " * male $ % + & '() * +, -… where β is our primary outcome coefficient showing the importance of that morbidity indicator for bad health, refers to a vector of age dummy variables, male $ refers to a binary gender dummy variable, '() * +, refers to a vector of education dummy variables (with four levels: degree/full-time student, A-levels/NVQ3/higher education below degree, other qualifications, or no qualifications), and , . , ! , and & refer to the coefficients on age, gender, their interaction and education respectively.
We adjust for education as well as age & sex to enable us to examine the importance of the measure for bad health, after taking account of whether general health and the measure are both strongly related to social status. Note however that it is not possible to control for all morbidity measures simultaneously (as we discuss just above) -so this is a rough indicator of the importance of that morbidity measure for general health, rather than a reliable indicator of the causal impact net of comorbidities.
The results of this analysis are shown overleaf, ordered by the effect on bad health. (We also repeat the trend in each measure for convenience; this is discussed following the table). Having estimated this, we can see if the areas in which morbidity has been improving or declining are those that are particularly important for general health. This is shown visually in Figure 1 below (the measures are not labelled to enable the overall pattern to be seen, but the top-to-bottom order of measures is the same in the figure as in the preceding table; i.e. the measure at the top of the figure is 'Pain-extreme'). It is easiest to interpret the figure by focussing on each group of measures in turn. Firstly, the biomarkers tend to have the weakest relationship with general health. Those with high levels of the diabetes biomarker (glycated haemoglobin) are 9.7% more likely to say they have bad health, and those who are underweight, with a high waist-hip ratio, raised fibrinogen, or low HDL cholesterol are 4-6% more likely to report bad health, but the other measures only had weaker relationships. Indeed, there was effectively no relationship between bad reported health and any of measured high blood pressure, high total cholesterol or iron deficiency.
Secondly, most of the measures based on medical labels have a moderately strong relationship with bad health (the weakest being lifetime asthma and recent high blood pressure, both of which can be asymptomatic), and these measures have mostly risen over time. There are however notable exceptions to this, including IHD/stroke LSI, recent angina and recent heart attack/stroke (the labelbased measures with some of the strongest relationships with bad reported health), as well as arthritis and other musculoskeletal LSIs.
Finally, symptom-based measures unsurprisingly tend to have stronger relationships with bad reported health, although this ranges from the moderate (those reporting 'recent wheezing/asthma attack' were 8.5% more likely to report bad health) to the very strong (those reporting 'extreme pain today' were 46.4% more likely to report bad health). In general, those symptoms-based measures with the strongest relationship with bad reported health were more likely to have increased over time ('extreme anxiety/depression today', 'locomotor limitations', and 'self-care limitations'). However, the size of the aforementioned declines in symptom-based measures of respiratory and cardiovascular morbidity was often greater.

Instructions to authors
Complete this checklist by entering the page numbers from your manuscript where readers will find each of the items listed below.
Your article may not currently address all the items on the checklist. Please modify your text to include the missing information. If you are certain that an item does not apply, please write "n/a" and provide a short explanation.
Upload your completed checklist as an extra file when you submit to a journal.
In your methods section, say that you used the STROBE cross sectionalreporting guidelines, and cite them as:

Reporting Item
Page Number

Title and abstract
Title #1a Indicate the study's design with a commonly used term in the title or the abstract 1, 3

Competing interests
The author has worked on secondment at the UK Department for Work and Pensions (DWP) in 2015-16.

Data sharing
The Health Survey for England 1994-2014 are available for free to registered users at the UK Data Service -see https://beta.ukdataservice.ac.uk/datacatalogue/series/series?id=2000021#!/abstract.
There are no conditions for re-use for non-commercial applications of the data. Results: We find a mixed picture: we see improving cardiovascular and respiratory health, but deteriorations in obesity, diabetes, some biomarkers, and feelings of extreme anxiety/depression, alongside stability in moderate mental ill-health and musculoskeletal-related health. In several domains we also see stable or rising chronic disease diagnoses even where symptomatology has declined. While data limitations make it challenging to combine these measures into a single morbidity index, there is little systematic trend for declining morbidity to be seen in the measures that predict self-reported health most strongly.
Conclusions: Despite considerable falls in working-age mortality -and the assumptions of many policymakers that morbidity will follow mortality -there is no systematic improvement in overall working-age morbidity in England from 1994 to 2014.

Strengths and limitations of this study
 We provide a robust analysis of changes over time in morbidity in England for 39 measures across two decades using the Health Survey for England ('HSE').
 We include every morbidity measure for which consistent comparisons over time can be constructed in the HSE.
 We take care to maximise comparability over time, including constructing new non-response weights.  9 10 The lack of evidence is even more problematic within social security, where many policymakers have assumed that working-age morbidity must have improved in recent decades given improvements in mortality (despite the potential for declining mortality to coexist with rising morbidity) 6 -and that therefore high/rising levels of claims are not 'genuine'. 11 12 Almost the only direct evidence on changes over time in working-age morbidity in high-income countries comes from the US. Contrary to policymaker expectations, these studies have generally found deteriorating morbidity since the mid-1990s, particularly activities of daily living (ADLs) and physical functioning. [13][14][15][16] Other studies have focused on the older working-age population with similar results. 2 17 Again, not all measures show deteriorations, and not all studies come to identical conclusions, 18 but there is little sign of any improvement in morbidity among working-age Americans -despite a 23% fall in working-age mortality 1993-2013 (Web Appendix 1). Outside of the US, there is a paucity of evidence, but from the limited evidence that exists, there is again little sign of improving morbidity. [19][20][21][22] This study therefore asks: is there empirical support for the hypothesis that workingage morbidity in England has declined? (H 1 ). Or does the evidence support alternative hypotheses of stable (H2) or even declining (H3) morbidity? We answer this using the Health Survey for England (HSE), a high quality Government survey with a combined sample of 140,000 individuals. We examine 39 specific aspects of morbidity rather than reducing morbidity to a single measure, partly because these produce more reliable trends, and partly to capture the multidimensional nature of morbidity. 23 However, we conclude by examining the broad picture of morbidity change, and how far this supports the competing hypotheses.
This analysis makes two contributions. Firstly, we provide one of the few systematic analyses of changes over time in working-age morbidity in any high-income country outside the US. Secondly, we supplement self-report measures with 10 'biomarkers', which are particularly valuable for showing genuine changes over time (rather than merely changes in how people describe their health), but which have rarely been examined alongside self-reported working-age morbidity trends (Martin et al. 2010 24 being an exception).

DATA AND METHODS
This section follows the STROBE cross sectional reporting guidelines. 25

Data source
Robust evidence of change over time requires consistently-collected, high-quality data. We use the HSE, an annual government-sponsored cross-sectional survey of 3,000-11,000 adults with no proxy responses.  A particular advantage is that the interview is followed by a nurse visit, which in selected years also includes a blood sample. Nevertheless, there are challenges in analysing change in HSE:  Third, HSE excludes those in communal establishments. While a smaller problem for the working-age population than older ages, 2 we minimise the impact of rising university attendance by focussing on those aged 25+ (Web Appendix 3). The upper limit of the working-age population is set to 59 (women) and 64 (men) to match state pension ages at the start of the period.

Patient involvement
As this is a health monitoring study using secondary data, patients were not directly involved. However, from previous discussions we are aware that the study will be of interest to patient/disability advocacy groups, who will receive jargon-free summaries of the research.

Measures
We cannot interpret changes over time correctly without understanding different ways of operationalising 'morbidity'. 1 General health/disability measures -e.g. "How is your health in general?" -are a simple way of measuring morbidity with a single indicator, and clearly do capture something meaningful. 50   Others have reached a similar conclusion for comparisons across place, 55 particularly for disability measurement, 59 60 where the Washington Group on Disability Statistics -a UN agency founded in 2001 -have brokered a consensus that cross-country disability comparisons should be based on multiple measures of specific activity limitations. 61 62 We should nevertheless note that there is no guarantee that a given symptom/impairment-based question will be interpreted identically over time. 63 64 3. Biomarkers -that is, objective measures of biological or physiological measures -have considerable strengths in analysing change, as they largely avoiding reporting biases that are likely to vary between socioeconomic groups and over time. 65

ANALYSIS
In the first instance we look at unadjusted changes over time in each morbidity indicator, showing the actual levels of morbidity found in the population. However, we primarily focus on changes after adjustment for sex and age (following others 66 67 ), akin to standardising for the age-sex composition of the population. Given that our aim is to describe changes rather than to explain them, we do not further adjust for potential causal influences on morbidity that are likely to vary over the period, such as employment over economic cycles. This is a task for future research, but we should note that such analysis is possible using our publicly-available time-series dataset that includes inter alia employment status, education and region.
We chose to examine discrete changes from the start to the end of available data for each measure, rather than using linear or non-linear trend terms. refer to the coefficients on age, gender and their interaction respectively. We present average marginal effects rather than odds ratios, partly because these are simple to understand -odds ratios have no easy real-world interpretation for policymakersbut primarily because odds ratios are not fully comparable across different models, and cannot therefore underpin our comparison of changes over time between indicators. 68 To avoid a binary cut-off of statistical significance, 69 95% confidence intervals are used to convey precision. All analyses use weights, exclude boost samples that use different sampling methods, and adjust for the multistage clustered sample design and the stratification of the sample across survey years using the SVYSET command in Stata (although standard errors will be slightly underestimated as it is not possible to consistently adjust for sample stratification within years). For reasons of space, we are unable to discuss previous HSE studies of aspects of morbidity in the main text; these are instead described in Web Appendix 8.

Conditions with sharply declining mortality
We start by focussing on cardiovascular disease (CVD) and respiratory illness, which have both seen large falls in mortality (by >50% and >25% respectively among 0-64 year-olds 1994-2013; Web Appendix 1). Changes over time in morbidity, however, are shown in Table 2. Looking first at high blood pressure, biomarker-measured high blood pressure has halved over two decades (similar improvements are found for the biomarkers for total and HDL cholesterol). Yet when we look at self-reports (either people reporting this as an LSI, or in response to a direct question about having recent diagnosed high blood pressure), we see large rises over time. There has been an increasing diagnosis of high blood pressure and increasing prescriptions of blood pressure-lowering drugs; these may have helped reduce the underlying incidence of high blood pressure while simultaneously raising people's awareness of morbidity. Table 2 further shows declines in several key types of CVD (heart attack, mini-stroke, angina), whether measured through people's reports of the disease itself or their reports of its symptoms. Nevertheless, the morbidity declines (8-50%) are often not on the scale of the declines in mortality (>50%); this is likely to be because mortality declines are partly driven by improved treatment, 70 which means each incident CVD case is likely to last longer. 71 72 More surprisingly, the measures of 'any reported CVD' show no improvement (with some, uncertain signs of rises). Looking at its subcomponents (Web Appendix 6), this seems to be due to possible increases in diagnosed irregular heart rhythm and other heart trouble.

Conditions with claims of increasing prevalence
The previous section focussed on conditions where there may be an a priori expectation that morbidity has improved (given declining mortality); in this section, we focus on three areas where there have been widespread claims of increasing prevalence -obesity, diabetes, and mental health.

Activity limitations, musculoskeletal and pain
Pain/musculoskeletal conditions are a major component of working-age morbidity, yet very few previous studies show changes over time in symptomatology, and even those that exist 78 sometimes have debatable comparability. 79 Table 4 shows a fall in some -but not all -HSE measures focussed on pain and musculoskeletal morbidity.
Arthritis as a longstanding illness (LSI) has declined (the precision of the estimates is greater when looking at 2008-10 rather than 2011-14, and shows a decline of 0.3-1.2%). There are some (similarly uncertain) signs that other musculoskeletal LSIs have also fallen, and noticeably fewer people say that they have any pain/discomfort today, although there has been no change in people saying they have extreme pain/discomfort. The echoes a previous study that found different trends in low back pain of different levels of severity. 80 Similarly, there has been a rise in all four activity limitations measures in HSEalthough the increases are sometimes uncertain, and are smaller after adjusting for changes in age/sex structure. Moreover, the timing of the rises differ between the  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59

Appendix 2: Overall missingness in health measures
This appendix refers to overall item-level missi ngness; changing item-and unit-level missingness is covered in Appendix 3.
-Refusal: the most common reason for no BMI measurement is an outright refusal (including those refusing out of anxiety, though this tends to be a minor reason). Refusal rates are 8.3% in 2014.

Self-completion measures
For those who completed the self-completion booklet, the level of item missingness is shown in the table below. Item missingness is relatively low compared to missingness from not completing the selfcompletion survey (51.5% of respondents in 2014).

Nurse visit measures
For those who took part in the nurse visit, the level of item missingness is shown in the table below. This shows that far more people have missing observations for measured high blood pressure than for their waist-hip ratio. This is despite the fact that we explicitly INCLUDE those who are on blood pressure-lowering drugs (about 5% of the sample at the start of the period and 10% at the end), on the grounds that their lowered blood pressure still conveys useful information about their health state. The main reason for the remaining high level of missingness is because people have recently exercised, smoked, drank or ate (12.2%).

Blood sample measures
For those from whom a blood sample was taken, the level of item missingness is shown in the table below.  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59   All of these measures are affected by problems in transferring and storing the blood sample and with the measurement process, which results in problems with 3-10% of the blood samples depending on the measure and year. As for blood pressure, we explicitly INCLUDE those who are on lipidlowering drugs (0.4% 1994 to 7.9% 2014), on the grounds that their changed cholesterol level still conveys useful information about their health state. Item missingness is highest for fibrinogen, which not only has high rates of such failures (7.0-9.5%), but also has ineligibility due to likely infection (from raised CRP, 3.6-5.6% of those with blood samples) and taking drugs that affect the reading (3.7% to 7.7% dependent on the year). Item missingness is also high for C-reactive protein (CRP), which also excludes those with likely infections.

Dealing with item-level missingness
Because of the high level of item non-response for certain measures (BMI, high blood pressure, fibrinogen, and CRP), and moderate level for others (other blood sample biomarkers and waist-hip ratio) -and because of evidence of changing non-response at various stages of the survey process -non-response weights were created to try to correct for any biases that these introduce. This is described in further detail in Appendix 3.

Appendix 3: Changing non-response & weights
This appendix focuses on changes in unit-level non-response at different stages of HSE.
2. In 2010, this was then relaxed so that those who had had an epileptic fit more than 5 years ago were again included in the blood sample. This lowered the ineligibility rate from 3.1% in 2009 to 2.4% in 2010.

Creating non-response weights
To increase comparability over time, we create new weights 1994-2014 in several phases.

First-stage non-response weights
Firstly, we created a selection weight because some households were slightly more likely to be interviewed than others. (Until 2009, only three households at each address were interviewed. Those living at addresses with many households are therefore less likely to be interviewed). NatCen supplied selection weights for 2004-2013 to enable this (funded by this project), which are not available on the public HSE datasets.
Secondly, after adjusting for the selection weight, we created new individual-level (inverse probability) weights to match population age-sex-region totals in each year. Population data are annual mid-year population estimates from nomis. NatCen added the region variable for the 19941997 datasets to the public HSE datasets to enable this.

Second-stage non-response weights
After the first-stage adjustment for individual non-response, for the later stages of the interview (self-completion, BMI measurement, nurse visit, blood sample), we created a further weight that adjusts for non-response among those responding to the individual interview. This is based on a logit regression model to predict that stage of response based on: • Age and gender (4 age group categories interacted with gender); • Qualifications (degree or FT student / A-level or above / other qualifications / no qualifications); • Household type (presence of other adults in the household); • Employment status (yes/no); • Smoking (never regular smoker / ex-regular smoker / current regular smoker); and • Self-reported general health (bad or very bad health vs. other categories).
On the basis of these criteria, we create inverse probability weights -that is, we create a predicted probability of response for each respondent based on the logit regression model, and then create a weight that is the inverse of this predicted probability. The revised weights are included in the Stata code to enable replication of the full paper.

Final sample size
The final sample size is as follows:  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  Trends for these measures are shown in Table 9 below. Looking first at good general health, the table shows the trend from 1994-6, when 80.9% reported good general health. By 2011-14, there had been a decline of 0.8 percentage points. When we adjust for the changing age and sex distribution of the working-age population (labelled 'Adj.' in Table 1), the decline is only 0.1%, with a wide confidence interval (-0.9 to +0.7%), and there is therefore little evidence for any systematic trend. For several of the general health measures, there is evidence of change over this period -but interpreting this is difficult, because the trends are in opposite directions. There is strong evidence for a rise in bad general health (a rise of 0.6-1.5% from a base of 4.4%), yet equally strong evidence for a decline in having problems with everyday activities (at both levels of severity), and being limited in activities by a longstanding illness. This shows the challenges in tracking population morbidity change through general, non-specific measures, which are likely to be as influenced by changes in reporting styles as much as changes in morbidity per se.
As an aside, UK Government publications have made claims based on healthy/disability-free life expectancy -sometimes using these to argue that morbidity has been improving 3, but more recently to argue that morbidity has been deteriorating.4-6 However, these trends are potentially misleading: they include older people as well as the working-age population; they confuse a combined mortality-morbidity measure with morbidity; and they are based on self-reports of global health that are unreliable, as we show here and discuss in the main text.  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59   -"I have some problems in walking about" -"I am confined to bed"

Activity limitations and MSDs
[This is part of the widely-used EQ-5D health status indicator 8 . However, for the purposes of this paper we have separated the individual measures that make up the EQ5D in order to compare these to similar indicators of morbidity within each domain]. People are classified as having a problem with self-care today if they had some problems walking about or were confined to bed. Locomotor limitation This is based on the personal care disability scale used in the 2001 HSE report 9 . Respondents in 1995, 2000 and 2001 were asked if any of the following applied to them (interviewers were instructed to ignore temporary disabilities that are expected to last less than one year): -"Cannot walk 200 yards or more on own without stopping or discomfort". People who reported a limitation were asked if they used a walking aid, and if they did, were then asked if they could walk 200 yards without the walking aid. -"Cannot walk up and down a flight of 12 stairs without resting" -"Cannot bend down and pick up a shoe from the floor when standing" People are classified as having a locomotor limitation if they reported ANY of these limitations.

Problems with
In the self-completion survey in 1996, 2003-6, 2008, 2010-12 and 2014, respondents were washing/dressing asked 'Now we would like to know how your health is today. Please answer ALL the questions. By today ticking one box for each question below, please indicate which statements best describe your own health state today': -"I have no problems with self-care" -"I have some problems washing or dressing myself" -"I am unable to wash or dress myself"  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  [This is part of the widely-used EQ-5D health status indicator 8 . However, for the purposes of this paper we have separated the individual measures that make up the EQ-5D in order to compare these to similar indicators of morbidity within each domain]. People are classified as having a problem with self-care today if they had some problems washing/dressing or were unable to wash/dress themselves. Self-care limitation This is based on the personal care disability scale used in the 2001 HSE report 9 . Respondents in 1995, 2000 and 2001 were asked if any of the following applied to them (interviewers were instructed to ignore temporary disabilities that are expected to last less than one year): -"Cannot get in and out of bed on own without difficulty" -"Cannot get in and out of a chair without difficulty" -"Cannot dress and undress without difficulty" -"Cannot wash hands and face without difficulty" -"Cannot feed, including cutting up food without difficulty" -"Cannot get to and use toilet on own without difficulty" People are classified as having a self-care limitation if they reported ANY of these limitations.

High cholesterol
In the years 1994, 1998, 2006, and 2008-14, blood samples were obtained during the nurse visit, which were then analysed for total cholesterol. A high level of total cholesterol ('hypercholesterolaemia') is an established risk factor for CVD, and high cholesterol is defined following conventional practice at the NICE guidance 'audit level' of 5mmol/L or above 11 12 .
The measurement of cholesterol changed slightly in 2010 when a new laboratory was used. This resulted in values that are an average of 0.1mmol/L higher, and later values are therefore adjusted by this amount to maintain comparability over time as in 11 . Low HDL In the years 1994, 1998, 2006, and 2008-14, blood samples were obtained during the cholesterol nurse visit, which were then analysed for high density lipoprotein (HDL) cholesterol. HDL cholesterol reduces the risk of CVD (it carries cholesterol away from the arteries towards the liver), and it is therefore low HDL cholesterol that indicates poorer health; low HDL cholesterol is here defined as 1 mmol/L or less 11 12 .
The measurement of HDL cholesterol changed slightly in 2010 when a new laboratory was used. This resulted in values that are an average of 0.1mmol/L lower, and later values are therefore adjusted by this amount to maintain comparability over time as in 11 . Recent heart attack/stroke Respondents in 1994Respondents in , 1998Respondents in , 2003Respondents in , 2006 and 2011 were asked a series of questions on whether they have had a heart attack (within a battery of questions about different types of heart disease): -"Have you ever had a heart attack (including myocardial infarction or coronary thrombosis)?" -Those responding 'yes' were then asked "Were you told by a doctor that you had a Heart Attack (including myocardial infarction or coronary thrombosis)?" -Those with doctor-diagnosed angina were asked, "Have you had a heart attack (including myocardial infarction and coronary thrombosis) during the past 12 months?" Respondents in these years were similarly asked about stroke: -"Have you ever had a stroke?" -Those responding 'yes' were then asked, "Were you told by a doctor that you had a stroke?" -Those with doctor-diagnosed stroke were asked, "Have you had a stroke during the past 12 months?" People were considered to have recent IHD or stroke if they said they had ever been diagnosed as having stroke or a heart attack by a doctor, and that they have had a heart attack or stroke during the past 12 months. Recent angina Respondents in 1994Respondents in , 1998Respondents in , 2003Respondents in , 2006 and 2011 were asked a series of questions on whether they have angina (within a battery of questions about different types of heart disease): -"Have you ever had angina?" -Those responding 'yes' were then asked "You said that you had Angina. Were you told by a doctor that you had Angina?" -Those with doctor-diagnosed angina were asked, "Have you had angina during the past 12 months?" People were considered to have recent angina if they said they had ever been diagnosed as having angina by a doctor, and that they have had it during the past 12 months.  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  -"Have you ever had <type of heart disease>?" -Those responding 'yes' were then asked "You said that you had <type of heart disease>. Were you told by a doctor that you had <type of heart disease>?" -For heart murmurs only, women saying they had doctor-diagnosed heart murmurs were asked if they were pregnant when told this, and if so, whether they were ever told they had a heart murmur when they were not pregnant.
-Those with doctor-diagnosed heart disease (excluding heart murmurs when pregnant) were asked, "Have you had <type of heart disease> during the past 12 months?" People were considered to have recent CVD if they said they had a doctor-diagnosed heart condition and that they had had this during the past 12  different types of heart disease -including angina; heart attack (including myocardial infarction or coronary thrombosis); a heart murmur; abnormal heart rhythm; or other heart trouble. For EACH of these, they were asked:  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  o Those who said they stop or slow down were asked, "If you stand still does the pain go away or not?" (If respondents were unsure, they were asked, "What happens to the pain on most occasions?"). If the pain goes away, they were asked, "How soon does the pain go away? Does it go in 10 minutes or less, or more than 10 minutes?" o Those who said the pain goes away in 10 minutes or less were asked, "Will you show me where you get this pain or discomfort? Where else" The interviewer then coded the site as Sternum (upper or middle) | Sternum lower | Left anterior chest | Left arm | Right anterior chest | Right arm | (Somewhere else). Following the HSE reports, possible angina is defined as chest pain or discomfort that (i) includes either the sternum or the left arm and left anterior chest; (ii) is prompted by hurrying or walking uphill (or by walking on the level, for those who never attempt more); (iii) makes the respondent either stop or slacken pace; and (iv) usually disappears in 10 minutes or less when they stand still. Heart attack symptoms This is taken from the Rose Angina questionnaire. Respondents in 1994Respondents in , 1998Respondents in , 2003Respondents in , 2006 and 2011 were asked, "Have you ever had a severe pain across the front of your chest lasting for half an hour or more?" As in the 2006 HSE report, those responding 'yes' are treated as having a possible heart attack (myocardial infarction).

COPD symptoms
Respondents in 1995, 1996 and 2010 were asked: o "Do you usually cough first thing in the morning in the winter?" (In 2010 only, respondents had previously been asked "Do you usually cough first thing in the morning?" -but this is not used to filter people into the questions on coughing in winter).
o "Do you usually bring up any phlegm from your chest, first thing in the morning in the winter?" (Again, this was asked to everyone in all years, but was preceded by an additional, non-winter-specific question in 2010).  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  -Grade 2 dyspnoea: people who report shortness of breath when hurrying on level ground or walking up a slight hill (or who report shortness of breath when walking on level ground, but who say they never walk up hill or hurry). -Grade 3 dyspnoea: people who report shortness of breath when walking with people of own age on level ground, or who have to stop for breath when walking at own pace on level ground. (The same questions also exist in 1994 and 1998, but (i) the wider bank of questions differs substantially in the two versions and question context effects are likely; and (ii) the filtering into the final question differs between versions. However, the 1991-98 trends are included below). Recent wheezing/ Respondents in 1995-97, 2001 and 2010 were asked the following two questions as part asthma symptoms of the battery of questions on breathing problems: -"I am now going to ask you some questions about your breathing... Have you ever had wheezing or whistling in the chest at any time, either now, or in the past?" -Those that said yes were then asked, "Have you had wheezing or whistling in the chest in the last 12 months?" -(For those who said they had ever been told by a doctor they had asthma; see above), "When was your most recent attack of asthma? PROMPT IF NECESSARY: Less than 4 weeks ago | More than 4 weeks but within the last 12 months | One to five years ago | More than 5 years ago" People who said they had EITHER wheezing/whistling in the past 12 months or an asthma attack in the past 12 months were counted as having recent wheezing/asthma symptoms.
[It should be noted that the filtering to the second question is very slightly different in 2010 compared to previous years (it was only asked to people who said they had not had wheezing/whistling in the chest in the past 12 months). However, given the way that the derived variable is calculated here, the change in filtering does not introduce any discontinuities over time].  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  -Those responding 'yes' were then asked "Were you told by a doctor that you had diabetes?" -Women responding 'yes' were then asked, "Can I just check, were you pregnant when you were told that you had diabetes?", and those responding 'yes' were then asked "Have you ever had diabetes apart from when you were pregnant?" -Finally, those with doctor-diagnosed diabetes (excluding only when pregnant were asked: "Do you currently inject insulin for diabetes?" and "Are you currently taking any medicines, tablets or pills (other than insulin injections) for diabetes?" People were considered to have recent diabetes if they said they had ever been diagnosed as having diabetes by a doctor (excluding when pregnant), and that they are injecting insulin or taking any other medicines for diabetes.

Diabetes LSI
Every year 1994-2011, people who report a longstanding illness (LSI) are then asked, 'what is the matter with you?'; up to 6 responses are then coded by the interviewer into a consistent coding frame based on the International Classification of Diseases. The diabetes LSI measure is based on the group labelled 'Diabetes', which as of 2011 includes Diabetes and Hyperglycaemia. which were then analysed for glycated haemoglobin (HbA1C). HbA1C is a measure of the share of haemoglobin (within red blood cells) that glucose is attached to, with higher levels indicated less well-controlled diabetes in the previous three months 20 . Following the recommendations of a 2009 expert committee, we mirror recent HSE reports in using a threshold of 48mmol/mol (i.e. 48 millimoles of glycated haemoglobin per mole of haemoglobin) as the threshold for raised HbA1C, a different threshold to that used in earlier HSE reports.
While the measurement of HbA1C has been consistent in HSE from 1994, the units reported have changed from the % of haemoglobin that is glycated to mmol/mol. Earlier measures have been transformed into mmol/mol through the formula, mmol/mol = (% -2.15) x 10.929. HbA1C was also measured in 1994 but using a different technique, which cannot be made comparable 21:67 .

Other biomarkers
Raised C-reactive protein In the years 1998, 2003, 2006, and 2009, blood samples were obtained during the nurse visit, which were then analysed for C-reactive protein (CRP). CRP is an inflammatory marker, which can indicate heart-related inflammation (it is used to test for heart failure) but can also indicate other sorts of health damage including diabetes. However, there are still debates about exactly what CRP shows, both in terms of its causal role in heart disease, and whether it also indicates depression. 22 Raised CRP is defined as >3mg/L, the standard cut-off for a clinically significant rise in CVD 23 24 . Participants with CRP >10mg/L are excluded, as this is taken to be evidence of current infection rather than inflammation from chronic disease.

Raised Fibrinogen
In the years 1998, 2003, 2006, and 2009, blood samples were obtained during the nurse visit, which were then analysed for fibrinogen. Like CRP, fibrinogen is an inflammatory marker, which is both commonly thought to be a causal risk factor for CVD (it is a component of coagulation), and which seems to be a risk factor for other diseases (including cancer and diabetes) 25 .
While fibrinogen is often analysed as a continuous variable with no cutpoints 24 , we here define raised fibrinogen as>4mg/L as in 12 . As for CRP, participants with CRP >10mg/L are excluded, as this is taken to be evidence of current infection rather than inflammation from chronic disease. A change of analysis method and laboratory between 1994 and 1998 means that the 1994 results are not comparable to the later results 26:8.10.4 . Anaemia In the years 1994, 1998, 2006, and 2009, blood samples were obtained during the nurse visit, which were then analysed for haemoglobin. Haemoglobin dist ributes oxygen around the body, and low haemoglobin levels usually indicate anaemia. Various different thresholds for low haemoglobin have been used in the literature, particularly for older populations 27 , but we here used the longstanding WHO definition of <13g/dL for men and <12g/dL for women 24 .

Iron deficiency
In the years 1994, 1998, 2006, and 2009, blood samples were obtained during the nurse visit, which were then analysed for serum ferritin (which correlates directly with the amount of iron stored in the body). Iron deficiency is one of several possible causes of anaemia (alongside other nutritional deficiencies, genetic conditions such as sickle cell anaemia, infections, and blood loss). Iron deficiency is defined as a serum ferritin less than 45ng/ml 27 . In the self-completion survey in most years (except 1996, 2007, 2011 and 2013), respondents were asked the following series of questions:

Mental health
-"Please read this carefully: We should like to know how your health has been in general over the past few weeks. Please answer ALL the questions by ticking the box below the answer which you think most applies to you. Have you recently...
-"…been able to concentrate on whatever you're doing?" RESPONSES: "Better than usual" | "Same as usual" | "Less than usual" | "Much less than usual" -"…lost much sleep over worry?" RESPONSES: "Not at all" | "No more than usual" | "Rather more than usual" | "Much more than usual"" -"…felt you were playing a useful part in things?" RESPONSES: "More so than usual" | "Same as usual" | "Less useful than usual" | "Much less useful"" -"…felt capable of making decisions about things?" RESPONSES: "More so than usual" | "Same as usual" | "Less so than usual" | "Much less capable"" -"…felt constantly under strain? RESPONSES: "Not at all" | "No more than usual" | "Rather more than usual" | "Much more than usual"" -"..felt you couldn't overcome your difficulties?" RESPONSES: "Not at all" | "No more than usual" | "Rather more than usual" | "Much more than usual"" -"…been able to enjoy your normal day-to-day activities?" RESPONSES: "More so than usual" | "Same as usual" | "Less so than usual" | "Much less than usual" -"…been able to face up to your problems?" RESPONSES: "More so than usual" | "Same as usual" | "Less able than usual" | "Much less able" -"…been feeling unhappy and depressed? RESPONSES: "Not at all" | "No more than usual" | "Rather more than usual" | "Much more than usual" -"…been losing confidence in yourself? RESPONSES: "Not at all" | "No more than usual" | "Rather more than usual" | "Much more than usual" -"…been thinking of yourself as a worthless person?" RESPONSES: "Not at all" | "No more than usual" | "Rather more than usual" | "Much more than usual"" -"…been feeling reasonably happy, all things considered?" RESPONSES: "More so than usual" | "Same as usual" | "Less so than usual" | "Much less happy" These make up the 12-item General Health Questionnaire GHQ-12; 28 , a well-validated, widely-used measure of probable mental ill-health. This is often termed general nonpsychotic psychiatric morbidity, but I here use the more easily understood term 'psychological distress' following Stochl et al 2016. 29 A total score has been created by first ensuring that all questions were coded from 1 (positive symptom) to 4 (negative symptom), and then creating a sum score for all the number of questions in which people answered with categories 3 or 4 (indicating a negative symptom). A binary measure (often called GHQ caseness) was created for people who had negative symptoms for 4 or more of the 12 questions.

Anxiety/depression
In the self-completion survey in 1996, 2003-6, 2008, 2010-12 and 2014, respondents were (moderately / asked 'Now we would like to know how your health is today. Please answer ALL the questions. By Extremely) ticking one box for each question below, please indicate which statements best describe your own health state today': -"I am not anxious or depressed" -"I am moderately anxious or depressed" -"I am extremely anxious or depressed" [This is part of the widely-used EQ-5D health status indicator 8 . However, for the purposes of this paper we have separated the individual measures that make up the EQ5D in order to compare these to similar indicators of morbidity within each domain]. Two outcome measures are based on this: whether people have any anxiety/depression (the 2 nd and 3 rd categories combined), and whether they have extreme anxiety/depression (3 rd category only).

Communication
Hearing, seeing & communication limitations These measures were not included in the main paper due to the short time frame that we can examine trends over, but are included in the Web Appendix as they relate to important domains of morbidity. They were included in the disability scale used in the 2001 HSE report 9 . Respondents in 1995, 2000 and 2001 were asked if of the following applied to them (interviewers were instructed to ignore temporary disabilities that are expected to last less than one year): • "Cannot follow a TV programme at a volume others find acceptable (with hearing aid if normally worn)" ('hearing limitation') Changes over time in several other measures are only presented in Web Appendices 4 & 6, rather than the main paper. Details of these variables are included below:

General health
General health Every year, respondents were asked, "How is your health in general? Would you say it was ... ( Two outcome measures are based on this, following standard practice in the HSE reports: bad general health (which includes 'bad' or 'very bad' health) and good general health (which includes 'good' or 'very good' health).

Longstanding illness (LSI)
Every year 1994-2011, respondents were asked "Do you have any long-standing illness, disability or infirmity? By long-standing I mean anything that has troubled you over a period of time, or that is likely to affect you over a period of time?" (The response options were 'Yes' and 'No'). In 2012 the question was changed to be consistent with the Government's new harmonised disability questions for use in social surveys 30 , and is not comparable to the previous version. Limiting LSI Every year 1996-2011, respondents who said they had an LSI were than asked, "Does this illness or disability (do any of these illnesses or disabilities) limit your activities in any way?" (again allowing only Yes/No answers).
In 2012 the question was changed to be consistent with the Government's new harmonised disability questions for use in social surveys (see HSE 2012 report), and is not comparable to the previous version. Problems with usual activities (some problems / unable) In the self-completion survey in 1996, 2003-6, 2008, 2010-12 and 2014, respondents were asked 'Now we would like to know how your health is today. Please answer ALL the questions. By ticking one box for each question below, please indicate which statements best describe your own health state today': -"I have no problems with performing my usual activities (e.g. work, study, housework, family or leisure activities)" -"I have some problems with performing my usual activities" -"I am unable to perform my usual activities" [This is part of the widely-used EQ-5D health status indicator 8 . However, for the purposes of this paper we have separated the individual measures that make up the EQ5D in order to compare these to similar indicators of morbidity within each domain]. Two outcome measures are based on this: whether people have any problems (the 2 nd and 3 rd categories combined), and whether they are unable to perform their usual activities (3 rd category only).

Limitations in past
Every year, respondents were asked, "Now I'd like you to think about the two weeks ending 2wks yesterday. During those 2 weeks did you have to cut down on any of the things you usually do (about the house or at work or in your free time) because of your answer at <the LSI question> or some other illness or injury?" There have been two small changes to this question's wording in 1996. Firstly, 'work' was changed to 'work/school'. Secondly, 'your answer at <the LSI question>' was changed to 'a condition you have just told me about'. While it is impossible to be sure of the exact effect of these changes, neither seem likely to influence the results (at least for the 25+ age group where fewer individuals are in full-time education).

Appendix 6: Measures not included in the main paper
Trends in several measures are not included in the main paper, either This is based on the personal care disability scale used in the 2001 HSE report 9 . Respondents in 1995, 2000 and 2001 were asked if any of the following applied to them (interviewers were instructed to ignore temporary disabilities that are expected to last less than one year): -"Cannot dress and undress without difficulty" -"Cannot wash hands and face without difficulty" For comparison to the 'problems with washing/dressing today' measure in the main paper (which covers a more extended period and is based on a different question; see above), a measure is derived if respondents say they report either of these problems.  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  Web Appendix page 27

Other LSIs
• If complaint is breathlessness with the cause also stated, this is coded with the cause -hence it also excludes breathlessness as a result of anaemia, breathlessness due to hole in heart, and breathlessness due to angina. Component measure: Wheezing In the main paper, we look at whether people report recent wheezing/asthma. As shown above, this comes from three questions: whether people report ever having had wheezing or whistling in the chest; whether they have had this in the past 12 months; and whether they have had an asthma attack in the past 12 months. Web Appendix 6 shows trends in the other versions of these measures, i.e. having ever had wheezing/whistling in the chest, and whether they have had this in the past 1 months.

Beyond 'recent':
In the main paper, we look at whether people report recent doctor-diagnosed diabetes 'Ever had' and 'DD' As shown above, this comes from three questions: whether people report ever having diabetes this condition; whether a doctor diagnosed this; and whether they currently inject insulin / take other medication for diabetes.
Web Appendix 6 shows trends in the other versions of these measures, i.e. having ever had diabetes, and having ever doctor-diagnosed ('DD') diabetes.

Activity limitations
For comparison: This is based on the personal care disability scale used in the 2001 HSE report 9 . Walking limitation Respondents in 1995, 2000 and 2001 were asked if of the following applied to them (interviewers were instructed to ignore temporary disabilities that are expected to last less than one year): "Cannot walk 200 yards or more on own without stopping or discomfort". People who reported a limitation were asked if they used a walking aid, and if they did, were then asked if they could walk 200 yards without the walking aid.
2011 includes: Acoustic neuroma, After effect of cancer (nes), All tumours, growths, masses, lumps and cysts, whether malignant or benign eg. tumour on brain,, growth in bowel, growth on spinal cord, lump in, breast, Cancers sited in any part of the body or system eg. • The Blood Disorders LSI measure is based on the group 'Disorders of blood and blood forming organs and immunity disorders', which as of 2011 includes: Anaemia, pernicious anaemia, Blood condition (nes), blood deficiency, Haemophilia, Idiopathic Thrombochopenic Purpura (ITP), Immunodeficiences, Polycthaemia (blood thickening), blood to thick, Purpura (nes), Removal of spleen, Sarcoidosis (previously code 37), Sickle cell anaemia/disease, Thalassaemia, Thrombocythenia. It explicitly excludes Leukaemia -code 01.

Appendix 7: Year-by-year trends
This appendix presents the year-by-year trends for all of the variables included in the main paper. The table row labelled 'start v end sig' presents the p-value for testing the null hypothesis that there is no difference between the first and last years in the series (whichever these years are).
Note that this will differ from the confidence intervals presented in the main paper as these are grouped into multi-year periods with larger sample sizes and therefore greater precision.      Changes over time in some of these indicators have not previously been analysed (e.g. waist-hip ratio, fibrinogen). However, others have been studied but never integrated into a single picture of changing morbidity; we review these in this section. (For reasons of space these are included here rather than in the main text).

Cardiovascular morbidity
1998-2011 trends in the two biomarkers for total and HDL cholesterol using HSE data are shown in Oyebode,11 who find similar results. (pastyear wheezing) -at the same time as the reported prevalence of doctor-diagnosed asthma increased.

Obesity & diabetes
While the English trends in waist-hip ratio have not previously been analysed, earlier Scottish trends are given in Hotchkiss et al 2012. 19 Trends in diabetes have been covered in several HSE reports, e.g. Moody 2012,20 as has BMI (see particularly the paper by Sperrin et al 2014,32 who also created a publicly-available time-series HSE dataset for this purpose).

Activity limitations, pain & musculoskeletal morbidity
While musculoskeletal LSIs have not previously been analysed in HSE, a decline can also be seen in the General Household Survey.33

Mental health
In the UK and most other high-income countries, benefit claims due to mental ill-health have been rising,34 which has come alongside considerable increases in mental health diagnosis and treatment. 35 The extent to which this reflects rises in mental ill-health and genuinely declining work capacity, however, has long been the subject of debate.36 37 Perhaps the most robust long-term general population data series in the UK is the Adult Psychiatric Morbidity Survey. 35 38 While some studies have used HSE to show rises in mental ill-health, others have used the same data to come to the opposite conclusion.39 40 These contrasting conclusions are explained by the tables in Web Appendix 7 which show year-by-year changes: moderate mental ill-health fell between the mid1990s and the mid-2000s, before rising in 2009, and with a particularly high prevalence in 2011. The conclusions of studies will therefore depend on the years they use as their start and end periods for the trend analysis. 3 It is also worth noting that our results for considerable increases in mental health LSIs can also be seen in a similar measure in the Labour Force Survey. 41 42 Other morbidity measures While CRP and fibrinogen are collected in HSE at considerable efforts, their trends have rarely been studied (e.g. they appear only in supplementary descriptive tables in Hughes et al 23). A decline in anaemia using HSE data 1998-2005 has been observed by Tull et al 2009,43 but this has not hitherto been updated to the 2008-10 period.
It has been suggested that multimorbidity has risen among older people in England 44 and for all age groups in Ontario,45 although others have cautioned against using simple disease counts,46 and the evidence cited in the introduction of the main paper suggests that rising chronic disease reporting may partly be a result of increasing awareness (rather than underlying prevalence) of disease. 3 The major explanation why 'moderate anxiety/depression today' does not show a decline 2011-14 compared to 1994-6 is because of a single very high reported prevalence in 2011, which had reduced by 2012 and 2014.
The alternate measure ('psychological distress symptoms') was not asked in 2011.

Appendix 9: Summarising multiple measures
Having reviewed trends in 39 morbidity measures, we have seen that morbidity in the English working-age population has improved in some respects and deteriorated in others. For those who view work-related morbidity as intrinsically multidimensional,47, this is the endpoint of our analysis. However, for those who conceive of morbidity as unidimensional -or those who are interested in morbidity as it relates to a unidimensional work capacity -this raises the question of how we weight different dimensions of morbidity to decide if the overall change in morbidity has been positive or negative.

Methods for creating unidimensional morbidity scales
Several methods have been proposed for creating unidimensional morbidity scales, but most of these are unavailable using the HSE data: • Weights can be based on empirically-derived preferences for different health states, of which the most famous example is the WHO Global Burden of Disease (GBD) study 48. Some GBD estimates for trends in disability in the UK do exist, and suggest that the prevalence of disability in the working-age population is unchanged 1990-2010, though these results are only presented in passing. 4 For our analyses, however, we have no preference-based weights for most of the HSE measures (excluding the subset of measures that make up the EQ-5D scale).
• Those reporting limitations beyond a certain severity in any domain can be categorised as 'disabled', as recommended by the Washington Group on Disability Statistics (see above). However, as previously discussed, we have few functional limitations measures available in HSE.
• Latent morbidity scales can be created based on the inter-correlations between different measures (using item response theory), as used in the World Disability Report 51 and by researchers associated with the US National Bureau of Economic Research e.g. 52. However, it is unclear why we would wish to weight items in this way: a given morbidity indicator may be severe, yet if it is unrelated to other morbidity measures it will be given a low weight.
• Latent morbidity scales can also be created based on the independent correlation between each indicator and a general measure of morbidity, such as general self-reported health or 53 as in 54. This maintains some of the advantages of single-item measures (in providing a basis for making morbidity unidimensional), while avoiding the potential threats to validity discussed above. However, the inconsistent inclusion of measures in each HSE wave prevents a unidimensional morbidity scale being constructed here. 4 Trends in the UK GBD results are reported in Murray et al. 49 However, Murray et al do not focus on trends in years lived with disability (YLD), other than to note that "YLDs per person by age and sex have not changed substantially in the UK, but age-specific mortality has been improving" (p1005). The figure in the supplementary appendix shows that YLDs have barely changed for either men or women at any age. However, the confidence intervals for YLDs as a whole in the main paper (Table 1) suggest that the confidence intervals for these trends are very wide. The public GBD data 50 do provide cause-disaggregated YLDs for the UK (and all other countries) for a slightly different period (2000-2015), but are not agestandardised, are within broad age groups only (e.g. [15][16][17][18][19][20][21][22][23][24][25][26][27][28][29], and again lack estimates of uncertainty. , educational level and each individual morbidity measure in turn, using all years for which that morbidity measure is available. That is, for each morbidity indicator morbidity we use the following model: badhealth =logit β morbidity + +β + ! " * male $ %+ & '() *+,-… where β is our primary outcome coefficient showing the importance of that morbidity indicator for bad health, refers to a vector of age dummy variables, male $ refers to a binary gender dummy variable, '() *+, refers to a vector of education dummy variables (with four levels: degree/full-time student, A-levels/NVQ3/higher education below degree, other qualifications, or no qualifications), and , . , ! , and & refer to the coefficients on age, gender, their interaction and education respectively.
We adjust for education as well as age & sex to enable us to examine the importance of the measure for bad health, after taking account of whether general health and the measure are both strongly related to social status. Note however that it is not possible to control for all morbidity measures simultaneously (as we discuss just above) -so this is a rough indicator of the importance of that morbidity measure for general health, rather than a reliable indicator of the causal impact net of comorbidities.
The results of this analysis are shown overleaf, ordered by the effect on bad health. (We also repeat the trend in each measure for convenience; this is discussed following the table). Having estimated this, we can see if the areas in which morbidity has been improving or declining are those that are particularly important for general health. This is shown visually in Figure 1 below (the measures are not labelled to enable the overall pattern to be seen, but the top-to-bottom order of measures is the same in the figure as in the preceding table; i.e. the measure at the top of the figure is 'Pain-extreme').  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58   a 'Trend' is as reported above in the main paper. 'Effect on bad health' shows the effect of the morbidity measure on (very) bad health after controlling for age, sex (and their interaction) and educational level, using all years for which the individual morbidity measure is available. (This shows average marginal effects following a logistic regression; see text above).

Measure
It is easiest to interpret the figure by focussing on each group of measures in turn. Firstly, the biomarkers tend to have the weakest relationship with general health. Those with high levels of the diabetes biomarker (glycated haemoglobin) are 9.7% more likely to say they have bad health, and those who are underweight, with a high waist-hip ratio, raised fibrinogen, or low HDL cholesterol are 4-6% more likely to report bad health, but the other measures only had weaker relationships. Indeed, there was effectively no relationship between bad reported health and any of measured high blood pressure, high total cholesterol or iron deficiency.
Secondly, most of the measures based on medical labels have a moderately strong relationship with bad health (the weakest being lifetime asthma and recent high blood pressure, both of which can be asymptomatic), and these measures have mostly risen over time. There are however notable exceptions to this, including IHD/stroke LSI, recent angina and recent heart attack/stroke (the labelbased measures with some of the strongest relationships with bad reported health), as well as arthritis and other musculoskeletal LSIs.

Instructions to authors
Complete this checklist by entering the page numbers from your manuscript where readers will find each of the items listed below.
Your article may not currently address all the items on the checklist. Please modify your text to include the missing information. If you are certain that an item does not apply, please write "n/a" and provide a short explanation.
Upload your completed checklist as an extra file when you submit to a journal.
In your methods section, say that you used the STROBE cross sectionalreporting guidelines, and cite them as: von

Title and abstract
Title #1a Indicate the study's design with a commonly used term in the title or the abstract 1, 3 Statistical methods #12a Describe all statistical methods, including those used to control for confounding [13][14] Statistical methods #12b Describe any methods used to examine subgroups and interactions n/a Statistical methods #12c Explain how missing data were addressed A3-9 Statistical methods #12d If applicable, describe analytical methods taking account of sampling strategy 13-14 Statistical methods #12e Describe any sensitivity analyses A10-11,

Results
Participants #13a Report numbers of individuals at each stage of study-eg numbers potentially eligible, examined for eligibility, confirmed eligible, included in the study, completing follow-up, and analysed. Give information separately for for exposed and unexposed groups if applicable.

A3-A9
Participants #13b Give reasons for non-participation at each stage A3-A9 Participants #13c Consider use of a flow diagram n/a Descriptive data #14a Give characteristics of study participants (eg demographic, clinical, social) and information on exposures and potential confounders. Give information separately for exposed and unexposed groups if applicable.
n/a [this is a descriptive study] Other analyses #17 Report other analyses done-e.g., analyses of subgroups and interactions, and sensitivity analyses A10-11,

Discussion
Key results #18 Summarise key results with reference to study objectives 21-23 Limitations #19 Discuss limitations of the study, taking into account sources of potential bias or imprecision. Discuss both direction and magnitude of any potential bias.

21-22
Interpretation #20 Give a cautious overall interpretation considering objectives, limitations, multiplicity of analyses, results from similar studies, and other relevant evidence.