Smoker, ex-smoker or non-smoker? The validity of routinely recorded smoking status in UK primary care: a cross-sectional study

Objective To investigate how smoking status is recorded in UK primary care; to evaluate whether appropriate multiple imputation (MI) of smoking status yields results consistent with health surveys. Setting UK primary care and a population survey conducted in the community. Participants We identified 354 204 patients aged 16 or over in The Health Improvement Network (THIN) primary care database registered with their general practice 2008–2009 and 15 102 individuals aged 16 or over in the Health Survey for England (HSE). Outcome measures Age-standardised and age-specific proportions of smokers, ex-smokers and non-smokers in THIN and the HSE before and after MI. Using information on time since quitting in the HSE, we estimated when ex-smokers are typically recorded as non-smokers in primary care records. Results In THIN, smoking status was recorded for 84% of patients within 1 year of registration. Of these, 28% were smokers (21% in the HSE). After MI of missing smoking data, the proportion of smokers was 25% (missing at random) and 20% (missing not at random). With increasing age, more were identified as ex-smokers in the HSE than THIN. It appears that those who quit before age 30 were less likely to be recorded as an ex-smoker in primary care than people who quit later. Conclusions Smoking status was relatively well recorded in primary care. Misclassification of ex-smokers as non-smokers is likely to occur in those quitting smoking at an early age and/or a long time ago. Those with no smoking status information are more likely to be ex-smokers or non-smokers than smokers.

of 25-30 years were less likely to be recorded as an ex-smoker in primary care than people who quit later.

Conclusions:
Smoking status is relatively well recorded in primary care.
Misclassification of ex-smokers as non-smokers is likely to occur in those quitting smoking at an early age and/ or a long time ago. Those with no smoking status information are more likely to be ex or non-smokers versus smokers.

STRENGTHS AND LIMITATIONS OF THIS STUDY
• England (HSE). Although the proportion of smokers was similar between THIN and the HSE before multiple imputation of data in THIN, the proportion of smokers was substantially higher after multiple imputation in THIN. On the other hand, the proportion of ex-smokers was substantially lower in THIN both before and after imputation compared to the HSE. This suggests that current smokers may be adequately identified using primary care data and most people with missing data on smoking status are likely to be either ex or non-smokers. This has clinical importance as smoking status (including ex-smoking) may be used to In this study we further investigate recording of smoking status in primary care and explore potential reasons for the discrepancy in the proportion of ex-smokers between primary care records and the HSE. Specifically, we seek to deduce when ex-smokers may not be recorded as such in primary care records based on information about time since quitting in the HSE. Finally, we aim to provide a practical solution for imputation of missing smoking status records in routinely collected clinical data.

Study populations
We used data from THIN primary care database.

Definition of smoking status
In THIN, smoking status was recorded by self-report. In many general practices this would be on the basis of a questionnaire submitted at the time of registration, whereas in other general practices this would be recorded in conjunction with a clinical consultation with the general practitioner or practice nurse. Patients would be classed as current non-smoker, or current smokers. In some instance the non-smokers would be classified as ex-smokers but this was variably defined from one practice to another. In the HSE, smoking status was defined on the basis of a series of questions (see Appendix 1) and individuals who had ever smoked (but did not smoke at the time of the interview) would be defined as exsmokers, regardless of their age at quitting and length of time since they quit.
The HSE holds information on when ex-smokers quit so that age at the time they quit can be deduced, whereas this information was not consistently available in THIN.

Statistical analyses
Initially, we examined smoking status (smoker, ex-smoker, non-smoker or missing) in THIN and the HSE, overall, by age group, gender and Index of After preliminary analysis, [26] we included the following variables in the multiple imputation models: age in years, gender and IMD quintile, [25] health indicators: smoking status (three categories, non, ex and current smoker), height, weight, systolic and diastolic blood pressures and disease indicators: type II diabetes, coronary heart disease (CHD) and cerebrovascular accident (CVA). Multiple imputation was performed using Chained Equations using the ice command using Stata 11.[29, 30] Continuous variables were imputed using multiple linear regression, smoking status using multinomial regression and IMD quintile using ordered logistic regression. Percentages in each smoking category were obtained using Rubin's Rules. [31] In the first multiple imputation we assumed that smoking data were MAR and hence allowed imputed smoking data of either smokers, non-smokers or ex-smokers (using a MAR assumption; hereafter referred to as MAR MI). In the second multiple imputation we assumed that all smokers had been recorded (so that smoking data were MNAR) and we imputed missing smoking data as either ex-smokers or non-smokers (hereafter referred to as MNAR MI).
Following multiple imputation we carried out age-specific direct standardisation using the HSE as the standard population and the age-specific proportion in each smoking category from THIN. This was done to account for the fact that the mean age in the HSE was 49 years while the mean age in THIN was 38 years in the year after registration. We deduced the average time after which an ex-smoker is no longer classified as an ex-smoker in primary care records by combining information from the HSE on when ex-smokers quit and the age-specific distribution of ex-smokers in THIN after imputation of non and ex-smokers. This was done by ranking the individuals in the HSE in accordance to the length of time since they quit by 10 year age groups and then 'reclassifying' individuals who had quit the longest time ago within each age group from ex to non until we reached the same proportion of ex-smokers in the HSE as in THIN.

RESULTS
In total, 354,204 individuals were included from THIN and 15,102 from the HSE.
Individuals in THIN were, on average 11 years younger than those in the HSE (38 years versus 49 years, respectively) ( Table 1). Smoking status was recorded for 84% in THIN within one year of initial registration. Before multiple imputation of missing data, a greater proportion of people were recorded as smokers in THIN than the HSE (24% versus 21% respectively), and the proportions of exsmokers and non-smokers differed substantially between THIN and the HSE (Table 1).  Our first analyses used missing as a separate category of smoking, so we refer to those with reported smoking status as "known smokers" and "known exsmokers". The proportion of known smokers by age group was similar in THIN and the HSE between 30 and 79 years, but this was not the case for the proportions of known ex-smokers and non-smokers ( Figure 1). In the HSE, the proportion of ex-smokers increased from 12% within the 20-29 age group to 46% in the 80-89 age group. In THIN, the proportion of known ex-smokers also increased with age although the overall proportion of known ex-smokers was smaller than in the HSE for all age groups after 20-29 years. Conversely, in the HSE, the proportion of non-smokers decreased slightly from 56% in the 20-29 age group to 48% in the 80-89 age group. Within THIN, the proportion of known non-smokers remained constant with increasing age at around 43%. The proportion of missing smoking data in THIN was relatively constant at less than In THIN, the percentage of non-smokers was greater for women (52%) than men (40%) while the percentage of smokers was smaller for women (21%) than men (27%). There were similar trends in the HSE, although the percentage differences between sexes were smaller (smokers: 22% of men versus 20% of women).
The proportions in each smoking status category varied substantially by social deprivation in both THIN and the HSE (Figure 2). In THIN, the percentage of non-smokers decreased from 52% in the least deprived quintile to 40% in the most deprived quintile. The percentage of ex-smokers decreased slightly with increasing deprivation. In contrast, the percentage of smokers increased with increasing deprivation from 16% in the least deprived quintile to 34% in the most deprived quintile (Figure 2). The patterns were similar in the HSE although the proportion of ex-smokers was substantially larger across all levels of deprivation in the HSE compared to THIN.

Analyses imputing missing smoking status
After MAR MI of THIN, age-standardised smoking prevalences still differed somewhat between THIN and the HSE. For example, 22% were ex-smokers in THIN compared with 26% in the HSE; 25% were smokers in THIN, compared with 21%in the HSE (Table 2).
After MNAR MI of THIN (that is, regarding missing values as either ex-smokers or non-smokers), the age-standardised prevalence of smoking in THIN was similar to that in the HSE (Table 2). However, the age-specific prevalence of exsmokers was still greater in HSE than in THIN. Age-specific analysis showed that this difference was greatest at older ages, and indeed reversed at younger ages This suggested that individuals who had quit in the less recent past might be classified as non-smokers in THIN but as ex-smokers in HSE. (Figure 3).  The median time since ex-smokers quit in the HSE varied greatly by age group (Table 3), from two years (Interquartile range (IQR): 0, 3) in the under 20s to 40 (IQR: 25, 51) years in those aged 90 or over (Table 3). Equating proportions of ex-smokers in THIN to that in the HSE data suggested the typical time-window after which patients are no longer regarded as ex-smokers in primary care, but instead regarded as non-smokers, varied with age. Thus, typically individuals who registered with a general practice when they were in their forties would no longer be recorded as an ex-smoker if they quit more than 22 years earlier (when they were between 18 and 27 years of age) (Table 3). Individuals registering in their seventies would typically no longer be recorded as ex-smokers if they quit 42 years earlier (when they were between the ages of 28 and 37 years) (Table   3). Yet, most individuals who quit after the age of 30 would still be captured as ex-smokers when they later registered with a new general practice. Using these age-specific extrapolations to reclassify ex-smokers as non-smokers in the HSE according to when they quit, we can see that the age-specific distributions of exsmokers in THIN and the reclassified HSE are similar ( Figure 3).

DISCUSSION
The proportion of newly registered patients in THIN between 2008 and 2009 with a record of being a smoker was slightly higher than the HSE in 2008. However, the proportion of individuals recorded as ex-smokers and non-smokers differed substantially between THIN and the HSE. Overall, a larger proportion of individuals were recorded as ex-smokers in the HSE than in THIN and this increased with age. Likewise, the proportion of ex-smokers was substantially larger across all levels of deprivation in the HSE compared to THIN.
Under MAR MI there was a greater percentage of smokers (25%) and a smaller percentage of ex-smokers (22%) in THIN compared with the HSE (smokers 21%, ex-smokers 26%). However, under MNAR MI (assuming all missing data were either ex-smokers or non-smokers) slightly increased the proportion of nonsmokers (57%) in THIN compared to the HSE (53%), whereas the proportion of ex-smokers (23%) was slightly lower in THIN. Moreover, the latter imputation  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60   F  o  r  p  e  e  r  r  e  v  i  e  w  o  n  l  y 17 resulted in a relatively larger percentage of ex-smokers in THIN in those aged under 30 years compared with the HSE. This may be because the imputation model was unable to distinguish between ex and non-smokers in those age groups as both are unlikely to have developed typical later onset diseases which are key predictors of smoking status in the imputation model.
There may be several reasons for the discrepancy in the distribution of the smoking categories between THIN and the HSE. In the HSE, the definition of an ex-smoker was highly sensitive and clearly defined.
[24] Thus respondents were categorised as ex-smokers even if they were a trivial smoker, smoked for a short period of time and/ or quit many decades ago. Also, the HSE used computer aided personal interviewing; where questions were read to the respondent in a standardised way from the screen and a detailed sequence of questions were asked to ascertain current smoking status. In primary care, while smoking status is systematically recorded in medical records, there is no detailed protocol for recording smoking status and the ascertainment is thus likely to vary by how the information was obtained. Many practices use self-report questionnaires at registration including smoking status. Smoking status is then updated by health professionals (general practitioners and/ or practice nurses) during consultations where smoking status is often recorded as part of an assessment of current or future disease risk.  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60   F  o  r  p  e  e  r  r  e  v  i  e  w  o  n  l  y   18 Our examination of the age-standardised data suggests that typically an exsmoker in primary care settings is recorded as a non-smoker when they quit at a young age or had not smoked for a substantial time period. This could be because the patient may not volunteer previous smoking in either initial selfreport questionnaire or on questioning by clinicians when it was minor, long ago or they consider it not relevant to their current or future health. It is possible that patients are more reluctant to volunteer ex-smoking habits when data are being held on their medical record and is not anonymous. However, comparing the proportion of individuals with a smoking record in THIN with that of the HSE we found a similar distribution suggesting that most smokers were identified in the  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60   F  o  r  p  e  e  r  r  e  v  i  e  w  o  n  l  y 19 assumes that everyone who becomes an ex-smoker does so at the same time in their lives as others in their age group. However, it may be indicative of reporting of smoking status at the GP practice, given the results shown in this study.
An alternative method of dealing with unobserved smoking data is to dichotomise smoking status into current smokers and non-current smokers with missing data assumed to be non-current smokers. However, it should be noted that this solution may be to the detriment of some epidemiological studies where exsmokers who quit recently are at greater risk of disease than non-smokers. For example, the 50 year follow up of male British doctors shows that ex-smokers had elevated age standardised mortality rates for many diseases. [37,38] Our findings suggest that in contrast to health surveys, patients who quit smoking at a young age (before 25-30) are likely to be recorded by their general practice as a non-smoker instead of an ex-smoker. This has implications for researchers using these data sources. To our knowledge this is the first study which seeks to deduce and quantify typical time between when a smoker quit and when they are no longer perceived as an ex-smoker in primary care. Clinicians, policy-makers and researchers who wish to use smoking status in primary care records to identify populations at risk of smoking-related diseases can be reassured by our findings that using data from new registrations, most current smokers will be identified and misclassification of ex-smokers is more likely to have occurred in those who have quit smoking at an early age and/ or a long time ago.   : Age group specific percentages of ex-smokers in THIN (after MNAR imputation) and the HSE 2008 (before and after reclassifying ex-smokers in the HSE who quit before the age specified in Table 3 column 3 to be non-smokers)

Author contributions
LM extracted and analysed the data and wrote the first draft of the paper with help from IP and JRC. KRW and IN provided clinical input and IRW and RWM provided additional statistical input. All authors commented on the paper and helped write subsequent drafts.

Data sharing statement
No data are available I Dr Louise Marston the Corresponding Author of this article contained within the original manuscript which includes any diagrams & photographs, other illustrative material, video, film or any other material howsoever submitted by the Contributor(s) at any time and related to the Contribution ("the Contribution") have the right to grant on behalf of all authors and do grant on behalf of all authors, a licence to the BMJ Publishing Group Ltd and its licensees, to permit this Contribution (if accepted) to be published in BMJ Open and any other BMJ Group products and to exploit all subsidiary rights, as set out in the licence at: (http://group.bmj.com/products/journals/instructions-forauthors/BMJOpen_licence.pdf)        (before and after reclassifying ex-smokers in the HSE who quit before the age specified in Table 3   Outcome measures: Age-standardised and age-specific proportions of smokers, ex-smokers and non-smokers in THIN and the HSE before and after multiple imputation (MI). Using information on time since quitting in the HSE, we estimated when ex-smokers are typically recorded as non-smokers in primary care records.

Results:
In THIN, smoking status was recorded for 84% of patients within one year of registration. Of these; 28% were smokers (21% in the HSE). After MI of missing smoking data, the proportion of smokers was 25% (missing at random) and 20% (missing not at random). With increasing age, more were identified as ex-smokers in the HSE than THIN. It appears that those who quit before age 30 were less likely to be recorded as an ex-smoker in primary care than people who quit later.

Conclusions:
Smoking status was relatively well recorded in primary care.
Misclassification of ex-smokers as non-smokers is likely to occur in those quitting smoking at an early age and/ or a long time ago. Those with no smoking status information are more likely to be ex or non-smokers than smokers.

STRENGTHS AND LIMITATIONS OF THIS STUDY
• In this study we further investigate recording of smoking status in primary care and explore potential reasons for the discrepancy in the proportion of ex-smokers between primary care records and the HSE. Specifically, we seek to deduce when ex-smokers may not be recorded as such in primary care records based on information about time since quitting in the HSE. Finally, we aim to provide a practical solution for imputation of missing smoking status records in routinely collected clinical data.

Study populations
We used data from THIN primary care database, from practices in England that had passed data quality checks, to ensure they were using their computer smoking (99.3%) and we therefore used the data from patients with complete smoking information.

Definition of smoking status
In THIN, smoking status was recorded by self-report. In many general practices this would be on the basis of a questionnaire submitted at the time of registration, whereas in other general practices this would be recorded in conjunction with a clinical consultation with the general practitioner or practice nurse. GPs and nurses may be more interested in the separation between current non-smokers and smokers, thus the non-smoking categories may include some people who are never smokers as well as some who are ex-smokers in primary care records.
. In THIN we extracted smoking status data either using Read codes[30] which were classified into non-smoker, ex-smoker and smoker with clinical input, or we used the categorisation (non-smoker, ex-smoker or current smoker) provided in the Additional Health Data. In the HSE, smoking status was defined on the basis of a series of questions (see Appendix 1) and individuals who had ever smoked (but did not smoke at the time of the interview) would be defined as ex-smokers, regardless of their age at quitting and length of time since they quit. The HSE holds information on when ex-smokers quit so that age at the time they quit can be deduced, whereas this information was not consistently available in THIN. In the first multiple imputation we assumed that smoking data were MAR and hence allowed imputed smoking data of either smokers, non-smokers or ex-smokers (using a MAR assumption; hereafter referred to as MAR MI). In the second multiple imputation we assumed that all smokers had been recorded (so that smoking data were MNAR) and we imputed missing smoking data as either ex-smokers or non-smokers (hereafter referred to as MNAR MI).

Statistical analyses
Following multiple imputation we carried out age-specific direct standardisation using the HSE as the standard population and the age-specific proportion in each smoking category from THIN. This was done to account for the fact that the mean age in the HSE was 49 years while the mean age in THIN was 38 years in the year after registration.
We deduced the average time after which an ex-smoker is no longer classified as an ex-smoker in primary care records by combining information from the HSE on when ex-smokers quit and the age-specific distribution of ex-smokers in THIN after imputation of non and ex-smokers. This was done by ranking the individuals in the HSE in accordance to the length of time since they quit by 10 year age groups and then 'reclassifying' individuals who had quit the longest time ago within each age group from ex to non until we reached the same proportion of ex-smokers in the HSE as in THIN. By doing this, we were able to estimate Our first analyses used missing as a separate category of smoking, so we refer to those with reported smoking status as "known smokers" and "known exsmokers". The proportion of known smokers by age group was similar in THIN and the HSE between 30 and 79 years, but this was not the case for the proportions of known ex-smokers and non-smokers (Figure 1). In the HSE, the proportion of ex-smokers increased from 12% within the 20-29 age group to 46% in the 80-89 age group. In THIN, the proportion of known ex-smokers also increased with age although the overall proportion of known ex-smokers was smaller than in the HSE for all age groups after 20-29 years. Conversely, in the HSE, the proportion of non-smokers decreased slightly from 56% in the 20-29   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59

Analyses imputing missing smoking status
After MAR MI of THIN, age-standardised smoking prevalences still differed somewhat between THIN and the HSE. For example, 22% were ex-smokers in THIN compared with 26% in the HSE; 25% were smokers in THIN, compared with 21% in the HSE (Table 2).
After MNAR MI of THIN (that is, specifying that missing values are either exsmokers or non-smokers), the age-standardised prevalence of smoking in THIN was similar to that in the HSE (Table 2). However, the age-specific prevalence of ex-smokers was still greater in the HSE than in THIN. Age-specific analysis showed that this difference was greatest at older ages, and indeed reversed at younger ages. This suggested that individuals who had quit in the less recent past might be classified as non-smokers in THIN but as ex-smokers in HSE. (Figure 3).   ( Figure 3 here) The median time since ex-smokers quit in the HSE varied greatly by age group (Table 3) (Table 3). Equating proportions of ex-smokers in THIN to that in the HSE data suggested the typical time-window after which patients are no longer regarded as ex-smokers in primary care, but instead regarded as non-smokers, varied with age. Thus, typically individuals who registered with a general practice when they were in their forties would no longer be recorded as an ex-smoker if they quit more than 22 years earlier (when they were between 18 and 27 years of age) (Table 3). Individuals registering in their seventies would typically no longer be recorded as ex-smokers if they quit 42 years earlier (when they were between the ages of 28 and 37 years) (Table   3). Yet, most individuals who quit after the age of 30 would still be captured as  (Figure 3).   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  There may be several reasons for the discrepancy in the distribution of the smoking categories between THIN and the HSE. In the HSE, the definition of an ex-smoker was highly sensitive and clearly defined.
[29] Thus respondents were categorised as ex-smokers even if they were a trivial smoker, smoked for a short period of time and/ or quit many decades ago. Also, the HSE used computer aided personal interviewing; where questions were read to the respondent in a standardised way from the screen and a detailed sequence of questions were asked to ascertain current smoking status. In primary care, while smoking status is systematically recorded in medical records, there is no detailed protocol for recording smoking status and the ascertainment is thus likely to vary by how the information was obtained. Many practices use self-report questionnaires at where smoking status is often recorded as part of an assessment of current or future disease risk.
Our examination of the age-standardised data suggests that typically an exsmoker in primary care settings is recorded as a non-smoker when they quit at a young age or had not smoked for a substantial time period. This could be because the patient may not volunteer previous smoking in either initial selfreport questionnaire or on questioning by clinicians when it was minor, long ago or they consider it not relevant to their current or future health. It is possible that   The method of age standardisation then deducing the average time since quitting and reclassifying them to non-smokers in the HSE is relatively crude and assumes that everyone who becomes an ex-smoker does so at the same time in their lives as others in their age group. However, it is likely to be indicative of reporting of smoking status at the GP practice, given the results shown in this study.
An alternative method of dealing with unobserved smoking data is to dichotomise smoking status into current smokers and non-current smokers with missing data assumed to be non-current smokers. However, it should be noted that this solution may be to the detriment of some epidemiological studies where exsmokers who quit recently are at greater risk of disease than non-smokers. For example, the 50 year follow up of male British doctors shows that ex-smokers had elevated age standardised mortality rates for many diseases. [41,42] Our findings suggest that in contrast to health surveys, patients who quit smoking at a young age (before 25-30) are likely to be recorded by their general practice as a non-smoker instead of an ex-smoker. This has implications for researchers using these data sources. To our knowledge this is the first study which seeks to deduce and quantify typical time between when a smoker quit and when they are no longer perceived as an ex-smoker in primary care. Clinicians, policy-makers and researchers who wish to use smoking status in primary care records to  identify populations at risk of smoking-related diseases can be reassured by our findings that using data from new registrations, most current smokers will be identified and misclassification of ex-smokers is more likely to have occurred in those who have quit smoking at an early age and/ or a long time ago.

Author contributions
LM extracted and analysed the data and wrote the first draft of the paper with help from IP and JRC. KRW and IN provided clinical input and IRW and RWM provided additional statistical input. All authors commented on the paper and helped write subsequent drafts.

Data sharing statement
No additional data available.

Definition of smoking status
In THIN, smoking status was recorded by self-report. In many general practices this would be on the basis of a questionnaire submitted at the time of registration, whereas in other general practices this would be recorded in conjunction with a clinical consultation with the general practitioner or practice nurse. GPs and nurses may be more interested in the separation between current non-smokers and smokers, thus the non-smoking categories may include some people who are never smokers as well as some who are ex-smokers in primary care records.
Patients would be classed as current non-smoker, or current smokers. In some instance the non-smokers would be classified as ex-smokers but this was variably defined from one practice to another. In THIN we extracted smoking status data either using Read codes[30] which were classified into non-smoker, ex-smoker and smoker with clinical input, or we used the categorisation (nonsmoker, ex-smoker or current smoker) provided in the Additional Health Data. In the HSE, smoking status was defined on the basis of a series of questions (see Appendix 1) and individuals who had ever smoked (but did not smoke at the time of the interview) would be defined as ex-smokers, regardless of their age at quitting and length of time since they quit. The HSE holds information on when  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60   F  o  r  p  e  e  r  r  e  v  i  e  w  o  n  l  y 10 ex-smokers quit so that age at the time they quit can be deduced, whereas this information was not consistently available in THIN.
[3137] In the first multiple imputation we assumed that smoking data were MAR and hence allowed imputed smoking data of either smokers, non-smokers or ex-smokers (using a MAR assumption; hereafter referred to as MAR MI). In the second multiple imputation we assumed that all smokers had been recorded (so that smoking data were MNAR) and we imputed missing smoking data as either ex-smokers or non-smokers (hereafter referred to as MNAR MI).
Following multiple imputation we carried out age-specific direct standardisation using the HSE as the standard population and the age-specific proportion in each smoking category from THIN. This was done to account for the fact that the mean age in the HSE was 49 years while the mean age in THIN was 38 years in the year after registration.

BMJ Open
year age groups and then 'reclassifying' individuals who had quit the longest time ago within each age group from ex to non until we reached the same proportion of ex-smokers in the HSE as in THIN. By doing this, we were able to estimate the average time that elapses from quitting smoking after which true ex-smokers are recorded as non-smokers in primary care records. Our first analyses used missing as a separate category of smoking, so we refer to those with reported smoking status as "known smokers" and "known exsmokers". The proportion of known smokers by age group was similar in THIN and the HSE between 30 and 79 years, but this was not the case for the proportions of known ex-smokers and non-smokers (Figure 1). In the HSE, the proportion of ex-smokers increased from 12% within the 20-29 age group to 46% in the 80-89 age group. In THIN, the proportion of known ex-smokers also increased with age although the overall proportion of known ex-smokers was smaller than in the HSE for all age groups after 20-29 years. Conversely, in the HSE, the proportion of non-smokers decreased slightly from 56% in the 20-29

Analyses imputing missing smoking status
After MAR MI of THIN, age-standardised smoking prevalences still differed somewhat between THIN and the HSE. For example, 22% were ex-smokers in THIN compared with 26% in the HSE; 25% were smokers in THIN, compared with 21% in the HSE (Table 2).
After MNAR MI of THIN (that is, regarding specifying that missing values as are either ex-smokers or non-smokers), the age-standardised prevalence of smoking in THIN was similar to that in the HSE (Table 2). However, the age-specific prevalence of ex-smokers was still greater in the HSE than in THIN. Age-specific analysis showed that this difference was greatest at older ages, and indeed reversed at younger ages. This suggested that individuals who had quit in the less recent past might be classified as non-smokers in THIN but as ex-smokers in HSE. (Figure 3).  ( Figure 3 here) The median time since ex-smokers quit in the HSE varied greatly by age group (Table 3) (Table 3). Equating proportions of ex-smokers in THIN to that in the HSE data suggested the typical time-window after which patients are no longer regarded as ex-smokers in primary care, but instead regarded as non-smokers, varied with age. Thus, typically individuals who registered with a general practice when they were in their forties would no longer be recorded as an ex-smoker if they quit more than 22 years earlier (when they were between 18 and 27 years of age) (Table 3). Individuals registering in their seventies would typically no longer be recorded as ex-smokers if they quit 42 years earlier (when they were between the ages of 28 and 37 years) (Table   3). Yet, most individuals who quit after the age of 30 would still be captured as ex-smokers when they later registered with a new general practice. Using these age-specific extrapolations to reclassify ex-smokers as non-smokers in the HSE according to when they quit, we can see that the age-specific distributions of exsmokers in THIN and the reclassified HSE are similar (Figure 3).   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  There may be several reasons for the discrepancy in the distribution of the smoking categories between THIN and the HSE. In the HSE, the definition of an ex-smoker was highly sensitive and clearly defined.
[2429] Thus respondents were categorised as ex-smokers even if they were a trivial smoker, smoked for a short period of time and/ or quit many decades ago. Also, the HSE used computer aided personal interviewing; where questions were read to the respondent in a standardised way from the screen and a detailed sequence of questions were asked to ascertain current smoking status. In primary care, while smoking status is systematically recorded in medical records, there is no detailed protocol for recording smoking status and the ascertainment is thus likely to vary by how the information was obtained.