Assessing and predicting adolescent and early adulthood common mental disorders using electronic primary care data: analysis of a prospective cohort study (ALSPAC) in Southwest England

Objectives We aimed to examine agreement between common mental disorders (CMDs) from primary care records and repeated CMD questionnaire data from ALSPAC (the Avon Longitudinal Study of Parents and Children) over adolescence and young adulthood, explore factors affecting CMD identification in primary care records, and construct models predicting ALSPAC-derived CMDs using only primary care data. Design and setting Prospective cohort study (ALSPAC) in Southwest England with linkage to electronic primary care records. Participants Primary care records were extracted for 11 807 participants (80% of 14 731 eligible). Between 31% (3633; age 15/16) and 11% (1298; age 21/22) of participants had both primary care and ALSPAC CMD data. Outcome measures ALSPAC outcome measures were diagnoses of suspected depression and/or CMDs. Primary care outcome measure were Read codes for diagnosis, symptoms and treatment of depression/CMDs. For each time point, sensitivities and specificities for primary care CMD diagnoses were calculated for predicting ALSPAC-derived measures of CMDs, and the factors associated with identification of primary care-based CMDs in those with suspected ALSPAC-derived CMDs explored. Lasso (least absolute selection and shrinkage operator) models were used at each time point to predict ALSPAC-derived CMDs using only primary care data, with internal validation by randomly splitting data into 60% training and 40% validation samples. Results Sensitivities for primary care diagnoses were low for CMDs (range: 3.5%–19.1%) and depression (range: 1.6%–34.0%), while specificities were high (nearly all >95%). The strongest predictors of identification in the primary care data for those with ALSPAC-derived CMDs were symptom severity indices. The lasso models had relatively low prediction rates, especially in the validation sample (deviance ratio range: −1.3 to 12.6%), but improved with age. Conclusions Primary care data underestimate CMDs compared to population-based studies. Improving general practitioner identification, and using free-text or secondary care data, is needed to improve the accuracy of models using clinical data.

: Reasons for having ALSPAC clinic/questionnaire common mental disorder (CMD) data, but not primary care linkage data. Note also that the numbers excluded do not always correspond to the numbers with ALSPAC, but not primary care, data; this is because some individuals may have been counted both in the 'lost from records' and 'first entered records' columns (i.e., they first have primary care data within 18 months of the time-point, then were lost from primary care records again within 6 months after the time-point), or have missing data. Numbers in brackets denote the percentage of individuals in each exclusion category of the total participants without primary care data but with ALSPAC data. Most individuals are likely to be lost from primary care records because they moved out of the Bristol area, but it is also possible that they stayed within the Bristol area but moved to a GP practice using a different software system from which it was not possible to extract primary care data. Similarly, individuals may first appear in the primary care data after the timepoint because they either moved into the Bristol area at this time, or stayed in the Bristol area but moved to a GP practice from which it was possible to extract primary care data.   Table S6: Comparing ALSPAC participants who possess primary care data at each time-point, split by whether they attended/completed each specific data collection event. The numbers with and without ALSPAC data are provided, as are differences in rates of current depression or common mental disorder (CMD) diagnoses from the primary care records. For participants who did not attend the clinic/complete the questionnaire, the age to define a 'current' diagnosis was based on +/-6 months from the average age each clinic/questionnaire was completed. Individuals who have GP data and completed the clinic/questionnaire, but do not have ALSPAC-derived depression/CMD data (as this session was not completed for whatever reason), are not included in the table below. The number of these individuals at each time point are: 104 at the age 15/16 TF3 clinic (2.8% of those with both ALSPAC and primary care data); 82 at the age 16/17 CCS questionnaire (2.5% of those with both ALSPAC and primary care data); 437 at the age 17/18 TF4 clinic (12.4% of those with both ALSPAC and primary care data); 20 at the age 18/19 CCT questionnaire (1% of those with both ALSPAC and primary care data); 63 at the age 21/22 YPA questionnaire (4.6% of those with both ALSPAC and primary care data); 43 at the age 22/23 YPB questionnaire (3.1% of those with both ALSPAC and primary care data). For a graphical summary of these results, see figure 1.  Table S7: Raw data comparing depression and common mental disorder (CMD) diagnoses based on the Development and Well-Being Assessment (DAWBA) data from the age 15/16 TF3 clinic against various definitions derived from the primary care data at this age (n=3,663). Note that values with an asterisk have been suppressed for disclosure control purposes as at least one cell has a value < 5. This table also includes sensitivities, specificities, positive predictive values (PPV) and negative predictive values (NPV) for the depression and CMD diagnoses based on the DAWBA data from this clinic. In these analyses we are treating the ALSPAC data as the reference standard.
DAWBA      Table S12: Results of the identification in primary care records analysis, based on whether individuals who were diagnosed have having depression or common mental disorders (CMDs) in ALSPAC were also diagnosed based on primary care record data (with primary care diagnosis defined as 'current diagnosis or treatment or symptoms'). Odds ratios are displayed (with 95% confidence intervals). Due to the small sample sizes in some analyses these estimates are rather imprecise, especially regarding the age 15/16 TF3 clinic as very few individuals were classified correctly/diagnosed as depressed/CMD in primary care records. Coefficients are odds ratios derived from univariable logistic regressions and denote the odds of identification relative to the baseline (e.g., for age 17/18 TF4 clinic depression, females have three times greater odds of being identified than males). Note also that when comparing against ALSPAC data (e.g., mother's marital status, parental education, etc.) the sample size for each analysis will vary as the variables come from different data sources, with different levels of completeness. For a graphical illustration of key results, see figure S1.

Variable
Age 15 Figure S1 (following page): Graphical summary of key results of the identification in primary care records analysis, based on whether individuals who were diagnosed have having depression or common mental disorders (CMDs) in ALSPAC were also diagnosed based on primary care record data (with primary care diagnosis defined as 'current diagnosis or treatment or symptoms'). Values for depression are displayed in black, and common mental disorders are in red. Odds ratios and 95% confidence intervals are displayed on the y-axis (on the log scale), with time point along the x-axis (going forwards in time, from the age 15/16 TF3 clinic to the age 22/23 YPB questionnaire. Due to the small sample sizes in some analyses these estimates are rather imprecise, especially regarding the 15/16 clinic as very few individuals were classified correctly/diagnosed as depressed/CMD in primary care records. Coefficients are odds ratios derived from univariable logistic regressions and denote the odds of identification relative to the baseline (e.g., for age 17/18 TF4 clinic depression, females have three times greater odds of being identified than males). Note also that when comparing against ALSPAC data (e.g., mother's marital status, parental education, etc.) the sample size for each analysis will vary as the variables come from different data sources, with different levels of completeness. For a full list of results, see table S11.  Table S13: Penalised coefficients from the optimal lasso prediction models, predicting ALSPAC questionnaire-derived depression or common mental disorder (CMD) cases using only linked primary care record data. Coefficients are log-odds estimates. Note that as the estimates are derived from lasso models used for model prediction, standard errors are not calculated.
Age 15 Table S14: Full models to estimate the predicted probability of ALSPAC questionnaire-derived depression or common mental disorders (CMDs) at each timepoint, based on the best-fitting lasso model using only linked primary care record data as predictors. Code is provided in Stata syntax.
Time   Table S15: In-sample and out-of-sample deviance ratios for common mental disorders (CMDs) or depression at each time-point. Deviance ratios are taken from logistic cross-validation lasso models. The full models are based on the set of all the primary care variables described in table S3. The 'diagnosis only' models contain just the relevant 'current' diagnosis variables (for depression models, this is just current depression diagnosis; for CMD models, this is current depression, anxiety and phobia diagnosis). The 'diagnosis, symptoms and treatment' models contain the relevant 'current' diagnosis, symptoms or treatment variables (for depression models, this is current depression diagnosis, current depressive symptoms and current antidepressant use; for CMD models, this is current depression, anxiety and phobia diagnosis, current depressive and anxiety symptoms, and current antidepressant and anti-anxiety medications).  Table S16: Sensitivities and specificities from the 40% validation sample, assessing how well the lasso model predicts common mental disorders (CMDs) or depression in ALSPAC across each of the timepoints. Sensitivities and specificities are also given for the three 'CMD/depression' definitions using only primary care diagnosis, symptoms and/or treatment data (all using the same 40% validation sample). In these analyses we are treating the ALSPAC data as the reference standard. Note that for disclosure control purposes, statistics calculated where at least one cell in the cross-tabulation has a value <5 have been suppressed and replaced with a '<' or '>' summary statistic.  Table S17: Examining the association between primary care depression and common mental disorder (CMD) diagnoses and having ALSPAC data at each time point. For each time point, we present three sets of analyses: i) unadjusted models exploring the univariable associations between having ALSPAC data and both depression and CMDs; ii) models adjusting for sex; and iii) models adjusting for both sex and maternal education (a proxy for SEP; operationalised as a binary variable with the categories 'CSE/Vocational/O level' vs 'A level/Degree'). All results are odds ratios with 95% confidence intervals in brackets.