Objectives External validity, or generalisability, is the measure of how well results from a study pertain to individuals in the target population. We assessed generalisability, with respect to socioeconomic status, of estimates from a matched case–control study of 13-valent pneumococcal conjugate vaccine effectiveness for the prevention of invasive pneumococcal disease in children in the USA.
Design Matched case–control study.
Setting Thirteen active surveillance sites for invasive pneumococcal disease in the USA.
Participants Cases were identified from active surveillance and controls were age and zip code matched.
Outcome measures Socioeconomic status was assessed at the individual level via parent interview (for enrolled individuals only) and birth certificate data (for both enrolled and unenrolled individuals) and at the neighbourhood level by geocoding to the census tract (for both enrolled and unenrolled individuals). Prediction models were used to determine if socioeconomic status was associated with enrolment.
Results We enrolled 54.6% of 1211 eligible cases and found a trend toward enrolled cases being more affluent than unenrolled cases. Enrolled cases were slightly more likely to have private insurance at birth (p=0.08) and have mothers with at least some college education (p<0.01). Enrolled cases also tended to come from more affluent census tracts. Despite these differences, our best predictive model for enrolment yielded a concordance statistic of only 0.703, indicating mediocre predictive value. Variables retained in the final model were assessed for effect measure modification, and none were found to be significant modifiers of vaccine effectiveness.
Conclusions We conclude that although enrolled cases are somewhat more affluent than unenrolled cases, our estimates are externally valid with respect to socioeconomic status. Our analysis provides evidence that this study design can yield valid estimates and the assessing generalisability of observational data is feasible, even when unenrolled individuals cannot be contacted.
- socioeconomic status
- vaccine effectiveness
- external validity
- matched case control
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
- socioeconomic status
- vaccine effectiveness
- external validity
- matched case control
Strengths and limitations of this study
Strengths included population-based surveillance data and multiple data sources for estimating socioeconomic status (ie, case report form, parent interview and geocoding) at multiple levels (individual and neighbourhood).
This was the first study to look at generalisability with respect to socioeconomic status in a vaccine effectiveness study.
Weaknesses included no access to parent interview variables for unenrolled cases, the study potentially lacked sufficient power to identify effect measure modification, and no nationwide surveillance to compare results.
External validity, or generalisability, is the measure of how well results from a study pertain to individuals in the target population.1–3 Studies may fail to be generalisable to a target population when both of the following criteria are met: (1) selection/enrolment into the study is differential with respect to variable X (ie, the study population is not representative of the source population with respect to variable X, which is usually a potential confounder) and (2) variable X modifies the exposure–outcome relationship under study.1 4
Increased external validity can be a major benefit of observational studies versus randomised controlled trials (RCTs).2–9 Unlike in RCTs, where strict exclusion criteria are often used, observational studies usually enrol a broader subset of the population.3 This may be especially true in case–control studies, in which researchers commonly attempt to enrol every case of disease in a given surveillance population (eg, geographic area or hospital system).10 With the assumption that the study source population (the surveillance area) does not vary in a meaningful way from the target population (eg, an entire country), then results from observational studies may be more generalisable than those from RCTs.10
Unfortunately, due to issues related to tracking, contacting and enrolling participants, generalisability in observational studies may not be achieved, even when a priori exclusion criteria are minimal. In particular, the study population may differ from the source population by socioeconomic status (SES), which is frequently associated with characteristics that may affect enrolment probability.11 For example, less-affluent individuals may lack a landline or frequently switch cell phone numbers, making them harder to reach; they may hold multiple jobs or have long commutes, making them less likely to answer their phone; they may have diminished use of or trust in the medical system, making them less likely to agree to enrolment; they may seek care at underfunded (eg, public) hospitals, which lack the resources to enrol participants.12–17 Alternatively, lower-SES individuals may be more likely to join a study that provides a monetary incentive for participating versus those of higher SES for whom the monetary incentive is lower.
While not always a problem, lack of representativeness by SES may reduce external validity if SES modifies the effect of the exposure–outcome relationship under study.18–20 One way to explore generalisability as it relates to SES is to evaluate differences in SES indicators between those selected into a study and those who were eligible, but not selected. Although some individual SES constructs, such as use of or trust in the medical system, may not be measured accurately, others, such as educational attainment, are more readily available. Additionally, other characteristics, such as recent immigration, insurance status and utilisation of prenatal care, can serve as proxies for harder-to-measure SES constructs, providing some context for the demographic characteristics of the underlying source and participant populations.
Extensive SES information is usually not available for individuals who are not enrolled in a study. We present an analysis of the external validity of estimates from a study with access to relatively detailed information on unenrolled cases, providing a rare opportunity to assess differences in key SES variables between enrolled and unenrolled cases. For variables with substantial differences, we assessed effect measure modification (EMM), using data from a large matched case–control study of 13-valent pneumococcal conjugate vaccine (PCV13).21
Parent study and study population
PCV13 was introduced in the USA in 2010 as part of the routine childhood immunisation schedule.22
After licensure, the Centers for Disease Control and Prevention (CDC) conducted a vaccine effectiveness (VE) evaluation in children aged 2–59 months with culture dates between 1 May 2010 and 31 May 2014 who were identified through the Active Bacterial Core surveillance (ABCs) system or one of three additional sites with similar surveillance procedures. This parent study has been previously published and is summarised in the online supplementary material—Methods for the current study.21 In addition to the exclusion criteria for the parent study, we excluded cases from two ABCs sites, Colorado and Maryland, because the necessary SES data were not available from those sites. The total annual surveillance population under 5 years of age was 4 249 724.
Supplementary file 1
In the current study, almost 45% of cases were not enrolled, leading to concerns about representativeness with respect to SES (table 2). Further, past studies of invasive pneumococcal disease (IPD) and pneumococcal vaccines have provided some evidence for differential risk of IPD at different levels of individual SES, potentially due to the association of lower SES with IPD risk factors (eg, existing chronic or immunocompromising conditions, smoking exposure, household crowding).
Enrolment procedures have been previously published.21 Briefly, cases were identified via routine surveillance and enrolled over the phone. Lists of eligible controls were obtained from local birth registries for children born in the same zip code in which the case resided when diagnosed with IPD and born within 14 days of the case. Detailed information on eligible and enrolled controls has been previously published.23 Vaccine providers for cases and controls were contacted and state or city immunisation registries were reviewed to obtain vaccine and medical histories. Additional details on enrolment procedures have been previously published and are summarised in the online supplementary material for the present study.21 Both the parent study and the current analysis were approved by institutional review boards (IRBs) at CDC and surveillance sites. The current analysis was also approved by the IRB of the University of North Carolina-Chapel Hill.
Surveillance staff completed a standard case report form (CRF) for every IPD case identified through ABCs.24 We included the following variables from the CRF (for all cases, whether or not enrolled in the parent study, table 1): hospitalisation status, length of hospitalisation and intensive care unit admission, outcome (survived/died), presence of an underlying condition that is a known risk factor for IPD and insurance status at IPD culture.22 24 Hospitalisation and ICU admission were combined into a three-level severity index: not hospitalised, hospitalised but not admitted to ICU and admitted to ICU.
Individual-level SES indicators
For cases and controls (regardless of enrolment status, table 1), study staff obtained variables from the birth certificate,25 including: mother’s race, ethnicity, education, source of payment for the birth and timing/initiation of prenatal care and gestational age, which were used to calculate the Adequacy of Prenatal Care Utilization (APNCU) Index.26 27 APNCU is a four-category variable that summarises how early in the pregnancy a woman initiated prenatal care and how many total prenatal care visits the woman received compared with how many are recommended by the American College of Obstetrics and Gynecology for a pregnancy of a given length.26 27 Insurance status and APNCU Index were analysed as measures of the individual’s access to and use of the medical system.11
Neighbourhood-level SES indicators
All enrolled and unenrolled children were geocoded, using a standard geocoding protocol (available from the authors on request). For children whose locations could not be confirmed, the most recent known address (from the birth certificate, state or city immunisation registry or medical chart) was used. Census tract data were merged with the 2013 5-year estimates from the American Community Survey (ACS), an annual survey conducted by the US Census Bureau, which includes demographic and SES indicators. These census-tract-level estimates can be a useful tool for understanding the context in which a child lives by complementing individual-level SES indicators.28 29
Importantly, in matched case–control studies, controls need not be representative of the source population with respect to the matching factor(s). In fact, if matching is done correctly, controls should resemble enrolled cases on the matching factor. This analysis, therefore, focuses on representativeness of enrolled cases with respect to eligible cases and leaves questions concerning controls for other analyses.23
Exploratory analyses compared differences in individual-level characteristics and ACS variables between enrolled and unenrolled cases. We also used the ACS variables to calculate a composite index of SES, based on the Socioeconomic Position Index (SEP Index) created by Krieger et al for the Public Health Disparities Geocoding Project.30 The SEP Index measures the major SES constructs of wealth, education and occupation at the neighbourhood level. Specifically, it includes: working class, unemployment, poverty, education, home prices and median family income (with home prices and median income reversed, so a higher SEP score indicates lower SES).31 p Values were calculated for continuous and categorical predictors via two-sided Wilcoxon rank-sum test and Fisher’s exact test, respectively.
Because we had access to many SES-related variables through the CRF, birth certificates and ACS, it was not practical to assess every variable for EMM. Instead, we chose variables to assess for EMM in two ways. First, we decided a priori to assess as modifiers the SEP Index and the following individual-level variables that had p values <0.2 in the exploratory analyses for differences between enrolled and unenrolled: maternal education, insurance status at birth and APNCU. Second, to ensure we did not miss any variables for the EMM assessment that may be strong modifiers and related to representativeness, we used predictive modelling to select additional ACS variables to assess for EMM. Many of the available ACS variables measured similar metrics (both to other ACS variables and to the individual variables); therefore, we fit a series of models to narrow the selection. All models fit were logistic regression, using backward selection and retaining variables with a p value of <0.2.
We first fit a single predictive model including all individual-level variables (ie, CRF and birth certificate variables), requiring the model to retain the following CRF variables: severity, outcome, underlying condition status, child’s race/ethnicity and insurance status at IPD culture. This model also included interaction terms between child’s race/ethnicity and maternal education and between child’s race/ethnicity and APNCU. We divided the ACS variables into four categories—demographics, wealth, work and household characteristics—for inclusion in separate models to reduce the number of ACS variables used. Four models were run, including all individual-level variables retained from the first model and each category of the ACS variables. Lastly, we fit the final predictive model, which included all individual-level variables and interaction terms (regardless of whether they were retained in the first model) and any variables retained in the ACS models. As with the first model, we required the model to retain the CRF variables.
Any variables included in the final model predicting enrolment were assessed as potential modifiers, in addition to the SEP Index and individual-level variables chosen a priori. All the individual-level variables were already categorical. Variables not applicable to and/or not available for controls (ie, all CRF variables) were excluded. Child’s race/ethnicity, underlying condition status and insurance status at IPD were collected as part of the parent interview/provider follow-up for enrolled children and could therefore be assessed for EMM. The ACS variables were continuous, so we dichotomised them at the median for the EMM analysis (online supplementary table). To assess variables as modifiers, we fit conditional logistic regression models for each variable with IPD caused by 1 of the 13 serotypes included in the vaccine as the outcome, receipt of one or more doses of PCV13 as the exposure and an interaction term between each variable of interested and PCV13 receipt.21 Because power to assess interaction terms is reduced, we used a p value <0.2 as our cut-off for the likelihood ratio χ2 value for the interaction term and then did a Bonferroni adjustment by the total number of variables assessed for EMM to account for multiple comparisons. Therefore, α<0.015 for the likelihood ratio test was considered an indication of modification and cause for concern about generalisability.
Supplementary file 2
The annual population of children under age 5 in the catchment area for the parent study was approximately 4.7 million, which was reduced to 4.2 million with the exclusion of Colorado and Maryland. We identified 1214 cases of IPD in children in the catchment area for this analysis. Three children were initially miscategorised as ineligible by surveillance personnel and enrolment was not attempted, leaving 1211 cases in our analysis, 661 (54.6%) of whom were enrolled (table 2). Of the 550 cases not enrolled, 194 (35.3%) could not be located/contacted, 177 (32.2%) refused and 158 (28.7%) did not have a pneumococcal isolate available for serotyping. The remaining 21 children were not enrolled for other reasons.
Differences between enrolled and unenrolled cases
Enrolled and unenrolled cases were of similar age (median age: 21 and 22 months, respectively) and race/ethnicity (42% and 43% white, non-Hispanic, respectively, p=0.41; table 3a). Enrolled children were slightly more likely to survive their illness (98.8% vs 97.0%, p=0.04).
Mothers of enrolled and unenrolled children had a similar racial/ethnic distribution (table 3b). Maternal education was somewhat different between the two groups, with 44.1% of mothers of enrolled children having no college education compared with 53.9% of mothers of unenrolled children (p<0.01).
The SEP Index was similar in enrolled and unenrolled cases (p=0.07; table 3c). A number of characteristics differed between the two groups, including proportion with an income above US$60 000, proportion disabled and proportion of households occupied (p<0.01 for all).
Enrolment prediction model
In the individual-level model, three variables—maternal education, APNCU and the interaction between race/ethnicity and maternal education—were retained in addition to the five variables that we required (table 4). The model yielded a concordance (c-) statistic of 0.639, indicating only marginal predictive ability (0.5 is equal to a coin flip).32 Retaining these eight variables and adding demographic, wealth, work and household variables from the ACS yielded similarly low c- statistics of 0.632, 0.652, 0.643 and 0.655, respectively. The final predictive model included IPD severity, outcome, underlying condition status, child’s race/ethnicity, insurance status at IPD culture, maternal education, APNCU, race/ethnicity and maternal education interaction, as well as the following census tract variables: per cent of tract of white race, earning <US$30 000, earning ≥US$60 000, disabled, working class, occupied households and mean hours worked. This model had a concordance statistic of 0.703, indicating only slightly better predictive ability than the single ACS category models.
Effect measure modification
None of the individual-level variables retained in the prediction model were found to be significant modifiers of the effect of PCV13 receipt on IPD caused by 1 of the 13 serotypes in the vaccine (figure 1A). Likewise, none of the ACS variables (including SEP Index) assessed as modifiers had a p value <0.015 for the likelihood ratio test (figure 1B).
We assessed the external validity of VE estimates from a postlicensure evaluation of PCV13 in children in the USA. Despite a small trend toward enrolled cases being more affluent than unenrolled cases, the differences were minimal and most did not meet a priori definitions of significance. Additionally, we did not find EMM of VE by any SES variables and therefore conclude that lack of generalisability to the broader source population is of minimal concern in the current study. Our results provide some evidence that study designs based on population-based surveillance systems can be generalisable, even with relatively low case enrolment (54.6% in the current study)—in particular, when researchers conduct extensive investigations to obtain contact information and provide incentives to encourage enrolment.
Generalisability can be a major benefit of observational vaccine studies over RCTs8 33; however, it is rare for reports on either study type to provide extensive information on how their study population differs from their source or target population.1 34 We used a novel approach to assessing generalisability with respect to SES, exploring a mix of variables chosen a priori and through predictive modelling. We had access to more detailed information on unenrolled cases than is typical; however, the data we assessed were collected without patient contact and therefore could be gathered more routinely in observational studies. Our analysis shows that assessing and reporting on generalisability may be feasible even when it is not possible to interview unenrolled cases.
Postlicensure VE studies are common tools to assess the ‘real-world’ effectiveness of vaccines, and as with similar studies, cases were identified from a disease surveillance system.35–38 Generalisability of results with respect to SES from vaccine studies is an important component of assessing the overall quality of a study, as well as the utility of results beyond the study population. Thus, although we found little cause for concern in the current analysis, this conclusion may not be applicable for vaccines for diseases, such as rotavirus, which are spread differently and therefore may have different risk factors and potential modifiers. Likewise, if case identification or enrolment methods are different, representativeness to the source population could be lacking. Since SES has been identified as a significant risk factor for many vaccine-preventable diseases, other postlicensure vaccine studies may find more substantial cause for concern. However, careful selection of case and control populations, regardless of the disease under study, can yield results, such as these, which are generalisable to the underlying source population.
Our study had some limitations. We did not have access to variables collected during the parent interview (eg, household income) for unenrolled cases and so could not explore differences between enrolled and unenrolled cases for these variables. We may have also lacked sufficient power to identify EMM in our study. For example, the confidence limits for VE among those without insurance were extremely wide. Likewise, a VE study of the 7-valent pneumococcal conjugate vaccine (a more limited valency, but otherwise identical, vaccine) in a similar population to the current analysis found clear evidence of EMM by underlying condition status.37 We did not identify EMM by underlying condition status but only had two matched sets with underlying conditions and discordant vaccination status (required to contribute to a conditional analysis) in our study. Additionally, we were not able to compare cases eligible for our study (ie, the source population) with cases throughout the USA (ie, the target population) because active, population-based surveillance for IPD does not exist nationwide.
We identified minimal external validity concerns in a PCV13 VE study in children in the USA. Nevertheless, future VE studies should take care to enrol as broad a subset of cases as possible, especially focusing on children from lower-SES areas. Additionally, every effort should be made to gather at least basic SES information so that study populations can be compared with source populations and estimates can be understood in context.
We acknowledge the following individuals for their contributions to the establishment and maintenance of the ABCs system and expanded surveillance areas. California Emerging Infections Program: Susan Brooks, Hallie Randel. Colorado Emerging Infections Program: Benjamin White, Deborah Aragon, Jennifer Sadlowski. Connecticut Emerging Infections Program: Matt Cartter, Carmen Marquez, Michelle Wilson. Georgia Emerging Infections Program: Sasha Harb, Nicole Romero, Stephanie Thomas, Amy Tunali, Wendy Baughman. Maryland Emerging Infections Program: Joanne Benton, Terresa Carter, Rosemary Hollick, Kim Holmes, Andrea Riner, Kathleen Shutt, Catherine Williams. Minnesota Emerging Infections Program: David Boxrud, Larry Carroll, Kathy Como-Sabetti, Richard Danila, Ginny Dobbins, Liz Horn, Catherine Lexau, Kerry MacInnes, Megan Sukalski, Billie Juni. New Mexico Emerging Infections Program: Kathy Angeles, Lisa Butler, Sarah Khanlian, Robert Mansmann, Megin Nichols, Sarah Shrum. New York Emerging Infections Program: Suzanne McGuire, Sal Currenti, Eva Pradhan, Jessica Nadeau, Rachel Wester, Kathryn Woodworth. Oregon Emerging Infections Program: Mark Schmidt, Jamie Thompson, Tasha Poissant, Keenan Williamson. Tennessee Emerging Infections Program: Brenda Barnes, Karen Leib, Katie Dyer, Lura McKnight, Tiffanie Markus. Los Angeles Epidemiology and Laboratory Capacity Site: Christine Benjamin, Ramon Guevara, Nicole Green, Anali Gutierrez, David Jensen, Annelise Lupica, Christine Wigen. New York City Department of Health and Mental Hygiene: Sarah Borderud, Katherine Lawrence, Orin Forde, Andrea Farnham, Ifeoma Ezeoke. Utah Epidemiology and Laboratory Capacity Site: Jonathan Anderson, Susan Mottice, Kristina Russell and Amanda Whipple. CDC: Tamara Pilishvili, Ryan Gierke, Karrie-Ann Toews, Emily Weston, Londell McGlone, Gayle Langley, Bernard Beall, Delois Jackson, Joy Rivers, Logan Sherwood, Hollis Walker.
Contributors RL-G and MRM conceived the original idea and ran the parent study. DW, AEA, NS, DJW and CGW provided comments and guidance to refine the study objectives and methods. JBR, TM, LM, JE, KS, CH, AR, MB, SP, MMF, LHH, SZ, AT and WS served as the primary investigators and/or surveillance officers for the study sites, providing input on the original study objectives, analysis and manuscript. LM led the laboratory analysis. All authors reviewed the manuscript and provided input.
Funding This work was supported by the CDC.
Disclaimer The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the CDC.
Competing interests CH reports grants from the CDC during the conduct of the study. WS reports personal fees from Merck, Pfizer, the Cleveland Clinic and Novavax outside the submitted work.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement Unpublished data are available to outside researchers after submission of a research proposal and approval by site principal investigators.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.