Evidence for sample selection effect and Hawthorne effect in behavioural HIV prevention trial among young women in a rural South African community

Objectives We examined the potential influence of both sample selection effects and Hawthorne effects in the behavioural HIV Prevention Trial Network 068 study, designed to examine whether cash transfers conditional on school attendance reduce HIV acquisition in young South African women. We explored whether school enrolment among study participants differed from the underlying population, and whether differences existed at baseline (sample selection effect) or arose during study participation (Hawthorne effect). Methods We constructed a cohort of 3889 young women aged 11–20 years using data from the Agincourt Health and socio-Demographic Surveillance System. We compared school enrolment in 2011 (trial start) and 2015 (trial end) between those who did (n=1720) and did not (n=2169) enrol in the trial. To isolate the Hawthorne effect, we restricted the cohort to those enrolled in school in 2011. Results In 2011, trial participants were already more likely to be enrolled in school (99%) compared with non-participants (93%). However, this association was attenuated with covariate adjustment (adjusted risk difference (aRD) (95% CI): 2.9 (− 0.7 to 6.5)). Restricting to those enrolled in school in 2011, trial participants were also more likely to be enrolled in school in 2015 (aRD (95% CI): 4.9 (1.5 to 8.3)). The strength of associations increased with age. Conclusions Trial participants across both study arms were more likely to be enrolled in school than non-participants. Our findings suggest that both sample selection and Hawthorne effects may have diminished the differences in school enrolment between study arms, a plausible explanation for the null trial findings. The Hawthorne-specific findings generate hypotheses for how to structure school retention interventions to prevent HIV.

Objectives We examined the potential influence of both sample selection effects and Hawthorne effects in the behavioural HIV Prevention Trial Network 068 study, designed to examine whether cash transfers conditional on school attendance reduce HIV acquisition in young South African women. We explored whether school enrolment among study participants differed from the underlying population, and whether differences existed at baseline (sample selection effect) or arose during study participation (Hawthorne effect). Methods We constructed a cohort of 3889 young women aged 11-20 years using data from the Agincourt Health and socio-Demographic Surveillance System. We compared school enrolment in 2011 (trial start) and 2015 (trial end) between those who did (n=1720) and did not (n=2169) enrol in the trial. To isolate the Hawthorne effect, we restricted the cohort to those enrolled in school in 2011. results In 2011, trial participants were already more likely to be enrolled in school (99%) compared with non-participants (93%). However, this association was attenuated with covariate adjustment (adjusted risk difference (aRD) (95% CI): 2.9 (− 0.7 to 6.5)). Restricting to those enrolled in school in 2011, trial participants were also more likely to be enrolled in school in 2015 (aRD (95% CI): 4.9 (1.5 to 8.3)). The strength of associations increased with age. Conclusions Trial participants across both study arms were more likely to be enrolled in school than nonparticipants. Our findings suggest that both sample selection and Hawthorne effects may have diminished the differences in school enrolment between study arms, a plausible explanation for the null trial findings. The Hawthorne-specific findings generate hypotheses for how to structure school retention interventions to prevent HIV.

IntrOduCtIOn
The evidence base for public health interventions largely comes from rigorous epidemiological studies. 1 2 However, results from epidemiological studies may not be generalisable (or 'externally valid') when study participant characteristics differ from those in the target population, even with randomisation of exposure (referred to here as 'sample selection effect'). 3 Further threats to validity can occur if study participation itself induces behaviour change (Hawthorne effect, research participation effect or trial effect, referred to here, collectively, as 'Hawthorne effect'). 4 5 Analysing study data to examine how results may have differed in the target population to which we would like to make inference is critical to making valid conclusions and policy recommendations.
Although epidemiological training and research have long included at least cursory examinations of external validity, 6 with more recent methodological advancements around transportability of effect estimates from study strengths and limitations of this study ► To our knowledge, this study is the first to empirically examine whether Hawthorne effects may have influenced study results in an HIV prevention trial. ► We analysed longitudinal data on a key study outcome (school enrolment) for the underlying population from which study participants were drawn. Complete data are not typically available for source populations in research studies. ► Our Hawthorne-specific findings suggest that aspects of the HIV Prevention Trial Network (HPTN) 068 protocol could potentially be adapted for school retention interventions to prevent HIV. ► It is important to note that data on HIV incidence, the primary endpoint of HPTN 068, were not available for the underlying target population. ► The differences we attribute to the Hawthorne effect were estimated in an observational data set with adjustment for key sociodemographic characteristics. The potential for uncontrolled confounding requires that our results be interpreted cautiously.
Open Access populations to target populations, 7-9 empirical evaluation of Hawthorne effects is rare. The limited evidence for Hawthorne effects comes largely from clinical randomised controlled trials, most often assessed in cancer and nutrition studies, 10 11 with supporting evidence from HIV treatment research. 12 Trials designed to affect behaviour change may be particularly susceptible to Hawthorne effects as the behaviours in question may be influenced by trial participation. 13 14 This is particularly true in HIV prevention research, where Hawthorne effects could pose validity threats if unexpectedly low HIV incidence occurs due to trial-induced risk behaviour changes. 15 To our knowledge, no prior HIV prevention trial has empirically examined whether Hawthorne effects influenced study results.
In this study, we examine the potential influence of both sample selection effects and Hawthorne effects in the behavioural HIV Prevention Trial Network (HPTN) 068 study, 16 17 designed to examine whether cash transfers conditional on school attendance reduce HIV acquisition in young South African women. Contrary to the study hypothesis, no difference in HIV acquisition was observed between study arms, with high levels of school enrolment and low HIV incidence in both arms. These findings were surprising given the high background rates of school dropout in the study area [18][19][20] and the large body of evidence showing the positive impact of cash transfers on schooling outcomes, 21 and limited the ability to explore schooling as a mechanism to reduce HIV risk.
Here, we contextualise HPTN 068 findings, using data on school enrolment in the underlying target population routinely collected by the Agincourt Health and socio-Demographic Surveillance System (HDSS) in which HPTN 068 was nested. We examine whether school enrolment trajectories of trial participants differed from non-participants, and whether differences could be attributed to existing differences in school enrolment at baseline (sample selection effect) or differences that arose during study participation (Hawthorne effect).

MethOds study setting and population
The Agincourt HDSS is located in the rural Bushbuckridge municipality of the Mpumalanga province, South Africa, 22 and has routinely collected annual vital event data on all people living in the study area since 1992.
Other sociodemographic data are collected at regular but less frequent intervals. For example, educational attainment is queried every 3 years, employment data are collected every 4 years, and a household asset index is measured every other year. Community, household and individual consents have been obtained for all Agincourt HDSS census research since its inception, with informed verbal consent obtained at each census round. The Agincourt HDSS currently surveys the full cohort of over 115 000 people living across 31 villages, in an area of economic disadvantage with historically low access to public services. However, government schools in the study site are free and often provide feeding programmes. HIV contributes a large burden to the community, with 19% HIV prevalence overall in those aged 15 years and older. 23 HPTN 068 was a phase III individually randomised trial designed to examine whether cash transfers conditional on school attendance influence the risk of HIV acquisition in young women. 16 17 Young women and their caregivers were randomly assigned to receive a monthly cash transfer conditional on ≥80% school attendance or no cash transfer. The size of the monthly cash transfer was 300 rands (R; about US$30 in 2012), and was divided into R200 provided to the caregiver and R100 provided to the young woman. Key selection criteria for participation in the study were current enrolment in grades 8-11; age 13-20 years; not married or pregnant at baseline; and having a caregiver with the documents necessary to open a bank account. Age-eligible young women were identified from Agincourt HDSS records to be contacted for further eligibility screening (n=10 134). 16 Between March 2011 and December 2012, a total of 2533 young women were enrolled. Participants were seen annually for a maximum of 3 years from enrolment or until high school graduation. Thus, participants who enrolled in the trial in 2011 in grade 11 could exit the study as early as 2012 after graduating high school. Participants who enrolled in the trial in 2012 in grade 8 or 9 could exit the study as late as March 2015 (figure 1).

Cohort construction
We constructed our analytical cohort to identify all young women living in the study area at the time of trial start (2011) regardless of trial participation status. Further restrictions were applied to build a cohort of young women on comparable age/grade trajectories and to match key HPTN 068 selection criteria. First, we restricted the cohort to include young women between the ages of 13 and 20 years in 2011 or 2012. Based on education data collected by the Agincourt HDSS in 2009 (education data were collected in 2009 but not again until 2012), we also restricted the cohort to those who were enrolled in grades that projected to grades 8-11 in 2011 or 2012, assuming a one-grade increase each year.

Key measures
Our primary exposure of interest was HPTN 068 trial participation (both trial arms combined). We analysed both arms together because trial results indicated essentially no differences in school attendance and enrolment data between the arms. School attendance was high (≥80%) for 95% of the intervention arm and 96% of the control arm participants. Permanent school dropout occurred at a rate of 3 per 100 person years in both arms. 17 Our primary outcome of interest was school enrolment, which we calculated at 2011 and 2015 based on Agincourt HDSS education data collected in 2009, 2012 and 2015. We used the 2011 school enrolment outcome to assess whether enrolment patterns were already different at the beginning of the trial, indicating a potential sample selection effect. We used the 2015 school enrolment outcome to assess whether enrolment patterns were different at the end of the trial, when both sample selection and Hawthorne effects could be present. We considered young women as enrolled in school if they indicated current school enrolment or if they reported a grade 12 attainment, the final year of secondary schooling.
With We also explored the potential for confounding and effect measure modification by key covariates. We examined age on 1 January 2011, categorised as ages of compulsory school enrolment (ages 11-15 years), older than age for compulsory school enrolment but correct age for grade (ages 16-17 years), and older than age for compulsory school enrolment while also older than expected for grade (ages 18-20 years). We also examined indicators of household socioeconomic status (SES), measured with a composite index of household assets; household size; gender of household head; secondary school educational attainment of the household head; country of origin (South African or Mozambican descent); and pre-2011 childbearing.

Analysis
We used binomial regression models with an identity link to estimate the association between trial participation and school enrolment in 2011 and 2015. The 2011 enrolment outcome was used to isolate the potential for a sample selection effect (ie, Were trial participants more likely to be in school than non-participants at the beginning of the trial?). We used the 2015 enrolment outcome in a restricted cohort of young women who were enrolled in school in 2011 to isolate the potential for a Hawthorne effect at the end of the trial (ie, Were trial participants more likely to remain in school after 4 years than non-participants, conditional on being in school at the beginning of the trial?).
We conducted unadjusted analyses and analyses adjusted for age, SES, gender and education of household head, household size, country of origin, and pre-2011 childbearing. School enrolment decisions were likely highly influenced by age both because our cohort straddled the age limit for compulsory schooling in South Africa and because school dropout generally increases with age. Thus, we conducted age-stratified analyses to see whether the associations between trial participation and school enrolment differed by age category.
Although trial results indicated that school attendance and enrolment outcomes were not significantly different between the arms of the trial, the intervention was designed to incentivise school attendance. 17 For this reason, we conducted a sensitivity analysis restricting the trial participants to those who were randomly assigned to the control group only. We compared results from this restricted population with the results from the analysis of the full trial population.

results
Overall 3889 young women from the Agincourt HDSS were included in our cohort (table 1). The median age was 15 years (IQR: [14][15][16]. Young women tended to live in large households (mean size: 8.4), and household heads were often female (42%) and often lacked high school education (86%). The majority of young women were of South African descent (60%) and very few (7%) had Open Access begun childbearing prior to 2011. Just under half of the young women (44%) went on to participate in HPTN 068, and they tended to be less likely to be on the youngest (ages [11][12] or oldest (ages 19-20) end of the age spectrum, although the median age of both participants and non-participants was 15. Trial participants were also less likely to have begun childbearing, slightly more likely to live in households headed by high school graduates and slightly more likely to live in household with above-median SES.
At the end of the trial in 2015, the difference in school enrolment between trial participants (81%) and non-participants (69%) grew, with an adjusted overall risk difference of 6.8 (95% CI 3.4 to 10.2). To investigate whether any differences in school enrolment could be attributed to a Hawthorne effect, we restricted the cohort to those enrolled in school in 2011 and examined differences in 2015. Under this restriction, young women enrolled in the trial were still more likely to remain in school in 2015 (82%), compared with those who did not (74%), with an aRD of 4.9 (95% CI 1.5 to 8.3)). Again, the association was weakest among those 11-15 years old (aRD (95% CI): 1.8 (−1.3 to 5.0)) and strongest among those 18-20 years old (aRD (95% CI): 22.8 (7.3 to 38.2)).
Results were largely unchanged when we restricted the trial participant population to those assigned to the control group only (table 3). Although CIs widened due to reduced sample size with some newly spanning the null, the magnitudes of the risk difference point estimates were largely unchanged from those in the primary analysis.
dIsCussIOn HPTN 068 found that cash transfers conditional on school enrolment did not influence HIV acquisition among young women in a rural South African setting. Due to unexpectedly high levels of school enrolment in both arms, the ability to explore schooling as a mechanism through which cash transfers could influence HIV acquisition was limited. Here, we found evidence to suggest that both Hawthorne effects and sample selection effects could threaten the external validity of these findings. Overall, trial participants were more likely to remain in school until graduation than non-participants. Differences in school enrolment status were already apparent at the beginning of the study, suggesting that the trial selection criteria likely pulled in young women with better school enrolment behaviours than those who were not enrolled as trial participants (sample selection effect). Differences in school enrolment grew larger as the trial progressed, and importantly remained strong even after restricting to those enrolled in school in 2011 when the trial started, suggesting the changes in enrolment status occurred during the trial itself (Hawthorne effect). Both sample selection and Hawthorne effects may have diminished the differences in school enrolment between study arms and is one plausible explanation for the overall null study effect. The HPTN 068 trial was designed to activate the HIV prevention effects of education by incentivising school attendance and retention in the intervention arm. With high levels of school attendance and retention across both arms of the trial, the ability to detect a trial effect was likely weakened.
Our findings that trial participation influenced school enrolment behaviour could plausibly be explained by several characteristics of the HPTN 068 study design and protocol. First, all participants were aware of the objective of the study: to retain young women in school to prevent HIV. This information could result in school enrolment behaviour change to align with perceived expectations of study staff or because young women were motivated to prevent HIV. Second, compared with non-participants, trial participants were exposed to different networks likely to be supportive of school enrolment. Adult fieldworkers showed interest in the schooling of participants with yearly in-person data collection and monthly in-school data collection. Data were collected in 'camps' wherein trial participants were transported to study offices annually for a half-day of surveys and blood tests, and entertaining activities during wait periods (eg, fingernail painting, photograph taking, magazine reading). This protocol could have fostered a cohesive group environment among trial participants resulting in positive peer pressure to maintain school enrolment. There is a growing body of evidence that interventions providing a safe space with adult mentorship and peer support can have positive outcomes for young women in sub-Saharan Africa, 24-26 a pathway that may have been activated with trial participation. Finally, trial participation provided access to certain health and social services that may have otherwise been inaccessible, including annual HIV and herpes simplex virus type 2 (HSV-2) testing and counselling, linkage to care for those who tested positive, and linkage to social work services for young women who reported experiences of sexual abuse. These services may have enabled young women who would have otherwise struggled with serious physical and mental health outcomes to remain in school.
Associations between trial participation and school enrolment were strongest in older age groups. The small Table 2 The relationship between HPTN 068 trial participation and school enrolment in 2011 and 2015, stratified by age, in the full cohort and the cohort restricted to those enrolled in school in 2011  Open Access differences observed in the youngest age group are understandable as they were under the age limit of compulsory education with requirements to remain in school regardless of trial influence. For the oldest age group, trial selection criteria for lower grade levels meant they were older than expected for their grade, and suggested a history of grade repetition or temporary dropout. That the trial protocol may have contributed to keeping older teens in school is significant as the transition to adulthood carries extremely high HIV risk. 23 This study was fairly unusual in that data were available on key study outcomes for the underlying population from which study participants were drawn. The Agincourt HDSS routinely collects school enrolment data on all residents in the study area, and we were able to leverage those data to assess differences between trial participants and non-participants. The majority of epidemiological studies do not have the benefit of complete background data on the target population, and, as such, sample selection and Hawthorne effects are rarely empirically assessed. 5 10 12 14 However, when Hawthorne effects are assessed, the direction of the relationship between research participation and healthy outcomes tends to be positive, in line with our findings of improved school enrolment outcomes.
It is important to note that data on HIV incidence, the primary endpoint of HPTN 068, were not available for the underlying target population. We speculate that the improved schooling trajectories we observed in trial participants likely resulted in reduced risk of HIV acquisition. 17 Continued schooling is strongly associated with HIV prevention and reduced sexual risk outcomes in young women in sub-Saharan Africa, 18 27 28 and we observed lower HIV incidence (1.8%) than expected among trial participants (3%). However, we cannot say with certainty that the association between trial participation and school enrolment extended to HIV protection.
The potential for uncontrolled confounding requires that our results be interpreted cautiously. The differences we attribute to the Hawthorne effect were estimated in an observational data set. Initial screens for trial eligibility were performed based on age data maintained by the Agincourt HDSS, and 82% of the eligible young women approached went on to enrol in the study, a fairly high response rate. 17 Still, it is plausible that those who refused participation were different from those who consented in ways that were also related to future school enrolment trajectories. Although we controlled for key sociodemographic characteristics that we theorised could be related to both trial participation and school enrolment, the possibility for bias from unmeasured confounding remains.
We offer three key conclusions from this study. First, epidemiologists should give greater weight at the planning, analysis and dissemination stages to identifying how sample selection and Hawthorne effects can be minimised, assessed and discussed. Prioritising research with well-defined target populations in areas with ongoing background data collection (eg, HDSS centres) would improve researchers' abilities to empirically assess the external validity of their findings. Second, the sample selection effect we observed highlights how school-based samples can differ in important ways from non-schoolbased samples in terms of underlying risk. Interventions focused on school-going adolescents may not reach those most in need of prevention, an anticipated issue that was ultimately difficult to avoid given the HPTN 068 design. Third, the Hawthorne-specific findings suggest that aspects of the HPTN 068 protocol could potentially be adapted for school retention interventions to prevent HIV. If the relationship we observed is causal, the trial protocol increased school enrolment at a magnitude similar to targeted cash transfer interventions and other fairly resource-intensive school retention interventions in sub-Saharan Africa, 21 29-31 despite the actual contact with the young women being limited to annual visits. Future work should examine key elements of the study protocoladult mentorship, peer support, school attendance monitoring, messaging around the link between school and HIV, routine HIV/sexually transmitted infection testing and linkage to care-to better understand their relationship with school retention and HIV acquisition.