Background Longitudinal research is subject to participant attrition. Systemic differences between retained participants and those lost to attrition potentially bias prevalence of outcomes, as well as exposure-outcome associations. This study examines the impact of attrition on the prevalence of child injury outcomes and the association between sociodemographic factors and child injury.
Methods Participants were recruited as part of the Environments for Healthy Living (EFHL) birth cohort study. Baseline data were drawn from maternal surveys. Child injury outcome data were extracted from hospital records, 2006–2013. Participant attrition status was assessed up to 2014. Rates of injury-related episodes of care were calculated, taking into account exposure time and Poisson regression was performed to estimate exposure-outcome associations.
Results Of the 2222 participating families, 799 families (36.0%) had complete follow-up data. Those with incomplete data included 137 (6.2%) who withdrew, 308 (13.8%) were lost to follow-up and 978 families (44.0%) who were partial/non-responders. Families of lower socioeconomic status were less likely to have complete follow-up data (p<0.05). Systematic differences in attrition did not result in differential child injury outcomes or significant differences between the attrition and non-attrition groups in risk factor effect estimates. Participants who withdrew were the only group to demonstrate differences in child injury outcomes.
Conclusion This research suggests that even with considerable attrition, if the proportion of participants who withdraw is minimal, overall attrition is unlikely to affect the population prevalence estimate of child injury or measures of association between sociodemographic factors and child injury.
- child injury
- longitudinal research
- data linkage
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
Strengths and limitations of this study
Uses a mixed active-passive cohort design from a birth cohort study that employed both data linkage and direct participant follow-up to ascertain comprehensive exposure and outcome data.
Ability to examine impact of participant attrition on outcome estimates using complete data.
Relatively small sample size. The adjusted model removed significance even though there was little change in the point estimate and could represent a type 2 error.
Longitudinal birth cohort studies are invaluable for exploring the impact of early environments (physical, social and economic) on a child’s ongoing health and development.1 One of the greatest threats to the validity of this research design is bias due to participant attrition.2 Attrition can occur as a result of a participant actively withdrawing from the study, relocating and becoming lost to follow-up or not responding to follow-up waves.
Research examining attrition in longitudinal studies has identified systemic differences between responders and non-responders across a range of individual, family, social, demographic and health characteristics, with disadvantaged populations generally having higher rates of attrition.3–8 Systemic characteristic differences have also been found between participant groups according to the reason for attrition.6 9 10 In line with epidemiological principles, it is anticipated that these differences will result in a bias of outcome prevalence and exposure-outcome associations. However, while evidence of a link between attrition and prevalence bias has been reported in the literature, there is less evidence of attrition leading to bias in exposure-outcome associations, and the bias effect can vary depending on the nature of the outcomes under investigation.3–5 7 11
Within the context of longitudinal birth studies, the impact that attrition bias has on child injury outcomes is an important area for further research. To date, the majority of attrition-injury studies have focused on participants recruited after injury has already occurred, or older populations.12–22 In these studies, attrition has been linked to socioeconomic and participant characteristics, causes of injury, injury severity and treatment plans. Extending this research to include longitudinal research on outcomes for young children in a broader population will greatly enhance our understanding of this relationship.
Using data linkage of routinely collected administrative health records with participant survey data provides researchers new opportunities to assess the impact of attrition bias on participant outcomes in prospective longitudinal studies.23 Data linkage can provide comprehensive access to outcomes data for the total cohort group, allowing comparisons of outcomes to be made between participants who are active and those who have been lost from ‘active’ (survey) follow-up.
In this manuscript, we report the use of a mixed direct participant contact (including surveys) and linked data design in a longitudinal birth cohort study to examine the impact of participant attrition bias on child injury outcome estimates. In particular, the research will 1) identify whether there are systematic differences in attrition and 2) examine the impact of attrition on both the prevalence of child injury outcomes and the association between sociodemographic variables and child injury.
This cohort study combined survey and linked administrative health data.
Environments for Healthy Living (EFHL): Griffith Birth Cohort Study
The EFHL study is a longitudinal birth cohort which investigates how social, environmental, neighbourhood, family, maternal and individual factors impact a child’s health and development.24 Pregnant women in their third trimester were recruited between 2006 and 2011 at one of three public maternity hospitals in South East Queensland and Northern New South Wales. Participants completed a questionnaire at baseline and then follow-up surveys were sent 1, 3 and 5 years after the birth of the child. In between these scheduled contacts, regular newsletters were sent and additional substudy project contacts occurred. Returned mail or contact difficulties triggered the use of alternative contact mechanisms supplied at enrolment and updated at follow-ups, including a relative or friends contact details, email and Facebook.24 25The EFHL sample was largely characteristic of births in the region, however, consistent with the public hospital setting from which participants were recruited, there was a higher than national average representation of families with lower incomes, younger maternal age, more overseas born parents and higher proportions of maternal smoking in pregnancy.25 The study methodology and baseline characteristics of the sample have been comprehensively described in other papers.24 25 EFHL has been included on the Australian and New Zealand Clinical Trials Registry (ACTRN12310000931077).
Participants were families enrolled in the EFHL project at one of the two Queensland based public hospitals in the region (Logan or Gold Coast Hospital). This allowed direct comparison to Queensland Health child hospital records. Participants were selected from those recruited in the years 2006–2010 only, as these cohorts had completed more than one follow-up survey. Cases of maternal and child death were excluded from the analysis as these participants had died (which is a known outcome) and were not lost to follow-up.
Ethics approval was provided by the participating hospitals and by the Human Research Ethics Committees of Griffith University and the University of Queensland.
No additional data available.
Written, informed maternal consent for data used in this analysis was obtained for (1) direct participant contact including the completion of baseline and follow-up surveys; (2) release of hospital perinatal records and (3) for data to be linked to the child’s administrative health records including hospital admission and emergency department (ED) presentations.
Data sources and collection
Baseline social, maternal and household variables
Baseline information was collected from hospital perinatal records and from a self-administered survey completed by women during pregnancy. This baseline survey covered a range of maternal, household and community factors. The child characteristics included in this analysis were gender and age, via length of follow-up. Maternal characteristics included the level of education completed; maternal age at baseline and whether the mother smoked during her pregnancy. Household factors included marital status, whether other children resided in the home, housing status (owned, rented, government/boarding house) and household income level. Household income was divided into quintiles.25
Cohort attrition status
Information on participant attrition status was collected from the EFHL participant tracking database. Records were kept on both the participant’s overall recruitment status, as well as on their participation in follow-up surveys up to May 2014.
Participants were categorised into two groups: (1) those with complete follow-up or (2) those with incomplete follow-up. Participants with complete follow-up included those that had returned all follow-up surveys relevant to their age group and who had not withdrawn during the lifetime of the project.
Participants with incomplete follow-up were categorised into one of three groups. (1) The ‘withdrawn’ group included any participant who had actively asked to be withdrawn from direct participant contact and ongoing survey completion. Consent was retained for the use of existing data and linked administrative data. (2) ‘Lost to follow-up’ included those participants who did not respond to follow-up surveys and who could no longer be contacted through their home address, phone, email or emergency contact addresses. (3) ‘Partial or non-responders’ included people who were locatable, had not withdrawn, but had not completed one or more follow-up surveys beyond the initial baseline survey.
Administrative health data
The child’s health records were extracted between 2006 and 2013 from the Queensland Emergency Department Information System (EDIS) and from the Queensland Hospital Admitted Patients Data Collection (QHAPDC). The matching procedure was undertaken by the Queensland Department of Health (the custodians of the data) using linkage software, based on deterministic and probabilistic methods, to link demographic, child and maternal information to hospital records. Deterministic linkage involves the linking of data sets through comparing fields such as name, year of birth and street name with the requirement that the records agree on all characters. Probabilistic linkage involves the use of statistical models and algorithms to estimate the probability of data from different data sets having commonality (eg, the same person/event). Clerical reviews of the data were undertaken to manually inspect uncertain matches in probabilistic linkage.26
Injury related to a hospital or ED episode of care was the outcome of interest. Multiple admissions, nested admissions and any corresponding hospital or ED presentations relating to the one injury event were identified through dates of presentations, transfer codes combined with diagnostic fields. These records were subsequently collapsed into one episode of care for the injury event.
Injury presentations were classified using International Statistical Classification of Diseases (ICD)-10-AM, Chapter 19 Injury and Poisonings (S00-T98) and Chapter 20 External Causes (U50-Y98), with late effects from injury excluded. EDIS data contain only a single diagnostic field in which to describe the reason for presentation, whereas QHAPDC contains a primary diagnosis field and multiple other diagnostic fields in which external causes and activity codes may be recorded. Researchers checked text descriptions in the EDIS data to identify any injuries that were not captured in the single diagnosis field. Almost all ED records used the one available diagnostic field to classify the nature of the injury (S00-T98). External cause codes (V01-Y98) were used in <5% of ED records. As a result, injury cause codes were only available comprehensively for the inpatient episodes of care. Injury subgroups were defined and matched to the ICD-10-AM chapter subcategories.
Calculation of person-years
The ages of the children and length of follow-up for injury-related hospital treatment varied considerably across participants due to the five recruitment waves. As such, individual person-years (PYs) of exposure time were calculated for each child, based on the time between birth and 31 December 2013 in which he or she was residing in the state of Queensland, alive and eligible for healthcare.
Data cleaning and analyses were undertaken using SAS V.9.4 software. The statistical significance of differences between groups was assessed by Pearson’s χ2 test for categorical data. Using the state-wide linked administrative health data, injury-related episodes of care were obtained for all cohort participants. Rates of injury-related episodes of care were calculated, taking into account PYs exposure time, for each factor and by attrition status (rates/10 PYs).
Poisson regression was performed to estimate crude and adjusted rate ratios (RRs) between multiple exposure factors (child, maternal and household characteristics) and the outcome (count of child injury-related episodes of care). All factors significant in the univariate analyses associated with injury were included in the model. The log of individual PYs of exposure time was included as an offset. The final model included child gender, maternal age and maternal education. Complete case analysis was employed with two-sided significance set at a level of 5%.
In total, 2222 families who enrolled in the EFHL study from 2006 to 2010, at the two participating Queensland hospitals, were included in this analysis. The number of children totalled 2245 (including 23 sets of twins). Maternal age ranged from 16 to 48 years with a mean age of 28.9 years (SD±5.98). At the time of enrolment, 14.1% of the households were sole parent families, 28.2% of maternal participants smoked cigarettes, 34.9% had no other children living in the household and 22.7% of the maternal participants had not completed secondary school.
Attrition status and follow-up
Of the 2222 participating families, 799 families (36.0%) had complete follow-up, 137 (6.2%) were withdrawn, 308 (13.8%) were lost to follow-up and 978 families (44.0%) were partial or non-responders (figure 1). There was no significant difference in the proportion of households with complete or incomplete follow-up across the five recruitment waves (p=0.20).
Follow-up consisted of a total of 11 908.3 PYs from birth to 7 years of age, with a mean 5.3 PYs per child (range 0–7). Automated linkage and manual searching by the state health department data linkage unit found records in QHAPDC for 97.1% of the child participants (n=2245), including their birth record.
Baseline demographic, household and child characteristics by attrition status
Baseline demographic and household characteristics differed significantly between participants who had complete follow-up compared with those families who had incomplete follow-up. Attrition families were more likely to have lower levels of maternal education, lower maternal age, have higher rates of maternal smoking, be single parent households, have more children residing in the home, have lower gross household incomes and live in rental or government boarding housing (all p<0.0001). However, there were no differences in the proportion of child gender for active families compared with attrition families (table 1).
These baseline characteristic differences were similar for two of the three attrition groups, lost to follow-up and partial/non-responders. However, those families in the withdrawn group were more similar in baseline characteristics to the families in the complete follow-up group, with the exception of maternal age, marital status and gross household income (table 1). Of note, families in the withdrawn group were more likely to have male children than those families who fully participated (p<0.01).
Prevalence rates of childhood injury by attrition status
The total cohort had an overall child injury rate of 2.59/10 PYs, similar to the child injury rate for those families with complete follow-up (2.60/10 PYs) and those families who were lost to follow-up or partial/non-responders (table 2). However, families in the withdrawn group had significantly higher rates of child injury compared with families with complete follow-up with an unadjusted relative risk (RR)=1.32 (95% CI 1.03 to 1.68). While there was little reduction in the effect size after adjusting for covariates, the difference in child injury rates for families who withdrew compared with families with complete follow-up was no longer statistically significant, with an adjusted RR=1.24 (95% CI 0.96 to 1.58) (table 2).
Relationship between baseline sociodemographic variables (exposure) and child injury episodes (outcome) by attrition status
For the total cohort, rates of child injury increased as the level of maternal education, maternal age and gross household income decreased (table 3). Rates of child injury were higher in single parent households, families living in government or boarding house accommodation and for families with male children. This distribution was similarly reflected in both the families with complete follow-up and those lost to attrition, with no statistically significant differences in the demographically stratified rates of child injury when comparing overall attrition status (p>0.05) (table 3).
While the direction of the demographically stratified child injury rates was similar for all attrition groups, the magnitude of this difference was greater in the withdrawn group for a number of baseline factors. Child injury rates for families that withdrew were higher than families with complete follow-up for all baseline factors, but most pronounced with respect to child gender, maternal education, maternal smoking, number of children living in the household and home ownership (p<0.05).
This study examined the impact of attrition in child injury research on the prevalence of child injury outcomes and the association between sociodemographic variables and injury (exposure-outcome relationship). There were three key findings. First, participants with incomplete follow-up, as a group, differed from complete responders across most sociodemographic characteristics and there were also differences in these characteristics across attrition types (within the incomplete follow-up group). Second, despite these systematic differences, there were no statistically significant impacts on the overall injury prevalence estimates and risk factor effect estimates, with the exception of the withdrawn group. The withdrawn group differed least, however, from the complete group in sociodemographic characteristics. Third, despite these systemic differences, the direction and nature of the sociodemographic-child injury relationship did not vary across the attrition groups.
The finding that there were systemic differences in the baseline social, maternal and household characteristics of those with complete follow-up versus those lost to attrition is consistent with previous research in this area, with families categorised as having lower socioeconomic status (SES) less likely to remain as active participants in the study.3 5 9 10 Systematic differences in attrition were also evident across attrition groups, as has been demonstrated in previous research.9 10 Interestingly, while families from the ‘lost to follow-up’ and ‘partial/non-responder’ groups were more likely to be of lower socioeconomic status, our withdrawn participants were more similar in their baseline characteristics to participants with complete follow-up. It may be that families of higher sociodemographic status felt more confident to actively withdraw from this research or that there are unmeasured characteristics in these families that lead to higher rates of withdrawal. Further research is warranted given the relatively small sample size of this withdrawn group.
Consistent with previous research,3 4 27 it was anticipated that systemic differences in attrition would result in differences in the prevalence of child injury outcomes across our groups (attrition-child injury), particularly as the relationship between lower sociodemographic factors and child injury has been well documented in the literature.28–31 However, there were no statistically significant impacts of this differential attrition on the overall injury prevalence estimates and risk factor effect estimates. The only apparent suggestion of difference (although with the observed stable effect losing significance with inclusion of multiple variables in the model) was in the withdrawn group, despite this group differing least from the ‘complete follow-up’ group for all measures of SES, compared with the other attrition categories. It should be noted that the withdrawn group was small in absolute terms (ie, <5% of the total cohort) and had the group been larger, it could have impacted on the overall prevalence of the outcome. This suggests that reasons for attrition may need to be examined in epidemiological research, as a high proportion of withdrawals may indicate the potential for bias.
Importantly, the systemic differences in attrition also had little impact on the direction of the relationship between sociodemographic factors and child injury, with lower socioeconomic status families having higher rates of injury across all the attrition groups. This finding adds to a growing body of evidence indicating that selective attrition does not necessarily impact the relationship between a range of exposure-outcome measures.3 4 10 With follow-up rates for cohort studies regularly being reported in the 30%–70% range,32 it is promising that these results suggest exposure-outcome findings from studies with large amounts of attrition may not need to be interpreted with the degree of caution currently applied.
A key strength of this study was the mixed active-passive nature of the cohort design. Prior to the first published results from this study, there had been few birth cohort studies that employed data linkage methodology in combination with active participant contact (survey follow-up) to ascertain comprehensive exposure and outcome data. Thus, this is one of the few studies capable of examining with rigour the questions we have addressed in this research. The key limitation is the relatively small sample size. This was particularly evident in relation to our main finding that it was in the group of participants who withdrew from the study that we found a bias. The adjusted model removed significance of the main finding even though there was little change in the point estimate (from 1.32 to 1.24) and could represent a type 2 error. The second limitation relates to the nature of the study sample, the determinant variables measured, the injury outcome focus of the analysis and the follow-up protocols used. While the research results could be expected to apply to similar circumstances, it is unknown how validly the study findings should be applied beyond the study group we examined. It is possible that in studies with a demanding follow-up protocol, participant attrition may be higher in people whose determinant variables are already affecting outcomes, and hence in these studies attrition may be more likely to be associated with a biased effect estimate.
The findings of this research provide support for relaxing one of the most challenging expectations of epidemiological studies, that is, complete participant follow-up. Even with considerable attrition, when proportions of participants who withdraw from the study are low, attrition bias is unlikely to affect either the population prevalence estimate or measures of association with key study variables.
The research reported in this publication is part of the Environments for Healthy Living (EFHL): Griffith Study of Population Health (Australian and New Zealand Clinical Trials Registry: ACTRN12610000931077). The EFHL project was conceived by Professor Rod McClure, Dr Cate Cameron, Professor Judy Searle and Professor Ronan Lyons. We are thankful for the contributions of the Project Manager, Rani Scott, and the current and past Database Managers. We gratefully acknowledge the administrative staff, research staff and the hospital antenatal and birth suite midwives of the participating hospitals for their valuable contributions to the study, in addition to the expert advice provided by Research Investigators throughout the project.
Contributors CMC, JMO, ABS, TMD, NS, RJM were involved in study concept and design. JMO and CMC acquired the data. CMC conducted the data management and analysis. CMC, JMO and RJM were all involved in interpretation of data wrote the first draft of the manuscript. CMC, JMO, ABS, TMD, NS, RJM contributed to the critical revision of the manuscript and approved the final version.
Competing interests None declared.
Patient consent Obtained from Guardian.
Ethics approval Griffith University, University of Queensland, The Logan Hospital, Gold Coast Hospital.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement No additional data available.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.