Evaluating the impact of the Bolsa Familia conditional cash transfer program on premature cardiovascular and all-cause mortality using the 100 million Brazilian cohort: a natural experiment study protocol

Introduction Brazil’s Bolsa Familia Program (BFP) is the world’s largest conditional cash transfer scheme. We shall use a large cohort of applicants for different social programmes to evaluate the effect of BFP receipt on premature all-cause and cardiovascular mortality. Methods and analysis We will identify BFP recipients and non-recipients among new applicants from 2004 to 2015 in the 100 Million Brazilian Cohort, a database of 114 million individuals containing sociodemographic and mortality information of applicants to any Brazilian social programme. For individuals applying from 2011, when we have better recorded income data, we shall compare premature (age 30–69) cardiovascular and all-cause mortality among BFP recipients and non-recipients using regression discontinuity design (RDD) with household monthly per capita income as the forcing variable. Effects will be estimated using survival models accounting for individuals follow-up. To test the sensitivity of our findings, we will estimate models with different bandwidths, include potential confounders as covariates in the survival models, and restrict our data to locations with the most reliable data. In addition, we will estimate the effect of BFP on studied outcomes using propensity score risk-set matching, separately for individuals that applied ≤2010 and >2011, allowing comparability with RDD. Analyses will be stratified by geographical region, gender, race/ethnicity and socioeconomic position. We will investigate differential impacts of BFP and the presence of effect modification for a combination of characteristics, including gender and race/ethnicity. Ethics and dissemination The study was approved by the ethics committees of Oswaldo Cruz Foundation and the University of Glasgow College of Medicine and Veterinary Life Sciences. The deidentified dataset will be provided to researchers, and data analysis will be performed in a safe computational environment without internet access. Study findings will be published in high quality peer-reviewed research articles. The published results will be disseminated in the social media and to policy-makers.

1. In Logic model in the page 6 and corresponding Figure 1, the authors suggested improved health care utilization as one of the mediators for favorable outcome, but presenting only 'controlled blood pressure' in Figure 2 seems to be too specific. There are many other preventive measures such as statins, antiplatelet agents and/or smoking cessation other than that. Consider a more comprehensive description. 2. In the page 5: 'If individuals do no longer meet the inclusion criteria, ie., if they improve their socioeconomic status, do not meet the conditionalities or do not update the registry every two years, the benefit will only continue for two more years.' So, there is significant heterogeneity among those who were dropped from the CCT. Those whose SES improved are likely to have favorable health outcome but the others may have poorer outcome. Please describe the analysis plan to address this issue. 3. In the page 7: 'Mortality Information System' section described possible inaccuracy problem in the outcome (mortality). Do you have any plan to test the validity of the cause of death data? If it is not possible, do you have any previous study result to estimate the validity of the cause of death in death certificates. 4. In the page 8: 'For individuals that apply ≥2011, in which we have better recorded income data'. Does this mean that classification in the <=2010 cohort may be less accurate? Please give some more detail in the rationale for this division.

REVIEWER
Jesús-Adrián Alvarez Interdisciplinary Centre on Population Dynamics, University of Southern Denmark REVIEW RETURNED 24-May-2020

GENERAL COMMENTS
The study protocol describes the set up to investigate the impact of the conditional cash transfer program (Bolsa Familia Program, further referred to as BFP) on cardiovascular and all-cause mortality in Brazil. The methods to be used in the study are sound and the framework seems to be adequate. I have some minor comments. 1. In Logic model in the page 6 and corresponding Figure 1, the authors suggested improved health care utilization as one of the mediators for favorable outcome, but presenting only 'controlled blood pressure' in Figure 2 seems to be too specific. There are many other preventive measures such as statins, antiplatelet agents and/or smoking cessation other than that. Consider a more comprehensive description. R: In the logic model (in the figure and in the text) we have focused on known mechanisms in which CCTs and/or BFP may reduce cardiovascular mortality. We have nsiow also updated figure 1 to include controlled blood pressure as one example of how improved medical treatment can impact on cardiovascular mortality. In addition, given hypertension drugs are freely available through Brazil's Universal healthcare System (SUS) we also included in our model the following hypothesis: "We hypothesize that inclusion of BFP beneficiary families in the Family Health Program might promote early CVD diagnosis and better care (Rasella et al., 2014), even though Brazil has a Universal Healthcare System (SUS) and access to free hypertension drugs has substantially increased over recent decades (Emmerick et al., 2015)." (Please see section "Methods and analysis/Logic model", page 5).

2.
In the page 5: 'If individuals do no longer meet the inclusion criteria, ie., if they improve their socioeconomic status, do not meet the conditionalities or do not update the registry every two years, the benefit will only continue for two more years.' So, there is significant heterogeneity among those who were dropped from the CCT. Those whose SES improved are likely to have favorable health outcome but the others may have poorer outcome. Please describe the analysis plan to address this issue. R: The vast majority of people who stop receiving the intervention do so because they have improved their socioeconomic status, and not because they did not comply with BFP conditionalities. BPF works with the idea that non-compliance means the family is more vulnerable and, in these cases, they receive a visit of a social worker (Soares, 2011). Therefore, there is less heterogeneity than might be expected. It also should be noted that improved socioeconomic conditions that lead to changes in health risk behaviour are more likely to influence at short term cardiovascular disease burden, rather than mortality.
We have now included the following sentence to the text: "Nevertheless, non-compliant families are thought to be more vulnerable and, in these cases, receive a visit of a social worker that will help families' compliance and their maintenance in the programme" (Please see section "Methods and analysis/Intervention", page 5/6). To address potential biases related to changes in socioeconomic and treatment status over time, we have now included more details on how to explore the impact of these potential changes: " To improve the robustness of our results, we will perform additional analyses: i. restricted to a subgroup of individuals whose treatment has not varied over time (i.e., excluding those who stopped receiving BFP treatment); ii. restricting the follow-up time to shorter periods in which socioeconomic conditions are less likely to have varied over time; iii. exploring the possibility of treatment contamination occuring when untreated individuals start receiving the BFP, […]"(Please see section "Methods and analysis/Data analysis plan/Analysis", page 10). We agree with the reviewer that is necessary to evaluate possible bias due to the ascertainment of death certificates. We have now included in page 7, references that suggest that mortality records have been improved in Brazil over, despite remaining geographical inequalities: "Despite the significant and continuous improvement of data quality over time, regional disparities remain with the worst quality in the poorest regions and those with worse health care (Lima et al., 2014;Junior et al., 2017)." (Please see section "Methods and analysis/Datasets/ Mortality Information System", page 7). In addition, we also plan to stratify the analysis to geographical regions (e.g., microregions or municipalities) with better and worse quality of mortality data. Therefore, we have now rephrase the sentence in page 10 to better describe the intended analysis: "We shall also repeat the analyses in geographical areas with more and less reliable mortality data, e.g. with different proportions of ill-defined causes of death and underreported mortality." (Please see section "Methods and analysis/Data analysis plan/Analysis", page 10).
4. In the page 8: 'For individuals that apply ≥2011, in which we have better recorded income data'. Does this mean that classification in the <=2010 cohort may be less accurate? Please give some more detail in the rationale for this division. R: To be more accurate in this description, we have now included additional information regarding CadUnico registry: "The baseline of the 100 Million Brazilian Cohort includes a range of sociodemographic variables collected at individuals' first application in CadÚnico, and includes household income, gender, age, race/ethnicity, geographical region and urban-rural classification, housing characteristics and education ( Table 2). As the cohort was built from different versions of CadÚnico (ie., version 6 from 2001-2010 and version 7 from 2011-2015), the baseline contain variables that are common to the two versions and those that are only available in one of them. Also, completeness varies widely between variables (0-10% in the selected variables) ( Table 2) and over time." (Please see section "Methods and analysis/ Datasets/Sociodemographic variables", page 6-7). For the reasons we will only use data from 2011 onwards to apply RDD, please see: "For individuals that apply ≥2011, for which income data are higher quality (i.e., preliminary data cleaning showed that >75% of individuals had a monthly per capita income <BRL1/USD0.25 prior to 2011), we can use a regression discontinuity design (RDD)." (Please see section "Methods and analysis/Analysis", page 9).

The study protocol describes the set up to investigate the impact of the conditional cash transfer program (Bolsa Familia Program, further referred to as BFP) on cardiovascular and all-cause mortality in Brazil.
The methods to be used in the study are sound and the framework seems to be adequate. I have some minor comments.

1.
The authors define premature mortality as deaths among persons 30 to 69. Why would they choose such range? Could you elaborate more? R: For this study, we have used the same definition of premature mortality by Non-communicable diseases as defined in one of the Sustainable Developmental Goal (SDG) target 3.4 indicators. To clarify that in the text, we have now included the following sentence to the text: "Premature mortality (i.e., death among persons 30 to 69 years of age) is an important indicator included in the Sustainable Developmental Goals (SDG) 3.4 target for monitoring the implementation of effective public policies for disease prevention and control." (Please see Introduction section, page 4).

Brazil depicts a large component of external mortality driven by homicides (see Alvarez et al, 2020 in Population Studies). This component is mainly observed in males between ages 15-55. It is important that the authors consider the effect of the BFP on the external mortality component, specifically when
analysing the third and fifth objectives of their research protocol: "To estimate the causal effect of BFP on all-cause premature mortality." and "To explore how combinations of selected social characteristics influence the causal effects of BFP on the above outcomes, adopting an intersectionality perspective." R: We agree with the reviewer and, considering that previous studies have suggested an indirect effect of Bolsa Familia program in reducing violence related deaths (homicides), we have now included the following sensitivity analysis: "To test if the effect of BFP on all-cause mortality is independent of BFP's effect on homicide and other external causes of death (Alvarez et al, 2020; Machado et al 2018), we shall re-estimate the effect excluding external causes of death." (Please see section "Methods and analysis/Data analysis plan/Analysis", page 10). We note that the issue of impacts on external mortality is of considerable policy relevance but another team has been focusing on this issue using these data and we are therefore not duplicating that ongoing work. We have now provided some preliminary estimates of missing data in the "datasets" section and have now included in the "Analysis" section the methods chosen to deal with missing data: "Also, completeness varies widely between variables (0-10% in the selected variables) ( Table 2) and over time." (Please see section "Methods and analysis/ Datasets/Sociodemographic variables", page 7); "To deal with missing data, we will start by exploring the missingness pattern of covariates over time in our study population. Given the size of our sample and the complexity of causal inference methods, we are unable to implement multiple imputation. For the development of the propensity score, we will try to limit inclusion of covariates to those with a relatively low percentage of missing values (e.g., <5%). For variables which have higher levels of missingness but which are strongly informative of intervention receipt, we will include a missing indicator for that variable. In addition, we will perform a sensitivity analysis using only individuals without missing data in the covariates of interest (i.e., complete case analysis)." (Please see section "Methods and analysis/ Datasets/Sociodemographic variables", page 11).

4.
When describing how to address the changes over time in eligibility criteria for BFP, the authors mention that "To account for changes in the eligibility criteria for BFP over time (  1). To account for these changes over time, we will standardize the monthly per capita household income to the 2014 threshold so that we can use a single cut-off value in the analysis for all years." (Please see section "Methods and analysis/ Datasets/Sociodemographic variables", page 7; and Table 1).

5.
When calculating exposures based on their 100 Million Brazilian cohort, it is very important that the authors take into account the different observational schemes (censoring, truncation, etc.). Such schemes could have a huge impact on their results. R: We agree and have now included additional analysis to test the robustness of our results: "To improve the robustness of our results, we will perform additional analyses: i. restricted to a subgroup of individuals whose treatment has not varied over time (i.e., excluding those who stopped receiving BFP treatment); ii. restricting the follow-up time to shorter periods in which socioeconomic conditions are less likely to have varied over time; iii. exploring the possibility of treatment contamination occuring when untreated individuals start receiving the BFP, and iv. removing families with zero income or restricting the analysis to individuals that are more likely to receive the treatment (e.g., monthly per capita income below a certain threshold)" (Please see section "Methods and analysis/Data analysis plan/Analysis", page 10).

Please provide dates (time schedule) of planned studies. R:
We have now provided a time schedule for the development of the study.