Article Text

REinforcement learning to improve non-adherence for diabetes treatments by Optimising Response and Customising Engagement (REINFORCE): study protocol of a pragmatic randomised trial
  1. Julie C Lauffenburger1,2,
  2. Elad Yom-Tov3,
  3. Punam A Keller4,
  4. Marie E McDonnell5,
  5. Lily G Bessette2,
  6. Constance P Fontanet1,2,
  7. Ellen S Sears1,2,
  8. Erin Kim1,2,
  9. Kaitlin Hanken1,2,
  10. J Joseph Buckley6,
  11. Renee A Barlev1,2,
  12. Nancy Haff1,2,
  13. Niteesh K Choudhry1,2
  1. 1Center for Healthcare Delivery Sciences, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
  2. 2Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
  3. 3Microsoft Research, Microsoft, Herzeliya, Israel
  4. 4Tuck School of Business, Dartmouth College, Hanover, NH, USA
  5. 5Endocrinology, Diabetes and Hypertension, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
  6. 6Division of Sleep Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
  1. Correspondence to Dr Julie C Lauffenburger; jlauffenburger{at}


Introduction Achieving optimal diabetes control requires several daily self-management behaviours, especially adherence to medication. Evidence supports the use of text messages to support adherence, but there remains much opportunity to improve their effectiveness. One key limitation is that message content has been generic. By contrast, reinforcement learning is a machine learning method that can be used to identify individuals’ patterns of responsiveness by observing their response to cues and then optimising them accordingly. Despite its demonstrated benefits outside of healthcare, its application to tailoring communication for patients has received limited attention. The objective of this trial is to test the impact of a reinforcement learning-based text messaging programme on adherence to medication for patients with type 2 diabetes.

Methods and analysis In the REinforcement learning to Improve Non-adherence For diabetes treatments by Optimising Response and Customising Engagement (REINFORCE) trial, we are randomising 60 patients with suboptimal diabetes control treated with oral diabetes medications to receive a reinforcement learning intervention or control. Subjects in both arms will receive electronic pill bottles to use, and those in the intervention arm will receive up to daily text messages. The messages will be individually adapted using a reinforcement learning prediction algorithm based on daily adherence measurements from the pill bottles. The trial’s primary outcome is average adherence to medication over the 6-month follow-up period. Secondary outcomes include diabetes control, measured by glycated haemoglobin A1c, and self-reported adherence. In sum, the REINFORCE trial will evaluate the effect of personalising the framing of text messages for patients to support medication adherence and provide insight into how this could be adapted at scale to improve other self-management interventions.

Ethics and dissemination This study was approved by the Mass General Brigham Institutional Review Board (IRB) (USA). Findings will be disseminated through peer-reviewed journals, reporting and conferences.

Trial registration number (NCT04473326).

  • diabetes & endocrinology
  • clinical trials
  • public health

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • REinforcement learning to Improve Non-adherence For diabetes treatments by Optimising Response and Customising Engagement is a two-arm pragmatic randomised trial of patients with suboptimally controlled diabetes that is testing a highly scalable strategy to personalise communication using reinforcement learning to improve adherence to medication and diabetes control.

  • The trial is designed to maximise both internal validity and generalisability and uses routinely-collected data to evaluate outcomes.

  • By using a 6-month follow-up to evaluate adherence outcomes, the trial will examine both long-term medication-taking and clinical outcomes (eg, glycaemic control).

  • While electronic pill bottles are highly accurate at measuring actual pill consumption, monitoring could theoretically influence adherence, but these observer effects typically decrease over time and would be similar in the control and intervention arms.

  • Secondary outcomes, including glycated haemoglobin A1c and self-reported adherence, may be susceptible to missing data due to the nature of the pragmatic data collection, but we are using imputation methods to overcome this issue.


Type 2 diabetes affects more than 34 million Americans and annually costs the US healthcare system an estimated US$237 billion.1 Achieving optimal diabetes control requires a variety of daily self-management behaviours, such as physical activity and maintaining a healthy weight.2 3 Among these, adherence to medication is central.4–6 Many strategies to improve adherence have been developed and tested.7–10 A growing body of evidence supports the use of text messages to improve health behaviours by offering reminders, providing education and enhancing motivation.11–16 In the case of diabetes, text messaging has been shown to improve adherence compared with usual care.17

Despite this evidence, there remains much opportunity to optimise the effectiveness of text messaging-based interventions.17 One key limitation of these prior interventions is that the message content has been largely generic. Evidence from behavioural sciences indicates that personalisation of communication is an extremely important principle for changing behaviours.18–20 In contrast to customisation, which involves manually changing content, personalisation tailors content for individuals based on their actual, observed prior behaviours, which can result in greater behaviour change.21 However, doing so at a population scale is difficult, especially in a way that is integrated with patients’ regular care.

One approach to achieve both precision and scalability is with the use of reinforcement learning, a machine learning method that can be used to observe individual behaviours in response to cues and implement a personalised strategy to further optimise behaviours in response to these cues.21–25 Through an iterative and systematic trial-and-error feedback loop, reinforcement learning algorithms can identify which outreach, in what order, and what delivery frequency, maximises each patient’s response. In contrast to other approaches, reinforcement learning algorithms learn in real-time, do not rely on historical data from other subjects, and tailor interventions for individual patients.26

In healthcare, reinforcement learning has been used to improve targeting of interventions for patients with mild depression,27 titrate antiepilepsy drugs28 and identify the best way for clinicians to manage sepsis.27 29 30 In the case of patient-facing healthcare, a reinforcement learning-based text message intervention improved physical activity and glycaemic control in patients with type 2 diabetes.24 25 Such an approach could also personalise text message outreach for medication adherence about adherence and lifestyle but has not been evaluated for this purpose.

Methods and analysis

Overall study design

REinforcement learning to Improve Non-adherence For diabetes treatments by Optimising Response and Customising Engagement (REINFORCE) is a two-arm pragmatic randomised controlled trial designed to evaluate the impact of a text messaging programme tailored using reinforcement learning on medication adherence for patients with type 2 diabetes (figure 1). The trial began enrolment on 4 February 2021.

Figure 1

Overall trial design. HbA1c, haemoglobin A1c.

Study setting and subjects

The study is being conducted at Brigham and Women’s Hospital (BWH), an academic medical centre in Boston, Massachusetts, USA, part of Mass General Brigham. Potentially-eligible subjects are individuals 18–84 years of age diagnosed with type 2 diabetes who are prescribed 1–3 daily oral diabetes medications, with their most recent glycated haemoglobin A1c (HbA1c) level ≥7.5% (ie, above guideline-based treatment targets).2 These criteria are being assessed using the BWH electronic health record (EHR).

Eligible patients must also have a smartphone with a data plan or Wi-Fi at home with ability to receive text messages regularly (ie, expected gaps in communication ≤3 days in a row), have a basic working knowledge of English, not be actively enrolled in another diabetes trial at Mass General Brigham, currently not using a pillbox or being willing and able to use electronic pill bottles for their diabetes medications for the duration of the study, and be independently responsible for taking their medications on a daily basis (ie, do not have daily assistance with medication-taking at home).

Patients using insulin are eligible to participate, and participants are not required to have prior evidence of non-adherence, as self-reported adherence is known to be strongly overestimated.31 These criteria were chosen because smartphone connectivity and willingness to use electronic pill bottles are essential for the daily measurement of adherence from the electronic pill bottles, and prior literature suggests wide adoption of smartphones, even among patients from socioeconomically disadvantaged backgrounds.32–34 The text messages are only presently available in the English language.

Study procedures and randomisation

The timeline of study procedures is shown in figure 2. Potentially eligible subjects with recent or upcoming virtual or in-person appointments at one of the BWH diabetes clinics are identified using a biweekly EHR screen. Patients’ endocrinologists are then provided with a list of their potentially eligible patients via email and asked to opt-out any patients they do not wish to be approached for participation.

Figure 2

Timeline of study procedures.

Patients approved for enrolment are sent a mailed or electronic patient portal letter on their endocrinologist’s behalf inviting them to participate in the study and providing them with a contact number to enrol directly. Patients are then contacted by phone. Those subjects who provide consent to participate are sent a baseline questionnaire administered and collected through REDCap electronic data capture tools housed at BWH (online supplemental appendix table 1; written consent form) and mailed Pillsy electronic pill bottles for their medications. REDCap is a secure, web-based software platform that supports data capture for research studies.35 36 Electronic pill bottles have been widely used in prior research on adherence and have shown high concordance with other adherence measurement methods.37 38 Subjects are asked to use these devices in place of regular pill bottles or pillboxes for their eligible oral diabetes medications. Data from the bottles are transmitted through the patients’ smartphones via a latent downloaded app that has no features enabled (Android or iOS) and is used only for measurement purposes.

After receiving the pill bottles, patients are randomised in a 1:1 ratio to intervention or control arm using a simple random number generator. In order to improve baseline participant balance between the two treatment arms, randomisation is block-randomised based on (1) baseline level of self-reported adherence, specifically <1 dose or ≥1 doses missed in the last 30 days,31 and (2) baseline HbA1c of <9.0% or ≥9.0%. These adherence and HbA1c cutpoints are based on prior literature and clinically relevant thresholds and are used to create four total blocks.31 39

At the end of the 6-month follow-up, patients are contacted by text message (and then by phone if non-responsive) to complete a follow-up questionnaire (online supplemental table 2) and ensure the complete synchronisation of their electronic pill bottles. The follow-up questionnaire includes items about self-reported adherence,31 number of diabetes medications, diabetes-related hospitalisations and satisfaction with the text messaging programme. Patients in both arms receive a US$50 gift card for participation at the end of the study.



The core component of the intervention is a reinforcement learning programme that personalises daily text message outreach based on data from the electronic pill bottles (figure 3). Beginning the morning after randomisation, the programme predicts which text message will be most likely to cause a patient to take their medications; the corresponding text message is then sent. The effectiveness of each message is assessed the next morning based on whether the medication doses were actually taken. The control arm receives no text messaging programme.

Figure 3

Reinforcement learning platform.

Text messages and classification scheme

The text messages that are delivered to patients are based on existing behavioural science principles of how message content that influences patient behaviour and improves patient self-efficacy.40–42 Based on feedback from qualitative interviews with patients,43 we selected five factors to incorporate into the messages: (1) framing, classified as neutral, positive or negative, (2) observed feedback, whereby the text message included the number of days in the previous week that patients had evidence of medication-taking (ie, 0–7), (3) social reinforcement (ie, mentioning loved ones in the text), (4) the nature of content, either providing a medication reminder or information about medications or lifestyle and (5) reflection, where the texts were designed to invoke introspection, such as including a reflective question.7 18 19 25 44 45

Informed by prior trials led by the study team, evidence-based texts in publicly-available materials, and the patient qualitative interviews, we designed text messages incorporating varying components of the five selected factors (table 1).25 46–48 For instance, a text message containing positive framing, observed feedback, social reinforcement, reminder content and without reflection would comprise one set of factors (eg, table 1, text 8). Each factor set in the trial had at least two text messages that corresponded with that set. Altogether, we developed and included 128 messages containing 47 unique combinations of factors.

Table 1

Example text messages and factor classifications

Reinforcement learning text messaging program

The reinforcement learning algorithm is hosted on a Health Insurance Portability and Accountability Act (HIPAA)-compliant Microsoft Azure server and integrates three components: (1) electronic pill bottle data obtained on a daily basis from the Pillsy server, (2) patient data from REDCap updated on a daily basis, including age, sex, number of medications, baseline HbA1c and patient activation, used as fixed predictors in the algorithm and (3) the reinforcement learning prediction model algorithm itself housed by Microsoft Personaliser, a publicly-available platform.49 Even though they are ‘fixed predictors’, the patient data are updated on a daily basis to incorporate new patients as they are enrolled and to update the number of medications for the adherence calculations, for example, if a diabetes medication for which a given patient is using an electronic pill bottle is discontinued or a new one is added.

Each day, adherence from the prior day is calculated by dividing the number of times a patient opened the pill bottle by the number of doses prescribed (ie, once or two times per day as assessed during REDCap data collection at baseline). Adherence values, ranging from 0 to 1, are the ‘reward’ events sent to Microsoft Personaliser that train the algorithm to achieve the highest possible sum of these adherence rewards over time. For patients on more than one medication, the values for each are averaged. To avoid erroneously classifying repeated bottle openings for a single dose as representing multiple doses, we only count a maximum of 1 opening event per ~3 hour period. To reduce the chance of training the model on incorrect data, reward events for the prior day are not sent to Personaliser if the electronic pill bottles appear to be disconnected that day. Once the pill bottles are reconnected, the feedback loop is complete, and the model is updated. If no reward value is sent to Personaliser after 2 days, a default reward value of 0 is assigned to that training event.

The reinforcement learning initially suggests random text message factors and observes individual feedback and subsequent adherence to medication.29 Over time, the algorithm begins to predict which factors should be included in the message that a patient receives. In addition to the adherence reward, the algorithm also incorporates other predictors including baseline characteristics, the number of days since each factor of text message was sent (to avoid sending similar messages many times in a row), and an indicator of whether a patient took their medication ‘early’ (ie, in the same calendar day but prior to the text message prediction of that morning). The most appropriate message for each day is determined by computing the predicted responses and Boltzmann sampling,50 sometimes referred to as a ‘contextual bandit’ method.26 51 52 When no adherence reward is received (because no data have been received from a patient’s pill bottle), text messages are predicted and based on adherence rewards available to date. Throughout the trial, 10% of the daily predictions are randomly chosen in order to continue to train the model. As in prior work, the algorithm can decide that no text message be sent to a specific patient on a given day.25

The text messages are sent up to daily using Microsoft Dynamics 365 SMS Texting, a HIPAA-compliant third-party platform managed through BWH. While the platform allows for two-way communication for patients in the intervention arm to stop the daily text message reminders, patients are not specifically encouraged to respond in order to enhance the potential for scalability to other settings. The overall programme was pilot tested among non-participant volunteers for 3 weeks before the start of the study in order to address operational issues; these data were not used to train the algorithm.

In addition to the reinforcement learning specified messages, intervention subjects also receive an introductory text on the day of randomisation, a simple reminder text to synchronise their electronic pill bottles if they have not been connected for more than seven days (sent every 3 days until connected or until 30 days has elapsed), and a final text at the end of follow-up with a link to the final questionnaire (Appendix Table 2). If the pill bottles have still not been synchronised for >14 days and >28 days, subjects will also receive one call from study staff on these days to inquire about any synching issues.

Control arm

The control arm receives the same simple introductory, synchronisation and end of study text messages as sent to the intervention arm in the same sequence (Appendix Table 3). As in the intervention arm, if the pill bottles have still not been synced for >14 days and >28 days, subjects will receive one call from study staff on these days to inquire about any synching issues. Patients in the control arm do not receive any other intervention.


The trial’s primary outcome is medication adherence assessed in the 6 months after randomisation (table 2). Medication adherence will be measured by averaging daily adherence (as described above) for each medication across each patient beginning the day after randomisation until 183 days after randomisation.

Table 2

Study outcomes

Secondary outcomes include change in glycaemic control as assessed using HbA1c, and self-reported adherence at the end of follow-up. HbA1c values will be collected from routine measurements recorded in the EHR system; we will use the value closest to each patient’s 6-month end of follow-up, up to 1 month after randomisation. In routine care, HbA1cs are measured approximately every 3–6 months, so we expect only modest missingness, as we have observed also in prior work.46

Self-reported adherence will be assessed as the proportion of patients who are adherent according to a validated three-item self-report measure and prior literature from the follow-up questionnaire.31 We will also descriptively measure implementation outcomes informed by the Reach, Effectiveness, Adoption, Implementation, Maintenance (RE-AIM) framework,53 including representativeness of patients in the study, text messaging opt-out rates, any feedback from patients and rates of pill bottle disconnectedness, which will inform considerations for how to scale the intervention to other settings.

Analytic plan and sample size

We will report means and frequencies of prerandomisation variables separately by intervention and control arm, comparing these values using absolute standardised differences. The outcomes will be evaluated using intention-to-treat principles among all randomised participants.

In the primary analysis, we will evaluate adherence and glycaemic control using generalised estimating equations with an identity link function and normally distributed errors. We will also adjust for the block randomised design. We do not expect any missing data for the primary outcome, but may have up to 25% missingness for the glycaemic control outcome.46 54 If >10% of participants have missing HbA1c data, we will repeat our analyses using multiple imputation.54 55 A similar approach will be taken for self-reported adherence, except using a log link function and Poisson distributed errors to generate relative risks of the proportion of adherent patients in the intervention versus control arms.56 In secondary analyses, we will control for any differences in baseline variables between the arms despite randomisation.

As a sensitivity analysis, we will censor patients in the analysis when they have stopped using the electronic pill bottle for >30 days. We will also evaluate the change in HbA1c from baseline until the end of follow-up and differences in self-reported adherence separately for the three items that make up the self-reported scale we are using. For glycaemic control (HbA1c) and self-reported adherence, we will also conduct complete case analyses. Similarly, subgroup analyses will include stratification by age, sex, race/ethnicity, baseline HbA1c, baseline self-reported adherence and number of study medications.

Our study should be sufficiently powered to detect clinically meaningful differences in the primary outcome. With 60 subjects, we estimated that we would have the power to detect a 10% difference in average adherence over the 6-month follow-up period between the two arms, assuming an SD=12.5%, power=0.8 and α=0.05. With this sample size, we would also be able to detect an HbA1c difference of 1.0% between arms (assuming SD=1.3) and 50% relative difference in self-reported adherence.54

On trial completion, we will cluster intervention patients by their response to different text message factors and evaluate the ability to predict these cluster phenotypes using baseline information before randomization. Based on prior work and the general 1:10 rule of thumb for predictor parameters, we expect to elucidate at least two unique patient characteristics for the clusters.25 57 We have further classified each text message as quantitative (ie, containing numbers), social reinforcement with specific reference to their doctor, or containing lifestyle information, which we will also incorporate as post-hoc prediction factors to evaluate responsiveness to individual texts; in total, there will be at least 5000 individual text messages sent. This exploratory prediction modelling on trial completion could provide a more accurate ‘starting point’ on which future programmes could begin to adapt to further personalise message content.

Patient and public involvement

We conducted 20 qualitative interviews with patients with type 2 diabetes at the outset of designing the trial.43 These experiences and their preferences were used to help design and inform the text messaging programme, refine the research question and outcome measures, and recruitment mechanisms. We also plan to involve patients in the dissemination plans and distribute the study results to participants in the study.


Interventions for health behaviour change appear to be most effective when tailored to the needs and behavioural tendencies of individuals. Reinforcement learning is a machine learning method that can be used to discover individuals’ patterns of responsiveness and then personalising cues accordingly. Despite its promise for improving the tailoring of communication, reinforcement learning has not yet been used to support medication-taking behaviours. Accordingly, we launched the REINFORCE trial to test the impact of a reinforcement learning-based text messaging programme on medication adherence for patients with type 2 diabetes.

Prior work using reinforcement learning indicates its early promise to improve health outcomes. For instance, in a three-arm trial testing the impact on exercise of different text messaging approaches for individuals with type 2 diabetes, reinforcement learning resulted in significantly larger improvements in daily activity and glycaemic control than non-personalised weekly or daily texting strategies. Other trials of reinforcement learning-based interventions have shown similar successes, increasing, for example, physical activity by more than 20% versus non-adaptive approaches (p<0.001).58 59 We hypothesise that ‘reinforcement learning’ has transdiagnostic implications and thus could apply to other health behaviours, such as weight loss, diet, exercise, self-monitoring and medication use for patients with diabetes, the latter of which we are testing here.

Randomised trials have demonstrated the effectiveness of text messaging to support adherence to medication.11 12 14 15 60–66 However, these approaches have only been modestly successful, possibly because they have not personalised the content and presentation (ie, framing) of the messages that patients receive.60 Text messages can be delivered at low cost and are widely available—even for patients who have difficulty accessing care—adding to their promise for improving population health, particularly if sufficiently optimised.12

There are several limitations to this study that should be acknowledged. First, while electronic pill bottles allow for a highly accurate measure of actual pill consumption,38 67 monitoring could theoretically influence adherence. While these observer effects decrease over time,37 68 to further minimise this bias, we are using electronic pill bottles in both arms. Patients may also not have an HbA1c lab value during follow-up for the secondary outcome, but we are using commonly used multiple imputation methods to address this. These findings may also not generalise to patients with prediabetes or gestational diabetes or to those without reliable access to a smartphone and/or wireless internet. The post-hoc prediction analyses may also be limited by small sample size and are exploratory. Finally, this trial does not include a ‘generic’ text messaging arm. Within pragmatic funding constraints, this design choice was motivated by our goal of testing the potential efficacy of a reinforcement learning text messaging intervention. If our trial were to find no beneficial effect compared with control, then it is highly unlikely that there would be benefit compared with generic text messaging either. Conversely, if the intervention is shown to be successful, a large trial would be required for validation, and this study would include other types of text messaging programmes.

In conclusion, the REINFORCE trial will evaluate the effect of personalising text message content to support medication adherence in type 2 diabetes using a machine learning method for patients. If the intervention is effective, this approach is expected to be tested and reproducible in other clinical environments and for a broader set of health behaviours. Regardless of outcome, the trial will also provide insight into how reinforcement learning could be adapted at scale to improve other self-management interventions.

Ethics and dissemination

The trial is approved by the institutional review board of Mass General Brigham and registered with (NCT04473326) (see Trial Protocol attachment). The authors will be responsible for performing the study analyses, writing the first draft of the manuscript, substantive edits and submitting its final contents for publication. Data analysts at the end of the study will be blinded to arm assignment; patients are not blinded due to the nature of the interventions. No data monitoring committee was deemed necessary by the human subjects’ oversight boards. Findings will be disseminated through peer-reviewed journals, reports to the funding organisation and and scientific conferences. Study data will be made available pending appropriate agreements given the nature of the human subjects’ data.

Ethics statements

Patient consent for publication


The authors wish to thank the Digital Care Transformation team at Brigham and Women's Hospital, the team responsible for managing Microsoft Dynamics 365 SMS Texting, and other individuals who helped establish the Microsoft Azure setup.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Twitter @jlauffen

  • Contributors JCL had overall responsibility for the trial design and drafted the trial protocol and manuscript. NKC, as co-principal investigator, had overall responsibility for the trial design and trial protocol and helped draft the trial protocol and manuscript. EY-T, PAK, MED, LGB, CPF, ESS, EK, KH, JJB, RAB and NH contributed meaningfully to trial or intervention design and implementation as well as the manuscript. All authors contributed to the refinement of the study protocol and approved the final manuscript.

  • Funding Research reported in this publication was supported by the National Institute on Aging of the National Institutes of Health under Award Number P30AG064199 to BWH (Choudhry PI). Dr Lauffenburger was supported by a career development grant (K01HL141538) from the NIH.

  • Disclaimer The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

  • Competing interests EYT is an employee of Microsoft. RAB is now an employee at Vytalize Health. There are no other reported competing interests.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.