Article Text

Download PDFPDF

Can purchasing information be used to predict adherence to cardiovascular medications? An analysis of linked retail pharmacy and insurance claims data
  1. Alexis A Krumme1,
  2. Gabriel Sanfélix-Gimeno2,
  3. Jessica M Franklin1,
  4. Danielle L Isaman1,
  5. Mufaddal Mahesri1,
  6. Olga S Matlin3,
  7. William H Shrank3,
  8. Troyen A Brennan3,
  9. Gregory Brill1,
  10. Niteesh K Choudhry1
  1. 1Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA
  2. 2Center for Public Health Research (CSISP-FISABIO) and REDISSEC, Valencia, Spain
  3. 3CVS Health, Woonsocket, Rhode Island, USA
  1. Correspondence to Dr Niteesh K Choudhry; nkchoudhry{at}


Objective The use of retail purchasing data may improve adherence prediction over approaches using healthcare insurance claims alone.

Design Retrospective.

Setting and participants A cohort of patients who received prescription medication benefits through CVS Caremark, used a CVS Pharmacy ExtraCare Health Care (ECHC) loyalty card, and initiated a statin medication in 2011.

Outcome We evaluated associations between retail purchasing patterns and optimal adherence to statins in the 12 subsequent months.

Results Among 11 010 statin initiators, 43% were optimally adherent at 12 months of follow-up. Greater numbers of store visits per month and dollar amount per visit were positively associated with optimal adherence, as was making a purchase on the same day as filling a prescription (p<0.0001 for all). Models to predict adherence using retail purchase variables had low discriminative ability (C-statistic: 0.563), while models with both clinical and retail purchase variables achieved a C-statistic of 0.617.

Conclusions While the use of retail purchases may improve the discriminative ability of claims-based approaches, these data alone appear inadequate for adherence prediction, even with the addition of more complex analytical approaches. Nevertheless, associations between retail purchasing behaviours and adherence could inform the development of quality improvement interventions.


This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:

Statistics from

Strengths and limitations of this study

  • Using a unique data set combining prescription claims data with retail purchase data, this study aims to improve adherence prediction accuracy where traditional claims-based approaches have typically performed poorly.

  • This study uses both investigator-specified and empirical predictor selection to evaluate and maximise prediction accuracy.

  • This study comprises complete dispensing information for a patient and so minimal misclassification of the outcome and other medication use covariates is expected.

  • Members with loyalty cards may not use them at every purchase, and these data do not capture retail purchases patients make at non-CVS Pharmacy locations.


Suboptimal adherence to evidence-based medications for coronary artery disease and other chronic conditions can lead to potentially avoidable morbidity, mortality and health spending.1 ,2 The accurate prediction of who will ultimately become non-adherent could help identify which patient groups would most benefit from interventions to maintain adherence.3–5 Traditional approaches to prediction generally use insurance reimbursement claims data from commercial or government-sponsored insurance programmes to generate clinical and demographic variables measured at or prior to treatment initiation. Other methods to identify non-adherence rely on patient self-report, pill counts or electronic pill monitors, but are not useful for identifying non-adherence before it occurs.6 ,7 While many of the variables available in insurance databases are correlated with adherence, for example, age and race, they provide weak discrimination between adherers and non-adherers.8–12

Good adherence has repeatedly been linked to other healthy behaviours, such as greater use of preventive healthcare services, influenza vaccination and compliance with recommended screening programmes through a phenomenon known as the ‘healthy user effect’.13–15 Thus, the use of richer data sources that capture activities reflecting such healthy attributes and behaviours might improve adherence prediction over approaches using healthcare claims alone. Such rich ‘big data’ sources have had success in predicting similarly complex individual behaviours. For example, mobile phone data have been used to describe individual mobility patterns and infer friendship networks, while transactional data from credit and debit cards have been used to predict consumption patterns.16–20

The extent to which enrichment of insurance claims data with in-pharmacy purchasing information may improve the prediction of medication filling and refilling behaviours has not been evaluated until now. Using a unique data set combining prescription claims data with retail purchase data of individuals at CVS Pharmacy locations, we explored the association between purchasing behaviours in the year prior to initiation of a statin medication and adherence in the subsequent year. Our approach employed a combination of investigator-specified purchasing predictors and those selected by a high-dimensional propensity score (hd-PS) algorithm.21 ,22 We additionally compared logistic regression results to those enhanced with boosting techniques, a machine-learning method that identifies non-linear associations between predictors and outcome or deep interactions among predictors, which may be useful for modelling behaviour as complex as medication adherence.23 ,24 Enhanced predictive ability from this linked data set could improve targeting of patients who stand to benefit from adherence-improving interventions, while the co-occurrence of in-pharmacy purchasing and filling behaviours could lead to the development of novel interventions.


Data source

We used a data set linking prescription claims and retail purchase data for individuals receiving prescription medication insurance benefits through CVS Caremark, a pharmacy benefits manager that provides coverage to more than 65 million members annually making purchases at CVS Pharmacy stores using a CVS Pharmacy ‘ExtraCare Health Care’ (ECHC) loyalty card. Retail purchasing data consisted of item-level data on each purchase, including date of visit, dollar amount paid, quantity, and whether the product was on sale or a CVS brand item. Data also included information from CVS Pharmacy's internal product hierarchy, a system for classifying products according to merchandise group (eg, consumer health), category (eg, pain relievers) and subcategory (naproxen). Prescription claims data included information on all paid prescription claims including the brand and generic drug name, dose, days supply, patient paid amount and insurer paid amount from all prescription claims, as well as patient gender, age group and region of residence. Data on education level, median income and racial distribution in a zip code, derived from 2010 Census data by CVS Health, were additionally provided.

All purchasing data were linked to prescription claims data based on a unique masked patient identifier. To safeguard the anonymity of the participants included in the analysis, the linked data set used by the research team did not contain any direct patient identifiers. The Institutional Review Board of Brigham and Women's Hospital approved this study.

Study population

The study population consisted of patients who received prescription medication benefits through CVS Caremark, a pharmacy benefits manager that serves over 65 million individual members nationwide who also received a CVS Pharmacy ECHC loyalty card from their health plan. Whereas members are enrolled individually, loyalty cards do not require identification to be used and thus can be shared, typically within a household.

From this cohort, we identified individuals newly initiating a statin or statin combination drug between 1 January 2011 and 31 December 2011. A patient's index date was defined as the date of the first prescription filled during the study window. Patients were required to have continuous insurance enrolment for 180 days before the index date and 1-year postindex (figure 1). New use was defined as not having any statin filled in the 180 days prior to the index date. Finally, we restricted our cohort to patients with at least three unique store visits in which the ECHC card was used in the year prior to the first statin prescription to enrich the data set from which purchasing predictors were drawn.

Figure 1

Study design and data sources. ECHC, ExtraCare Health Care.

Adherence measures

We measured medication adherence using prescription fill data from prescription claims. By stringing together all paid fills, we created a statin ‘supply diary’ that indicates whether on each day for the year after the index date patients had a statin medication available to them. From this diary, we calculated a proportion of days covered (PDC), defined as the number of days with medication available divided by 365. We defined ‘full adherence’ as a PDC≥0.8, which corresponds to the level of use above which patients with coronary artery disease derive clinically relevant benefit from statins25 and the threshold employed by most quality measures.26 ,27

Adherence predictors

Investigator-specified clinical and demographic variables: We defined nine clinical characteristics based on prescription claims incurred during the 180 days before the index date (table 1). Clinical characteristics included the number of unique medications taken and the presence of maintenance medications likely to influence adherence to a cardiovascular medication, including antihypertensive, antiplatelet, oral antidiabetic, chronic obstructive pulmonary disease and asthma, heart failure, antidepressant, antiarrhythmic, anticoagulant, and osteoporosis medications. Characteristics of the index prescription included whether or not it was a generic (as compared with a branded medication) and the dollar amount the patient must pay at the pharmacy (copayment amount). Demographic information included age group and sex. Information at the zip code level included race, education and median household income.

Table 1

Cohort baseline characteristics

Investigator-specified purchasing variables: Using the retail purchasing data, we defined 12 variables that we hypothesised would be associated with adherence. These variables were developed based on a manual review by a team of clinicians and epidemiologists of the purchasing data for a random sample of 100 patients, 50 of whom were fully adherent to their statin and 50 of whom were not. Our team first discussed candidate variables. We subsequently refined potential predictors while blinded to the patients' observed adherence and then using patients' observed adherence to calibrate how the variables were defined. In all instances, we used descriptors from the internal product hierarchy, consisting of merchandise group, category and subcategory to assign individual products. After the final list of variables was agreed on, they were generated for the entire cohort for use as potential predictors.

The final list of investigator-specified behaviours is presented in table 2. Behaviours 1, 2 and 3 were based on products within a given purchasing category. For example, we classified each product listed in the ‘food’ merchandise group as being ‘healthy’ or ‘unhealthy’ based on product descriptions, hypothesising that unhealthy foods would be associated with worse adherence. Similarly, we classified all products listed in the ‘consumer healthcare’ merchandise group as being for preventive or symptomatic purposes, or both. Behaviours 4 and 11 recorded the number of visits in which any product in the ‘consumer healthcare’ merchandise group was purchased. Behaviours 5–9 used fields outside of the product hierarchy to describe the amount of money spent, whether products were purchased on sale, and the number of monthly store visits with a purchase. Behaviour 10 used retail purchasing data in conjunction with 2010 Census data derived from patient zip code of residence, hypothesising that a higher burden of drug costs relative to neighbourhood income might influence adherence. Behaviour 12 additionally used purchasing and prescription drug data in the 180 days prior to statins initiation to observe whether filling coincided with a retail purchase, hypothesising that patients with co-filling would be more likely to be adherent.

Table 2

Retail purchasing behaviour association with statin adherence

High-dimensional propensity score approach: To more completely use all of the available data for adherence prediction without any a priori hypotheses, we applied a hd-PS variable selection algorithm to the retail purchasing data and prescription claims data incurred during the 180 days prior to statin initiation.21 ,22 When applied to prescription claims, the hd-PS algorithm creates binary variables that indicate the frequency of each unique medication. When applied to purchasing data, the algorithm indicates the frequency of each product subcategory. The hd-PS algorithm can screen thousands of variables for an empirical association with a study exposure, and ranks variables with the expectation that a large number of highly ranked variables can collectively proxy for unmeasured confounders. Equivalently, since our goal was prediction rather than confounder adjustment, we used the hd-PS approach to identify a large collection of variables that could be proxies for underlying behavioural constructs that may be predictive of adherence as has been done previously.28 We ran hd-PS on two sets of data: first, using all prescription claims data available and second, using all products in CVS Pharmacy's retail product hierarchy, in each case selecting the 200 variables with the strongest associations with adherence.

Statistical analyses

We first evaluated associations between the 12 investigator-specified purchasing variables and optimal adherence during follow-up using logistic regression models. We measured univariable associations as well as a multivariable model adjusting for all purchasing variables, two prespecified interaction terms, and demographic covariates to observe the magnitude and direction of each effect after accounting for other characteristics.

To compare the discriminative ability of different groups of adherence predictors, we estimated six models. In models with hd-PS predictors, the models included predictors selected by the algorithm as model covariates. These models were evaluated with respect to their ability to discriminate between patients who were and were not adherent during the year after initiation, as measured by the C-statistic. This measure ranges from 0.5 to 1.0, corresponding to a completely non-informative model and perfect prediction, respectively.29 To avoid the ‘overoptimism’ bias associated with evaluating model prediction accuracy in the same data that were used to estimate the model, we performed 10-fold cross-validation.30 Each model was predicted using both logistic regression and generalised boosting regression, a non-parametric data-mining technique capable of fitting highly predictive models with deep interactions among predictors.23 Logistic regression models were estimated using SAS (SAS, V.9.3, Cary, North Carolina, USA); all models were re-estimated using the generalised boosting algorithm, as implemented in the R package gbm.11 To identify the best-performing model(s), we compared the cross-validated C-statistic across all logistic and boosted regression models. Finally, we re-estimated our six prediction models with the addition of retail purchase predictors from the first 3 months after initiation to observe whether changes in retail patterns during this period might provide additional improvement in adherence prediction.


Patient characteristics

The cohort consisted of 11 010 patients initiating a statin during the study period and who used their loyalty card on at least three visits prior to initiation (figure 2). Included versus excluded patients were similar in age and gender composition (59.3 vs 57.8 years, and 40.3% vs 39.6% female), as well as regional distribution, though included patients had a higher proportion of patients from the Northeast (28% vs 18%). Patients tended to be male (60.4%) with an average age of 59 years (table 1). Simvastatin (42.1%) and atorvastatin (22.5%) were the most commonly used statins. More than half of the patients (55.4%) were concomitantly taking an antihypertensive while one-fifth (20.2%) were taking an oral antidiabetic at the time of statin initiation. On average, patients filled prescriptions for more than five unique medications in the year prior to index. At 12 months, 4691 (42.6%) of patients were optimally adherent to their statin medication.

Purchasing patterns

Results from investigator-specified purchasing variables are presented in table 2. The median number of monthly visits was 0.9 (IQR 1.2). During these visits, patients purchased a median of 5 (IQR 4) items, and spent a median of $15 (IQR $11). These visits included an unhealthy food product on average 28% of the time, while on average nearly half (45%) included the purchase of a consumer healthcare product. The median patient purchased a symptomatic consumer healthcare product on 33% of visits (IQR 31%), and a preventive consumer healthcare product on 18% of visits (IQR 33%).

Nearly two-thirds (63.6%) of patients made a store purchase on the same day as a prescription fill in the 180 days prior to index. Of all the items purchased during the study period, a median per cent of 35 (IQR 25) were on sale.

Relationship between potential predictors and adherence

In univariable analysis, greater numbers of visits per month and dollar amount per visit were positively associated with adherence, as was making a purchase on the same day as filling a prescription and a higher proportion of visits with preventive and symptomatic healthcare purchases. In contrast, unhealthy purchases and higher copayment amounts relative to median income in zip code of residence were associated with lower adherence; the latter association was additionally observed within strata of median income in zip code (<$50 000; $50 000–$100 000; >$100 000). While the proportion of visits with a CVS brand product was significantly associated with adherence, the per cent with an item on sale was not (p<0.0001 vs p=0.78). Higher statin copayments relative to median income in zip code of residence were associated with worse adherence (p<0.0001). Results from multivariable analyses adjusted for demographic and all purchasing behaviour variables, as well several multiplicative interaction terms, were similar.

Prediction models

The results of prediction models are shown in table 3. Both models using purchase variables, one with investigator-specified variables and the other with variables selected by the hd-PS, had similar predictive ability (C-statistic for both: 0.563). Investigator-specified clinical variables achieved a C-statistics of 0.599, while models with both clinical and retail purchase variables had slightly greater discriminative ability (C-statistic: 0.617). Generalised boosting regression improved prediction slightly in five out of six models; the highest C-statistic achieved was 0.621 in the model with clinical and retail purchase variables. Results from logistic regression models incorporating retail purchase data from the 3 months after initiation were similar (see online supplementary appendix A).

Table 3

Model prediction


One of the central challenges in predicting adherence is the multifaceted nature of adherence itself. Adherence is the result of complex interactions between patients, providers and the healthcare system, as well as of behavioural characteristics unique to patients themselves.8 ,31–33 It is, therefore, not surprising that efforts to predict adherence using insurance claims data have generally yielded disappointing results.12 ,28 ,34 Pharmacies also have access to data about retail purchases for goods such as snacks and beverages, beauty products, and over-the-counter medicines that patients make at the ‘front of the store’. Accordingly, we hypothesised that retail purchasing transactions, confidentially linked to prescription claims, could be a rich source of data to capture adherence-related behaviours that are otherwise not observable using routinely collected data. In a cohort of more than 11 000 statin initiators, 43% of whom were optimally adherent at 12 months, we found that several variables generated with these data tend to be associated with long-term medication use. In specific, after multivariable adjustment for all other predictors, more monthly visits, a great dollar amount spent per visit, a greater proportion of items that were store-branded, and making a purchase and filling a prescription on the same visit were significantly associated with optimal adherence to statin medications in the year after initiation. While small in magnitude, these associations nonetheless suggest that greater retail purchase activity, in the form of store visits and amount of money spent, correlates with adherence, as does the purchase of lower priced equivalent store brand products, which could be a manifestation of the well-documented ‘healthy user’ effect.

While the use of retail purchasing data may improve the discriminative ability of traditional claims-based approaches, these data alone appear inadequate for adherence prediction, even with the addition of more complex analytical approaches, including the use of a hd-PS algorithm and boosting techniques. There are several potential reasons for this apparent discrepancy. Most obviously, retail transaction data may not actually sufficiently capture true underlying constructs reflective of medication adherence. Just as adherence behaviours cannot be explained by health system use and disease state information as captured in insurance claims data, retail purchase data that tabulate the frequency, quantity and types of purchases made may not encode patterns or individual behaviours that accurately discriminate adherers from non-adherers.28 Additionally, information on personal behaviours such as diet and exercise, structural barriers such as language and health literacy, and degree of social support, which was not available in our analysis, may be critical to adherence prediction.

Several methodological features may have also impeded our ability to predict adherence. By using purchases made prior to statin initiation, we did not observe how purchasing behaviours may change over time following statin initiation, which in turn may help predict a patient's longitudinal adherence pattern. We repeated our analyses including retail purchases made in the first 3 months after initiation as potential predictors and found very similar model discriminative ability. This suggests that postinitiation variables may not be informative. Additionally, the structure of the data set may have reduced our ability to model the strongest predictors of medication adherence. Investigator-specified purchasing variables leveraged only a brief product description available in the data, while the predictors selected by the hd-PS algorithm had a median prevalence of only 1%. A richer data set with greater information on product and pharmacy characteristics including location, combined with an analytical approach that includes time-varying prediction or network analysis, may better capture and characterise hypothesised behaviours and thus achieve better adherence prediction. Whereas some investigator-specified purchasing variables may have had an overlap in the products classified (eg, CVS branded products tend to be mostly healthcare-related), this would not have affected our models' discriminative ability.

Nevertheless, several of the observed associations between retail purchasing behaviours and adherence could be useful for the development of new quality improvement interventions by pharmacies or payers. For example, the association between greater frequency of visits as well as retail purchasing on the same day as a medication fill and adherence may be markers of convenience of the pharmacy, an individual's level of comfort in the store, or an underlying routine that adherent patients develop around pharmacy visits. In this way, identifying patients with irregular store visits and with store visits that do not coincide with a medication fill may represent an opportunity to link medication taking with other daily activities such as shopping. Specific interventions could target patients taking medications for chronic disease who have a high frequency of unhealthy food purchases for interventions tailored to adherence and dietary modification.

Several limitations to our study should be acknowledged. Members with loyalty cards may not use them at every purchase, and these data do not capture retail purchases patients make at non-CVS Pharmacy locations. Additionally, other family members may use an individual's loyalty card. The former would result in an underestimate of store visit frequency, the latter in an overestimate, and both would result in misspecification of other variables in either direction, which, from a prediction perspective, would reduce model discrimination. Finally, behaviours of individuals who have obtained and regularly use a loyalty card may not be generalisable to all individuals using both the retail and pharmacy sections of the store, nor to patients who stopped being covered by Caremark during follow-up.

In summary, we found several retail purchasing behaviours to be associated with adherence to statins in the year after initiation, some of which might be useful for the development of new quality improvement interventions by pharmacies or payers. In contrast, the use of retail purchasing data similar to those to which we had access appears not to meaningfully add to efforts to discriminate adherers from non-adherers.



  • Contributors All authors had access to study results and contributed meaningfully to the analysis. AAK, NKC, GS-G and JMF contributed to the study conception and design, interpretation of results, and manuscript drafting. GB prepared and analysed the data. DLI, WHS, TAB and OSM provided interpretation of results and critical manuscript revisions.

  • Funding This work was supported by an unrestricted grant from CVS Health to Brigham and Women's Hospital. All data analyses and outcomes assessment were performed independently of the study sponsor. WHS, TAB and OSM are employees of CVS Health and own stock in the company.

  • Competing interests TAB, OSM and WHS are employees of CVS Health and own stock options in the company.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement Technical appendix, study protocol and statistical code available from researchers on request.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.