Article Text

Cohort profile
US veterans administration diabetes risk (VADR) national cohort: cohort profile
  1. Sanja Avramovic1,2,
  2. Farrokh Alemi1,
  3. Rania Kanchi3,
  4. Priscilla M Lopez3,
  5. Richard B Hayes3,
  6. Lorna E Thorpe3,
  7. Mark D Schwartz2,3
  1. 1Health Administration and Policy, George Mason University, Fairfax, Virginia, USA
  2. 2VA New York Harbor Healthcare System, New York, New York, USA
  3. 3Department of Population Health, New York University School of Medicine, New York, New York, USA
  1. Correspondence to Sanja Avramovic; savramov{at}


Purpose The veterans administration diabetes risk (VADR) cohort facilitates studies on temporal and geographic patterns of pre-diabetes and diabetes, as well as targeted studies of their predictors. The cohort provides an infrastructure for examination of novel individual and community-level risk factors for diabetes and their consequences among veterans. This cohort also establishes a baseline against which to assess the impact of national or regional strategies to prevent diabetes in veterans.

Participants The VADR cohort includes all 6 082 018 veterans in the USA enrolled in the veteran administration (VA) for primary care who were diabetes-free as of 1 January 2008 and who had at least two diabetes-free visits to a VA primary care service at least 30 days apart within any 5-year period since 1 January 2003, or veterans subsequently enrolled and were diabetes-free at cohort entry through 31 December 2016. Cohort subjects were followed from the date of cohort entry until censure defined as date of incident diabetes, loss to follow-up of 2 years, death or until 31 December 2018.

Findings to date The incidence rate of type 2 diabetes in this cohort of over 6 million veterans followed for a median of 5.5 years (over 35 million person-years (PY)) was 26 per 1000 PY. During the study period, 8.5% of the cohort were lost to follow-up and 17.7% died. Many demographic, comorbidity and other clinical variables were more prevalent among patients with incident diabetes.

Future plans This cohort will be used to study community-level risk factors for diabetes, such as attributes of the food environment and neighbourhood socioeconomic status via geospatial linkage to residence address information.

  • general diabetes
  • health informatics
  • epidemiology

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • A strength of this national cohort is that it has a large size, a high degree of long-term follow-up, and a comprehensive set of variables.

  • The veterans administration (VA) healthcare system is the nation’s largest integrated healthcare system, in which veterans are followed across all VA facilities and in-system providers.

  • Data are restricted to that which is collected in electronic health records during the course of clinical practice, leading to the possibility of confounding, selection bias and measurement error.

  • The veteran population is predominantly male and white, findings may not generalise to minorities or to women.


Diabetes mellitus (diabetes) is a chronic disease that affects 34.2 million (10.5%) of adults and children in the USA.1 As of 2018, diabetes was the seventh leading cause of death and one of the major contributors to heart disease and stroke.2 Adjusting for age and gender, all-cause mortality is 1.5 times greater for people with diabetes than for people without diabetes, and average healthcare costs are 2.3 times higher.3 Another 88 million American adults (34.5%) are estimated to have pre-diabetes and at risk of developing diabetes.4

The veterans administration (VA) cares for more than 8 million US veterans, of whom approximately 25% have diabetes.5 6 The annual mortality rate among veterans with diabetes is 5%—nearly double that of veterans without diabetes.7 8 It is likely that nearly 3 million other veterans have pre-diabetes. These high rates compared with the general population may be due to the increased proportions of overweight (37%) and obesity (41%) among veterans,9 their older age, lower socioeconomic status10 and possible exposure to herbicides such as Agent Orange.11

Behavioural prevention interventions can reduce the incidence of diabetes by 50%–70%,12 13 but scaling this up for population impact has been challenging due to the intensity and cost of the intervention and challenges of enrolling patients for such programmes.14–17

In response to these challenges, we developed the veterans administration diabetes risk (VADR) cohort, a national cohort of all US veterans enrolled at the Veterans Health Administration since 1 January 2008 who were diabetes-free at enrollment. The cohort was developed as a part of the Diabetes Location, Environmental Attributes and Disparities network; a Center for Disease Control and Prevention (CDC) funded research collaboration among Drexel University, Geisinger-Johns Hopkins, New York University School of Medicine and University of Alabama at Birmingham with the CDC as a collaborative scientific partner in the network.18

The VADR cohort facilitates studies on temporal and geographic patterns of pre-diabetes and diabetes, as well as targeted studies of their predictors. For example, the cohort currently provides the infrastructure for the nationwide study examining community-level risk factors for diabetes incidence and management among veterans described above. This cohort also establishes a baseline against which to assess the impact of national or regional strategies to prevent diabetes in veterans. It also provides an analytic cohort to examine the dynamic relationship between the COVID-19 pandemic and diabetes outcomes.

Cohort description

VADR is the largest national cohort of diabetes-free adults in the USA. Established in 2017 as a dynamic cohort enabled by the VA national electronic health record (EHR), the cohort includes diabetes-free US veterans enrolled in primary care clinics at any VA facility as early as 1 January 2008 through 31 December 2016, and followed from cohort entry through 31 December 2018. VA primary care clinics operate in 170 VA Medical Centers and in more than 1000 Community-Based Outpatient Clinics across the USA.19 As a dynamic cohort, subject follow-up is ongoing, but this paper reports on the cohort from 1 January 2008 through 31 December 2018.

Building on published, validated criteria in EHRs,7 20 we defined diabetes using the following query-based definition comprised of any of three criteria: (1) at least two encounters (inpatient or outpatient) with documentation of a type 2 diabetes ICD-9/10 code (ICD-9: 250.x; ICD-10: E11.x) or (2) a documented prescription for a diabetes medication other than metformin or acarbose alone; or (3) at least one encounter with a diabetes ICD-9/10 code and two elevated (≥6.5%) glycosylated haemoglobin (Hgb A1C) laboratory test results (see online supplemental appendix 1 for complete definition).21 We excluded metformin or acarbose alone from the criteria because these drugs may be used for diabetes prevention in patients with pre-diabetes; including them may lead to misclassifying cases of pre-diabetes as diabetes.22 23 This definition for incident diabetes was used to exclude prevalent diabetes cases prior to cohort entry and to estimate diabetes incidence during the study period.

For the analytic cohort, subjects were eligible if they were veterans with at least two diabetes-free visits to a VA primary care service, occurring at least 30 days apart, from 1 January 2003 to 31 December 2016. Cohort entry (baseline) was defined as either 1 January 2008 or the date of the second diabetes-free primary care visit for subjects entering after 1 January 2008. Eligible subjects were allowed to enter the cohort through 31 December 2016 to allow at least 2 years of follow-up during which subjects may be diagnosed with diabetes. Subjects were censored when they developed diabetes, died or were lost to follow-up (defined as having no encounters in the VA health system for more than 2 years). Once a patient was lost-to-follow-up, they were not eligible to re-enter the cohort. Encounters for follow-up included any visits to primary care, specialists, emergency departments, walk-in clinics, hospitalisations or nursing home stays at any VA facility. Person-years (PY) of follow-up for each subject were calculated as the interval between cohort entry date and censor date.

As shown in figure 1, the cohort was developed from a base total population of 8 346 180 patients seen for at least one primary care visit between 1999, the earliest year for which EHR data were available on patients and the start of the study period. The cohort was then restricted to patients seen in the 5 years prior to the study period start date, 1 January 2008. Patients were excluded if they had fewer than two primary care visits, at least 30 days apart during that 5-year time period and less than two primary care visits after cohort entry. After excluding patients with prevalent diabetes, the initial diabetes-free cohort included 2 968 763 patients. Another 3 113 255 diabetes-free patients met the same eligibility criteria after the start of the study period and entered the cohort between 1 January 2008 and 31 December 2016, resulting in a diabetes-free cohort of 6 082 018 patients.

Figure 1

Cohort flow diagram of diabetes-free cohort of US veterans, 2008–2016. EHR, electronic health record; LFU, lost to follow-up; VA, veterans administration.

Information on subjects in the cohort was updated daily as it was drawn from EHR at all VA facilities into the VA corporate data warehouse (CDW), based on all clinical services provided and documented by the VA to subjects over time. All data in the cohort were obtained through the VA Informatics and Computing Infrastructure (VINCI), a secure, high-performance interface with VA’s national CDW, available through VA’s Information Resource Center.24 The CDW contains data integrated from VA’s electronic medical record (Veterans Health Information Systems and Technology Architecture), including all administrative data (eg, all dates of encounters and diagnostic codes for outpatient and inpatient care), patient demographic characteristics, clinical data (eg, vital signs, health factors, pharmacy, laboratory, radiological, clinical notes) and healthcare utilisation factors as they accrue over time, as the CDW is refreshed daily.25

The main outcome variable was a new diagnosis of type 2 diabetes, measured using the definition described earlier.

Predictor variables and covariates

All continuous variables with repeated measures, including anthropomorphic, vital signs and laboratory values, were defined as the average of the two most recent measures, prior to or at the time of cohort entry. If only one measure was taken prior to cohort entry, that was used as the baseline measure. The rate of missing data for all variables was measured.

Demographic measures were captured at baseline, including age, gender, marital status and race/ethnicity. First address on file per patient in cohort was exported out of the VINCI environment, geocoded using ArcGIS26 and Python,27 and mapped to show number of patients in the cohort per census tract using QGIS.28

Glycaemia and body weight are important predictors of diabetes. We measured Hgb A1c as a continuous value, and classified as normal (<5.7%), pre-diabetes (5.7% to 6.4%) or diabetes (>6.5%). We measured weight in pounds and body mass index (BMI), defined as (weight in kilograms)/(height in metres).2 BMI was also classified as underweight (<18.5); normal (18.5 to <25); overweight (25.0 to <30) and obese (>30.0).29

Common comorbidities measured at baseline included established risk factors for diabetes such as obesity, hypertension, gestational diabetes, cardiovascular disease, chronic kidney disease, hyperuricaemia, fatty liver disease, polycystic ovary syndrome and hepatitis C. These and all other comorbidities were defined as having at least 1 ICD code in the EHR prior to entering the cohort. Hyperlipidaemia was defined as at least two encounters with ICD codes for hyperlipidaemia, total cholesterol >240 mg/dL or the use of lipid lowering medications.30 Hypertension was defined as at least ICD code for hypertension or at least two consecutive elevated blood pressure (BP) within the last 2 years prior to cohort entry.31 32 Elevated BP was included as ≥130/80 and ≥140/90, respectively, to comply with changes in hypertension guidelines over the course of the study period.33 34

Other clinical variables potentially related to diabetes incidence included BP (excluding those measured in the hospital, emergency department or at night); lipids (total cholesterol, high density lipoprotein, low density lipoprotein and triglycerides); hepatic transaminase enzymes (serum aspartate aminotransferase—AST or SGOT—and alanine aminotransferase—ALT or SGPT); renal function (measured as estimated glomerular filtration rate—eGFR); smoking status: (obtained from health factor files within CDW at cohort entry, classified as current, ever or never smokers) and agent orange exposure (obtained from the number of veterans with agent orange listed as a health factor in the medical record).11 Besides this select list, all documented diagnoses and treatments are available for the cohort.

Access to cohort data

Access to VA EHRs data is limited to researchers with active, VA appointments and an IRB-approved protocol. Once a researcher has a VA appointment and IRB approval, the VA has a comprehensive data infrastructure to support secure and remote access to data via the VINCI platform. Additionally, deidentified data sets can be established and shared with appropriate IRB approval and data use agreements. The authors encourage collaborations to leverage this cohort to examine how national or regional natural experiments may be related to diabetes incidence or diabetes outcomes.

Findings to date

The total PY for this national cohort with 6 082 018 veterans from all 50 states was 35 889 183 (median 5.5 PY, IQR: 2.6–9.8). As shown in table 1, the mean age of the cohort was 58 years at baseline, 36.4% were 65 or older, most were male (91.7%), more than two-thirds were non-Hispanic white (74.8%), 16.3% were non-Hispanic black and 6.1% were Hispanic. The majority (55.2%) were married or living with a partner.

Table 1

Cohort demographics and clinical characteristics at cohort entry by incident diabetes status

At baseline, the average Hgb A1C was 5.7% among the 40.7% of the cohort tested at entry, and of these, 41.5% had an Hgb A1C in the pre-diabetes range. The average weight was 196.9 pounds and average BMI was 28.8 (SD 5.4). At baseline, 40.6% were overweight and 36.1% were obese. Traditional clinical risk factors for diabetes were common in this cohort as 46.1% had hypertension, 44.1% had hyperlipidaemia and 42.6% were smokers. Other clinical risk factors for diabetes included ischaemic heart disease (16.4%), peripheral vascular disease (4.2%), heart failure (3.0%) and chronic kidney disease (2.5%). Most of these risk factors were present at baseline at higher rates among those who developed diabetes compared with those who did not during cohort follow-up.

Figure 2 shows the number of subjects in the cohort over time, from inception 1 January 2008 through 31 December 2018. Almost half (48.8%) of the cohort entered at cohort inception in 1 January 2008, with the remainder entering during the study period through 31 December 2016. During cohort follow-up, 936 596 (15.4%) veterans developed diabetes, for an incidence rate of 26 per 1000 PY. Additionally, 518 489 (8.5%) were lost to follow-up, and 1 077 572 (17.7%) died during the study period.

Figure 3 shows the geographic distribution of the number of patients per tract. The majority of addresses were able to be geocoded (89%); of those not geocoded, about half were PO boxes and the other half were missing. The majority of census tracts had between 20 and 80 patients.

Figure 2

Cohort trends, with cumulative numbers and percentage of patients, 2008 through 2018. PY, person-years.

Figure 3

Geographic distribution of VADR cohort. VADR, veterans administration diabetes risk.

Because cohort data were drawn from the VA EHR, which depends on documentation of services provided, some subjects had missing values for some variables at baseline. For example, the percentage of missing or unmeasured variables at cohort entry were: gender (<0.01%); race/ethnicity (10.1%); marital status (7.5%); BMI (4.3%) and Hgb A1C (59.3%). The missing race/ethnicity variable in VA data is widely known.35 Screening for diabetes with Hgb A1c became more common after the recommendation was published in 2009.36

Strengths and limitations

A primary strength of this national cohort is its large size and long-term follow-up. The cohort includes a comprehensive set of demographic, anthropomorphic, clinical, treatment and other administrative variables, drawn from all inpatient and outpatient encounters, each of which are automatically updated over time. In addition to the select comorbidities identified in this paper, the cohort includes data related to all comorbidities. Future work will include calculation of a multimorbidity index to measure the impact of medical history on emergence of diabetes.37

As the nation’s largest, integrated healthcare system, the VA follows veterans across all VA facilities, even after moving and changing VA facilities or providers within the system. Additionally, data on veterans who are Medicare or Medicaid beneficiaries and seek healthcare outside of the VA will be included by merging the study cohort with data from the Centers for Medicare and Medicaid. Finally, home addresses are available and were geocoded in order to study the effect of community level characteristics and the impact of moving over time on incident diabetes in future work using this cohort.

The cohort has a few limitations. It relies on data documented during the course of clinical practice in EHRs and thus causal inferences face difficulties associated with unmeasured confounding, selection bias and measurement error. Selection biases may arise as lower healthcare utilisers are more likely to be lost to follow-up or excluded, and higher utilisers may be more likely to meet criteria for key exposure and outcome variables. This is partially mitigated by the several-year, longitudinal follow-up.

The veteran population is predominantly male and white, and so the findings may not generalise to minorities or to women. Nonetheless, our large cohort ensures a sufficient and growing sample of women veterans (504 002) and patients from major ethnic/racial groups (889 465 NH black veterans, 331 817 Hispanic veterans), providing the ability to study diabetes incidence among these subgroups and improving the generalisability of our findings to non-veteran populations.


The VADR cohort is an important example of how large retospecitve cohorts can be developed using EHRs, designed with methodologic and statistical approaches to increase generalisability and validity. The benefits of such large cohorts are that they can offer more information and ability to examine associations in substrata than smaller cohorts. Follow-up is ongoing and presented here through 31 December 2018. While the main outcome of interest was incidence of type 2 diabetes in this cohort, the infrastructure is well suited to support studies of diabetes management and management of other chronic conditions using incident cases of diabetes, particularly as retention has been shown to be good. During the study period, only 8.5% of the cohort were lost to follow-up and 17.7% died. Additional methodologic work is needed to address biases unique to EHR-based observational studies, including cohort selection bias and non-ignorable missing data.

Patient and public involvement

This cohort study was conducted without engagement or coproduction by patients or the public.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Twitter @DrAlemi

  • Contributors SA has full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. Concept and design: MS, LET, RH, SA and FA. Acquisition, analysis or interpretation of data: SA, PL, RK and MS. Drafting of the manuscript: SA, MS, RK, PL and LET. Critical revision of the manuscript for important intellectual content: all authors, SA, FA, RK, PL, RH, LET and MS. Statistical analysis: RK. All authors, SA, FA, RK, PL, RH, LET and MS, attest that they meet the full authorship criteria.

  • Funding This study was funded by the Centers for Disease Control and Prevention (5 U01DP006299-02-00; PI: LET).

  • Map disclaimer The depiction of boundaries on this map does not imply the expression of any opinion whatsoever on the part of BMJ (or any member of its group) concerning the legal status of any country, territory, jurisdiction or area or of its authorities. This map is provided without any warranty of any kind, either express or implied.

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement Data may be obtained from a third party and are not publicly available. Access to VA electronic health records data is limited to researchers with active, VA appointments and an IRB-approved protocol. Once a researcher has a VA appointment and IRB approval, the VA has a comprehensive data infrastructure to support secure and remote access to data via the VINCI platform. Additionally, deidentified data sets can be established and shared with appropriate IRB approval and data use agreements. The authors encourage collaborations to leverage this cohort to examine how national or regional natural experiments may be related to diabetes incidence or diabetes outcomes.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.