Article Text
Abstract
Objective To develop and validate rheumatoid arthritis (RA) risk models based on family history, epidemiologic factors and known genetic risk factors.
Methods We developed and validated models for RA based on known RA risk factors, among women in two cohorts: the Nurses’ Health Study (NHS, 381 RA cases and 410 controls) and the Epidemiological Investigation of RA (EIRA, 1244 RA cases and 971 controls). Model discrimination was evaluated using the area under the receiver operating characteristic curve (AUC) in logistic regression models for the study population and for those with positive family history. The joint effect of family history with genetics, smoking and body mass index (BMI) was evaluated using logistic regression models to estimate ORs for RA.
Results The complete model including family history, epidemiologic risk factors and genetics demonstrated AUCs of 0.74 for seropositive RA in NHS and 0.77 for anti-citrullinated protein antibody (ACPA)-positive RA in EIRA. Among women with positive family history, discrimination was excellent for complete models for seropositive RA in NHS (AUC 0.82) and ACPA-positive RA in EIRA (AUC 0.83). Positive family history, high genetic susceptibility, smoking and increased BMI had an OR of 21.73 for ACPA-positive RA.
Conclusions We developed models for seropositive and seronegative RA phenotypes based on family history, epidemiological and genetic factors. Among those with positive family history, models using epidemiologic and genetic factors were highly discriminatory for seropositive and seronegative RA. Assessing epidemiological and genetic factors among those with positive family history may identify individuals suitable for RA prevention strategies.
- Rheumatoid Arthritis
- Epidemiology
- Gene Polymorphism
Statistics from Altmetric.com
Introduction
Rheumatoid arthritis (RA) develops in individuals at increased genetic risk after certain environmental exposures.1 ,2 Epidemiologic factors associated with RA include cigarette smoking, alcohol intake, excess body weight, low socioeconomic status and female reproductive factors.3–17 Genome-wide association studies and meta-analyses have identified RA-associated alleles and an interaction between HLA-DRB1 and smoking.18–30 Individuals with a family history (FH) of autoimmunity are at particularly increased RA risk, probably owing to shared environment and genetics.31–33
RA prevention remains an elusive goal given its relatively low prevalence and unclear transitions between preclinical phases and clinical disease.34 ,35 Preclinical RA prevention efforts targeted at people at increased risk may overcome these challenges. The identification of high-risk individuals using RA risk models is therefore an important goal.36 RA models incorporating genetic and epidemiologic factors have been developed.37 However, these models did not incorporate FH, a potent RA risk factor.31 ,32 ,34 Previous risk models have evaluated only autoantibody-positive RA or have used a limited set of epidemiologic factors.37–40 Studies of RA clinical prediction rules limited to patients with symptomatic, undifferentiated arthritis have used clinical, epidemiologic genetic and autoantibody factors, but this population is closer to the development of RA than are preclinical, asymptomatic cohorts.35 ,41–44
Our goal was to develop and validate risk models incorporating FH, genetic and epidemiological factors, for RA and its serological subtypes among asymptomatic individuals. We aimed to evaluate model performance among those with and without a FH. We quantified the joint effects of FH with high-risk genetics and epidemiological factors and hypothesised that, among those with a positive FH, models would be highly discriminatory for RA.
Materials and methods
Study design and populations
We developed models in a nested case–control study in the Nurses’ Health Study (NHS). NHS is a prospective cohort of 121 700 female nurses in the USA aged 30–55 years at baseline in 1976. Of these, 32 826 (27%) provided blood and another 33 040 (27%) provided buccal samples. Women who self-reported RA were screened for RA symptoms; chart review confirmed RA according to the 1987 American College of Rheumatology (ACR) classification criteria.45 ,46 Seropositivity was defined as positive rheumatoid factor or anti-citrullinated peptide antibody (ACPA) by chart review after RA diagnosis, or by assay among a subset of cases with plasma collected before onset.47 Genotyped cases and healthy controls were matched 1:1 at the index date of diagnosis by age, menopausal status and postmenopausal hormone use. Women of non-white race or missing FH were excluded.
We validated our models in the Epidemiological Investigation of RA (EIRA), a Swedish population-based case–control study that enrolled patients with RA aged 18–70 years at diagnosis between May 1996 and December 2009. RA was diagnosed by a rheumatologist and met the 1987 ACR classification criteria.46 ACPA assays were performed on all cases at enrolment. Cases were matched with controls for age, sex and region at index date of diagnosis.26 A subset was randomly selected for genotyping. Participants with kinship, of non-white race, or missing FH were excluded.
FH assessment
In NHS, women completed a single question on FH of RA or systemic lupus erythematosus (SLE) in first-degree relatives in 2008. We dichotomised responses as any or no FH of RA or SLE in first-degree relatives.
In EIRA, FH of RA in first-degree relatives among RA cases and controls was determined through the Swedish Patient and Multi-Generation registers, described in detail elsewhere.1 We dichotomised data as any or no FH of RA in first-degree relatives.
Epidemiologic factors
Selection of epidemiologic factors
In our RA risk models, we included epidemiologic factors that were significantly associated with RA in previous studies and our datasets.3–5 ,7 ,9–13 ,48 Our group previously developed RA models using epidemiologic factors, genetics and gene–environment interactions (GEIs).37 The primary model in those analyses consisted of risk factors (cigarette smoking, alcohol, education and parity) easily ascertained, and significantly contributing to the overall model. Based on recent literature, we included body mass index (BMI) for these analyses.6 ,9 ,48 ,49
Covariates
Age was included as a continuous variable. Categorical variables were defined as follows: smoking as never, <10, 10 to <20, or ≥20 pack-years; cumulative average alcohol intake as none, 1 to <5, 5 to <10, 10 to <20 or ≥20 g/day; education as high school or some college/college graduate or more education (husband's education in NHS); parity as nulliparous or parous. BMI was dichotomised at 25 kg/m2 (underweight or normal/overweight or obese, according to WHO).50 For NHS, data were updated through biennial questionnaires until the index date. For EIRA, data were collected at the index date and pertained to exposures before RA onset.
Genetic risk scores and gene–environment interaction
RA risk alleles were combined to form genetic risk scores (GRS), weighted by the natural logarithm of published ORs for RA in genome-wide association studies or meta-analyses (see online supplementary table S1).40 We included 39 independent RA risk alleles (8 HLA-DRB1 and 31 non-HLA alleles) validated at the time of genotyping that were available in both datasets for our complete GRS.
Since HLA-DRB1 and smoking were shown to interact in previous studies, we used two GRS: one for 8 HLA-DRB1 alleles (GRS-HLA) and another for 31 non-HLA alleles (GRS-non-HLA), when HLA×smoking was considered.23 ,26 Genotyping and quality control procedures for NHS and EIRA have been previously described in detail.20 ,23 ,27 ,28
Statistical analysis
RA risk models for women
Risk models for women were developed in NHS and validated in EIRA. We estimated the area under the receiver operating characteristic curve (AUC) and 95% CIs based on model components using logistic regression and discrimination interpreted by Hosmer and Lemeshow's51 rules (AUC ≥0.7 acceptable; ≥0.8 excellent). We performed separate analyses for seropositive and seronegative RA in NHS and for ACPA-positive and ACPA-negative RA in EIRA. We used the following model components: FH, epidemiologic factors (E), genetics (G), FH+E, FH+E+G and FH+E+G+GEI (the complete model). Models were compared using the integrated discrimination index (IDI), a measure of overall improvement in sensitivity and ‘1−specificity’ between models in the same case–control dataset.49 The IDI is not comparable across populations owing to different event rates.52 IDI can be more sensitive to addition of new variables than AUC and is more stable as a function of the baseline model than AUC.52 Reclassification of cases and controls between models within datasets was assessed using the continuous net reclassification improvement (cNRI), a positive value indicating correct reclassification of cases as higher risk and controls as lower risk by the new model compared with the original model.53
RA risk models for women stratified by FH
We stratified analyses based on any or no FH and used logistic regression models to estimate AUC and 95% CI. After stratification by FH, we used the following model components: E, G, E+G and E+G+GEI (the complete model).
RA risk models for men
We performed analyses for men in EIRA using the same methods. For men in EIRA, E models did not include parity, but were otherwise identical to models for women.
Joint effect of FH with genetics, smoking and BMI
We focused on two modifiable risk factors for RA, smoking and BMI, to assess the risk of RA among subgroups stratified by FH and GRS. We dichotomised GRS based on the 75th centile of the GRS distribution of controls in each study. Smoking was dichotomised as never/≤10 or >10 pack-years and BMI as <25 or ≥25 kg/m2. The joint effect of FH with genetics, smoking and BMI was examined using logistic regression models to estimate ORs and 95% CI for each RA phenotype, with the reference of no FH and low genetic risk, never/low smoking, or normal/underweight BMI. Logistic regression models were used to estimate ORs and 95% CIs for RA phenotypes from multiple risk factors (positive FH, smoking >10 pack-years, BMI ≥25 kg/m2 and high GRS). All models were adjusted for alcohol intake, education, parity and matching factors (age, menopausal status and postmenopausal hormone usage for NHS; age and region for EIRA).
Results
Population characteristics
Among women in NHS, there were 221 patients with seropositive RA, 160 patients with seronegative RA and 410 controls. Among women in EIRA, there were 733 patients with ACPA-positive RA, 511 patients with ACPA-negative RA and 971 controls. Among men in EIRA, there were 295 patients with ACPA-positive RA, 213 patients with ACPA-negative RA and 390 controls. Characteristics of the women in NHS and EIRA at the index date are shown in table 1. Positive FH was more common in NHS (34% of cases; 10% of controls) than EIRA (10% of cases; 4% of controls), probably owing to study differences in FH ascertainment.
Model validation and performance among women
AUCs for RA risk models for women in NHS and EIRA are shown in table 2. FH models had AUCs of 0.64 (95% CI 0.60 to 0.69)/0.66 (95% CI 0.61 to 0.71) in NHS for seropositive/seronegative RA and 0.58 (95% CI 0.55 to 0.60)/0.53 (95% CI 0.50 to 0.57) for ACPA-positive/ACPA-negative RA in EIRA. E models had higher AUCs for autoantibody-positive RA: 0.64 (95% CI 0.60 to 0.69) in NHS and 0.69 (95% CI 0.67 to 0.72) in EIRA, than for autoantibody-negative RA. G models had modest discrimination for RA serotypes with AUCs of 0.62 (95% CI 0.58 to 0.67) in NHS for seropositive RA and 0.70 (95% CI 0.68 to 0.73) in EIRA for ACPA-positive RA. AUCs for complete models (FH+E+G+GEI) were 0.74 (95% CI 0.70 to 0.78) for seropositive RA in NHS and 0.77 (95% CI 0.75 to 0.80) for ACPA-positive RA in EIRA and lower in autoantibody-negative RA.
Among women with positive FH in NHS, E models had AUCs of 0.79 for seropositive and seronegative RA (table 2). E models for women in EIRA with positive FH had AUCs of 0.77 (95% CI 0.68 to 0.86) and 0.79 (95% CI 0.67 to 0.90) for ACPA-positive and ACPA-negative RA, respectively. Among women in NHS with positive FH, G models had modest AUCs: 0.65 (95% CI 0.53 to 0.76) and 0.62 (95% CI 0.51 to 0.74) for seropositive/seronegative RA. For women with positive FH in EIRA, the G model for ACPA-positive RA had a higher AUC than NHS (0.73, 95% CI 0.64 to 0.83). The complete models (E+G+GEI) were highly discriminatory in both studies. AUCs were 0.82 (95% CI 0.74 to 0.90) and 0.83 (95% CI 0.74 to 0.91) for seropositive and seronegative RA in NHS. For women in EIRA with positive FH, complete models had excellent discrimination, with AUCs of 0.83 (95% CI 0.76 to 0.91) for ACPA-positive RA and 0.78 (95% 0.67 to 0.90) for ACPA-negative RA.
Receiver operating characteristic (ROC) curves are shown in figure 1 for seropositive/ACPA-positive RA. Online supplementary figure S1 shows ROC curves for seronegative/ACPA-negative RA.
Performance of complete models
Comparisons of the complete (FH+G+E+GEI) model with each model with single factors (FH, E or G) for autoantibody-positive RA showed improved discrimination by IDI (0.08–0.20) that was highly significant (table 3), suggesting marked model improvement. IDI improvements of 0.03–0.10 between complete and FH+E models suggest that genetics significantly improved discrimination. A positive cNRI that was highly statistically significant suggests improved reclassification to all other models except the FH+E+G model. The lack of improvement in IDI or cNRI when adding GEI to the FH+E+G model shows little benefit in discrimination from including GEI.
Among those with positive FH, IDI showed significantly improved discrimination for the complete model (E+G+GEI) compared with G models in NHS (0.22) and EIRA (0.15). Among women with positive FH, significantly positive cNRI values (0.73–0.84) suggest that complete models improved reclassification compared with G models.
The addition of GEI to complete models improved reclassification by cNRI (0.35–0.42) but only slightly improved discrimination by IDI (0.01–0.03), which was not statistically significant, suggesting only marginal improvement with GEI.
Complete comparisons for RA risk models are shown in online supplementary tables S4–S6.
Model performance among men
AUCs for models among men in EIRA are shown in online supplemental tables S2 and S3. The complete model (FH+E+G+GEI) among men in EIRA had excellent discrimination for ACPA-positive RA (AUC 0.80, 95% CI 0.76 to 0.83). ROC curves are shown in figure 1 for ACPA-positive RA and online supplementary figure S1 for ACPA-negative RA.
Joint effect of FH with genetics, smoking and BMI
The joint effects of FH and GRS are shown in table 4. In NHS, positive FH and high-risk genetics had an OR of 10.30 (95% CI 4.98 to 21.67) for seropositive RA. In EIRA, positive FH/high GRS had an OR of 13.04, 95% CI 6.56 to 25.91).
Positive FH and high smoking had ORs of 8.42 (95% CI 4.06 to 17.46) for seropositive RA in NHS and 5.43 (95% CI 2.79 to 10.59) in EIRA for ACPA-positive RA compared with no FH and low smoking. Positive FH and high BMI had ORs for seropositive/ACPA-positive RA of 7.44 (95% CI 3.66 to 15.13) and 2.43 (95% CI 1.15 to 5.14), respectively.
The ORs for RA from multiple risk factors are shown in table 5. In NHS, women with positive FH, high smoking and high BMI had an OR of 9.42 (95% CI 4.59 to 19.35) for seropositive RA, which increased to 20.89 (95% CI 9.04 to 48.29) with the addition of high GRS. Women in EIRA had similarly raised ORs for ACPA-positive RA with positive FH, high smoking, high BMI and high GRS (OR 21.73, 95% CI 10.69 to 44.19). Compared with autoantibody-positive RA, multiple positive risk factors conferred relatively less risk for seronegative (OR 8.03, 95% CI 3.29 to 19.63) and ACPA-negative RA (OR 3.23, 95% CI 1.48 to 7.06).
Discussion
We developed and validated models for RA serotypes among women enrolled in studies where epidemiologic factors were assessed in the asymptomatic, preclinical period before RA onset. RA risk models were highly discriminatory among those with positive FH, with AUCs of 0.82 in NHS and 0.83 in EIRA. We found that complete models incorporating FH, epidemiologic factors and genetics improved discrimination between RA cases and controls, especially for autoantibody-positive RA. We found that women with positive FH, high-risk genetics, high smoking and high BMI had up to 22-fold increased odds for ACPA-positive RA. Our models used easily obtained clinical information (FH, smoking, BMI, alcohol consumption, parity and education) and validated RA genetic markers.
Models using combinations of FH, epidemiologic factors and genetics improved discrimination by IDI compared with models using components alone. Reclassification, measured by cNRI, was improved in complete models. Models using only FH did not discriminate well, despite the potent association of FH with RA, probably owing to the low prevalence of FH, especially in EIRA, where IDI was highest for the complete model compared with the FH model.31 E models generally had better discrimination for RA than G and FH models. This highlights both the importance of epidemiologic factors in RA pathogenesis and the modest discrimination provided by genetic factors. Several studies have recently evaluated the performance of RA models with genetics, but have used a limited number of environmental factors, typically smoking, for ACPA-positive RA. Genetic models in these studies provided less discrimination than models that also used smoking.38 ,39 EIRA complete models for ACPA-positive RA (AUC 0.77) performed better than NHS models for seropositive RA (AUC 0.74), perhaps owing to more homogeneous classification by ACPA in EIRA.
Among those with positive FH, RA models had excellent discrimination for autoantibody-positive RA (AUCs 0.82–0.83). This suggests that evaluating epidemiological and genetic factors among those with positive FH may be able to identify asymptomatic individuals at increased RA risk. Among women with positive FH, the complete model showed significant improvement in discrimination compared with G models. Epidemiologic factors may be especially important in the aetiology of RA among those at high risk. In a recent study, among a population with arthralgias and RA-related autoantibodies, those who smoked and were overweight were sevenfold more likely to develop RA.36 In our study, those who smoked, had high BMI, positive FH and high-risk genetics had 22-fold higher odds for RA. Furthermore, these findings suggest that using these risk models among an asymptomatic population may be useful in screening for high-risk subjects to enrol in RA prevention trials. Since many of the epidemiologic factors in our RA risk models are modifiable and E models had higher AUCs in FH-positive models than G models, this also suggests that a proportion of RA may be preventable. Prior reports suggest that cigarette smoking may account for 25–35% of population-attributable RA risk perhaps owing to interaction with HLA-DRB1.14 ,16 Our findings offer more evidence that epidemiologic factors are important to the aetiology of RA, even among those with positive FH.
We acknowledge limitations in our study. Our models were developed and validated among women without RA symptoms and do not deal with the progression of symptomatic, undifferentiated arthritis to RA. Exploratory analyses using men with similar models had excellent discrimination for ACPA-positive RA (AUC 0.80), but this needs replication. ACPA testing was not performed on all NHS samples owing to lack of plasma for those with buccal samples. ACPA testing before RA onset was unavailable in EIRA. Thus, we were unable to include preclinical ACPA in our models.35
Autoantibody-negative RA models generally performed worse than autoantibody-positive RA models. AUCs for complete models were modest for seronegative RA in NHS (0.70) and ACPA-negative RA in EIRA (0.66). Epidemiologic factors have different associations for seropositive and seronegative RA, often with weakened or null associations in seronegative RA compared with seropositive RA.6 ,8 ,54 In our study, the odds for autoantibody-negative RA with multiple risk factors were only increased three- to eight-fold compared with 21–22-fold increased odds for autoantibody-positive RA (table 5). There may be less heritability in ACPA-negative RA, which might also explain the underperformance of risk models for autoantibody-negative RA.1 Our study used 39 genetic RA risk alleles validated at the time of our genotyping, less than the currently reported 101 loci.55 These newly discovered single nucleotide polymorphisms have modest ORs, so it is unlikely that they would change discrimination appreciably.18 ,55 Autoantibody-negative RA risk models performed better among FH-positive women (AUCs 0.83 in NHS and 0.78 in EIRA). The higher AUC in NHS might reflect some misclassification by serostatus for RA cases diagnosed before the development of ACPA testing.
The NHS and EIRA study designs were also different. In NHS, we performed a nested case–control study within a prospective cohort of US women. Many patients with RA in the NHS were diagnosed prior to routine clinical ACPA testing, so there is potential for misclassification of serological status. Women in NHS were followed up prospectively before RA diagnosis, so epidemiologic data were collected without differential bias between cases and controls. EIRA is a Swedish case–control study and all cases were classified by ACPA at diagnosis. Epidemiologic data before RA diagnosis were assessed retrospectively, introducing the potential for recall bias.
Finally, FH ascertainment differed between the studies. In NHS, FH was collected by self-report, usually after RA diagnosis and included SLE. In EIRA, FH was validated using register linkages that provided close to complete coverage of FH irrespective of its temporal association to the index case/control, but only for birth cohorts covered by the Multi-Generation registers. The prevalence of FH among patients with RA in previous studies ranged from 7% to 22%, with higher FH prevalence in studies using self-reporting (18–22%).30 Since NHS data on FH were collected by self-report and included SLE and was usually collected after RA diagnosis, the high prevalence of FH in this study (34%) is probably overestimated. However, controls also had a high prevalence of FH (10%), suggesting that overestimation of FH occurred in both cases and controls. Since women in NHS were advanced in age when FH was collected, family members might have been more likely to develop RA or SLE than in other studies. The prevalence of FH in EIRA (10% in cases, 4% in controls) might have underestimated the true FH prevalence, though not the relative prevalence of FH between cases and controls for the reasons mentioned above. Despite these differences, our models performed similarly in both studies, enhancing the generalisability of our models.
In conclusion, models based on FH, RA risk factors (smoking, BMI, alcohol consumption, education and parity) and validated RA genetic markers, classified RA risk well among women. Among those with positive FH, RA risk models using known risk factors and genetics provided excellent discrimination between patients with RA and controls. Our results suggest that using risk models with epidemiologic and genetic factors among those with FH may enable identification of individuals suitable for RA prevention strategies.
Acknowledgments
We thank May Al-Daabil, MD for her assistance in reviewing medical records in the NHS. We thank Lori Chibnik, PhD for critical manuscript review. Finally, we thank all the participants and staff of the Nurses’ Health Study in the USA and the Epidemiological Investigation of Rheumatoid Arthritis in Sweden for their contributions.
References
Supplementary materials
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Files in this Data Supplement:
- Data supplement 1 - Online supplement
Footnotes
Handling editor Tore K Kvien
JAS and C-YC contributed equally.
Correction notice This paper has been corrected since it was first Published Online. Following publication the authors discovered that family history of RA or lupus was missing for 247 subjects (of 1038 subjects) in their analysis. After removing subjects with missing data, the statistical analysis yields slightly different risk estimates, and AUCs; however, the interpretation and summary of the results does not change. The numbers throughout the text and tables are therefore different in many places. Data in the supplement documents have also been revised.
Contributors All authors fulfil all conditions required for authorship. Study design: JAS, C-YC, KHC and EWK. Acquisition of data: JAS, JA, LK, LA, KHC and EWK. Statistical analysis: JAS, C-YC and XJ. Analysis and interpretation of data: JAS, C-YC, XJ, JA, LTH, SM, LK, LA, KHC and EWK. Manuscript preparation: JAS, C-YC and EWK. All authors approved the final draft of the manuscript.
Funding This work was supported by grants from the National Institutes of Health (grants CA087969, CA049449, CA050385 and CA067262) and the National Institute of Arthritis and Musculoskeletal and Skin Diseases (grants AR049880, AR052403 and AR047782). The EIRA study was supported by grants from the Swedish Medical Research Council, from the Swedish Research Council for Health, Working Life and Welfare (FORTE), from King Gustaf V's 80-year foundation, from the Swedish Rheumatism Foundation.
Competing interests None.
Ethics approval All aspects of this study were approved by the Partners Healthcare and Karolinska Institutet institutional review boards.
Provenance and peer review Not commissioned; externally peer reviewed.