Article Text


Description of an incidence-based model for Assessing comorbidity patterns in disease natural history
  1. Victor A Kiri1,2
  1. 1Faculty of Pharmaceutical Sciences, University of Port Harcourt, Choba, Nigeria
  2. 2FV&JK Consulting Ltd, Guildford, UK
  1. Correspondence to Professor Dr Victor A Kiri; victor.kiri{at}


Background Patients with a chronic disease often suffer from other diseases called comorbidities, which can be important factors in the assessment of risks associated with the disease and its management. However, comorbidities can pose important methodological issues because factors such as time, age, duration and the disease can influence their impact on the risk of interest.

Methods To identify comorbidities of a chronic disease, it is common practice to construct 2 separate cohorts of patients—a set with the disease and another as a random sample of patients free of the disease—and compare the event rates for each candidate's comorbidity over a specific period between the 2, while accounting for factors which may confound the results. We describe an incidence-based alternative approach that exploits the longitudinal properties of observational databases to track incident event rates along the natural history of the chronic disease. We illustrate it in a retrospective cohort of patients with chronic obstructive pulmonary disease (COPD) aged 50 and above—each patient with COPD was matched with another without COPD on certain confounding factors.

Results We obtained 24 079 matched pairs. We found that chronic conditions such as lung cancer, asthma, fracture and osteoporosis were more common in patients with COPD. We also found evidence of time-varying associations.

Conclusions Our findings in COPD suggest that time is an important factor and comorbidity studies which are based on information in a single fixed period (such as first year postdiagnosis of COPD) are more likely to report spurious associations.

Statistics from

Strengths and limitations of this study

  • Explored the longitudinal properties of the data to obtain comparable estimates of incident event rates in each cohort.

  • Tracked the trend in incident events along the natural history of the disease.

  • Reduced likelihood of spurious associations compared with the traditional single-point estimation approach which is based on a single observation window.

  • The lack of control for the likely effect of smoking on the results due to the limited scope of information.

  • The underlying attendance patterns of the patients could affect the probability of diagnosis of the comorbid conditions of interest.


Comorbidity is defined as any disease which coexists with a chronic disease of interest and the level of comorbid disorders may depend on the chronic disease type. Comorbidities are important for several reasons. First, the safety profile and the potential for adverse effects associated with a given therapy may depend on the extent and severity of pre-existing comorbidities in the particular patient population. Second, the effectiveness of the therapy may vary among the patients because its benefits may be affected by the types of pre-existing comorbidities. For instance, there is evidence that patients suffering from asthma, particularly those who also have chronic obstructive pulmonary disease (COPD) have an increased risk of death from causes other than COPD.1 In such situations, it is clearly clinically relevant to know whether the increased risk is related to the severity of the primary disease, its treatment or the comorbidity. In general, comorbidity remains an unresolved issue in the morbidity and mortality of patients living with chronic diseases.

Since comorbidities may occur more frequently in patients with a particular chronic disease than in those of similar demographic characteristics who are free of the disease, information on the common comorbidities associated with the chronic disease such as background incidence rates can enhance pharmacovigilance and risk management activities, especially for events which may otherwise be falsely classified as safety signals associated with the drug.

Of course, information about comorbidity is also important in clinical practice. In a given chronic disease, such information can influence the quality of life of the patient as well as decisions on treatment.2–4 There are many examples in pharmacoepidemiological studies where lack of adequate control of the possible influence of comorbidity has resulted in effect estimates confounded by disease severity and other forms of bias.5–9 Observational databases with rich longitudinal information such as the UK Clinical Practice Research Datalink (CPRD) and many of the US claims databases as well as those in some European Union (EU) countries can serve as useful resources for obtaining the incidence and prevalence rates of medical events in patients with a particular chronic disease. In such studies, it is standard practice to compare the estimated rates with those obtained from a control population which often is a random sample of the population that is free of the chronic disease. In most situations, matching on factors such as age and gender which are generally known to influence the type, proportion and impact of comorbidities is often used to facilitate comparability between the two populations as cohorts.10–12 However, the use of an unmatched control population is not uncommon, despite the risk that by so doing, we may lose the ability to adequately control the confounding factors in our assessment of the association between comorbid events and the chronic disease.

Matching on the propensity scores is a popular approach for handling confounding factors in the assessment of the safety or effectiveness of an intervention in observational studies. However, the methodology may not be appropriate for comorbidity studies of the kind under description as these do not involve any intervention. In this setting, the propensity score becomes the probability of a patient being diagnosed with the chronic disease of interest and as such, in any matched pair, both the patient diagnosed with the disease and his/her counterpart would have the same chance of experiencing events which are associated with the disease. Comorbidities are factors associated with the chronic disease—adopting the propensity scores methodology in such studies an avoidable error.12 Instead, it may be more sensible to use an appropriate sampling strategy to match each patient with a disease with another patient free of the disease on one or two factors identified as potential key confounders such as age and gender in this setting.10–13

Another important dimension to comorbidity assessment is the role of time which often plays a major role in disease severity. We think that its influence can also be assessed by studying the natural history of the disease. Thus, to assess whether a particular comorbid condition is a risk factor for the chronic disease of interest, it may be useful to consider the pattern of the event in relation to the natural history of the chronic disease. In practice, this can be done by estimating the relevant event rates (ie, ideally as incident rates) over time such that it spans the periods prior to diagnosis of the chronic disease and afterwards. Indeed, the use of incident events in preference to prevalent cases may provide a more incisive insight into the nature of the relationship between the comorbid condition and the chronic disease although the effectiveness of this approach may depend on the number of years for which reliable historical data are available.

In this paper, we will recap the conventional approach for identifying comorbidities which may be associated with any particular chronic disease. We will then describe an innovative incidence-based methodology for identifying patterns of associations between comorbidities and the chronic disease along its natural history which we consider as a more viable alternative. By way of illustration, we will also reproduce some of the results reported elsewhere in a previous application of the new approach in COPD based on the UK CPRD population (formerly, the GPRD).14


Conventional approach: usually involves distinct patient populations in a matched cohort design in the following format:

  1. One set of patients who have a record of diagnosis or consultation for a chronic disease X in an a priori specified calendar year of interest and a random sample of patients who according to their medical records, are free of the disease.

  2. Both sets are from the same database population with each member also satisfying certain prespecified inclusion/exclusion study criteria.

  3. The date of the diagnosis/consultation for disease X in the specified calendar year—regardless of whether it is a pre-existing disease or a new condition—is taken as the index date and this is also assigned to the matched control so as to ensure same start of follow-up for each pair.

  4. Matching is usually on important measurable variables (ie, likely confounding factors) identified as key to facilitating comparability between the two cohorts.10 ,12

Age and gender are the most commonly used factors in this regard. The two cohorts may also be matched on other variables such as the duration of historical records at index date. Indeed, depending on the primary purpose of the study, the pool of eligible controls for each case may be restricted to only those whose last records span at least as long as that of the case so as to minimise the impact of between-pair differences in loss to follow-up.12

Incidence-based trend analytical approach: This involves a prespecified study period that spans over a reasonable number of years (ie, d), instead of the conventional method which either uses a single calendar year to identify patients with the chronic disease X or assesses event rates only in the postdiagnostic period. In this sense, the new approach is also different from the incidence-based methodology described elsewhere.13

  1. The study period consists of two separate phases: an earlier period of duration d1 years for the identification of the incident cases of disease X and a subsequent period of d2 years postdiagnosis. The total period d for trend analysis is thus d=d1+d2.

  2. Cohort X consists exclusively of patients newly diagnosed with condition X over the study period (ie, incident diagnosis) and the incident diagnosis date is defined as index date. Patients with any record of diagnosis/consultation for disease X outside of the study period are excluded.

  3. Each member of this cohort is then matched to a patient from a random sample of those in the database who are free of disease X during their entire medical history (ie, X=0). The matched control is assigned the same index date.

  4. As in the conventional approach, the matching variables include age and gender.

  5. However, unlike the former approach, each case is additionally matched with its control on the total completed years of medical records preindex and postindex date to ensure that the control is followed up for as long as the case exists—each having the same duration for the trend analysis.

Indeed, an aspect of the incidence-based approach has been successfully applied to assess the risk of cataract among patients with idiopathic thrombocytopaenic purpura in the CPRD.15

Data analysis

For each year i is the index date (i=1, 2…d, with i=1 for the earliest observed year) and for each candidate comorbid event k, we estimate the incidence rate per 1000 person-years (IRik) for each cohort as well as the corresponding 95% CI in a conditional logistic regression model involving relevant individual characteristic measures as explanatory covariates.16 We also estimate the rates ratio (RR) and its corresponding 95% CI using the conditional logistic regression approach to account for the matching variable, often ignored at some cost in the analysis of matched cohort data.17 ,18

To assess trends in RRs along the natural history of disease X, we fit a linear regression to the annual rate ratios on a logarithmic scale for the candidate comorbid event k and estimate the average annual percentage change over the periods prior to and postindex date and separately also for the overall period of evaluation (ie, d years). The resulting slope of each regression line is assessed for statistical significance.


By way of illustration of the new methodology, we have reproduced the details of a previous application in the UK CPRD over a 10-year period in which we evaluated the incident patterns of medical events from a list of candidates of a priori interest, thought to have possible associations with COPD.12–14 Comorbidity was defined as any event resulting from any consultation with a general practitioner which is significantly more common in patients with COPD. Thus, this illustration does not constitute a study of COPD.

We used a retrospective cohort of patients aged 50+ with a diagnosis of COPD. Each patient with COPD was matched to another patient without COPD on year of birth, gender, general practice and completed years of medical records up to at least a year after the index date for COPD between 1990 and 1998, the index date of the patient with COPD having been assigned to the matched non-COPD counterpart. We then estimated the annual incidence rates per 1000 person-years for each event in each cohort over the 10-year period as well as the corresponding annual RRs and their 95% CIs such that RR>1 indicates a higher rate in COPD. The age group is same as in the previously reported COPD studies conducted on the database.12–14


A total of 24 079 patients with COPD were each matched with a patient without COPD (figure 1).

Figure 1

Selection of chronic obstructive pulmonary disease (COPD) incident cases and controls from the Clinical Practice Research Database (CPRD).

The annual event rates in COPD and the corresponding annual RRs are as shown in tables 1 and 2 correspondingly.

Table 1

Annual incidence rates of certain conditions per 1000 person-years in patients with chronic obstructive pulmonary disease (COPD)*

Table 2

Annual incidence rates ratios of certain conditions per 1000 person-years in patients with chronic obstructive pulmonary disease (COPD) and patients without COPD*

According to these results, the incidences of many of the smoking-related chronic conditions were more common in patients with COPD than those free of the disease.19 ,20 They were consistently at higher risk of suffering from conditions such as lung cancer, asthma, other respiratory diseases, fracture, osteoporosis, thoracic, mediastinal, cardiac, nervous system and psychiatric disorders as early as several years before diagnosis of COPD. However, we found no evidence of association between COPD and conditions such as pneumonia, glaucoma, ear and labyrinth disorders, reproductive system, breast disorders and vascular diseases other than angina and cardiac disorders, although there were apparent signs of annual elevation in risk over time for some of the conditions. The pattern for angina was particularly inconsistent in terms of statistical significance—the levels were significantly higher in the patients with COPD only for the immediate 1-year period before and after the diagnosis of COPD—thus highlighting the unreliable nature of methods which rely solely on events in the first year of diagnosis of COPD.13

Indeed, we also found evidence of time-varying associations. For example, the annual levels for skin-related events were significantly and consistently higher among patients with COPD only after the chronic disease had been diagnosed—thus suggesting possible association with either treatment or severity of COPD or both. It is worthy of note that an assessment based strictly on data in the post-COPD diagnosis period would have offered a single conclusion, namely an association between the condition and COPD regardless of severity and treatment.


In this paper, we have described the features of an incidence-based methodology for identifying potential comorbid conditions for any particular chronic disease. The methodology exploits the longitudinal properties of observational databases to track incident event rates along the natural history of the chronic disease, as it involves the periods prior to its formal diagnosis and beyond. The results of its application in COPD, as previously described in detail elsewhere, revealed significant time-dependent associations between the chronic disease and certain conditions. We found evidence that in patients with COPD, the likelihood of diagnosis of certain comorbid events were highest in the immediate 1-year period before and after diagnosis of the chronic disease, perhaps due to the diagnostic-related activities experienced by these patients. If true, then a methodology which relies solely on data in the first year postdiagnosis of COPD is much more likely to suggest associations which may be spurious compared with our approach.

These findings may have interpretational implications on the results of comorbidity studies which are based exclusively on data in the immediate year postdiagnosis of any chronic disease of interest. Our results also suggest the trends approach which maintains the longitudinal quality of the data in the assessment of comorbidity associations with a chronic disease, may be more reliable than the traditional single estimate approach. Indeed, the new approach offers a facility for enhancing our understanding of the natural history of the chronic disease in relation to the burden of comorbidity in the management of patients living with the condition. With appropriate data, the method may also be useful to pharmacovigilance activities for any particular of interest, as it offers longitudinal results which may be used to put information from spontaneous reports into an appropriate context. This can be done by assessing the incident patterns of the event in two separately matched cohorts of the (1) exposed versus unexposed persons in one and (2) patients with chronic disease versus those free of the disease in the other.

We acknowledge the existence of alternative methods for obtaining matched cohorts in the natural history of disease studies and we have provided our reasons for excluding the propensity score approach. In the setting of exploration of possible associations between a chronic disease and comorbidities, we believe that the propensity score is exactly the same as the disease risk score—a probability estimate of a patient's likelihood of disease occurrence which has never been used for such natural history of disease studies.21–23 Outside of this setting, we think that propensity score matched cohorts could be useful for assessing factors associated with actual clinical practice in a chronic disease—such as the management of such patients in terms of resource usage independent of other sources of resource use (ie, confounding factors including comorbidities, among others).

A potential limitation of the new methodology, though common in natural history of disease studies conducted in general practice databases, is the possibility that the underlying behaviour and attendance patterns of the patients at the practices could affect the probability of diagnosis of the events. For example, patients with COPD may have higher rates of doctor consultations than those without COPD (ie, for routine checks, treatment of acute exacerbations as recommended in guidelines, among many other disease-related reasons), some events may have a higher likelihood of diagnosis in the COPD group.24 Clearly a notable limitation of the COPD illustration was the lack of control for the likely effect of smoking status which was due to the limited scope of information on smoking in the CPRD at the time of the study. Thus, smoking could indeed account in part for the observed differences between the two groups. Furthermore, the requirement of having at least 1-year follow-up might also introduce some bias in event estimates because of the possibility of significant differences between the two original cohorts in the proportion of patients with the comorbidities of interest over that period.13

The strengths of our methodology include the provision for exploiting the longitudinal properties of observational databases to obtain comparable estimates of event rate ratios as well as the provision for estimating the incidence patterns of such events over time which may facilitate a much clearer understanding of the nature of their associations with the disease.


The author is grateful to Dr Kourtney Davis for her encouragement and is indebted to GlaxoSmithKline R&D for the excellence in research award granted to him for the development of the incidence-based design while he was its employee. The author is also grateful to the reviewers for their useful suggestions. Finally, this work is humbly dedicated to the memory of Dr George Visick, a former research colleague at GlaxoSmithKline R&D whose untimely death remains hard to bear.


View Abstract


  • The results used to illustrate the design have been previously presented at conferences by the American Thoracic Society (ATS) and the International Society of PharmacoEpidemiology (ISPE) with the approval of GlaxosmithKline R&D—the sponsor.

  • Funding The author was an employee of GlaxoSmithKline Research and Development during the period of the research.

  • Competing interests The author consults for the pharmaceutical industry on epidemiological methods.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement No additional data are available.

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.