Article Text

Download PDFPDF

Study protocol: pneumonia and inhaled corticosteroid treatment patterns in chronic obstructive pulmonary disease – a cohort study using sequence analysis (PICCS)
  1. Allan Klitgaard1,2,
  2. Rikke Ibsen3,
  3. Ole Hilberg1,2,
  4. Anders Løkke1,2
  1. 1Department of Internal Medicine, Lillebaelt Hospital, Vejle, Denmark
  2. 2Department of Regional Health Research, University of Southern Denmark, Odense, Denmark
  3. 3I2minds, Aarhus, Denmark
  1. Correspondence to Dr Allan Klitgaard; alksorensen{at}


Introduction Treatment with inhaled corticosteroids (ICS) is a widely used treatment in chronic obstructive pulmonary disease. The main effects include a reduction in the number of exacerbations and, for some patients, an increase in expected mortality. Unfortunately, the treatment is also linked to an increased risk of pneumonia, and very little is known about which patients experience this increased risk. There is a need for identification of patient characteristics associated with increased risk of pneumonia and treatment with ICS.

Methods and analysis This is a register-based cohort study that uses the nationwide Danish registers. Data from several registers in the years 2008–2018 will be merged on an individual level using the personal identification numbers that are unique to every citizen in Denmark. Clusters based on pneumonia incidence and ICS treatment patterns will be explored with a sequence analysis in a 3-year follow-up period.

Ethics and dissemination This is a register-based study and research ethics approval is not required according to Danish Law and National Ethics Committee Guidelines. The results will be submitted to peer-reviewed journals and reported at appropriate national and international meetings.

  • Chronic airways disease
  • Respiratory infections
  • Epidemiology
  • Emphysema
  • Adult thoracic medicine

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


  • A limitation of register-based studies is that relevant variables are derived from already predetermined data, and it is difficult to add other data for individual study purposes. The definitions of moderate pneumonia in our study, for instance, is constructed from redeemed prescriptions of medication typically used for these events.

  • This study is conducted on a large study population represented in the Danish registers of high quality. This ensures vast amounts of data, resulting in great statistical power and limited selection bias.

  • The registers have been used for many years, allowing us to study long-term trends in diseases.

  • The study design allows for exploration of associations between pneumonia and inhaled corticosteroid treatment over time.

  • The validity of the registers used in this study has been investigated and deemed high.


Inhaled corticosteroid (ICS) treatment is one of the medical treatments used in chronic obstructive pulmonary disease (COPD).1 2 The main effect of ICS is a reduction in acute exacerbations of COPD, and the treatment has shown statistically significant effects on quality of life and mortality.3 However, the treatment is associated with side effects such as increased risk of oral candidiasis, diabetes, osteoporosis, and cataract.4–9 Arguably, the most important side effect is the increased risk of pneumonia,7 10 as reflected in the Global Strategy for the Diagnosis, Management, and Prevention of Chronic Obstructive Pulmonary Disease (GOLD) 2020 report.1

It is difficult to foresee which patients will benefit from ICS treatment and which patients will experience side effects. A Spanish study concluded that pulmonologists rarely agree on when to prescribe or withdraw from ICS.11 However, during the past few years, a growing body of evidence has emerged to illuminate certain phenotypes within COPD that are more likely to benefit from ICS treatment: A history of frequent and/or severe exacerbations, a high level of blood eosinophils, and a history of asthma are all associated with an increased probability of beneficial ICS treatment effects.1 12–15 Unfortunately, not much is known about which patients with COPD that are more likely to experience an increased risk of pneumonia. In a pooled analysis of five clinical trials, a low body mass index was the only significant risk factor for ICS-induced pneumonia.16

COPD is recognised as a complex and heterogeneous disease, and patient phenotypes within COPD have been studied using various approaches.17 18 Currently accepted clinical phenotypes include the eosinophilic phenotype and the emphysema phenotype.19 Phenotypes within comorbidity have been described in cluster analysis studies.20 21 Phenotypes based on cluster analysis have been explored using cross-sectional data,20 22 23 but no studies have explored phenotypes within COPD using cluster-analysis on longitudinal data. This may be achieved using sequence analysis, and a study from Scotland did this to understand multimorbidity.24 Sequence analysis is a non-parametric method commonly used in the social sciences to analyse trajectories and social processes.25 26 The method has the advantage of providing a holistic view of trajectories, describing how processes evolve over time and when transitions occur.

We hypothesise that, using sequence analysis, individual disease trajectories within COPD can be divided into groups forming typical trajectories based on how similar they are (clusters). The purpose of our present study is to explore if patients with COPD can be grouped in clusters according to ICS treatment patterns and pneumonia incidence.

Methods and analysis

Study design and research questions

This study is a nationwide cohort study from 2008 to 2018. The study will use data from the Danish national databases, and these data will be merged using a unique personal identification number (PIN) that is assigned to all citizens in Denmark from the Danish Civil Registration System (CRS). This creates a dataset that will be used to answer the following research questions:

  1. Can patients with COPD be grouped according to their ICS treatment pattern and pneumonia incidence?

  2. Is there a difference in patient characteristics between these groups?

Data sources

This is an entirely register-based study that uses data from the vast Danish national registries (figure 1). Data are collected from the CRS, the Danish National Patient Register (DNPR), The Danish National Prescription Register (NPR), Statistics Denmark (DST) and the Danish Register for Chronic Obstructive Pulmonary Disease (DrKOL).

Figure 1

Overview of registers used in this study. ICD-10, International Classification of Diseases 10th Revision.

The Civil Registration System

The CRS uses a unique PIN that has been assigned to every citizen in Denmark since 1968.27 This PIN allows for individual-level linkage across all registers, as it is used for registration in all health registers. It has been validated, and it is considered a vital tool in Danish epidemiological research.28

The Danish National Patient Register

DNPR contains information on patients’ contact with Danish hospitals such as admissions and outpatient clinics. It contains information on all hospitalised somatic patients from 1977 and all outpatients, emergency room patients, and psychiatric patients from 1995. In 2019, a new data model was established. Variables include treatment, diagnosis by International Classification of Diseases 10th Revision (ICD-10) codes, and length of stay.29 The use of ICD-10 diagnosis codes as a method for identification of pneumonia in hospitalised patients and construction of the Charlson Comorbidity Index (CCI) have both been validated.30 31

The Danish National Prescription Register

NPR contains data since 1994 on all medication sold in Denmark in pharmacies or prescribed by a doctor or hospital. It does not contain medication prescribed during hospitalisation. It includes variables such as drug name, dose, price, and dates of redeemed prescriptions using anatomical therapeutic chemical (ATC) codes. The validity of the data is well established.32

Statistics Denmark

DST contains various registers with longitudinal demographic information since 1970s and onwards.27 These include immigration and emigration, vital status, partner status, and place of residence. DST contains income data since 1980, educational status since 1981, and occupational status since 1976.

The Danish register for COPD

DrKOL is a nationwide clinical database that contains clinical information on every patient with COPD that has visited a pulmonary outpatient clinic in Denmark since 2008.33

Study population

All patients in Denmark with an ICD-10 diagnosis code of COPD (J44) and a first registration in DrKOL between 2008 and 2015 will be included (table 1). We consider clinical data from DrKOL vital to this study, and this restricts our study period to 2008 and onwards. Data in this study are available until the end of 2018. To ensure a 3-year follow-up period after first registered measurement, only patients with COPD with their first measurement in the period 1 January 2008 until 31 December 2015 were included. Patients with missing or incomplete information of clinical measures are excluded.

Table 1

Patient inclusion and exclusion

Definition of baseline variables

All variables are derived from the registers. Baseline variables are allocated into two groups: health status variables and sociodemographic variables. An overview of baseline variables included in this study, including the register from which the data is derived, is shown in table 2.

Table 2

Overview of baseline variables included in the study

Age, forced expiratory volume in first second (FEV1), and body mass index (BMI) are included in the dataset as continuous variables. FEV1 % of predicted will be divided into four groups (mild, moderate, severe, and very severe) according to GOLD guidelines.34 BMI is categorised according to WHO groups. Age will be converted into a categorical variable with five groups by age in years (40–49, 50–59, 60–69, 70–79 and 80+). Medical Research Council Dyspnoea (MRC) scale is included in the dataset as an ordinal variable ranging from 1 to 535 and will be subsequently converted to the modified MRC (mMRC) scale and dichotomised into high/low dyspnoea according to GOLD guidelines.34 Comorbidity is evaluated using ICD-10 codes from DNPR and will be collected as CCI within 3 years prior to inclusion date and as proportion of patients having comorbidity within each WHO category. CCI is divided into four groups (CCI=0, CCI=1, CCI=2 and CCI>2). A former asthma diagnosis is defined as an ICD-10 diagnosis code of asthma (J45) within 3 years prior to index. Medication at baseline is defined by redeemed prescription for inhaled medication in 4 months prior to index.36 Medication is defined categorically as long-acting muscarinic antagonist (LAMA), long-acting beta-agonist (LABA), LABA+ICS, LABA+LAMA and LABA+LAMA+ ICS. We include the number of moderate and severe pneumonias and non-pneumonic exacerbations 1 year prior to baseline. Pneumonias are defined in the same way as the outcome variable ‘pneumonia’ (see below). A moderate non-pneumonic exacerbation is defined as a redeemed prescription of oral corticosteroids, prednisolone (ATC H02AB06) and prednisone (ATC H02AB07), for short-term use. A severe non-pneumonic exacerbation is defined as a hospitalisation with ICD-10 primary diagnosis of J96 with a secondary diagnosis of J44. Incidents with ICD-10 diagnosis codes to define ‘pneumonia’ are excluded. Exacerbations and pneumonias must be separated by 4 weeks (28 days) to be considered separate events. If closer, it will be considered a pneumonia. Prescription of incident-relevant medication and hospitalisation must be separated by 4 weeks to be considered separate events. If closer, the severe event will be counted. Both pneumonias and non-pneumonic exacerbations are categorised by number of incidents (none, 1 moderate, ≥2 moderate, ≥1 severe). Cohabitation status will be defined as either living alone or cohabiting.

Definition of states

Patients are followed for 3 years starting from baseline. The follow-up period is divided into 36 monthly time units. For each month, patients are assigned to one of four states:

  1. Dead.

  2. Pneumonia.

  3. ICS.

  4. Nothing.

A patient can only have one of the four states in 1 month, and the states are ranked 1–4. Dead is an end state, so if a patient dies, this patient will have the state=‘dead’ each month for the rest of the follow-up period. If a patient in the same month has both pneumonia and collects ICS, this patient will get the state ‘pneumonia’ in that month. The state ‘nothing’ is when a COPD patient in a month is neither dead, has pneumonia, nor uses ICS.

ICS treatment

Treatment with ICS is defined as redeeming a prescription for ICS medication (table 3). Months with ICS is defined using DDD (defined daily dose) and expedition date.

Table 3

Codes used for defining ICS treatment

Compliance is defined as:

Compliance=DDD for the expedition/duration between 2 expedition dates.

If compliance is ≥60%, there is no break between expeditions. If the compliance is <60%, the duration is the first expedition date+DDD, and there is a break until the next expedition date. Patients can collect more DDD’s than they should use in one period and can then use the DDD’s over an extended period. We assume that the collected DDD’s correspond to the usage of ICS over time, and if a patient collects more DDD’s than covers the period between 2 expedition dates, the compliance is accumulated. From the last collection of ICS, the end date for ICS usage is the last collection date+DDD.


Pneumonia includes both severe and moderate pneumonia. A patient gets the monthly state ‘pneumonia’ if the patient in the given month either collects medication for pneumonia (ATC-codes) or if the patient is admitted or have an ER contact with A or B pneumonia ICD-10 diagnosis (table 4).

Table 4

Codes used for defining pneumonia

Statistical analysis plan

Sequence analysis will be used to explore and identify relevant clusters.25 26 We use single channel sequence analysis with one sequence per person. Each patient has their own distinct sequence in the 36-month follow-up period based on the four possible monthly states (dead, pneumonia, ICS, and nothing). Patients who have the state ‘nothing’ and patients who have the state ‘ICS’ the whole follow-up period will be manually gathered in two clusters ‘Only nothing’ and ‘Only ICS’. Furthermore, we assume that patients who die in the follow-up period are to be considered terminal, and all these patients are gathered in the same cluster named ‘Terminal’. This ensures that they do not ‘disturb’ the clusters with patients who survive all 36 months.

The sequence analysis is a two-step analysis, where we include all patients who are not manually distributed to a cluster. First step is to group sequences together based on an assessment of how similar the sequence trajectories are. Here, we use optimal matching (OM) to find the dissimilarity. OM is the method most often used to assess the dissimilarity between all pairs of sequence.25 The OM stage allows us to produce a dissimilarity matrix, which can be used to distinguish typical trajectories. At the OM stage, choices must be made on costs for three possible operations (substitution, insertion, and deletion) that allow two sequences to match. These costs are set by a substitution matrix (SM) for substitution operations and an indel value for insertion and deletion operations. In this OM analysis, the observed transitions matrix is used to construct the substitution costs. This is a data-driven approach to define SM based on transition rates.37 We use this approach because we do not assume that ‘all states are equally different’ (cost=1). Insertion-deletion (indel) cost is set to constant 1, and the distances are not normalised, since all sequences have the same length. Different indel values will be tested in sensitivity analyses. In step 2, we use a partitioning around medoids (PAM) algorithm to find the clusters in the dissimilarity matrix. PAM is used to group patients with similar sequence trajectories and allocate patients to the nearest cluster.38 The optimal number of clusters will be found based on an internal validation test.39 R will be used.40 41

Descriptive baseline variables will be presented for the total population and by clusters. Continuous variables will be presented as both categorical variables as described above and continuous variables with mean and SD.

Differences in baseline characteristics between the clusters will be analysed with a multivariable logistic regression model with clusters as the outcome variable and the ‘Only nothing’ cluster as reference. We include all baseline variables except medication at baseline and mMRC scale. mMRC is highly correlated with FEV1 % of predicted, and we consider the latter more important. We do not consider medication at baseline relevant for the model. Categorical variables are included as described above. Age is included as a continuous variable, but BMI and FEV1 % of predicted are included as categorical variables because of an expected non-linear relationship. The model will be tested for multicollinearity and influential outliers. Because of the large period of patient inclusion (2008–2015), we will also examine the model for time-dependency by including patient index year as a covariate. A p<0.05 will be considered significant.

Furthermore, we will explore the difference between clusters regarding average number of the different states ‘nothing’, ‘ICS’ and ‘pneumonia’ in each cluster in the 36 months follow-up period.

No subgroup analyses have been planned. We will have no missing data, because we include only patients with complete registration in DrKOL at baseline, and all other variables are derived from complete registers.

Patient and public involvement

Patients were not involved in the development of the research question or the design of this study.

Ethics and dissemination

This study will be conducted in accordance with the Declaration of Helsinki. The study is entirely register based, and research ethics approval is not required according to Danish Law and National Ethics Committee Guidelines.

Storage and management of data

Data are obtained from DST. Data are also stored at DST, and all analyses will be carried out using a secure remote access to Forskermaskinerne (virtual desktop) at DST. Data management, data control, and data analysis will be carried out by experienced statisticians from i2minds.

Dissemination of results

The results of this study will be published in international peer-review journals with an emphasis on open access. The results will also be shared with relevant public stakeholders such as patient associations and medical and academic communities within pulmonary medicine.


This study has strengths and limitations. The largest strength of the study is its use of data from a large nationwide real-world population of patients with COPD, in a cohort that allows for complete follow-up. On the other hand, we are only able to use data that have already been entered into the registers. This is the case in our definition of moderate pneumonia and exacerbation. We define pneumonia by redeemed prescriptions, because we do not have access to clinical diagnoses in the primary care setting. We do, however, believe this to be a feasible strategy. The distinction is not clear between pneumonia and acute exacerbation of COPD (AECOPD) caused by a lower respiratory tract infection (LRTI),42–44 and they may be seen as a continuum of severity instead of distinctly different entities.45 This is complicated by the fact that pneumonia is often clinically defined as the presence of typical symptoms and a new pulmonary infiltration on chest radiograph, but primary care practitioners do often not obtain these.46 Furthermore, chest radiography may have limited sensitivity and positive predictive value for identifying pneumonia,47 and patients can have pulmonary infiltrates on chest radiographs in AECOPD without clinical symptoms of pneumonia.48 Even randomised controlled trials of inhaled medication for COPD have widely different definitions of pneumonia, and many have not used radiological confirmation of the diagnosis.49 In summary, distinguishing between AECOPD and pneumonia is difficult both clinically and in research. Our proposed method identifies both pneumonia and AECOPD suspected to be caused by bacterial LRTI, while it likely excludes AECOPD of non-bacterial aetiology. Importantly, it does so in a real-world clinical outpatient setting, where the distinction between AECOPD and pneumonia is often not clear.

Ethics statements

Patient consent for publication



  • Contributors AK: conceptualisation, methodology, writing—original draft, writing—review and editing, funding acquisition. RI: conceptualisation, methodology, writing—original draft, writing—review and editing. OH: conceptualisation, methodology, funding acquisition. AL: conceptualisation, methodology, writing—review and editing, funding acquisition.

  • Funding This work is supported by the Region of Southern Denmark, the University of Southern Denmark, and an unrestricted grant from Boehringer Ingelheim, grant number AGR-2018-731-5845.

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.