Objective Diagnosis of community-acquired pneumonia (CAP) in the elderly is often delayed because of atypical presentation and non-specific symptoms, such as appetite loss, falls and disturbance in consciousness. The aim of this study was to investigate the external validity of existing prediction models and the added value of the non-specific symptoms for the diagnosis of CAP in elderly patients.
Design Prospective cohort study.
Setting General medicine departments of three teaching hospitals in Japan.
Participants A total of 109 elderly patients who consulted for upper respiratory symptoms between 1 October 2014 and 30 September 2016.
Main outcome measures The reference standard for CAP was chest radiograph evaluated by two certified radiologists. The existing models were externally validated for diagnostic performance by calibration plot and discrimination. To evaluate the additional value of the non-specific symptoms to the existing prediction models, we developed an extended logistic regression model. Calibration, discrimination, category-free net reclassification improvement (NRI) and decision curve analysis (DCA) were investigated in the extended model.
Results Among the existing models, the model by van Vugt demonstrated the best performance, with an area under the curve of 0.75(95% CI 0.63 to 0.88); calibration plot showed good fit despite a significant Hosmer-Lemeshow test (p=0.017). Among the non-specific symptoms, appetite loss had positive likelihood ratio of 3.2 (2.0–5.3), negative likelihood ratio of 0.4 (0.2–0.7) and OR of 7.7 (3.0–19.7). Addition of appetite loss to the model by van Vugt led to improved calibration at p=0.48, NRI of 0.53 (p=0.019) and higher net benefit by DCA.
Conclusions Information on appetite loss improved the performance of an existing model for the diagnosis of CAP in the elderly.
- general medicine (see internal medicine)
- infectious diseases
- primary care
- respiratory infections
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
Strengths and limitations of this study
This was the first study to investigate the external validity of existing prediction models for the diagnosis of community-acquired pneumonia in the elderly.
To evaluate the additional value of appetite loss to existing prediction models, we developed an extended logistic regression model, which was evaluated by net reclassification improvement and decision curve analysis.
To explore the external validity of our findings, a similar investigation on a larger sample size in a primary care setting should be further conducted.
In elderly patients, community-acquired pneumonia (CAP) could be more severe than that in younger patients and could result in poor prognosis when treatment is delayed.1 Therefore, several predictors of mortality in elderly patients with CAP have been reported to identify individuals at risk for poor outcomes and to initiate early intervention.2 However, in the elderly, the diagnosis of CAP is often delayed because of their atypical presentation and underlying comorbidities.1 3 In particular, the common symptoms of cough, sputum production, fever, chills, rigours and chest pain could be absent and be replaced by non-specific deterioration of their general condition, which can manifest as appetite loss, falls, consciousness disturbance, and so on.1 Likewise, physical examination is less reliable in the elderly than in younger individuals.4 For example, asymptomatic elderly patients may have chest examination findings of crackles, which could be age related.5 In addition to such challenges in the diagnosis of CAP, routine use of chest X-ray (CXR) for all elderly patients with respiratory symptoms is time consuming and might not be cost-effective.
Although several prediction models based on signs and symptoms are available for the diagnosis of CAP,6–12 none had been developed or validated specifically for elderly patients. Because of the atypical presentation of CAP in the elderly, the performance of these models could be poor. We hypothesised that addition of the non-specific symptoms that are characteristic of CAP in elderly patients may improve the diagnostic performance of these existing models.
Accordingly, we investigated the external validity of these existing models and evaluated the value of adding the non-specific symptoms for the diagnosis of CAP in elderly patients.
Materials and methods
This was a prospective observational study that followed the TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) statement for prediction model studies.13 Written informed consent was obtained from all the subjects.
This study was conducted between 1 October 2014 and 30 September 2016 at the general medicine departments of three teaching hospitals: Shirakawa Kosei General Hospital in Fukushima, Japan (471-bed capacity); Kimitsu Chuo Hospital in Chiba, Japan (661-bed capacity); and Ashigarakami Hospital in Kanagawa, Japan (296-bed capacity).
We included ≥65-year-old outpatients who presented with a chief complaint of cough or sputum production. Exclusion criteria were: (1) longer than 1-month duration of cough or sputum production; (2) current intake of antibiotics; (3) did not reside in ‘community’ (patients who resided in nursing homes or those transferred from another hospital were excluded because epidemiology has been reported to differ among CAP, healthcare-associated pneumonia and hospital-acquired pneumonia).14
Using a structured collection form, physicians collected data on examination of eligible patients. When patients themselves were unable to answer the questions, physicians collected data from their caregivers. The items in the form included the predictors in existing models: (1) history (cough, sputum production, sore throat, coryza, dyspnoea, fever, chills, night sweats, myalgia, pleurisy, diarrhoea, duration of symptoms, and history of bronchial asthma, chronic obstructive pulmonary disease, chronic heart failure or ischaemic heart disease); and (2) signs (blood pressure, pulse rate, oxygen saturation, respiratory rate, body temperature, diminished breath sounds, wheezes, crackles and rhonchi).6–11 We added items on appetite loss, falls and consciousness disturbance. Appetite loss was based on self-assessment of 24-hour food intake in proportion to the usual intake.
The reference standard for CAP diagnosis was CXR in the posteroanterior and lateral views or in the anteroposterior view for patients who could not stand up. Two certified radiologists (WM and HY) who were blinded to the history and physical examination data independently assessed the CXRs. Any disagreement was resolved by discussion.
Existing prediction models for CAP
Existing models were identified based on a previous external validation study on the prediction models for CAP in the primary care setting.12 Inclusion criteria for these existing models were (1) use of logistic regression for the diagnosis of CAP in adult patients; (2) predictors were items about history and physical examination; (3) the diagnosis of CAP was made by CXR or CT.
The existing models in their original form and without adjustment of intercept and coefficients were externally validated for diagnostic performance by calibration plot and discrimination. The calibration plots were tested with the Hosmer-Lemeshow (HL) test and a p Value <0.05 indicated a lack of good fit.13 Discrimination was assessed by area under the curve (AUC).
The diagnostic performance of each item in the existing models, as well as that of the non-specific symptoms (appetite loss, falls and consciousness disturbance), was assessed in terms of sensitivity, specificity, positive likelihood ratio (LR+), negative likelihood ratio (LR−), and diagnostic odds ratio (DOR), all of which were reported as point estimates with 95% CI. DOR ranged from zero to infinity and was calculated as the ratio of LR+ to LR−, with higher values indicative of better discriminative performance; a value of 1 indicated that the test did not discriminate between people with and without the disease.15 The cut-off points for continuous variables were determined according to existing prediction models16 or the values with the best sensitivity and specificity in the receiver operating characteristic (ROC) curve. The diagnostic performance of the logistic regression model that included appetite loss as the sole predictor was evaluated by calibration plot.
To evaluate the additional value of these non-specific symptoms to the existing prediction models, we developed an extended logistic regression model without changing the coefficients of the existing models. Calibration, discrimination, category-free net reclassification improvement (NRI) and decision curve analysis (DCA) were applied on the extended model. The NRI was the cumulative net proportion of events reclassified correctly plus the net proportion of non-events reclassified correctly.17 The DCA was a graphical approach to evaluate the prediction models based on the principle that the relative harm of false positives and false negatives can be expressed in terms of a probability threshold. Net benefit was obtained by subtracting the proportion of false positives from the proportion of true positives, weighed by the relative harm of false positive and false negative results.18 Net benefit was assessed by the probability threshold, which was decided on 10 physicians who worked in the participating hospitals and who were blinded to the results of our research.
To assess the agreement in the CXR interpretations of the two radiologists, Cohen’s kappa coefficient (κ) was calculated.19 Imputation method was not applied for missing values because of the possibility of perfect prediction of categorical data. We did not estimate sample size a priori because there was no available consensus on the adequate sample size among the external validation studies on the prediction models or evaluation studies on the diagnostic performance of the added predictors.13 Statistical analyses were performed with a commercial software program (STATA, V.14.2 SE; StataCorp, College Station, Texas).
A total of 109 patients were included in the study population. The baseline characteristics of the patients are listed in table 1; mean age was 76.7 years (SD 7.8) and 58.7% were men. CAP was diagnosed by CXR in 22% of the patients. There was moderate agreement (κ=0.58) between the two radiologists who read the CXRs.
External validation of existing models
Six existing prediction models for CAP diagnosis were externally validated in our cohort. The formula for each model is presented in the online supplementary table 1.
Supplementary file 1
The diagnostic performance of each predictor item in the history and physical examination is presented in table 2. For each model, a calibration plot was drawn by comparing the predicted probability with the observed probability in our data set (figure 1). All existing models demonstrated significant p Values in the HL test, indicating poor fit (table 3). On the other hand, the calibration plot for the model by van Vugt et al11 visually demonstrated good fit between the predicted probability and the observed prevalence of CAP. Discrimination was highest for the model by Singal et al7 (AUC=0.76; 95% CI 0.65 to 0.87), followed by the models by van Vugt et al11 (AUC=0.75; 95% CI 0.63 to 0.88); Diehr et al6 (AUC=0.75; 95% CI 0.64 to 0.86); and Heckerling et al8 (AUC=0.75; 95% CI 0.64 to 0.86) (table 3).
Diagnostic performance of the non-specific symptoms
The sensitivity, specificity, LR+ and LR− for appetite loss, falls and consciousness disturbance are listed in table 2. The cut-off point for appetite loss was determined at 50% by the ROC curve. The AUC of appetite loss for CAP diagnosis was 0.76 (95% CI 0.65 to 0.87). The calibration plot of the model for appetite loss showed better fit than the model by van Vugt et al,11 which showed the best calibration among the existing models (figure 2). A non-significant HL test suggested good fit of the model for appetite loss (table 3).
Added value of the non-specific symptoms
In the evaluation of the diagnostic performance of each non-specific symptom, falls and consciousness disturbance seemed less useful; therefore, we investigated the added value of appetite loss to the model by van Vugt et al,11 which showed the best performance among the existing models.
The formula for the extended model included the following: −5.258, –0.135 for appetite, +0.446 for dyspnoea, +0.698 for absence of runny nose, +0.596 for diminished vesicular breathing, +1.404 for crackles, +0.961 for tachycardia and +0.980 for temperature >37.8°C. The extended model showed improved performance on the calibration plot, especially in the subgroup with high probability of CAP (figure 2), and good fit by the HL test (p=0.48; χ2=8.6). Compared with the original model, the extended model tended to have an improved AUC, but this change was not significant (0.77 (95% CI 0.65 to 0.89) vs 0.75 (95% CI 0.63 to 0.88); p=0.48). The NRI of the extended model was 0.53 (95% CI 0.08 to 0.97; p=0.019), which meant that 53% of patients correctly obtained increased probability of CAP.
Figure 3 illustrates the decision curves for the model by van Vugt et al11 and the extended model. Among the 10 physicians, the mean threshold for ordering CXR for the diagnosis of CAP in the elderly was 22.5% (SD 7.1%; range 10%–30%). At a threshold of 10%–30%, a higher net benefit was more frequently observed in the extended model than in the original model. At a threshold of 20%, the net benefit of the extended model increased by 0.02 compared with the original model.
Our study assessed the external validity of existing models for the diagnosis of CAP in an elderly population at acute care hospitals in Japan. These existing models had been validated in a meta-analysis of individual data of patients who were at least 18 years old.12 Although the existing models were supposed to have poor external validity in the elderly population, the model by van Vugt et al11 showed relatively good calibration and discrimination. In the evaluation of the diagnostic performance of the non-specific symptoms of CAP in the elderly, falls and consciousness disturbance showed high specificity but low sensitivity, leading to clinically irrelevant LR+ and LR−. On the other hand, information about appetite showed good performance for the diagnosis of CAP in the elderly. Nurse-assessed food consumption has been reported to predict bacteraemia with a sensitivity of 0.92 and negative predictive value of 0.98.20 Therefore, appetite loss could be an important predictor of infection. In the assessment of the added value of appetite loss to the model by van Vugt et al,11 calibration was improved, especially in the population with high probability; however, the improvement in discrimination was not significant. NRI was significant as more than 50% of the patients were correctly assessed to have increased probabilities. Results of the DCA showed higher net benefit of the extended model at a threshold appetite loss of 10%–30%. For example, applying the extended model at a threshold appetite loss of 20% could correctly detect CAP in two additional patients per 100 patients.
The strength of our study was that this was the first study that evaluated the external validity of existing models for CAP in the elderly. Based on our results, the model by van Vugt et al11 can be applied in the elderly population despite its relatively poor calibration. To the best of our knowledge, this was the first study to investigate and identify the value of information on appetite loss, independently and in addition to the model by van Vugt et al,11 for the diagnosis of CAP in the elderly. Because routine use of CXR might not be time efficient or cost-effective, estimating the diagnostic probability using information on appetite loss could be an effective strategy in elderly patients with respiratory symptoms. The current study was conducted in hospital settings where CXR was easily available, but three of the six existing models were derived in the hospital setting.6–8 Therefore, we believe that there was a need to correctly estimate the diagnostic probability of CAP in both hospital and primary care settings.
This study had several limitations. First, our sample size was relatively small. Because this study was conducted in acute care hospitals, many patients were ineligible because they were already prescribed antibiotics in the primary care setting. Therefore, a similar investigation on a larger sample size in a primary care setting should be further explored. Second, the frequency of CAP was relatively higher than that previously reported,12 probably because the number of patients with CAP was higher in the acute care setting than in the primary care setting. Therefore, the external validity of our results should be applied carefully in the primary care setting. Healthcare in Japan is a free-access system that allows people to be examined and treated at the medical institutions (ie, clinics, secondary hospitals and university hospitals) of their choice.21 Therefore, the differences in patient characteristics between the hospital and primary care settings might be less prominent in Japan than those in other countries where patients have to see their primary care physicians first. Third, although CT has been reported to be superior in terms of diagnostic accuracy,22 CXR was used as the reference standard of CAP in this study because performing CT on all patients was not feasible and ethical due to difficult access, higher radiation exposure and cost.
In conclusion, the model by van Vugt et al11 demonstrated relatively high performance among the existing models for the diagnosis of CAP in the elderly. Information about appetite loss independently demonstrated diagnostic utility for CAP in the elderly and improved the performance of the model by van Vugt et al.11
Contributors TT had full access to all data in the study; he takes responsibility for the integrity of the data and accuracy of the data analysis and wrote the first draft. YY designed the study, interpreted the data and drafted the paper. KT, MO, WM and HY collected and interpreted the data. MH, JM and TA designed the study, and collected and interpreted the data. S Fukuma and S Fukuhara supervised the research and revised the work critically for important intellectual content.
Funding This work was supported by a Grant-in-Aid for Epidemiological Research of St Luke’s Life Science Institute.
Competing interests None declared.
Patient consent Obtained.
Ethics approval Kyoto University Graduate School and Faculty of Medicine, Shirakawa Kosei General Hospital, Kimitsu Chuo Hospital, and Ashigarakami Hospital.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement No additional data are available.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.