Article Text
Abstract
Objective To assess the association between features of acute sore throat and the growth of streptococci from culturing a throat swab.
Design Diagnostic cohort.
Setting UK general practices.
Participants Patients aged 5 or over presenting with an acute sore throat. Patients were recruited for a second cohort (cohort 2, n=517) consecutively after the first (cohort 1, n=606) from similar practices.
Main outcome Predictors of the presence of Lancefield A/C/G streptococci.
Results The clinical score developed from cohort 1 had poor discrimination in cohort 2 (bootstrapped estimate of area under the receiver operator characteristic (ROC) curve (0.65), due to the poor validity of the individual items in the second data set. Variables significant in multivariate analysis in both cohorts were rapid attendance (prior duration 3 days or less; multivariate adjusted OR 1.92 cohort, 1.67 cohort 2); fever in the last 24 h (1.69, 2.40); and doctor assessment of severity (severely inflamed pharynx/tonsils (2.28, 2.29)). The absence of coryza or cough and purulent tonsils were significant in univariate analysis in both cohorts and in multivariate analysis in one cohort. A five-item score based on Fever, Purulence, Attend rapidly (3 days or less), severely Inflamed tonsils and No cough or coryza (FeverPAIN) had moderate predictive value (bootstrapped area under the ROC curve 0.73 cohort 1, 0.71 cohort 2) and identified a substantial number of participants at low risk of streptococcal infection (38% in cohort 1, 36% in cohort 2 scored ≤1, associated with a streptococcal percentage of 13% and 18%, respectively). A Centor score of ≤1 identified 23% and 26% of participants with streptococcal percentages of 10% and 28%, respectively.
Conclusions Items widely used to help identify streptococcal sore throat may not be the most consistent. A modified clinical scoring system (FeverPAIN) which requires further validation may be clinically helpful in identifying individuals who are unlikely to have major pathogenic streptococci.
- Primary Care
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/3.0/
Statistics from Altmetric.com
Article summary
Strengths and limitations of this study
-
This was one of the largest studies until now to develop a clinical score for streptococcal infection.
-
Two data sets were used to determine the most consistent variables.
-
Bootstrapping techniques were used to limit overfitting, but the score requires further validation.
Background
Antibiotic resistance is a major public health problem driven largely by antibiotic prescribing in primary care1 ,2 and it is important to minimise antibiotic use in patients who will not benefit.3 However, antibiotics are still prescribed in the majority of patients with acute sore throat, the commonest upper respiratory infection to present in primary care.4
Management of acute sore throat is often based on features associated with Lancefield group A β-haemolytic streptococci (GABHS), and clinical scores to predict GABHS have shown promise to be useful5––7 including the simple ‘Centor'criteria—3 of 4 of pus, cervical nodes, a history of fever and no history of cough. These criteria are widely advocated in clinical practice guidance,8–12 have some validation in large routinely collected data sets13 and are probably reasonably calibrated.14 However, concern has been raised about their use in low prevalence settings such as primary care,14 and these criteria have low specificity9 leading to high rates of overall antibiotic use.9 Furthermore, small studies in typical primary care settings have suggested that other features might be useful in refining the criteria—such as shorter prior duration, severity of pain and muscle ache.7 ,15 The issue of which variables most strongly predict streptococcal infections is therefore still not settled.
We previously reported evidence that group C and G streptococci present in a similar manner to group A16 and found that some of the variables which comprise very commonly used clinical prediction rules (such as purulence) might not be significant, and other variables not commonly used might be important (such as speed of presentation and severity of inflammation). This suggests that confirmation is needed regarding which variables are important and the need to assess a wide range of potential variables in different data sets.
We compare findings from a new cohort with the original cohort16 regarding the predictors of the presence of pathogenic streptococci including groups A, C and G in throat swab cultures from patients presenting with sore throat in primary care.
Methods
This study was designed to assess not only the validation characteristics of five widely available rapid streptococcal antigen tests (not reported here), but also which clinical variables were associated with streptococcal infection. The inclusion criteria, clinical data collection and the collection and transport of swabs have been described previously16 but will be summarised.
Inclusion
The target group were patients aged 5 years and over presenting to primary care clinicians with an acute sore throat (<2 weeks), where the sore throat was the predominant clinical feature (or where the clinician felt that the pharyngitis was driving the illness presentation), and with an abnormality on examination (erythema or pus of the throat—similar to a previous study in primary care17). Exclusions were where the clinician judged there were other causes of sore throat (eg, aphthous ulceration, candida and drugs) or unable to consent (eg, dementia and uncontrolled psychosis).
Clinical data
Following informed consent, baseline clinical data were collected by the health professional. The clinical proforma collected information on age; gender; current smoking status; history of quinsy; data on symptom severity for the symptoms of sore throat, difficulty swallowing, fever, cough, coryza (‘runny nose’) headache, muscle ache, abdominal pain, diarrhoea, vomiting, earache (each symptom was rated 0 no problem; 1 slight problem; 2 moderately bad problem; 3 severe problem); and examination for oral temperature using Tempadot thermometers,18 the severity of tonsillar and pharyngeal inflammation, the presence of cervical glands, tonsillar exudate, fetor and palatal oedema. Patients then completed a daily symptom diary until symptom resolution (not reported here).
Throat swabs
A throat swab was sent to a central laboratory, where culture and sensitivity were performed for all significant pathogens in line with National Standard Operating Proceedures.19 ,20 The mean time between specimen collection and receipt at the laboratory was 2.9 days (data incomplete for 13 samples). A swab was inoculated onto a blood agar plate and a staph/strep agar plate (E&O Laboratories Ltd, Bonnybridge, Scotland) and spread for single colonies. Plates were incubated anaerobically for 48 h.19 ,20 Plates were read after 24 h incubation and negative cultures reincubated for an additional 24 h. Suspected β-haemolytic streptococcal isolates were identified by a visual analysis of colony morphology and Lancefield grouping (PathoDx Strep Grouping Kit, Oxoid, UK), in accordance with the National Standard Operating Procedures.19 ,20 Antibiotic sensitivities were conducted using disc diffusion techniques.21
Sample size
In order to determine the association of clinical variables with streptococcal infection, assuming that at least one-third of individuals would have streptococci (based on our first data set), and that variables in the streptococcal group were found in 30–80% of individuals, then to detect a variable with an OR of 2 required 407 individuals with complete results.
Analysis
Primary analysis
Our original intention was to use a traditional ‘sequential'approach to the development and validation of clinical scores—to develop them in one data set, and due to the problem of overfitting in one data set, to then validate in another data set. However, some variables included in the score from the first data set did not perform well in the second data set (severity of sore throat, cervical glands) and some variables not included in the first score were significant in the second data set (fever and pus). This poor consistency resulted in very poor discriminatory performance of the first clinical score when used in the second data set. Since one data set was clearly insufficient to identify consistent variables, we used both data sets to identify variables and used bootstrapping to overcome the problem of overfitting. We have shown previously from the first data set that patients with group C and G β-haemolytic strains presented with similar clinical features to individuals with group A β-haemolytic strains. We first assessed the independent clinical features associated with combined group A, C and G streptococci in both data sets. Clinical variables were included in a logistic regression model to assess their association with the presence of Lancefield group A, C and G streptococci. Forward selection was used: variables were included if significant at the 10% level and retained in multivariate analysis if they remained significant at the 5% level. Missing variables were not imputed. Continuous variables were dichotomised using previous cut-offs (age 10 or less; prior duration longer than the median of 3 days7). Given the much higher asymptomatic carriage rates of streptococci in children,22 we did not include age in the final multivariate models. Clinical scoring systems or clinical prediction rules are most likely to be useful if they are simple to remember and use, which suggests that few variables should be used—preferably using a simple count of the predictive variables. We estimated the increase in area under the receiver operator characteristic (ROC) curve starting with the most predictive variables, with the aim of maximising the area under the curve (AUC) without including unnecessary variables, and generated a basic model using variables that were significant in multivariate analysis in both data sets. However, a clinical score using very few variables will potentially limit the grading of risk (since there will be fewer categories) and variable performance of one item in different cohorts will unduly affect validity. Hence, we also generated a score to include variables that were significant in univariate analysis in both data sets and multivariate analysis in at least one of the data sets.
Because any new model developed from a single data set may be overfitted, bootstrapped estimates are provided for the area under the ROC curve for internal validation for the new model (see table 2).23 For the Centor criteria (an established model), non boot-strapped estimates are provided.
Calibration
We assessed calibration of the scores by assessing the differences between the observed and expected percentages of streptococci using the χ2 test.
Secondary analyses
We also presented the results of alternative analyses (A) the sequential approach of generating a prediction rule in one data set and validating it in the second or (B) the combined approach: using a combined data set for greater power.
We also explored other approaches to variable selection such as using the criterion of being significant in univariate analysis in both data sets (which resulted in the same variable selection), exploring the impact of variable omission and substitution, and assessing the discrimination comparing the model having the exact logistic coefficients for each variable with the simple clinical score (ie, a score which comprised a simple count of the variables).
Results
The recruitment rate was estimated during the recruitment of the first cohort: the median recruitment (the number of patients/months recruiting) was 4.7 patients/month—close to the expected rate from national data.24 Patients from higher recruiting doctors (higher than the median—average 11.8 patients/month) compared with lower recruiting doctors (an average 2.6 patients/month) had very similar number of features that predict streptococcal infections (respectively a mean of 3.3 features and 3.4 features using the streptococcal score developed from the first data set), suggesting little or no recruitment bias based on clinical characteristics in these practices. Both cohorts used very similar practices: for the first data set, we recruited patients from 15 practices, and for the second data set, 12 of these 15 practices participated.
Patients were recruited from January 2007 until October 2008 (96% of patients were recruited after January 2008 when the first data set was completed). All 517 patients recruited in the second data period had some usable data, and complete data were available in 460 patients. In the second data set, pathogenic streptococci were found in 207 patients (40%), mainly A (143), C (30) and G (20) with some B (9), D (2) and F (3). These are very similar figures to the first data set.16
Primary analyses
The independent variables associated with Lancefield group A, C or G streptococci in the second data set are shown in table 1, with the univariate and multivariate ORs also reported from the first data set for ease of comparison. The clinical features predicting the significant presence of group A, C or G β-haemolytic streptococci in multivariate analysis in both data sets were rapid attendance (a short prior illness duration of 3 days or less; multivariate adjusted OR in the first data set 1.92; 1.67 in the second data set), fever in the last 24 h (ORs 1.69 and 2.40, respectively) and doctor assessment of severity of inflammation (severely inflamed tonsils: 2.28; 2.29). Additional variables significant in univariate analysis in both data sets as well as in multivariate analysis in at least one of the data sets were items suggesting a purely pharyngeal illness (the absence of coryza and the absence of cough), purulent tonsils and muscle aches. ‘Absence of coryza’ performed only marginally better than ‘absence of cough’ in the two data sets, so based on the similarity of these items and their performance, the helpful concept for clinicians of a purely oropharyngeal illness (ie, when both cough and coryza are absent) and the prior extensive use of ‘absence of cough’ in the Centor criteria, the consensus among the study team was to use the combined variable ‘absence of cough or coryza’ which also performed marginally better than either alone.
Table 2 shows the incremental performance in terms of area under the ROC curve as successive variables are added to the models in both data sets. There is modest improvement in AUC after the first three variables are added, and no improvement when the sixth variable (muscle aches) is added. However, if a basic score (model 3) is used, the grading of risk at lower scores is crude as few patients can be categorised as lower risk: only 19% of the first data set and 22% of the second data set score 0 and 15% and 22%, respectively, of these groups have streptococci (see online supplementary appendix 1 for the full table).
A Centor score of ≤1 was identified among 23% in the first cohort and 26% of the second cohort and streptococcal percentages were isolated in 10% and 28% of these groups, respectively (see table 3). By comparison, the extended five point score—Fever, Purulence, Attend rapidly (3 days or less), severely Inflamed tonsils and No cough or coryza (FeverPAIN; model 5 from table 2) provides a finer grading of risk and significantly more patients in both cohorts can be categorised as being at low risk of streptococcal infection (<20% chance of streptococci, see table 3): using the modified FeverPAIN score, there were more than 30% of patients scoring ≤1 (first data set 38%; second data set 36%) and fewer of these patients (13% and 18%, respectively) had streptococci, shown graphically in figure 1.
Calibration
FeverPAIN calibrated well in both data sets, with no significant differences between the percentages of observed and predicted presence of streptococci. The calibration of the Centor criteria was good in the first data set but poor in the second data set with significant differences between the percentages of observed and predicted presence of streptococci at low scores (see online supplementary appendix 2).
The ‘sequential’ approach of developing the first score and then testing it in the second data set demonstrated poor performance of the first score in the second data set. The approach of using a combined data set to provide more power generated an eight-item unwieldy score and obscured the major differences in performance between the data sets (see online supplementary appendix 3 for details).
Discussion
This study provides evidence to confirm that streptococcal sore throats are currently common in primary care and that Lancefield groups C and G make up a quarter of streptococcal sore throats. The study also confirms that the best predictors of streptococcal infection may not include some of the features traditionally used, and that traditional scoring systems may have limited clinical utility in identifying individuals who have a low likelihood of streptococcal infection that is, who do not need to have antibiotics.
Strengths and limitations of the study
These data sets are some of the largest from a typical primary care setting to have assessed the importance of the range of streptococci, and to have explored the range of potential clinical predictors of streptococcal infection. There were few missing data (less than 5% for any analysis), and little evidence of recruitment bias either in recruitment rates or clinical characteristics. The conventional approach of developing and validating a diagnostic model is to develop it in one data set and test it in another. However, the variability of performance of variables in these data sets and the poor discriminatory performance of the first score when used in the second data set suggest that such an approach is unlikely to provide the most reliable method of variable selection for a clinical prediction rule, supported by similar findings in the development of clinical prediction rules for other acute infections.25 This suggests that the choice of variables to include in clinical prediction models preferably should be based on multiple cohorts at different times and/or different settings. The alternative approach of combining data sets to increase power generated an eight-variable score with improved discrimination but is unwieldy for clinical purposes. The combined data set also hid the considerable variability between data sets in performance of both individual variables and also of the first score. Further support for the poor clinical utility of the first score comes from the trial,26 which demonstrates that using the first score does not significantly improve outcomes, similar to a previous trial of the Centor criteria which also demonstrated no impact on antibiotic use.27 Over and above the most basic model (short prior duration, severe inflammation, fever), the choice of additional variables to include (pus and ‘absence of cough and coryza’) was determined by consensus, including a consideration of the strength of prior evidence, but omission of key variables or substitution did not have major effects on the discrimination. Although we have provided bootstrapped estimates of the area under the ROC curve to limit overfitting, nevertheless the proposed model should have further validation.15
Main findings in the context of previous literature
Group A β-haemolytic streptococci have dominated previous literature due to their association with major non-suppurative adverse outcomes—particularly rheumatic fever and glomerulonephritis.5 Hence, the clinical predictors of group A infection5 ,6 ,7—especially pus, cervical nodes, a history of fever and no history of cough—have been widely used in clinical guidelines.8 ,10 ,22 Trials using these as inclusion criteria may have larger effect sizes for antibiotics than trials using less selected patients—although the validity of historical comparisons is questionable.28 We were unable to confirm the importance of cervical glands as a predictor of streptococcal infection in the second data set, and in the first data set we were unable to confirm the importance of purulence.7 ,9 From these two data sets, the features that may be most important are the speed of presentation (ie, symptoms developing rapidly resulting in a short prior duration of illness), the severity of tonsillar inflammation and fever. These variables have been identified in studies from typical primary care settings,7 ,15 but previous studies have been limited by the lack of multivariate analysis or limited power.
Clinical utility
Scoring systems are most helpful clinically for reducing antibiotic use if they identify as large a group as possible of individuals unlikely to have Streptococcus. From these data sets, the Centor criteria are likely to identify relatively few such individuals who do not have streptococci: only 23% in the first data set and 26% in the second data set had a score ≤1, and in the second data set the percentage of streptococci was high (28%). A low count (≤1) using a modified score (FeverPAIN) identified more than 35% of patients in both data sets who are unlikely to have streptococci (between 13% and 18%).
Conclusion
Items traditionally used to help identify presentations of streptococcal sore throat in primary care may not be reliable. Conventional clinical scoring systems may not be very helpful clinically in identifying individuals who are unlikely to have major pathogenic streptococci. A modified clinical rule developed for targeting Lancefield group A, C and G streptococci requires further validation, but should enable clinicians to target those at high risk of streptococcal infections and identify more than one third of those presenting with sore throat as being at low (<20%) risk of streptococcal infection.
Acknowledgments
The authors are grateful to all the patients and Health Care Professions who have contributed their time and effort and helpful insights to make PRImary care Streptococcal Management (PRISM) study.
References
Supplementary materials
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Files in this Data Supplement:
- Data supplement 1 - Online appendix
Footnotes
-
Collaborators PRISM Investigators—University of Southampton: PL, IW, MM, MM, EC, James Raftery, David Turner, Rafael Pinedo-Villanueva, JK, JB, Karen Middleton, Gerry Leydon; University of Oxford : Richard Hobbs, Richard McManus, DM, Paul Glasziou, Sue Smith, Diane Coulson; Health Protection Agency: CMN, Peter Hawtin.
-
Contributors Razia Meer-Baloch (senior trial manager) was involved in day-to-day coordination of the Birmingham study centre, and commented on drafts of the manuscripts. EC developed the protocol, and contributed to the quantitative analysis and drafting of the manuscript. Paul Glaszioudeveloped the protocol for funding, contributed to the management of the clinical studiesand commented on the manuscript. FDRH developed the protocol for funding, contributed to the management of all studies, supervised the Birmingham study centre and contributed to the drafting of the manuscript. JK and JB developed the protocol, provided day-to-day overall management of the study, coordinated recruitment in the lead study centre and coordination of other centresand commented on drafts of the manuscript. Gerry Leydon developed the protocol for funding, contributed to management and commented on drafts of the manuscript. PL had the original idea for the protocol, led the protocol development and the funding application, supervised the running of the lead study centre and coordination of centres, contributed to the analysis and led the drafting of themanuscript. DM developed the protocol for funding, supervised the running of clinical studies in the Oxford centre and contributed to the analysis and the drafting of the manuscript. Richard McManus developed the protocol for funding, contributed to the management of all studies, supervised the Birmingham Network and contributed to the drafting of themanuscript. sLisa McDermott developed the protocol and commented on drafts of the manuscript. CM and Gemma Lasseter developed the protocol and contributed to the management and write-up of the study. Peter Hawtin developed the protocol for funding, contributed to the design and running of the in vitro and diagnostic phases of the study. Karen Middletonprovided administrative support, developed data management protocols, coordinated data entry and commented on drafts of the manuscript. MM developed the protocol for fundingand contributed not only to the management of the studybut also to the analysis and drafting of the manuscript. Mark Mullee developed the protocol for funding, contributed to the management of the study, supervised data management, led the quantitative analysis and contributed to the drafting of the manuscript, IW developed the protocol for funding and also contributed to the management of the study and the drafting of the manuscript. Sue Smith, Mary Selwood and Diane Coulson provided day-to-day coordination of the Oxford study centre and commented on drafts of the manuscript.possible.
-
Funding This project was funded by the National Institute for Health Research Heath Technology Assessment (HTA) Programme (project number 05/10/01) and will be published in full in the Health Technology Assessment journal. Further information is available at www.nets.nihr.ac.uk/projects/hta/051001. This report presents independent research commissioned by the National Institute for Health Research (NIHR). The views and opinions expressed by authors in this publication are those of the authors and do not necessarily reflect those of the NHS, the NIHR, NETSCC, the HTA programme or the Department of Health.
-
Competing interests None.
-
Ethics approval This study was approved by a Multicentre Research Ethics Committee (number 06/MRE06/17).
-
Patient consent Obtained.
-
Provenance and peer review Not commissioned; internally peer reviewed.
-
Data sharing statement The authors are happy to share data and collaborate with other investigators as appropriate (eg, in larger merged individual patient data studies).