Test accuracy of faecal calprotectin for inflammatory bowel disease in UK primary care: a retrospective cohort study of the THIN data

Objective To estimate the test accuracy of faecal calprotectin (FC) for inflammatory bowel disease (IBD) in the primary care setting using routine electronic health records. Design Retrospective cohort test accuracy study. Setting UK primary care. Participants 5970 patients (≥18 years) without a previous IBD diagnosis and with a first FC test between 1 January 2006 and 31 December 2016. We excluded multiple tests and tests without numeric results in units of µg/g. Intervention FC testing for the diagnosis of IBD. Disease status was confirmed by a recorded diagnostic code and/or a drug code of an IBD-specific medication at three time points after the FC test date. Main outcome measures Sensitivity, specificity, and positive and negative predictive values for the differential of IBD versus non-IBD and IBD versus irritable bowel syndrome (IBS) at the 50 and 100 µg/g thresholds. Results 5970 patients met the inclusion criteria and had at least 6 months of follow-up data after FC testing. 1897 had an IBS diagnosis, 208 had an IBD diagnosis, 31 had a colorectal cancer diagnosis, 80 had more than one diagnosis and 3754 had no subsequent diagnosis. Sensitivity, specificity, and positive and negative predictive values were 92.9% (88.6% to 95.6%), 61.5% (60.2% to 62.7%), 8.1% (7.1% to 9.2%) and 99.6% (99.3% to 99.7%), respectively, at the threshold of 50 µg/g. Raising the threshold to 100 µg/g missed less than 7% additional IBD cases. Longer follow-up had no effect on test accuracy. Overall, uncertainty was greater for specificity than sensitivity. General practitioners’ (GPs’) referral decisions did not follow the anticipated clinical pathways in national guidance. Conclusions GPs can be confident in excluding IBD on the basis of a negative FC test in a population with low pretest risk but should interpret a positive test with caution. The applicability of national guidance to general practice needs to be improved.

• Authors should include in the statistical analysis the ROC curve analysis and the likelihood ratios. On account the ratio post-pre PPV 8.1%/4.5% for IBD detection, the + LR should be below 2, thus with an extremely low discriminative capacity. • Analysis restricted to differentiate IBD from IBS. This analysis has little value, as long as GPs will have to evaluate patients with abdominal symptoms and the IBS will be based on clinical persistent symptoms with no organic origin. In fact, it would be relevant if authors produce an analysis of the diagnostic accuracy of the calprotectin in patients with persistent symptoms. In this sense, it would be extremely relevant to know the PPV in relation with the symptoms persistence that produced the calprotectin determination • In the impact of test results on referral decisions, the authors are including an interpretation of the results. They should include any interpretation in the discussion section.
• Figure 2: It should be clearly improved, and the AUC added

GENERAL COMMENTS
General remark: The study tries to estimate the test accuracy of faecal calprotectin (FC) for inflammatory bowel disease (IBD) in the primary care setting. There are some consistent evaluated data. However, Likelihood ratios and their interpretation are not described. These have several properties that make them more useful clinically than other measures of diagnostic test performance. Sensitivity and specificity are calculated for the test when the disease statuses of the patients are known. Positive and negative predictive values are affected by changes in the prevalence of disease. Therefore, the interpretation of a test is highly dependent on the context in which it is used. Likelihood ratios do not have the drawbacks of mentioned measures of test performance. It would be essential, to calculate and interpret positive and negative Likelihood ratios.
Specific remarks: • Abstract: IBS should be explained for the first time.
• Introduction: Given the aim of the study, it is important to indicate the prevalence of inflammatory bowel disease (IBD) in UK and in the world.

Reviewer 1
My main concern is derived from the IBD diagnosis criteria. What is the accuracy of the gold standard used for IBD diagnosis in comparison with the endoscopic diagnosis?
We completely agree that the reference standard is a limitation of test accuracy studies using routine data as we do not know the accuracy of lists of drugs and codes. We have discussed this limitation. However, using a coded IBD diagnosis as confirmation of disease is based on the knowledge that IBD diagnoses are not generally coded without confirmatory testing. In clinical practice the diagnosis of IBD, in general, mawill be after a colonoscopy. We recognise this bias and concluded that our pragmatic test accuracy study reflects how FC testing works in real life rather than proving the accuracy of FC testing in categorising IBD and IBS patients, which has been done before.
We added the following paragraph to strengthen the discussion around verification using routine data: In fact, the authors do not include which are the IBD specific prescription. They have to take into account that certain presciptions are not specific of IBD: azathioprine, infliximab.
We only included IBD specific prescriptions in the definition of IBD. Azathioprine and infliximab were not included. We have included the drug code list in the appendix for clarification.
I am worried also of the criteria used for the IBS diagnosis.
IBS coding might be less reliable than IBD coding since there is no diagnostic marker at present to confirm IBS, which is a diagnosis by exclusion (i.e. by ruling out other conditions). Therefore, there may be more IBS codes missing than IBD codes. However, the nondiseased group in our study is non-IBD rather than IBS (which is not applicable to primary care), so missing codes are still correctly classified as non-IBD. We apologise that the reporting was not sufficiently clear. The study first reports diagnoses recorded at any time after FC testing to characterise the study population. The analyses of test accuracy study only considered IBD diagnoses within 6 months, 12 months and 24 months, respectively as reported in the methods.
We have changed the following sentence to clarify what we report in the characterisation of the FC tested population: Table 2).

1,987 (32%) had an IBS diagnosis, 208 (3.5%) an IBD diagnosis, 31 (0.5%) a colorectal cancer (CRC) diagnosis, 78 (1%) had an IBD and an IBS diagnosis, 2 had an IBD and a CRC diagnosis and 3,754 (63%) had no diagnosis recorded at any time after the FC test (see online Supplementary
In fact, I would suggest to delete the analysis at the 24 month period, as long as only 44% of the initial patient are evaluated and the follow-up is too long. The three different follow-up times were included to explore the effect of late diagnosis on test accuracy. We have shown that late diagnosis has little impact on test accuracy which supports our study findings. We have included the 24 months follow-up based on 1) the study by Varicka et al. 2012 who report a IQR of 3-24 months from symtpoms onset to CD diagnosis and 2) patient accounts from our PPI group who reported long delays in diagnosis ("a hellish 10 or 18 months"). We prefer to include the analysis for completion. Authors should include in the statistical analysis the ROC curve analysis and the likelihood ratios. On account the ratio post-pre PPV 8.1%/4.5% for IBD detection, the + LR should be below 2, thus with an extremely low discriminative capacity.
We have added the LRs to the analysis for 6 months (PPV 8.1% and prevalence 3.5%) and report the following paragraph. The LR+ 2.41 and PPV 8.1% refer to results at 6 months. The prevalence at 6 months was 3.5% (see table 1 in the manuscript). The prevalence of 4.5% is for the analysis at 12 months.
We prefer to keep to our initial decision not to report the AUCs as they don't add anything to the interpretation of the test accuracy of FC testing in primary care. Only the test accuracy of FC testing at specific clinically relevant threshold is useful for clinicians and their decision making.
Analysis restricted to differentiate IBD from IBS. This analysis has little value, as long as GPs will have to evaluate patients with abdominal symptoms and the IBS will be based on clinical persistent symptoms with no organic origin.
We completely agree. It was important for us to report the impact of an analysis of IBD versus IBS on the test accuracy because UK national guidance was based on estimates from such analysis and they recommend FC testing for the differential of IBD versus IBS.
In fact, it would be relevant if authors produce an analysis of the diagnostic accuracy of the calprotectin in patients with persistent symptoms. In this sense, it would be extremely We agree that a test accuracy study of eligible patients based on persistent abdominal symptoms would be useful. We included a sensitivity analysis of patient with eligible relevant to know the PPV in relation with the symptoms persistence that produced the calprotectin determination.
symptoms (see Table 3). The different size in study populations (569 vs 5970 patients) revealed little overlap between tested and eligible patients. Furthermore, identifying patients with persistent symptoms is not feasible as this cannot be easily operationalised using routine electronic health care records of coded clinical data.
In the impact of test results on referral decisions, the authors are including an interpretation of the results. They should include any interpretation in the discussion section.
Thank you, we have moved the relevant paragraph below into the discussion.  Please see our response about AUCs above.

Reviewer 2
Sensitivity and specificity are calculated for the test when the disease statuses of the patients are known. Positive and negative predictive values are affected by changes in the prevalence of disease. Therefore, the interpretation of a test is highly dependent on the context in which it is used. Likelihood ratios do not have the drawbacks of mentioned measures of test performance. It would be essential, to calculate and interpret positive and negative Likelihood ratios.
We have added the following paragraph reporting the LRs to the results. Abstract: IBS should be explained for the first time.
Thank you, we have added the explanation at the first mention of IBS.
Introduction: Given the aim of the study, it is important to indicate the prevalence of inflammatory bowel disease (IBD) in UK and in the world.
We have added the following paragraph on IBD prevalence in the UK and worldwide to the introduction.