Article Text

Download PDFPDF

Common evidence gaps in point-of-care diagnostic test evaluation: a review of horizon scan reports
  1. Jan Y Verbakel1,
  2. Philip J Turner1,
  3. Matthew J Thompson2,
  4. Annette Plüddemann1,
  5. Christopher P Price1,
  6. Bethany Shinkins3,
  7. Ann Van den Bruel1,4
  1. 1 Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, UK
  2. 2 Primary Care Innovation Lab, Department of Family Medicine, University of Washington, Seattle, Washington, USA
  3. 3 Test Evaluation Group, AUHE, Leeds Institute of Health Sciences, University of Leeds, Leeds, UK
  4. 4 Julius Center for Health Sciences and Primary Care, University of Utrecht, Utrecht, The Netherlands
  1. Correspondence to Dr Jan Y Verbakel; jan.verbakel{at}phc.ox.ac.uk

Abstract

Objective Since 2008, the Oxford Diagnostic Horizon Scan Programme has been identifying and summarising evidence on new and emerging diagnostic technologies relevant to primary care. We used these reports to determine the sequence and timing of evidence for new point-of-care diagnostic tests and to identify common evidence gaps in this process.

Design Systematic overview of diagnostic horizon scan reports.

Primary outcome measures We obtained the primary studies referenced in each horizon scan report (n=40) and extracted details of the study size, clinical setting and design characteristics. In particular, we assessed whether each study evaluated test accuracy, test impact or cost-effectiveness. The evidence for each point-of-care test was mapped against the Horvath framework for diagnostic test evaluation.

Results We extracted data from 500 primary studies. Most diagnostic technologies underwent clinical performance (ie, ability to detect a clinical condition) assessment (71.2%), with very few progressing to comparative clinical effectiveness (10.0%) and a cost-effectiveness evaluation (8.6%), even in the more established and frequently reported clinical domains, such as cardiovascular disease. The median time to complete an evaluation cycle was 9 years (IQR 5.5–12.5 years). The sequence of evidence generation was typically haphazard and some diagnostic tests appear to be implemented in routine care without completing essential evaluation stages such as clinical effectiveness.

Conclusions Evidence generation for new point-of-care diagnostic tests is slow and tends to focus on accuracy, and overlooks other test attributes such as impact, implementation and cost-effectiveness. Evaluation of this dynamic cycle and feeding back data from clinical effectiveness to refine analytical and clinical performance are key to improve the efficiency of point-of-care diagnostic test development and impact on clinically relevant outcomes. While the ‘road map’ for the steps needed to generate evidence are reasonably well delineated, we provide evidence on the complexity, length and variability of the actual process that many diagnostic technologies undergo.

  • Point-of-care Systems
  • Diagnosis
  • PRIMARY CARE
  • framework
  • evidence based medicine
  • horizon scanning reports

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • This study provides the first data on evidence gaps in point-of-care diagnostic test evaluation in primary care, answering an important clinical need.

  • We extracted data from multiple consistently conducted horizon scan reports.

  • Our approach might ignore relevant research, but systematic evidence gaps identified suggest findings to be robust.

  • Our analyses are limited by the horizon scan report publication date.

Introduction

Primary care is becoming increasingly complex due to a rise in patients with multimorbidity and polypharmacy, the pressure of short consultation times and the fragmented nature of primary and secondary care. Delayed or missed diagnoses are the most common reason for malpractice claims.1 Therefore, there is a huge demand for innovations that enable efficient and accurate diagnostic assessment within a general practitioner (GP)’s consultation. Consequently, the development of point-of-care diagnostic tests is currently a hotbed of activity.2 These tests have the potential to significantly improve the efficiency of diagnostic pathways, providing test results within the time frame of a single consultation, enabling them to influence immediate patient management decisions.

A potential barrier to this innovative activity however is the slow and haphazard nature of the current pathway to adoption for new healthcare technologies.3 This is particularly the case for diagnostic tests, where uptake is highly variable between settings and notable inconsistencies lie in the speed at which they are adopted.4 One possible cause of this inefficiency is the slow generation of evidence of efficacy relevant to the target clinical settings.

To provide an efficient means of identifying, summarising and disseminating the evidence for emerging diagnostic technologies relevant to primary care settings, the Oxford Diagnostic Horizon Scanning Programme was established in 2008 (currently funded by the National Institute for Health Research (NIHR) Oxford Diagnostic Evidence Co-operative).5 New technologies are identified through systematic literature searches and interactions with clinicians and the diagnostics industry. These are then prioritised using a defined list of criteria.6 Evidence is gathered using systematic searches of the published literature and supplementary information obtained from manufacturer or trade websites and through web search engines, which are then used to summarise the analytical and diagnostic accuracy of the point-of-care test,7 impact of the test on patient outcomes and health processes, cost-effectiveness of the test and current guidelines for use within routine care in the UK. The reports, indexed in the TRIP database8 and freely available from the Horizon Scan Programme’s website (www.oxford.dec.nihr.ac.uk), are actively disseminated to the NIHR Health Technology Assessment Programme, the National Institute for Health and Clinical Excellence, clinical researchers and commissioners of healthcare services and highlight any further research requirements to facilitate evidence-based adoption decisions.

To date, 40 horizon scan reports have been completed, all following an identical protocol.

These horizon scan reports provide a unique opportunity to describe the evidence trajectory of new point-of-care diagnostic tests relevant to primary care settings and identify common evidence gaps.

Methods

This is a descriptive study of all 40 horizon scan reports published to date by the Oxford Horizon Scan programme. For each horizon scan report, we extracted the year of publication and the disease area (classified per clinical domain of the International Classification of Primary Care—Revised Second Edition (ICPC2-R) 17) (see online supplementary file 1). We subsequently reviewed all studies that were included in the horizon scan reports (including systematic reviews) and extracted data on year of publication, size of the study, point-of-care test device(s) and its intended role. The intended roles were defined as ‘triage’, in which the new test is used at the start of the clinical pathway, ‘replacement’, in which the new test replaces an existing test, either as a faster equivalent test or to replace a non-point-of-care laboratory test, or ‘add-on’ in which the new test is performed at the end of a clinical pathway.9 Depending on the role, different types of evidence are required before a new point-of-care test can be adopted in routine care.10

Supplementary Material

Supplementary material 1

We extracted data on study design and primary outcomes and used the dynamic evidence framework developed by Horvath et al,11 as shown in figure 1, to classify the type of evidence, defined as (1) analytical performance, (2) clinical performance, (3) clinical effectiveness, (4) comparative clinical effectiveness, (5) cost-effectiveness and (6) broader impact.

Figure 1

Horvath et al 11’s cyclical framework for the evaluation of diagnostic tests. This framework illustrates the key components of the test evaluation process. (1) Analytical performance is the aptitude of a diagnostic test to conform to predefined quality specifications. (2) Clinical performance examines the ability of the biomarker to conform to predefined clinical specifications in detecting patients with a certain clinical condition or in a physiological state. (3) Clinical effectiveness focuses on the test’s ability to improve health outcomes that are relevant to an individual patient, also allowing comparison (4) of effectiveness between tests. (5) A cost-effectiveness analysis compares the changes in costs and health effects of introducing a test to assess the extent to which the test can be regarded as providing value for money. (6) Broader impact encompasses the consequences (eg, acceptability, social, psychological, legal, ethical, societal and organisational consequences) of testing beyond the above-mentioned components.

Analytical performance is the aptitude of a diagnostic test to conform to predefined quality specifications.12 13 Clinical performance examines the ability of the biomarker to conform to predefined clinical specifications in detecting patients with a particular clinical condition or in a physiological state.14 Clinical effectiveness focuses on the test’s ability to improve health outcomes that are relevant to the individual patient.14 A cost-effectiveness analysis compares the changes in costs and in health effects of introducing a test to assess the extent to which the test can be regarded as providing value for money. Broader impact encompasses the consequences (eg, acceptability, social, psychological, legal, ethical, societal and organisational consequences) of testing beyond the above-mentioned components.

For point-of-care tests that had evidence on each of these components, we calculated the median time (in years) for a technology to complete the evaluation cycle.

We assessed whether the study was conducted in a setting that was relevant for primary care, which was defined as GP surgeries (clinics), outpatient clinics, walk-in (or urgent care) centres and emergency departments. Data extraction was piloted on 20 reports by BS and checked by JV and PT after which improvements were made to the final data extraction sheet. Three authors (JYV, BS and PJT) independently single-extracted data of the included studies.

Results

We screened 40 horizon scan reports and extracted data from the 500 papers (including 41 systematic reviews) referenced by these reports (table 1). Ten horizon scan reports examined a point-of-care test relevant to cardiovascular disease, six to respiratory diseases and five to each of endocrine/metabolic diseases, digestive diseases and general/unspecified diseases. A further nine horizon scan reports examined a health problem relevant to a range of other disease areas. The intended role of the test was triage in 14 (35%), replacement in 20 (50%) and add-on in 6 (15%) of the 40 horizon scan reports.

Table 1

Baseline characteristics of horizon scan reports by clinical domain

We found a median of nine primary studies (IQR 7–15.8) per horizon scan report with a median time between first and last publication of 10 years (IQR 6.8–13.3). Across all horizon scan reports, on average, only 19% (95% CI 11.4% to 20.7%) of studies were performed in primary care (figure 2).

Figure 2

Setting (%) of the studies by disease area (according to the International Classification of Primary Care-Second edition coding).

Of all studies, 30.4% (n=152) assessed analytical performance of the diagnostic technology, providing evidence for this component in 25 (62.5%) of the 40 horizon scan reports.

Clinical performance was evaluated in 71.2% (n=356) of all studies, while only 18.2% (n=91) of studies evaluated clinical effectiveness of the diagnostic technology. A further 10.0% (n=50) compared clinical effectiveness of two or more point-of-care tests, and only 8.6% (n=43) of the 500 papers evaluated cost-effectiveness (figure 3).

Figure 3

Test evaluation component by disease area in absolute number (n) of studies.

Clinical performance was often assessed earlier (in 16 tests) or together (in 12) with analytical performance and not assessed at all for 6 tests. Broader impact such as acceptability was tested before evidence on clinical effectiveness or cost-effectiveness was available in 11 horizon scan reports.

Figure 4 shows the number of years between the horizon scan report and original paper publication date for each evaluation component, split by intended role of the point-of-care test. The size of the bubbles represents the number of studies proportionate to all studies for the intended role, clearly depicting the emphasis on clinical performance and paucity of clinical effectiveness studies. Furthermore, tests acting as a triage instrument tend to spend more time on evidence generation than tests replacing an existing one or add-on tests performed at the end of the clinical pathway.

Only seven (17.5% (95% CI 7.3% to 32.8%)) horizon scan reports included evidence for all evaluation components with a median time to complete the evaluation cycle of 9 years (IQR 6–13 years). Of these, tests acting as a triage instrument (in three reports) had a median of 15 years (IQR 10–19) while tests replacing an existing one (in four reports) had 9 years (IQR 5–11) (figure 4).

Figure 4

Number of years between horizon scan report and original paper publication date by the intended role for each evaluation component. Size of bubbles represents number of studies proportionate to all studies for the intended role. BNP, B-natriuretic peptide; CRP, C reactive protein; FOBT, faecal occult blood test; HbA1c, glycated haemoglobin; hCG, human chorionic gonadotropin; hFABP, heart-type fatty acid-binding protein; INR, international normalised ratio; TSH, thyroid-stimulating hormone; WBC, white cell count.

Even in the latter category of diagnostic technology replacing existing tests, where nearly half of studies (49.4%; n=247) have been performed, there was a clear imbalance between studies merely focusing on analytical or clinical performance (87.4%) and the few studies advancing to clinical effectiveness (21.1%).

The sequence of evidence generation over time for the seven horizon scan reports which had completed the evaluation cycle varied widely, as shown in figure 5. The size of the bubbles represents the proportion of studies for each evaluation component. The grey arrow shows the sequence we would expect, starting at analytical performance (at 12 o’clock) and completing at broader impact analysis (at 10 o’clock). The arrows and numbered bubbles represent the actual time sequence of evidence generation.

Figure 5

Sequence of evidence generation for all seven horizon scan reports completing the full evaluation cycle.  INR, international normalised ratio.

Very few point-of-care test evaluations seem to follow the expected sequence. In fact, only the report on point-of-care C reactive protein testing seemed to generally follow a linear temporal sequence from analytical performance towards broader impact. Some diagnostic technology, such as point-of-care international normalised ratio (INR) testing, had evidence generated for the broader impact component before any other component, suggesting that some diagnostic technologies are adopted in routine clinical care prior to any published evidence on clinical performance or effectiveness.

Discussion

Main findings

Our findings suggest that most point-of-care diagnostic tests undergo clinical performance assessment, but very few progress to evaluation of their broader impact or cost-effectiveness, even in the more established and frequently reported clinical domains, such as cardiovascular disease. Some point-of-care tests even skip essential stages such as clinical effectiveness, yet are still implemented in routine care. We present a novel way to visualise the gaps in the evidence generation through bubble plots and dynamic cycle illustration.

Strengths and limitations

Our study provides novel data on common evidence gaps in the evaluation of novel point-of-care tests for a wide range of clinical conditions. The extensive library of existing horizon scan reports and the methodological rigour in which they are produced provided an ideal opportunity to review the pathway of evidence for novel point-of-care diagnostic technologies relevant to the primary care settings. The chosen topics of the reports result from a comprehensive approach to identify new or emerging diagnostic tests, including literature searches and interaction with the diagnostics industry and clinicians, prioritising technologies relevant to primary care. We have, however, no measure of how reproducible this prioritisation process is, and potentially risks greater or lesser inclusion of various disease areas. Evidence from reports on other clinical topics might provide different findings. Further to this, our review is limited to publication date of each report, thus potentially overlooking evidence generated following publication. Our approach might arguably ignore relevant (unpublished) research carried out (eg, studies performed during test development by industry), but the commonalities in the evidence gaps across the reports suggest that our findings are robust.

Three authors (JYV, BS and PJT) independently single-extracted data of the 500 included studies, making it impossible to test for inter-rater agreement.

Comparison with the existing literature

Previous evidence has shown that the adoption of a diagnostic technology is often insufficient to achieve a benefit, and in most cases, a change of care process is essential.15 The market for point-of-care tests is growing rapidly,16 and there is a clear demand from primary care clinicians for these tests to help them diagnose a range of conditions.17 18 Critical appraisal of new diagnostic technologies is considered essential to facilitate implementation.11 Several evaluation frameworks have been identified,19 most of which describe the evaluation process as a linear process, similar to the staged evaluation of drugs.

Considering the interactions between different evaluation components and the need for certain tests to re-enter the evaluation process after updates to the underlying technology,20 it may be more realistic to assess these components in a cyclic and repetitive process.11

Implications

The slow adoption of novel point-of-care tests may result from the paucity of technologies following the expected sequence of evidence generation. Specifically, there is a need to shift emphasis from examining clinical performance of point-of-care tests to comparative clinical effectiveness and broader impact assessment. We recommend using a structured dynamic approach, presenting the results in a visually appealing manner for both industry (during development and pursuit for regulatory approval) and research purposes. Policy-makers and guideline developers should be aware of this cyclical nature; assuming test evaluation is a linear process results in a less efficient evidence generation pathway. For example, assessing the cost-effectiveness early on in the development phase of novel point-of-care test can help determine where exactly it fits in the clinical pathway and thus ensure that the evidence subsequently generated is relevant to that population and setting.

Conclusions

Considering that evidence generation for new tests takes on an average 9 years, test developers need to be aware of the time and investment required. While the ‘road map’ for the steps needed to generate evidence are reasonably well delineated, we provide evidence on the complexity, length and variability of the actual process that many diagnostic technologies undergo.

Acknowledgments

This article presents independent research part funded by the NIHR. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health. The authors would like to thank all original horizon scan report authors.

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.

Footnotes

  • Contributors BS, MJT and AVdB conceived the study. JYV, PJT and BS did data extraction. JYV designed and performed the analyses, which were discussed with PJT, MJT, AP, CPP, BS and AVdB. JYV drafted this report and JYV, PJT, MJT, AP, CPP, BS and AVdB codrafted and commented on the final version. All authors had full access to all of the data (including statistical reports and tables) in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. JYV affirms that the manuscript is an honest, accurate and transparent account of the study being reported; that no important aspects of the study have been omitted. All authors read and approved the final manuscript.

  • Funding JYV, PJT, MJT, AP, CPP, BS and AVdB are supported through the National Institute for Health Research (NIHR) Diagnostic Evidence Co-operative Oxford at Oxford Health Foundation Trust (award number IS_DEC_0812_100).

  • Competing interests MJT has received funding from Alere to conduct research and has provided consultancy to Roche Molecular Diagnostics. He is also a co-founder of Phoresia which is developing point-of-care tests. All the other authors have no competing interests to disclose.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement All data for these analyses are included in the manuscript or online appendices. No additional data available.