Article Text
Abstract
Objectives In most European countries, innovative medical devices are not managed according to cost–utility methods, the reason being that national agencies do not generally evaluate these products. The objective of our study was to investigate the cost-utility profile of prostheses for hip replacement and to calculate a value-based score to be used in the process of procurement and tendering for these devices.
Methods The first phase of our study was aimed at retrieving the studies reporting the values of QALYs, direct cost, and net monetary benefit (NMB) from patients undergoing total hip arthroplasty (THA) with different brands of hip prosthesis. The second phase was aimed at calculating, on the basis of the results of cost–utility analysis, a tender score for each device (defined according to standard tendering equations and adapted to a 0–100 scale). This allowed us to determine the ranking of each device in the simulated tender.
Results We identified a single study as the source of information for our analysis. Nine device brands (cemented, cementless, or hybrid) were evaluated. The cemented prosthesis Exeter V40/Elite Plus Ogee, the cementless device Taperloc/Exceed, and the hybrid device Exeter V40/Trident had the highest NMB (£152 877, £156 356, and £156 210, respectively) and the best value-based tender score.
Conclusions The incorporation of value-based criteria in the procurement process can contribute to optimising the value for money for THA devices. According to the approach described herein, the acquisition of these devices does not necessarily converge on the product with the lowest cost; in fact, more costly devices should be preferred when their increased cost is offset by the monetary value of the increased clinical benefit.
- hip prosthesis
- net monetary benefit
- markov models
- tender
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
Strengths and limitations of this study
In most European countries, cost–effectiveness or cost–utility methods are not used to manage the procurement of medical devices;
To optimise the management of prostheses for hip replacement, the method described calculates the net monetary benefit and develops a score suitable for performing a tender;
Quality-adjusted life years play an essential role in this method of value-based procurement;
One strength of the method proposed is that devices generating an increased clinical benefit can be ranked better than those characterised by lower cost and standard benefit;
The main limitation of this approach is that reliable data are needed concerning outcomes and costs of individual devices.
Introduction
While, in the last decades, the methodology of clinical research has considerably advanced in the field of pharmaceuticals, a similar methodological progress has not occurred in the field of devices (especially class III devices).1 The methods of research for regulatory purposes are more advanced for drugs than for medical devices, and this difference is even more pronounced for cost–effectiveness and cost–utility.
In many countries, national drug agencies are systematically involved in the application of value-based methods to govern medicines, particularly innovative drugs. In contrast, innovative medical devices, in most countries, are kept out of this type of governance, the reason being that no national agencies take responsibility for managing devices.1
The aim of our study was to investigate the cost–utility profile of prostheses for hip replacement and to calculate, for each device, a score suitable for the process of procurement and tendering. In our view, original experiences in which cost–utility is applied to real-life procurement can be of interest to increasing the value for money of devices. The main reason why the present study was undertaken is represented by the widespread belief that, in European hospitals that use tenders for device procurement, the process of acquisition frequently uses the same approach applied to non-medical products; as a result, tenders generally select the device with the lowest cost and fail to prioritise the devices characterised by an increased cost, even when the magnitude of the increased clinical benefit justifies the increased cost.
A few preliminary experiences conducted by our group2–5 have shown that the magnitude of the clinical benefit associated with individual devices can be adequately handled in the procurement process if the calculation of net monetary benefit (NMB) is incorporated into the tendering methods. In this report, we describe the application of this original approach to a series of implantable devices used in the UK for total hip arthroplasty (THA).
Methods
Identification of the data source for our analysis
The first phase of our study was aimed at retrieving the values of quality-adjusted life years (QALYs), direct cost and NMB from patients undergoing THA with various brands of hip prostheses and identifying the most suitable dataset for application of modelling tools. For this purpose, we carried out a literature search using the PubMed database with the following key words: ‘(cost(titl) OR economic(titl)) AND hip AND (replacement OR arthroplasty OR prosthesis) AND Markov’. Among the articles identified through this search, we selected those satisfying the following criteria (‘eligible articles’): cost–effectiveness or cost – utility analysis based on Markov modelling; evaluation of at least three different brands of THA device with separate information on direct costs, QALYs, and NMB per patient for individual devices. Finally, to identify among eligible papers the single article to be used as a data source for our analysis, we made this decision by consensus of the authors.
Handling the data of direct cost, outcome and quality-adjusted survival in the study selected as information source for our analysis
The minimum information needed to carry out our analysis was represented by the values of QALYs estimated for the patients treated with the different devices, the rate of device revision, and the healthcare cost incurred by the patients. We planned to extract this information from the selected study in duplicate; two co-authors (AM and ST) separately performed this extraction and consequently identified the appropriate values for each item.
Calculation of NMB
NMB is defined as follows3 4 6:
(1)
where the clinical benefit of the device (expressed in QALYs per patient) is converted into a monetary benefit (expressed in £) by using a predetermined cost–utility threshold (£20 000 as in the study by Pennington et al); the cost of the device is expressed in £; the other treatment-related costs (OTRCs) are represented by a series of items that should be qualitatively the same across all treatments under examination. These OTRCs do not include the cost of the device, but always include the costs, other than the device cost, incurred in the short term (eg, accessories, etc). In addition, depending on the specific disease condition and the type of economic information actually available, these OTRCs may also include the costs incurred by the patients in the long term.
Finally, the equation of NMB can also be expressed by replacing the device cost and the OTRC with a single negative value, defined as the sum of the device cost plus the OTRC. Our analyses presented below have adopted this approach.
Estimation of the tender-based scores
The second phase of our study was aimed at calculating a value-based tender score for each device (defined according to standard tendering equations and adapted to a 0–100 scale). This allowed us to determine the ranking of each device in the simulated tender. In particular, to determine the value of QALYs and NMB per patient, we planned to directly use the values of QALYs and NMB reported in the selected paper; or alternatively, to recalculate these values using the computer programme (if publicly available) used in the selected study; or alternatively, to rewrite the Markov model using the language of a commercial software (Treeage Pro version 11, Treeage Inc, Williamstown, Massachusetts, USA) to allow us to recalculate these values.
Thereafter, the values of NMB for each individual device were used to generate a ranking across the comparators. This ranking was initially expressed in monetary units and then converted into a 0–100 scale (‘tender-based score’) where 0 is the score assigned to the worst comparator and 100 is the score assigned to the best one. Comparators associated to an intermediate ranking on the NMB scale were converted into an intermediate score on the same 0–100 scale (ie, a score greater than 0 and lower than 100 and based on a nonlinear proportionality). For administrative reasons, this score on a 0–100 scale is mandatory in European tenders.7
The following equation describes the calculation of the score8:
(2)
The above equation is available for online use at the following address: http://www.osservatorioinnovazione.net/tenders/nmb20000.php.
It should be noted that the above equation transforms the device-specific values of NMB into the corresponding device-specific values of the tender score. However, the rankings of the devices remain unchanged after this conversion from NMB to tender score.
Results
Identification of the source of data needed for our analysis
After selecting a total of 51 eligible papers from our literature analysis (date of the last PubMed search: 15 June 2017), we identified the studies by Pennington et al 9 and Pulikottil-Jacob et al 10 as potentially suitable data sources for our analysis. Of these two studies, we selected the article by Pennington et al 9 because this paper evaluated a larger number of device brands than the study by Pulikottil-Jacob et al.10 In fact, there were nine brands versus five brands, respectively, in these studies9 ,10 In more detail, the study by Pennington et al 9 used a Markov model to evaluate the cost–utility profile, expressed in terms of NMB, of the following nine brands of THA prostheses: Exeter V40/Contemp, Exeter V40/Duration, Exeter V40/Elite Plus Ogee (cemented prostheses), Corail/Pinnacle, Accolate/Trident, Taperloc/Exceed (cementless prostheses), Exeter V40/Trident, Exeter V40/Trilogy and CPT/Trilogy (hybrid prostheses). The horizon was lifetime, the yearly discount rate was 3.5%, and the willingness-to-pay threshold was set at £20 000 per QALY.
Another choice made in the present study regards the values of utility that Pennington and coworkers9 separately reported for the nine device brands (which were in turn pooled into three classes of devices, namely cemented, cementless and hybrid, with three different brands in each of these three classes of devices). First, the range of mean values of utility for these nine brands showed a limited variability (in men aged 70: minimum value=0.801, maximum=0.832; in women aged 70: minimum value=0.778, maximum=0.812); more importantly, in the study by Pennington and coworkers9 there were no statistics to compare the utilities across these nine brands of devices and no statistics to compare the values across the three different brands within each device class.6 Therefore, as a specific assumption of our research, we did not accept the hypothesis that these three classes of device had different intra-group utilities and instead we assigned to each of these three classes a single value of utility with no intra-group differences. This value of utility, assumed to be the same within each class, was calculated as the average of the three brand-related means weighted for their respective number of patients.
Since the modelling software used by Pennington et al 9 was not publicly available, we rewrote the Markov simulation procedure using the language of Treeage. Our computer programme can be downloaded as indicated in Messori.11
Estimation of the tender-based scores
Because we did not use the utility values from the study by Pennington et al,9 but we recalculated the means for these values assuming no variations within each device class, the values of QALYs and NMB were also recalculated for each device according to the Markov model11 and Equation (2), respectively.
The results are reported in table 1 (for cemented prostheses), table 2 (for cementless prostheses) and table 3 (for hybrid prostheses). Overall, these results were close to those originally published by Pennington et al in their Table III.11
Model parameters for cemented prostheses and estimated values of QALYs, NMB and tender score (reference population: men aged 70 years)
Model parameters for cementless prostheses and estimated values of QALYs, NMB and tender score (reference population: men aged 70 years)
Model parameters for hybrid prostheses and estimated values of QALYs, NMB and tender score (reference population: men aged 70 years)
Regarding cemented prostheses, our results in terms of ranking were identical to those published by Pennington et al 9 (ie, same rankings for the three devices in Pennington’s analysis and in ours). For cementless prostheses, the device that ranked first in our analysis (Taperlock/Exceed) ranked second in Pennington’s analysis. Likewise, for hybrid prostheses, the device ranking first in our analysis (Exeter V40/Trident) was the second in Pennington’s analysis.
Finally, table 4 shows the values of tender score calculated under the assumption of including all nine devices in a single tender lot. This analysis is of interest because the differences across the nine devices in QALYs per patient were not minimal. As in tables 1–3, the differences between our results of table 4 and those published by Pennington et al depend on the fact that our Markov model was slightly different from that used by Pennington et al. Furthermore, our analysis in table 4 assumed the same utility within each of the three device classes.
Model parameters estimated for all nine devices assuming a single tender lot (reference population: men aged 70 years)
Discussion
In nearly all countries, decisions about the procurement of medical devices continue to be based on the ‘traditional’ work of administrative offices in which outcomes are not managed through any specific clinical index or, at best, are managed through unstandardised scores developed at the local level. Hence, clinical results do not play any substantial role in making procurement decisions, because administrative algorithms do not differentiate between medical devices and materials not designed to yield a clinical benefit.
The experience described in this paper is of interest from several viewpoints. First, while most economic methods described herein are very similar to those used in previous cost–utility studies,9 10 the originality of our work lies in linking clinical outcomes with administrative decisions (namely, the procurement decisions adopted for these devices).
Overall, the results of our analysis confirm the clinical results reported in the original clinical studies. Accordingly, the pharmacoeconomic ranking from our study was headed by Exeter V40/Elite Plus Ogee for cemented prostheses, by Corail/Pinnacle and Taperloc/Exceed for cementless prostheses, and by Exeter V40/Trident and Exeter V40/Trilogy for hybrid prostheses.
Similarly, Exeter V40/Elite Plus Ogee (cemented prostheses), Corail/Pinnacle and Taperloc/Exceed (cementless prostheses), and Exeter V40/Trident and Exeter V40/Trilogy (hybrid prostheses) showed the best values of NMB, thus indicating that these devices have a more favourable cost–utility than the others. The same result was also given by the tender scores.
One important point of our methodology is that there were no mandatory decisions concerning how, and if, the tender scores should be applied in practice. For example, surgeons typically prefer having several types of prostheses to choose from for different types of patients. While tenders of course always generate a ranking among the devices under examination, this does not imply that only the best-ranking device should be purchased; decision-makers, depending on the specific context, can extend the procurement to some of the devices that ranked at second place or beyond.
From a practical point of view, one important issue is that our model of tender score estimation can separate the device cost from the other sources of direct cost per patient. Unfortunately, Pennington et al 9 did not report the unit cost for any of the nine devices; hence, our simulations could not present any examples showing how the unit cost of devices can influence the tender score. Of course, the computational algorithm (available online at http://www.osservatorioinnovazione.net/tenders/nmb20000.php) permits the cost of each device to be entered separately; this feature is essential for the prospective application of ‘our method’.
Finally, it should be recalled that, in the classic analysis based on cost–utility ratio (CUR), only two comparators are directly managed. For example, if A is the innovative therapy and B is the standard therapy (and assuming that all values of cost and quality-adjusted survival are normalised to one patient), CURAvsB is defined as: CURAvsB = (costA - costB)/(QALYA - QALYB). After this calculation, CURAvsB is evaluated against the predefined cost–utility threshold (T) (eg, £30 000 in the UK or around $100 000 in the US) to decide if using A as opposed to B has a favourable cost–utility (CUR <T) or an unfavourable cost–utility (CUR >T). The problem is that, while tenders generally evaluate three or more comparators, the design of the above equation manages just a single comparison, that is, only two comparators. This methodological point is discussed more thoroughly in Messori A4 and Messori A.13
Apart from the complexity of Markov models, the mathematical aspects of the procurement model described herein are simple. Despite this, implementing this approach in routine procurement decisions raises several practical issues that deserve discussion. First, the role of ‘small differences’ in the comparison across the various device scores should be interpreted correctly. For example, the values of NMB for individual devices were very similar to one another, and consequently the differences between the device-specific values of NMBs were small, particularly when they were expressed as percentages in relation to the size of NMB. This situation is extremely common in cost–utility analyses, irrespective of whether the main index is the incremental cost–utility ratio (ICUR) or the NMB. Rather, it should be stressed that the use of sums and subtractions, as required by the NMB, is advantageous from this point of view because ICURs determine a much worse situation (ie, with a clear instability in mathematical terms) when their denominator is represented by ‘small numbers’; ICURs in fact can generate a meaningless result tending to infinity when their denominator is a very small incremental benefit close to zero.
There were numerous limitations of our study. The usefulness of tender scores can be questioned. While the scores determined on a 0–100 scale are recommended by current administrative regulations, the clinical value of the method described herein is maximised when the final conversion of NMB into a tender score is not performed. In more detail, when the information about ranking in utility is converted from NMB into the tender scores, these scores, in qualitative terms, always confirm the rankings suggested by the utility data.
The main advantage of our method is that the NMB and the tender scores incorporate the clinical messages generated by the available evidence. Furthermore, they comply with both pharmacoeconomic theory and administrative requirements.
The information needed for application of ‘our method’ in the real world may sometimes be unavailable.14 While this can be an important practical obstacle in implementing this approach, one solution is that the call for tender (like those commonly published in the Official Journal of the European Community) should specify which data must be provided by individual applicants. We recognise, however, that most joint registries do not record outcomes except revision rates. Furthermore, it is not clear who would pay for the collection of these data.
Handling uncertainty in the situations described in the present study is another point of controversy. One limitation of our study is that we did not perform any sensitivity analysis (deterministic analysis of variability) about the results generated by our method; likewise, no statistical variations (such as those based on confidence intervals) were estimated around our results (stochastic analysis of variability). Conducting these analyses of variability (on the basis of NMB values) raises no particular problem in terms of feasibility. The difficult question regards the practical implications that the results of these variability analyses should have in the procurement process. In fact, one should keep in mind that the final result of our method is represented by an administrative decision of procurement, which has typically an all-or-nothing nature. In this context, determining the role of variability in the process of tendering is probably an issue that should remain the responsibility of the institution or committee that originally promoted the tender.
Our literature search was simplified to a considerable extent. On the other hand, it should be stressed that the objective of our search was not to select a series of studies for inclusion in a systematic review or in a meta-analysis, but simply to select a single study for use as a source of information for our simulations. So, we recognise that we assigned a marginal priority to the technical design of our literature search (including the evaluation of the quality of eligible papers).
Last but not least, the generalisability of these findings to other technologies or to procurement processes in different settings is another point that will require further consideration. At present, we have completed similar experiences in the field of knee replacement, mesh repair of ventral hernias, and thrombectomy devices for acute ischaemic stroke (Messori et al, unpublished observations; Trippoli et al, unpublished observations) with very promising results.
In conclusion, our analysis on simulated tenders indicated that the NMB has a good performance in capturing the differences in utility among different devices; more importantly, the method succeeded in assigning a ‘fair’ economic value to the increased utility demonstrated by the most effective devices. The main aim of our study was therefore met because we showed that the process of device ranking does not necessarily converge on the product with the lowest cost, but other devices can be preferred when their increased cost is offset by the monetary value of the increased clinical benefit.
All in all, our results demonstrated that, in the field of THA devices, incorporating the typical tools of cost–utility into the tendering process is feasible. In particular, bridging the methodology of NMB with the everyday practice of procurement can contribute to maximising the health returns generated by the in-hospital expenditures for these devices.
Footnotes
Contributors AM wrote the first draft of the manuscript; ST carried out the literature search; both authors contributed to subsequent drafts; CM guided the revision of the manuscript based on the comments from editors and reviewers.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement Our data sharing material is represented by the programming steps incorporated into the webfile that can be executed at http://www.osservatorioinnovazione.net/tenders/nmb20000.php.