Objectives Preventing adverse events (AEs) after orthopaedic surgery is a field with great room for improvement. A Swedish instrument for measuring AEs after hip arthroplasty based on administrative data from the national patient register is used by both the Swedish Hip Arthroplasty Register and the Swedish Association of Local Authorities and Regions. It has never been validated and its accuracy is unknown. The aim of this study was to validate the instrument’s ability to detect AEs, and to calculate the incidence of AEs following primary hip arthroplasties.
Design Retrospective cohort study using retrospective record review with Global Trigger Tool methodology in combination with register data.
Setting 24 different hospitals in four major regions of Sweden.
Participants 2000 patients with either total or hemi-hip arthroplasty were recruited from the SHAR. We included both acute and elective patients.
Primary and secondary outcome measures The sensitivity and specificity of the instrument. Adjusted cumulative incidence and incidence rate.
Results The sensitivity for all identified AEs was 5.7% (95% CI: 4.9% to 6.7%) for 30 days and 14.8% (95% CI: 8.2 to 24.3) for 90 days, and the specificity was 95.2% (95% CI: 93.5% to 96.6%) for 30 days and 92.1% (95% CI: 89.9% to 93.8%) for 90 days. The adjusted cumulative incidence for all AEs was 28.4% (95% CI: 25.0% to 32.3%) for 30 days and 29.5% (95% CI: 26.0% to 33.8%) for 90 days. The incidence rate was 0.43 AEs per person-month (95% CI: 0.39 to 0.47).
Conclusions The AE incidence was high, and most AEs occurred within the first 30 days. The instrument sensitivity for AEs was very low for both 30 and 90 days, but the specificity was high for both 30 and 90 days. The studied instrument is insufficient for valid measurements of AEs after hip arthroplasty.
- adverse events
- hip arthroplasty
- global trigger tool
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
Strengths and limitations of this study
The use of one of the most sensitive method for identifying AEs (retrospective record review with Global Trigger Tool methodology).
The multi-centre study design, which includes a large sample size comprising both acute and elective patients.
The use of the Swedish personal number in combination with the national register ensured that no admissions were missed.
Our results are only generalisable to healthcare systems where International Classification of Disease codes are used to measure AEs.
Adverse events (AEs) following surgery are a major challenge in the field of orthopaedics. Hip arthroplasty is one of the most successful procedures in modern medicine, and the technical improvements since Charnley arthroplasty have been minor.1
Preventing AEs is a field with great room for improvement. Complication rates after hip arthroplasty are between 3.4% and 27%.2–4 However, comparison of AE rates should be done with caution.5 Two reasons for this is (1) there are no globally accepted definitions of AEs after hip arthroplasty6 and (2) there are many different methods for identifying AEs, which complicate comparisons.7
The method that has been proven to be most sensitive compared with others is retrospective record review (RRR) by trained reviewers.8–10 Another method for identifying and measuring AEs is by using administrative data and International Classification of Diseases (ICD) codes.11
The Swedish Hip Arthroplasty Register (SHAR) issues a yearly report that includes the AE rate after hip arthroplasty.12 This AE rate is generated from an instrument that uses administrative data with a set of selected AE ICD-10 codes (see online supplementary table A1), which are found in the Swedish National Patient Register (NPR).13 Thus this report is not based on SHAR data but on NPR data, and the same instrument is used by the Swedish Association of Local Authorities and Regions in a public accessible web application named Healthcare in Numbers (HIN).14 The major difference about HIN and SHAR concerns the definition of the population. HIN is based on NPR procedure codes and SHAR is based on hospitals recording of interventions into the register.
The instrument only uses codes that are registered during discharge from readmissions. AEs that occur during the index admission are not included.
Despite this widespread usage, we know nothing of its sensitivity and specificity. While NPR’s primary ICD-codes are known to be accurate (but with some variation between diagnoses),15 we do not know the accuracy for secondary codes. We also do not know how well this set of codes and their selection are suited for detecting AEs.
The aim of this study was to validate the instrument’s ability to detect AEs, and to calculate the incidence of AEs following primary hip arthroplasties.
This is a retrospective multi-centre cohort study on prospectively collected data from medical records and register data from SHAR and NPR.
The calculated sample size was estimated to be 2000 patients, assuming 5%–10% inconclusive records, using an alpha level of 0.05 and a power minimum of 80%. The main assumptions regarding the instrument’s rate of failure to register a correct ICD-10 code for an AE was set to 15% (the sensitivity), and the rate for incorrectly coded non-event was set to 5% (the specificity).
The study comprises hip arthroplasty patients from four major county councils in Sweden (Stockholm, Skåne, Västra Götaland and Västerbotten) in 24 different hospitals (six university hospitals, five central county council hospitals, seven county council hospitals and six private hospitals who have agreements/contracts with the county councils, one private hospital treats both acute and elective patients. Patients underwent surgery between January 2009 and December 2011.
All patients 18 years of age or older whose data were recorded in the SHAR for either a hemi or total hip arthroplasty were eligible for inclusion. Both acute surgery for hip fractures and elective surgery for degenerative joint disease were included.
To increase the probability of selecting medical records with an AE and avoiding excess RRR on records without AEs, we used a weighted sample. Twenty different selection groups for acute and elective arthroplasties were created as follows (see online supplementary table A2).
We constructed three groups with lengths of primary stay in percentiles divided as 0%–50%, 51%–80% and 81%–100%. The three groups were further divided based on whether there was an ICD-10 code indicating an AE in the NPR (see online supplementary table A3). Overall, six groups were generated.
A selection was made for patients who had readmissions in the NPR. The readmission groups were divided in readmission within 2–30 days and within 31–90 days after surgery. The two groups were further divided based on whether there was an ICD-10 code indicating an AE in the NPR, generating a total of four groups.
This created a total of ten selection groups and we sampled according to the table (see online supplementary table A2) both from acute and elective patients yielding a total of 20 groups.
Patient and public involvement
This is a register and record-based retrospective study with no patient involvement.
From the SHAR we collected data on the primary procedures that were cross-linked with data from the NPR, using the Swedish personal identity numbers. From the NPR, we collected data on all admissions from the primary procedure and 90 days post-operatively. With the NPR data, we could create a timeline with all admissions for each patient. This timeline was used as a template to know which admissions to review with the RRR. The NPR data also contained ICD codes that were used in the validation of the instrument. Death data that was used in the validation of the instrument were available from the national death register (NDR). Medical records were obtained as paper copies or were reviewed on location at the hospital.
Review teams and the RRR method
The review team consisted of 10 reviewers with a record review experience ranging from novice to expert (see online supplementary table A4). The more experienced reviewers performed both stage one and two of the review. All reviewers received obligatory 1 day training by two of the senior researchers (MG and MU).
We used the Swedish adaptation of the Global Trigger Tool (GTT),16 named marker-based record review,17 as the RRR method for collecting AE data. A study-specific manual was created and included definitions, inclusion criteria, exclusion criteria and all alterations and clarifications from the GTT.
An AE was defined as suffering, physical harm or disease as well as death related to the index admission and as a condition that was not an inevitable consequence of the patient’s disease or treatment.
Based on the terminology in the Swedish Patient Safety Act,18 a preventable AE was defined as an event that could have been prevented if adequate actions had been taken during the patient’s contact with healthcare.
The index admission was defined as the orthopaedic admission when the patient had hip arthroplasty surgery. If the patient was discharged directly to a geriatric or rehabilitation clinic, this admission was also considered to be a part of the index admission.
AEs related to acts of either omission or commission were included.
Inclusion and exclusion criteria
We included and performed RRR on all inpatient care and all unplanned outpatient care in all Swedish hospitals from the index admission date up to 90 days after surgery. We included AEs that occurred during index admission and AEs that occurred during readmissions that originated from the index admission. AEs that were identified during unplanned outpatient visits at a hospital (accidents and emergencies visits) and originated from the index admission were also included.
We excluded AEs that were unrelated to the index admission and AEs that originated from the care of another AE. For example, if a patient was admitted because of a periprosthetic joint infection and sustained a fracture from falling in the ward, the infection was included as an AE, and the fracture was not included. We did not include planned outpatient visits at hospitals or planned or unplanned outpatient visits outside of hospitals, such as with a general practitioner.
The GTT consisted of a two-stage review process.
Review stage 1
All medical records, including notes from different professionals, were reviewed. The reviewers screened the record, searching for any of the 38 pre-defined triggers that indicated a potential AE. The triggers were divided into five modules: general triggers (n=18), laboratory triggers (n=5), surgical triggers (n=7), medication triggers (n=3) and intensive care triggers (n=5) (see online supplementary table A5).
A summary of the RRR and all identified triggers with a free text description of the trigger/event were documented in a database (Microsoft Access 2007). All records with a potential AE went forward to review stage 2.
Review stage 2
All identified triggers deemed as positive for a potential AE were assessed in stage 2.
Each potential AE was then assessed if it was caused by the healthcare service using a 4-point Likert scale graded as follows: (1) the AE was not caused by the index admission, (2) the AE was probably not caused by the index admission, (3) the AE was probably caused by the index admission and (4) the AE was caused by the index admission.
AEs graded as 1 or 2 were excluded and AEs graded as 3 or four were included, and the reviewer made a full assessment that included evaluations of preventability, type of AE (71 different types in 15 different categories), severity and whether or not the AE was ICD-10 coded.
Preventability was assessed using a similar 4-point Likert scale as follows: (1) the AE was not preventable, (2) the AE was probably not preventable, (3) the AE was probably preventable and (4) the AE was preventable. AEs that were graded 3 or 4 were classified as preventable.
The severity of the AEs was evaluated using a slightly modified version of the National Coordinating Council for Medication Error Reporting and Prevention (NCC MERP) index.19 NCC MERP index categories E–I were included, and the categories indicated the following: (E) contributed to or resulted in temporary harm, (F) contributed to or resulted in temporary harm that required outpatient or inpatient care or prolonged hospitalisation, (G) contributed to or resulted in permanent harm, (H) required intervention necessary to sustain life within 60 min and (I) contributed or resulted in the patient’s death.
Reliability and validity
Inter-rater reliability was evaluated through the double review of 6% of the records to assess the agreement between the primary reviewers’ judgements concerning whether at least one trigger or potential AE was identified in the record, whether the record was to be forwarded to secondary review, whether the reviewer identified the same specific event and whether this event was a potential AE.
The review process was monitored by an RRR expert (MU) who also was available for questions from the reviewers. The completeness and adherence to the study manual in stages 1 and 2 were monitored closely. All questions or discrepancies were given as written feedback to the reviewers for resolution. If needed, clarifying discussions were held with the respective reviewer.
The instrument is based on a set of 13 specific ICD codes and one code category (I-codes: diseases of the circulatory system) defining AEs (see online supplementary table A1). Five of the specific codes and the code category has to be used as the primary diagnosis and the remaining eight can be either as primary or secondary code. In the validation of the instrument, test was defined as positive for an AE if the patient had:
Any of these code criteria in any readmission within 90 days after surgery (data source=NPR). Or
A death date after discharge from the primary admission and within 90 days after surgery (data source=NDR).
We used the results from the RRR as gold standard when we performed the sensitivity and specificity analysis. To give a nuanced study of the performance of the instrument, we divided the AEs found with RRR into four categories.
All AEs (all found AEs with causality Likert scale ≥3).
Preventable AEs (all AEs with preventability Likert scale ≥3).
Major AEs (preventable AEs with NCC MERP ≥F).
Selected AEs (AEs types that correspond to the set of ‘AE’ ICD-codes).
We did two different validations for the four AE categories:
AEs found (with RRR) during both index and readmissions versus the instrument (only readmissions.
AEs found (with RRR) during only readmissions versus the instrument.
We performed the two separate validations for all AE categories for all patients and with the subsets of acute and elective patients. The rationale for the multiple validations was to test different nuances of the instrument.
Adjusted sensitivity and specificity were calculated for both 30 and 90 days. The sensitivity and specificity were calculated in each sample group and multiplied by the group proportion (population group/total population). The products of all groups were summed, and the result was the adjusted sensitivity and specificity for the population.
The adjusted cumulative incidence for 30 and 90 days was calculated by dividing the number of patients with an AE in each group with the group sample size, generating a rate for that group. This rate was multiplied by the group proportion (population group/total population). The products of all 10 groups were summed to provide the adjusted cumulative incidence. The same method was used to calculate the adjusted cumulative incidence of preventable AEs and serious AEs.
We used the selection group tables for acute and elective patients separated for the analysis of sensitivity and specificity for acute and elective patients and the two tables pooled together for the analysis of all patients.
The incidence rate was calculated by taking the total sum of the identified AEs within 30 days after surgery for each selection group and dividing it with the sample group size and then multiplying it with the group proportion. The sum was the incidence rate in AEs/person-month.
Cohen’s kappa was calculated for inter-rater reliability between the primary reviewers.20 Bootstrap samples (n=2000) were used to calculate the 95% CIs.
We used R (V. 3.5.2) and packages dplyr, boot, irr, htmlTable and Gmisc.
The study population consisted of 21 774 patients. We included 2000 patients weighted according to the selection group table (see online supplementary table A2). Two patients were excluded. The first patient had no available medical record, a short primary admission, no readmissions and was unlikely to have sustained an AE. The second patient had a hip fracture treated with internal fixation, with an assumingly faulty registration in the SHAR. After exclusion, 1998 patients with a total of 5422 inpatient admissions and outpatient visits in 69 hospitals were reviewed and included in the analysis (figure 1).
The study cohort comprised of 667 acute hip fracture patients and 1331 elective patients, and 63% of the patients were women. The hip fracture group comprised more women, had older patients and involved a longer length of stay during the index admission (table 1).
Identified AEs and rate of ICD-10 codes
In total, we found 2116 AEs in 1171 (58.6%) patients. Of these, 1605 AEs (75.9%) in 975 (48.8%) patients were classified as preventable AEs, 1066 AEs (50.4%) in 744 (37.2%) patients were classified as major AEs and 1206 (57.0%) in 829 (41.5%) patients were classified as selected AEs. The 667 acute patients sustained 981 (46.4%) of these and the elective patients sustained 1135 (53.6%). The acute patients sustained 758 (47.3%) of the preventable AEs and 431 (40.4%) of the major AEs.
Of the 2116 found AEs, an ICD-10 code for the AE was found in 1145 (54.1%) records, in 879 (54.8%) of the 1605 preventable AEs, in 787 (71.1%) of the 1066 major AEs and in 758 (62.9%) of the 1206 selected AEs.
The majority of AEs occurred during the index admission (n=1260, 59.5%), and 443 (35.2%) of them had an ICD-10 code. The number of AEs that occurred during readmission within 30 days after surgery was 590 (27.9%) and 476 (80.7%) had an ICD-10 code. The number of AEs that occurred during readmission within 90 days after surgery was 856 (40.5%), and 702 (82.0%) had an ICD-10 code.
The group of AEs that had the highest rate of ICD-10 codes was thrombosis and embolism, at 91.6%. AEs related to the surgical procedure, such as dislocation, had the second highest rate (76.1%), and bleeding that did not occur during the operation had the third highest rate (75.7%). The group of AEs that had the lowest rate of codes was pressure ulcers (5.3%), followed by skin and superficial vessel damage (6.3%) and neurological AEs (14.6%) (table 2).
The single AE type that had the highest rate of available ICD codes were acute myocardial infarction and stroke with 100% available codes, followed by the next top four, which were dislocation (98.5%), periprosthetic joint infection (96.0%), pulmonary embolism (95.3%) and fracture caused by falling (90.2%). Ten different individual types of AEs were not coded at all (see online supplementary table A6).
Adjusted cumulative incidence and incidence rate
The adjusted cumulative incidence for patients sustaining at least one AE was 28.4% for 30 days and 29.5% for 90 days (table 3). The acute patients had a higher incidence than the elective patients with 51.4% compared with 17.2% for 30 days and 52.1% compared with 18.6% for 90 days. The incidence of preventable AEs and major AEs were also higher for the acute patients compared with the elective, both for 30 and 90 days.
The incidence rate for all AEs was 0.43 AEs per person-month (95% CI: 0.39 to 0.47). For preventable AEs, the incidence rate was 0.32 (95% CI: 0.29 to 0.35), and for major AEs, the incidence rate was 0.22 (0.20 to 0.25).
Adjusted sensitivity and specificity
Adjusted sensitivity and specificity for all AEs were 5.7% and 95.2%, respectively, at 30 days, and 14.8% and 92.1%, respectively, at 90 days (table 4). This was the comparison that used the widest definition of AEs that were found from surgery until 90 days post-operatively. The sensitivity and specificity for the narrowest definition of AE that only compared readmissions were 3.0% and 93.5%, respectively, at 30 days, 26.6% and 90.5%, respectively, at 90 days.
The acute patients had higher sensitivity but lower specificity compared with the elective patients, for all classes of AEs, for both 30 and 90 days.
The inter-rater reliability values of the primary reviewers’ judgements concerning whether at least one trigger or potential AE was identified in the record were κ=0.828 and 0.965, respectively. The inter-rater reliability for whether the record was to be forwarded to secondary review was κ=0.965. The inter-rater reliability values for the identification of a specific event or whether that event was a potential AE were κ=0.65 and 0.873, respectively.
In this retrospective multi-centre cohort study using RRR on 1998 patients who had undergone hip arthroplasty surgery, we validated an instrument based on ICD-codes from NPR. We found a high incidence for AEs and more than every fourth patient sustained an AE. The incidence was higher for the acute patients and every other acute patient sustained an AE, compared with almost every fifth elective patient. Almost two-thirds of the AEs occurred during the index admission and the difference between AEs within 30 days and 90 days was below two percentage.
We found a low overall rate of coded AEs for all and preventable AEs (55%) and a higher rate for major AEs (73%).
We validated different nuances of the instrument and found that sensitivity was low, and at best every fourth patient with an AE is detected. We found that for all different nuances the specificity was high with the best result of 97%. Maas et al 21 compared ICD-codes with record review and also found low sensitivity and high specificity. When we compared found AEs (with RRR) during readmissions to the instrument the sensitivity was lower for all AEs within 30 days. This was due to the fewer total number of true positives and their distribution in fewer selection groups for the readmissions versus instrument.
The definition of AEs in this study is wide and can be considered as excessive by some. The rationale behind the choice of GTT as the method for identifying AEs was not that we wanted the instrument to fail or to imply that hip arthroplasty is a dangerous procedure. When we decided to do a record review validation, we wanted to use the method that has proven to identify the most AEs to ensure that we had the highest quality data possible. The range of severity of the found AEs is wide and it is easier to remove irrelevant AEs from a data- set than the opposite.
As expected, our definition and method for measuring AEs yielded higher rates than for example Huddleston et al,3 who used data abstraction from Medicare records and found a 30 day AE rate of 5.8% after total hip arthroplasty. Studies on AEs in mixed orthopaedic patients using the GTT have shown rates of 15%–30%.22 23
The preventability can be a hard to assess in RRR. To ensure concordant assessments some AEs, as falls, prosthetic dislocation and pressure ulcers were always classed as preventable in the study. The combination of our inclusive definition of preventability and structured RRR might be an explanation that the rate of preventable AEs in elective patients were more than double than Jorgensen et al 24 found in their study on total knee and hip replacements. However, our incidence of preventability is in accordance with another national GTT study in orthopaedic care.23
The use of administrative data for measuring AEs after orthopaedic surgery has been studied by Sebastien et al.25 The authors compared the Agency for Healthcare Quality and Research’s Patient Safety Indicators, an ICD code-based instrument, with the Agency for Healthcare Quality and Research National Surgical Quality Improvement Programme (ACS-NSQIP), a system that uses trained surgical clinical reviewers and well-defined criteria to identify AEs. In their study on mixed orthopaedic patients, the AHQR-PSI revealed an AE rate of 1%, and the ACS-NSQIP revealed an AE rate of 22%. The authors concluded that the instruments were unable to adequately assess AEs in orthopaedic surgery. Best et al 26 compared the ACS-NSQIP with administrative data for AEs after surgery and found similar results to this study, a sensitivity of more than 50% in only 23% of the selected AEs. Classen et al 9 also compared the AHQR-PSI with the GTT and found that the AHQR-PSI fared very poorly.
The examined instrument is used to compare the quality of care in different Swedish hospitals, and this is one of the quality indicators that determines economic reimbursement to the hospitals. With regards to the low sensitivity to detect AEs, their validity is questionable. The instrument algorithm is also used by the Healthcare in Numbers, and by the Swedish Knee Arthroplasty register to measure AEs following total knee arthroplasty.27 The use of the ICD-instrument for knee arthroplasties have not yet been validated, but our results from the elective hip patients implies that the use of the instrument might be questionable.
The low overall rate of correct ICD-10 codes in only half of the cases is the largest obstacle for using administrative data with ICD-10 codes for measuring all AEs after hip arthroplasty. Furthermore, we found that the majority of the AEs, including one-fifth of the dislocations, occurred during the index admission, so excluding the index admissions in an instrument will decrease the sensitivity.
Strengths and limitations
To our knowledge, this is the largest study on AEs after hip arthroplasty that uses RRR and the only study that includes both acute hip fracture patients and elective surgery patients, thereby including both total and hemi hip arthroplasties. The study contains a large study population and a multicentre design with a wide range of patients of all ages and types of hospitals. The 90-day follow-up is long enough to detect all acute and subacute AEs. The Swedish personal identity numbers and the NPR enabled us to review admissions, and this , in combination with the RRR method, decreased the risk of missing an AE to approximately zero and resulted in high-quality data on the AEs. All kappa values were classified as near-perfect agreement except for one that was classified as good agreement, indicating the good quality of the RRR.
The study period of 90 days after surgery in this study makes this analysis a study on short-term AEs and does not address late-onset AEs, such as aseptic loosening, one of the most common causes of revision surgery.28 The baseline data on the patients are from the registers, and information on patient factors, such as comorbidities and physical status, is lacking. Therefore, this study cannot identify risk factors for AEs. In addition, our results are only generalisable to healthcare systems where ICD codes are used to measure AEs. The weighted sample did not include type of hospital and we can therefore not calculate incidence for the different types of hospitals.
The conclusions from this study are that the incidence of AEs after hip arthroplasty is high and that the tested instrument cannot measure this correctly. Furthermore, because of the low reliability of the ICD-10 codes, an improved instrument needs to be based on robust variables, possibly in combination with ICD-10 codes, and also include AEs identified during index admission and a wider range of AE types.
The authors thank Marie Ax, Susanne Hansson, Ammar Jobory, Zara Hedlund, Mirta Stupin, Tim Hansson, Lovisa Hult-Ericson and Christina Jansson for valuable help in carrying out the study. We would also like to thank all department managers for access to the medical records and Per Nydert for help with the study database. Finally we would like to thank Christoffer C Jørgensen, who performed a splendid review of the manuscript, which improved this article tremendously.
Contributors MM collected, analysed and interpreted the data and contributed to the drafting of the work. MU contributed to the design of the study, collected data, contributed to the drafting of the work and revised the manuscript critically. CR contributed to the design of the study and the drafting of the work and revised the manuscript critically. OR contributed to the design of the study and to critically revising the work. AH collected data and critically revised the work. BS collected data and critically revised the work. KS collected data and critically revised the work. DS collected data and critically revised the work. MG contributed to the design of the study; collected, analysed and interpreted the data; contributed to the drafting of the work; and revised the manuscript critically. OS contributed to the design of the study, collected data, contributed to the drafting of the work and revised the manuscript critically. All authors have approved the final version of the manuscript and agree to be accountable for all aspects of the work.
Funding This study was funded by institutional grants from the Karolinska Institutet, Department of Clinical Sciences, Danderyd Hospital, from the regional agreement on medical training and clinical research (ALF) between Stockholm County Council and Karolinska Institutet, and from LÖF, the Swedish patient insurance programme. The grant providers were not involved in any part of the study, in the writing of the manuscript or in the decision to submit the manuscript for publication.
Competing interests None declared.
Ethics approval Ethical approval was provided by the Regional Ethics Committee of Gothenburg (516-13 and T732-13). Permission for data access for the reviewers was granted by the head of each respective unit. The patients did not provide an informed consent to the record review.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement No additional data are available.
Patient consent for publication Not required.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.