Article Text

Original research
Pairing regression and configurational analysis in health services research: modelling outcomes in an observational cohort using a split-sample design
  1. Edward J Miech1,2,
  2. Anthony J Perkins3,
  3. Ying Zhang4,
  4. Laura J Myers1,2,
  5. Jason J Sico5,6,
  6. Joanne Daggy3,
  7. Dawn M Bravata1,2
  1. 1Quality Enhancement Research Initiative (QUERI) and Health Services Research and Development (HSR&D), Roudebush VA Medical Center, Indianapolis, Indiana, USA
  2. 2Center for Health Services Research, Regenstrief Institute Inc, Indianapolis, Indiana, USA
  3. 3Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, Indiana, USA
  4. 4Department of Biostatistics, University of Nebraska Medical Center, Omaha, Nebraska, USA
  5. 5Neurology Service, VA Connecticut Healthcare System, West Haven, Connecticut, USA
  6. 6Department of Neurology, Yale School of Medicine, New Haven, Connecticut, USA
  1. Correspondence to Dr Edward J Miech; edward.miech{at}va.gov

Abstract

Background Configurational methods are increasingly being used in health services research.

Objectives To use configurational analysis and logistic regression within a single data set to compare results from the two methods.

Design Secondary analysis of an observational cohort; a split-sample design involved randomly dividing patients into training and validation samples.

Participants and setting Patients who had a transient ischaemic attack (TIA) in US Department of Veterans Affairs hospitals.

Measures The patient outcome was the combined endpoint of all-cause mortality or recurrent ischaemic stroke within 1 year post-TIA. The quality-of-care outcome was the without-fail rate (proportion of patients who received all processes for which they were eligible, among seven processes).

Results For the recurrent stroke or death outcome, configurational analysis yielded a three-pathway model identifying a set of (validation sample) patients where the prevalence was 15.0% (83/552), substantially higher than the overall sample prevalence of 11.0% (relative difference, 36%). The configurational model had a sensitivity (coverage) of 84.7% and specificity of 40.6%. The logistic regression model identified six factors associated with the combined endpoint (c-statistic, 0.632; sensitivity, 63.3%; specificity, 63.1%). None of these factors were elements of the configurational model. For the quality outcome, configurational analysis yielded a single-pathway model identifying a set of (validation sample) patients where the without-fail rate was 64.3% (231/359), nearly twice the overall sample prevalence (33.7%). The configurational model had a sensitivity (coverage) of 77.3% and specificity of 78.2%. The logistic regression model identified seven factors associated with the without-fail rate (c-statistic, 0.822; sensitivity, 80.3%; specificity, 84.2%). Two of these factors were also identified in the configurational analysis.

Conclusions Configurational analysis and logistic regression represent different methods that can enhance our understanding of a data set when paired together. Configurational models optimise sensitivity with relatively few conditions. Logistic regression models discriminate cases from controls and provided inferential relationships between outcomes and independent variables.

  • neurology
  • statistics & research methods
  • stroke

Data availability statement

No data are available. These data must remain on US Department of Veterans Affairs servers. Investigators interested in working with these data are encouraged to contact the corresponding author.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • Logistic regression and configurational methods (CNA) were applied to the same data to examine similarities and differences in results.

  • The split-sample approach to development and validation of models is a key methodological strength.

  • The results are based on data from the Department of Veterans Affairs and may not generalise to other healthcare systems.

Introduction

Configurational Comparative Methods (CCMs) have been used in a wide variety of disciplines since at least the 1990s and have recently started to gain traction in the general medical research literature1–4 as well as within implementation science.5 6 CCMs draw on mathematical approaches that are fundamentally different from those used in regression modelling, which is commonly used in health services research. Specifically, CCMs draw on Boolean algebra and set theory to identify specific combinations of conditions that lead to an outcome of interest as well as determine if multiple solution paths yield the same outcome (ie, equifinality).7–9

Although CCMs and logistic regression offer the potential for synergistic understanding of complex clinical situations, few studies in the medical literature10 have used both approaches within a single data set.11–14 The objective of the current study was to use both CCMs and logistic regression to independently derive and validate two models (one for mortality or recurrent stroke and the other for quality of care) among patients who had a transient ischaemic attack (TIA). Two outcomes were chosen because they provided different methodological challenges. The combined endpoint of death or recurrent stroke is relatively uncommon among patients who had a TIA15 16 and therefore presented the problem of predicting rare but important events; this may, for example, limit logistic regression modelling due to constraints on the number of outcome events per independent variable.17 18 The quality of care metric was available for the majority of patients but few robust predictors of quality at the patient level have been previously identified.19 In contrast, if a small set of key variables were strongly associated with an outcome, it would be expected that both regression and configurational methods would produce similar findings, limiting the potential insights available from comparing results across methods. Furthermore, if a variable is only weakly associated with an outcome, then the inconsistent relationship between configurations and an outcome could hinder the identification of a solution pathway from configurational methods. Across methods we sought to examine similarities and differences in factor selection (ie, variables or configurations that were included in the final models) as well as compare sensitivity, specificity, c-statistics and positive and negative predictive values.

Methods

This analysis was part of the Protocol-guided Rapid Evaluation of Veterans Experiencing New Transient Neurological Symptoms (PREVENT) project to improve quality of TIA care in Veterans Health Administration (VA) facilities.15 20 21 We identified patients with TIA who were cared for in any VA Emergency Department (ED) or inpatient setting based on primary discharge codes for TIA (International Classification of Disease (ICD)−10 G45.0, G45.1, G45.8, G45.9, I67.848) during the period between October 2016 and September 2017. The unit of analysis was the patient who had a TIA.

Patient and public involvement statement

This analysis did not include patient or public involvement.

Data sources

Electronic health record data were obtained from the VA Corporate Data Warehouse (CDW).22 23 CDW data included: inpatient and outpatient data files (eg, clinical encounters with associated diagnostic and procedure codes) in the 5 years pre-event to identify medical history,24 healthcare utilisation and receipt of procedures (Current Procedural Terminology, Healthcare Common Procedures Coding System and ICD-9 and ICD-10 procedure codes). CDW data were also used for vital signs, laboratory data, allergies, imaging, orders, medications and clinical consults. Mortality status was obtained from the VA Vital Status File.25 Recurrent stroke events were identified using a combination of VA CDW data and fee-basis data (which describes healthcare services that were paid for by the VA but that were obtained by Veterans in non-VA facilities).

Outcomes

The combined endpoint of all-cause mortality or recurrent ischaemic stroke within 1 year post-discharge from the index TIA event was the primary patient outcome. Recurrent ischaemic stroke events included ED visits or hospitalisations and were identified on the basis of ICD-10 codes (I63, I66, I67.89, I97.81 and I97.82).

The quality of care outcome was the ‘without-fail’ rate (also referred to as defect-free26 27 care), which is an ‘all-or-none’ measure of care quality.28 It was calculated as the proportion of Veterans who had a TIA who received all of the processes of care for which they were eligible from among seven processes: brain imaging, carotid artery imaging, neurology consultation, hypertension control, anticoagulation for atrial fibrillation, antithrombotics and high/moderate potency statins.29 30 Processes of care were ascertained using electronic health record data using validated algorithms.30 31 The without-fail rate was based on guideline32 33 recommended processes of care and has been associated with improved outcomes.34 Given the all-or-none nature of the without-fail rate, it can be a relatively difficult to change and even small improvements in the absolute rate may reflect substantial changes in practice.28 For the regression analyses modelling the without-fail rate, quality measures were recoded such that pass=1, not eligible=0 and fail=0 to avoid reducing sample size by eliminating ineligible patients.

Analytical overview

We analysed this same data set with configurational analysis and logistic regression modelling. We randomly divided the overall data set (n=3079) into a ~70% training sample (2192/3079) and ~30% validation sample (887/3079).35 The training sample was independently analysed by a configurational analyst (EJM) and a biostatistician (AJP); this split-sample approach was used to enhance within-method validity. For the combined endpoint of all-cause mortality or recurrent ischaemic stroke within 1 year post-discharge from the index TIA event, we included both baseline patient characteristics (eg, age) as well as processes of care (eg, hypertension control) in the modelling. The without-fail model included only processes of care. Model performance was tested using the validation sample.

Configurational analysis

Configurational analyses were conducted with Coincidence Analysis—a relatively new approach within the broader family of CCMs6—using the R package ‘cna’.36

Definitions

Variables were baseline characteristics of patients (eg, history of hypertension) which could be expressed with a dichotomous scale or a continuous scale. A configuration is a bundle of specific conditions (eg, a history of hypertension was present). Consistency or positive predictive value is the number of cases covered by the solution with the outcome of interest versus all cases covered by the solution. Coverage or sensitivity is the number of cases covered by the solution with the outcome of interest versus all cases with the outcome of interest. Complexity is the number of discrete conditions in a configuration. Ambiguity describes a situation where more than one model generated by the configurational analysis fit the data equally well.

Analytic steps

We began with a multi-step data reduction approach that has been described previously.1 2 37–39 Briefly, we used the ‘minimally sufficient conditions’ function in ‘cna’ to examine all 48 candidate factors (eg, patient characteristics, medical history, characteristics of the index cerebrovascular event, vital signs, laboratory data, medications and processes of care) in the analysis with the outcome of interest across all 2192 cases in the training sample and identify bundles of conditions with the strongest connections to the outcome condition. Factors in the analysis that were not already categorical or ordinal were binned; for example, age was categorised into 5-year increments (eg, 55–59, 60–64, 65–69 years). We performed this process separately for the two outcomes of interest: mortality or recurrent stroke within 1 year; and the without-fail rate. When analysing these combinations of conditions, we considered all 1-condition, 2-condition and 3-condition bundles instantiated in the data set (meaning patients with these specific combinations of configurations were present within the sample) that satisfied the consistency threshold.

We used a dual minimum threshold to identify patient characteristics to use in model iteration: a prevalence threshold of ≥0.145 (via the ‘consistency’ function available in the R ‘cna’ package using multi-value cna) and a coverage score of ≥0.15. These cutoffs were selected to ensure individual configurations were clinically relevant. Specifically, given that the overall outcome rate of death or stroke at 1 year post-TIA was (349/3079) 11.3%, a prevalence threshold of ≥0.145 identified configurations with a mortality or stroke rate at least three points higher (ie, 14.5% vs 11.3%) in absolute terms than the overall population, or ≥25% higher in relative terms. For the without-fail rate, the overall outcome rate was 34.4% (1058/3079) and the prevalence threshold was set at ≥50%, a rate that was at least 15 points higher in absolute terms (ie, 50% vs 34.4%), or ≥40% higher in relative terms. In this sense, the configurational analysis sought to identify distinct ‘phenotypes’ of patients who had substantially different outcome rates (as a group) than the overall sample. The coverage threshold of ≥0.15 ensured that the configurations applied to at least 15% of individuals with the outcome and was used to avoid overfitting.

We next generated a ‘condition table’ to list and organise the output. In a condition table, rows list configurations of conditions that meet a specified prevalence threshold, and column variables include outcome status, condition, consistency, coverage and complexity. We generated condition tables by specifying a prevalence threshold of 1.0 (ie, 100%). If we did not find any potential configurations that met our initial dual threshold (ie, prevalence threshold of 1.0 and a coverage score of ≥0.15), we then iteratively lowered the specified prevalence threshold by 0.05 (eg, from 1.0 to 0.95) and repeated the process of generating a new condition table. We continued this process until at a given prevalence threshold it was possible to identify at least two potential configurations (or ‘phenotypes’) of patient characteristics that met the specified prevalence threshold as well as the ≥15% coverage level. Using this approach, we inductively analysed the training sample and identified a subset of five candidate difference-making factors to use in the subsequent modelling phase.

We next developed candidate models with these five factors by iteratively applying the model-building function within the ‘cna’ software package in R using multi-value cna. We assessed models based on their overall consistency and coverage, as well as potential model ambiguity.40 We selected a final model based on these same criteria.

Logistic regression

Multivariable logistic regression was conducted using SAS Enterprise guide V.7.11. Models were constructed using forward and backward selection procedures in the HPLOGISTIC procedure using the Schwarz Bayesian Criterion. Patient clinical characteristics as well as processes of care were included in the modelling. Final models for the backward and forward procedure identified the same set of variables for each outcome. To calculate sensitivity and specificity, we chose a cut-point of the estimated probabilities at which the distance between (1,0) and the receiver operating characteristics (ROC) curve was minimised in the ROC diagram for the training sample. We used a predicted probability of 0.096 as the cut-point for the clinical outcome, and 0.490 for the quality of care model. In this way, each patient was dichotomised as yes versus no for risk of the outcome.

Model comparisons

The sensitivity (coverage), specificity, positive predictive value, negative predictive value and the c-statistic were examined and compared between the methods for both outcomes. For the logistic regression, the first area under the ROC (c-statistic) was calculated with all the variables in the model and used the continuous predicted probability. As described above, for the comparison of the two methods, we used a cut-point on the probability that maximised the sensitivity and specificity. We created a new variable describing the predicted outcome (1 if p>cut-point; 0 otherwise). We then performed logistic regression using only that variable as the independent variable. This variable was also used to calculate sensitivity and specificity. Similarly, for the configurational analysis, we created a predicted outcome variable based on the configurational groupings and use that as the independent variable in the logistic regression to obtain a c-statistic.

Results

The overall sample consisted of 3079 Veterans between the ages of 24 and 99 years (median age, 70 years; IQR 64–78) who presented at a VA medical facility with a TIA between October 2016 and September 2017. The baseline characteristics of the patients within the training and validation samples are provided in online supplemental file 1 and the process of care data are provided in online supplemental file 2. All patients had complete data both for the outcomes and potential explanatory factors, which included specific TIA processes of care as well as risk factors for recurrent stroke or death.

Patient outcome: death or recurrent stroke at 1 year

Configurational results

Among the training sample patients, the prevalence of the combined endpoint of death or recurrent stroke at 1 year post-TIA was 11.5% (251/2192). Configurational analysis yielded a three-pathway model comprised of five conditions, where the prevalence of death or stroke was 14.5% (193/1330). The configurational analysis identified the following three pathways: (1) having a history of TIA AND a history of hypertension AND not being prescribed a non-steroidal anti-inflammatory drug (NSAID); (2) having a HAS-BLED score41 (a measure of bleeding risk) of ≥3; or (3) having a history of dementia (table 1).

Table 1

Modelling results for death or recurrent stroke at 1-year post-TIA

Among patients in the validation sample, the death or stroke rate 1 year post-TIA was 11.0% (98/887) overall, and 15.0% (83/552) for patients within the three-pathway configurational model, 36% relatively higher than the overall rate. This performance in the validation sample was better than in the training sample, where the same configurational three-pathway model rate was 26% relatively higher than the overall rate (ie, 14.5% compared with 11.5%). The configurational model had a coverage (sensitivity) of 84.7% in the validation sample, identifying 83 out of 98 patients with the outcome of death or recurrent stroke at 1 year; this outperformed the 76.9% coverage score (193/251) in the training sample (table 1). The configurational model had a specificity of 41.4% in the training sample and 40.6% in the validation sample (table 2).

Table 2

Test characteristics of the logistic regression and configuration models for death or recurrent stroke rate at 1-year post-TIA

Logistic regression results

The logistic regression model identified six factors that were associated with the combined endpoint of death or recurrent stroke at 1 year post-TIA (table 1): age, Charlson Comorbidity Index,42 the modified APACHE (Acute Physiology And Chronic Health Evaluation) score,43 current smoking status, palliative care or hospice and history of stroke. None of these six factors were elements of the configurational model. The c-statistic for the primary model on training sample was 0.747 and 0.691 for the validation sample (table 1). The c-statistics for logistic models used to calculate sensitivity and specificity (table 2) were 0.6888 for the training sample and 0.632 for the validation sample. The sensitivity was 75.3% in the training sample and 63.3% in the validation sample (table 2). The specificity was 62.3% in the training sample and 63.1% in the validation sample.

Quality of care outcome: the without-fail rate

Configurational results

Among the training sample patients, the prevalence of the without-fail rate was 34.6%. The configurational analysis (table 3) yielded a single-pathway model with the conjunct of two processes—discharged on a high or moderate potency statin AND neurology consultation—where the without-fail rate was 67.3% (567/843). The final configurational model included 567 of the 759 patients with the outcome (ie, 74.7% coverage; table 3).

Table 3

Modelling results for without-fail rate

Among the validation sample patients, the without-fail rate was 33.7%. When applied to the validation sample, the single-pathway configurational model yielded a without-fail rate of 64.3% (231/359), which was nearly twice the observed sample prevalence. This model covered 231 of the 299 cases with the outcome (ie, 77.3% coverage; table 3). The configurational model had a specificity of 80.7% in the training sample 78.2% in the validation sample (table 4).

Table 4

Test characteristics of the logistic regression and configuration models for without-fail rate at 1-year post-TIA

Logistic regression results

The logistic regression model identified seven factors that were associated with the without-fail rate: carotid artery imaging, hypertension medication intensification, hypertension control, discharged on statin, discharged on high or moderate potency statin, antithrombotics by hospital Day 2, and neurology consultation (see table 3). Two of these factors were also identified in the configurational analysis: discharged on a high or moderate potency statin and neurology consultation. The c-statistics were higher for this model of quality than for the patient outcome model. In the primary model the c-statistic for the training sample was 0.842 and 0.841 in the validation sample (table 3). In the model used to calculate sensitivity and specificity the c-statistic was 0.823 for the training sample, and 0.822 for the validation sample (table 4). The sensitivity was 76.7% in the training sample and 80.3% in the validation sample. The specificity was 87.9% in the training sample and 84.2% in the validation sample.

Discussion

This study analysed one of the largest sample sizes used to date in a published configurational analysis, is one of the first to use a split-sample design featuring training and validation samples and is also one of the first to directly compare configurational and logistic regression results using identical data. The models developed by applying logistic regression and configurational analysis within the training sample were confirmed when tested against the validation sample. This was true for both the 1-year death or recurrent stroke outcome and the without-fail quality-of-care outcome. The results of this study demonstrate that configurational analyses and logistic regression, when applied to the same data set, can expand our understanding of the data. Key differences in the findings from the two methods as they were applied in the current study included: the focus of optimisation; the goal of making stochastic inferences versus empiric insights; and the possibility of conjunctivity.

Logistic regression models include variables to infer the absence and presence of the outcome and maximises the likelihood for the observed data in a parametrically well-structured model. The configurational models, by contrast, identified ‘phenotypes’ where particular groups of individuals sharing a specific bundle of characteristics had outcome rates substantially different from that of the overall sample. The logistic regression model is useful in making statistical inference for variables’ effects on the binary outcome of interest, though it can be applied to predict the outcome if a cut-off probability threshold is provided. In contrast, the configurational models pinpointed specific combinations of factor values that linked directly to the positive outcome of interest.

An expected pattern in results is that configurational analysis has an advantage over logistic regression in prediction of a dichotomous outcome when prevalence is low. This pattern was evident in the model of recurrent stroke or death at 1 year post-TIA (with a prevalence of 11.5% in the training sample), where the sensitivity in the validation sample was higher in the configurational model (84.7% (95% CI: 76.0% to 91.2%)) than in the logistic regression model (63.3% (95% CI: 52.9% to 72.8%)). Both approaches had equivalent c-statistics (configurational model, 0.626 (95% CI: 0.587 to 0.666); logistic model, 0.632 (95% CI: 0.581 to 0.683)). However, this advantage may diminish if the prevalence of the outcome is not rare, which was evident in the model using the quality outcome (with a prevalence of 34.6% in the training sample), where the sensitivity in the validation sample was similar in both approaches (configurational model, 77.3% (95% CI: 72.1% to 81.9%); logistic model, 80.3% (95% CI: 75.3% to 84.6%)), and the c-statistics were also similar (configurational model, 0.777 (95% CI: 0.748 to 0.801); logistic model, 0.822 (95% CI: 0.795 to 0.849)).

The models of the 1-year recurrent stroke or death rate differed dramatically with no overlap between the factors included in the logistic regression model and the conditions in the configurational model. This observation may be attributed to correlations between variables. For example, the finding that increasing age was negatively correlated with taking NSAIDs (r=−0.215, p<0.0001; online supplemental file 3) may partially account for why age was a variable that was included in the logistic model whereas not taking NSAIDs was a configuration that was included in one of the pathways in the configurational model. In contrast, the models of the without-fail rate were overlapping. The configurational results were more parsimonious, though the logistic regression models could be further developed if parsimony was of particular interest.

The configurational results for the quality outcome (table 3) provide an example of Boolean conjunctivity, where a bundle of conditions that jointly appear together are sufficient for the outcome. Conjunctivity is an attractive characteristic of configurational methods and particularly relevant to studies in healthcare settings given the inherent complexity within clinical medicine and health services research. In other words, it is expected that for some complex phenomena that it is a combination of conditions—rather than a single factor alone—which can explain the outcome.

As described above, configurational methods differ from regression methods in terms of the underlying mathematical foundations, the focus on configurations of conditions (ie, factor values) versus variables, and the results in the output.44 The use of configurational methods is increasing within health services research in general and in implementation science in particular.37 The pairing of logistic regression and configurational methods may be particularly fruitful for implementation science in terms of describing difference-making patterns and identifying factors associated with an outcome at a particular site, especially if the outcome is uncommon or when there are few sites. Configurational methods are also increasingly used in mixed methods analyses; given the focus on cases, the persistent link to cases present throughout configurational analysis allows investigators to examine qualitative data from key illustrative cases.45

Because regression methods have been widely used in health services research, investigators have experience in applying them and best practices have emerged to address common methodological difficulties. Future research, conducted either on real-world data or in simulations,46 should compare findings from configurational methods with regression analyses to advance our understanding of how configurational methods will perform in the following situations which are common in healthcare data: (1) the strength of the association between a variable and an outcome depends on the presence of another variable (eg, if implementation success is related to champion characteristics only in the presence of leadership support for a programme); (2) a rare characteristic is robustly associated with an outcome (eg, patients presenting with coma are at markedly increased risk of mortality, however, coma is an uncommon clinical presentation); (3) variables that are at least modestly associated with an outcome are correlated; (4) missing data, especially for factors that are at least modestly associated with an outcome; (5) limited diversity, especially for configurations that are related to an outcome (eg, few older persons included in a data set where the outcome is mortality); and (6) nested data (eg, patients within sites). Although regression analyses identify the same variables as being associated with an outcome whether modelling the presence or absence of an outcome, configurational methods can produce different results depending on whether a positive or negative outcome is being modelled.45 Future research should evaluate situations when this key difference between methods is most pronounced and hence most likely to provide novel insights.

Several limitations of this study should be noted. First, the results are based on data from the Department of Veterans Affairs, and therefore may not generalise to other healthcare systems.

Second, the outcomes used in this study were chosen to provide variation in prevalence rates and associations between variables and outcomes; however future studies could consider data sets with different characteristics (eg, varying sample sizes).

Third, the process of care variables were originally coded as pass among those eligible, fail among those eligible and ineligible. However, patients who were not eligible for processes of care were generally the most critically ill patients (eg, hospice); being not eligible for a process was a strong predictor of mortality. By combining the fail among eligible and ineligible categories in the regression analyses we were able to retain all patients and as expected hospice was associated with the combined endpoint of death or recurrent stroke.

Fourth, to calculate sensitivity and specificity, we chose a cut-point of the estimated probabilities at which the ROC curve was minimised; different thresholds could have been used (eg, to optimise sensitivity). For example, one option would have been to use the observed probabilities as a cut-point. Another approach would have been to use 0.5 which would be unlikely to perform well with rare outcomes. An alternative would have been to target a specific sensitivity (ie, 80%) in which case we would have used higher cut-points for both outcomes; this approach would have been at the expense of sensitivity. In contrast, we could have targeted a given specificity (ie, 80%); in which case we would have used a lower predicted probability cut-point and sensitivity would have been reduced.

Fifth, previous work has demonstrated that conjuncts in configurational methods are not synonymous with interactions in regression.44 We did not systematically explore interactions within the logistic regression modelling.

Finally, we presented an example of how logistic regression and configurational methods could be used on the same data to glean different information. The analytical approaches are fundamentally different; we do not intend to suggest that one method is better than another. Future studies should consider both circumstances where other methods (eg, decision-tree analysis) can be used with configurational methods, and situations when alternative methods might be used in series rather than in parallel (eg, for variable selection or for dichotomising continuous variables).

Conclusions

Configurational analysis and logistic regression represent fundamentally different analytical methods. Configurational models optimise sensitivity with relatively few conditions and allow for equifinality. Logistic regression models provide inferential relationships between binary outcomes and independent variables as well as clinically useful measures to interpret effects (ie, OR). Pairing these two diverse approaches offers a major new analytical option to health services researchers interested in leveraging multiple methodological perspectives to explore and model complex phenomena with greater nuance and understanding.

Data availability statement

No data are available. These data must remain on US Department of Veterans Affairs servers. Investigators interested in working with these data are encouraged to contact the corresponding author.

Ethics statements

Patient consent for publication

Ethics approval

The study was approved by the human subjects committee at the Indiana University School of Medicine Institutional Review Board and the Richard L. Roudebush VA Medical Center Research and Development Committee.

References

Supplementary materials

Footnotes

  • Twitter @edmiech

  • Contributors All authors read and approved the final manuscript. EJM and AJP: had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. DMB: obtained funding and was responsible for the design and conduct of the PREVENT study which is the data source used in the analyses; participated in data analysis conceptualisation, interpretation of the results and drafting and revising the manuscript. LJM: obtained the PREVENT data which is the data source used in the analyses and participated in data analysis conceptualisation. EJM and AJP: planned and executed the data analysis, participated in interpretation of the results and drafting and revising the manuscript. YZ and JD: participated in the interpretation of the results and the framing of the manuscript especially with regard to the mathematical and statistical foundations of the methods and the statistical applications of both methods. JJS: participated in interpretation of results and manuscript editing. EJM is responsible for the overall content as the guarantor.

  • Funding This work was supported by the Department of Veterans Affairs (VA), Health Services Research and Development Service (HSRD), Expanding Expertise Through E-health Network Development Quality Enhancement Research Initiative (QUERI; QUE HX0003205-01). The funding agency had no role in the design and conduct of the study; collection, management, analysis or interpretation of the data; preparation, review or approval of the manuscript; and decision to submit the manuscript for publication.

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.