Protocol for a systematic review on the methodological and reporting quality of prediction model studies using machine learning techniques

Introduction Studies addressing the development and/or validation of diagnostic and prognostic prediction models are abundant in most clinical domains. Systematic reviews have shown that the methodological and reporting quality of prediction model studies is suboptimal. Due to the increasing availability of larger, routinely collected and complex medical data, and the rising application of Artificial Intelligence (AI) or machine learning (ML) techniques, the number of prediction model studies is expected to increase even further. Prediction models developed using AI or ML techniques are often labelled as a ‘black box’ and little is known about their methodological and reporting quality. Therefore, this comprehensive systematic review aims to evaluate the reporting quality, the methodological conduct, and the risk of bias of prediction model studies that applied ML techniques for model development and/or validation. Methods and analysis A search will be performed in PubMed to identify studies developing and/or validating prediction models using any ML methodology and across all medical fields. Studies will be included if they were published between January 2018 and December 2019, predict patient-related outcomes, use any study design or data source, and available in English. Screening of search results and data extraction from included articles will be performed by two independent reviewers. The primary outcomes of this systematic review are: (1) the adherence of ML-based prediction model studies to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD), and (2) the risk of bias in such studies as assessed using the Prediction model Risk Of Bias ASsessment Tool (PROBAST). A narrative synthesis will be conducted for all included studies. Findings will be stratified by study type, medical field and prevalent ML methods, and will inform necessary extensions or updates of TRIPOD and PROBAST to better address prediction model studies that used AI or ML techniques. Ethics and dissemination Ethical approval is not required for this study because only available published data will be analysed. Findings will be disseminated through peer-reviewed publications and scientific conferences. Systematic review registration PROSPERO, CRD42019161764.

Differences or similarities in definitions with the Yes development study are described No NA (Mentioning of any differences in all four (setting, eligibility criteria, predictors and outcome) is required to score Yes. If it is explicitly mentioned that there were no differences in setting, eligibility criteria, predictors and outcomes, score Yes. For incremental value reports, in case additional predictors are not added to a previously developed prediction model but rather added to conventional predictors in a newly fitted model, score Not applicable.) In which domains are differences? Setting Eligibility criteria Predictors Outcomes No differences were reported Other BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s)
If other, please specify __________________________________ (list using (;) to separate if more than 1.) Is there a diagram/draw to clarify the methods used? Yes No

Source of data
Development Yes The study design or source of data is reported No (E.g. Prospectively designed, existing cohort, existing RCT, registry/medical records, case control, case series. This needs to be explicitly reported; reference to this information in another article alone is insufficient.)

External validation
Yes The study design or source of data is reported No (E.g. Prospectively designed, existing cohort, existing RCT, registry/medical records, case control, case series. This needs to be explicitly reported; reference to this information in another article alone is insufficient.)

External validation
Yes The study design or source of data is reported No (E.g. Prospectively designed, existing cohort, existing RCT, registry/medical records, case control, case series. This needs to be explicitly reported; reference to this information in another article alone is insufficient.)

Development
Yes The study setting is reported No (E.g.: 'surgery for endometrial cancer patients' is considered to be enough information about the study setting.)

External validation
Yes The study setting is reported No (E.g.: 'surgery for endometrial cancer patients' is considered to be enough information about the study setting.)

External validation
Yes The study setting is reported No (E.g.: 'surgery for endometrial cancer patients' is considered to be enough information about the study setting.)

Development
Primary care What is the setting for the model?
Secondary care Tertiary care General population Other (Primary care = GPs, dentists and pharmacists (often first point of care). Secondary care = hospital or clinic based care -can be planned (e.g., cataract operation) or emergency (e.g., fracture). Tertiary care = highly specialised treatments (e.g., transplant, hip replacement). )

External validation
Primary care What is the setting for the model?
Secondary care Tertiary care General population Other (Primary care = GPs, dentists and pharmacists (often first point of care). Secondary care = hospital or clinic based care -can be planned (e.g., cataract operation) or emergency (e.g., fracture). Tertiary care = highly specialised treatments (e.g., transplant, hip replacement). )

External validation
Primary care What is the setting for the model?
Secondary care Tertiary care General population Other (Primary care = GPs, dentists and pharmacists (often first point of care). Secondary care = hospital or clinic based care -can be planned (e.g., cataract operation) or emergency (e.g., fracture). Tertiary care = highly specialised treatments (e.g., transplant, hip replacement If other, please specify __________________________________ Development Yes The number of centres involved is reported No (If the number is not reported explicitly, but can be concluded from the name of the centre/centres, or if clearly a single centre study, score Yes.)

External validation
Yes The number of centres involved is reported No (If the number is not reported explicitly, but can be concluded from the name of the centre/centres, or if clearly a single centre study, score Yes.)

External validation
Yes The number of centres involved is reported No (If the number is not reported explicitly, but can be concluded from the name of the centre/centres, or if clearly a single centre study, score Yes.) How many centres involved? __________________________________ How many centres involved? __________________________________ How many centres involved? __________________________________ Development Yes The geographical location (at least country) of No centres involved is reported (If no geographical location is specified, but the location can be concluded from the name of the centre(s), score Yes.) External validation Yes The geographical location (at least country) of No centres involved is reported (If no geographical location is specified, but the location can be concluded from the name of the centre(s), score Yes.) External validation Yes The geographical location (at least country) of No centres involved is reported (If no geographical location is specified, but the location can be concluded from the name of the centre(s), score Yes.) If yes, what was the geographic location of the data Europe collection?
North America Latin America Asia Africa Oceania (Multiples answers are possible) BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) Yes Actions to blind assessment of outcome to be No predicted are reported (If it is clearly a non-issue (e.g. all-cause mortality or an outcome not requiring interpretation), score Yes. In all other instances, an explicit mention is expected.) External validation Yes Actions to blind assessment of outcome to be No predicted are reported (If it is clearly a non-issue (e.g. all-cause mortality or an outcome not requiring interpretation), score Yes. In all other instances, an explicit mention is expected.) External validation Yes Actions to blind assessment of outcome to be No predicted are reported (If it is clearly a non-issue (e.g. all-cause mortality or an outcome not requiring interpretation), score Yes. In all other instances, an explicit mention is expected.) BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) Yes It is clearly described whether predictor assessments No were blinded for outcome (For predictors for which it is clearly a non-issue (e.g. automatic blood pressure measurement, age, sex) and for instances where the predictors were clearly assessed before outcome assessment, score Yes. For all other predictors an explicit mention is expected.) External validation Yes It is clearly described whether predictor assessments No were blinded for outcome (For predictors for which it is clearly a non-issue (e.g. automatic blood pressure measurement, age, sex) and for instances where the predictors were clearly assessed before outcome assessment, score Yes. For all other predictors an explicit mention is expected.) External validation Yes It is clearly described whether predictor assessments No were blinded for outcome (For predictors for which it is clearly a non-issue (e.g. automatic blood pressure measurement, age, sex) and for instances where the predictors were clearly assessed before outcome assessment, score Yes. For all other predictors an explicit mention is expected.) Development Yes It is clearly described whether predictor assessments No were blinded for the other predictors External validation Yes It is clearly described whether predictor assessments No were blinded for the other predictors External validation Yes It is clearly described whether predictor assessments No were blinded for the other predictors

Development
Yes It is explained how the sample size was arrived at No (Is there any mention of sample size, e.g. whether this was done on statistical grounds or practical/logistical grounds (e.g. an existing study cohort or data set of a RCT was used)?)

External validation
Yes It is explained how the sample size was arrived at No (Is there any mention of sample size, e.g. whether this was done on statistical grounds or practical/logistical grounds (e.g. an existing study cohort or data set of a RCT was used)?)

External validation
Yes It is explained how the sample size was arrived at No (Is there any mention of sample size, e.g. whether this was done on statistical grounds or practical/logistical grounds (e.g. an existing study cohort or data set of a RCT was used)?)

Model building
Instructions -Please extract the models in the order they are presented in the article.
-If more than 10 models were developed for the main outcome, only refer to the first 10.
-If a comparison with logistic regression was made, please included this model in the final count and extract information.
How many models were developed for the primary outcome?
__________________________________ (This should reflect the number of models you are going to extract on -primary outcome and primary timepoint (If more than 10 models were developed, only refer to the first 10). If a comparison with logistic regression was made, please included this model in the final count.) External validation Yes It is described how predictions for individuals (in No the validation set) were obtained from the model (E.g. Using the original reported model being validated coefficients with or without the intercept, and/or using updated or refitted model coefficients, or using a nomogram, spreadsheet or web calculator.)

Model 1
The type  The approach used for predictor selection before Yes modelling is described No NA (Before modelling' means before any univariable or multivariable analysis of predictor-outcome associations. If no predictor selection before modelling is done, score Not applicable. If it is unclear whether predictor selection before modelling is done, score No. If it is clear there was predictor selection before modelling but the method was not described, score No.) The approach used for predictor selection during The approach used for predictor selection before Yes modelling is described No NA (Before modelling' means before any univariable or multivariable analysis of predictor-outcome associations. If no predictor selection before modelling is done, score Not applicable. If it is unclear whether predictor selection before modelling is done, score No. If it is clear there was predictor selection before modelling but the method was not described, score No.) The approach used for predictor selection during The approach used for predictor selection before Yes modelling is described No NA (Before modelling' means before any univariable or multivariable analysis of predictor-outcome associations. If no predictor selection before modelling is done, score Not applicable. If it is unclear whether predictor selection before modelling is done, score No. If it is clear there was predictor selection before modelling but the method was not described, score No.) The approach used for predictor selection during The approach used for predictor selection before Yes modelling is described No NA (Before modelling' means before any univariable or multivariable analysis of predictor-outcome associations. If no predictor selection before modelling is done, score Not applicable. If it is unclear whether predictor selection before modelling is done, score No. If it is clear there was predictor selection before modelling but the method was not described, score No.) The approach used for predictor selection during The approach used for predictor selection before Yes modelling is described No NA (Before modelling' means before any univariable or multivariable analysis of predictor-outcome associations. If no predictor selection before modelling is done, score Not applicable. If it is unclear whether predictor selection before modelling is done, score No. If it is clear there was predictor selection before modelling but the method was not described, score No.) The approach used for predictor selection during The approach used for predictor selection before Yes modelling is described No NA (Before modelling' means before any univariable or multivariable analysis of predictor-outcome associations. If no predictor selection before modelling is done, score Not applicable. If it is unclear whether predictor selection before modelling is done, score No. If it is clear there was predictor selection before modelling but the method was not described, score No.) The approach used for predictor selection during The approach used for predictor selection before Yes modelling is described No NA (Before modelling' means before any univariable or multivariable analysis of predictor-outcome associations. If no predictor selection before modelling is done, score Not applicable. If it is unclear whether predictor selection before modelling is done, score No. If it is clear there was predictor selection before modelling but the method was not described, score No.) The approach used for predictor selection during Yes modelling is described No NA (E.g. Univariable analysis, stepwise selection, bootstrap, Lasso. 'During modelling' includes both univariable or multivariable analysis of predictor-outcome associations. If no predictor selection during modelling is done (so-called full model approach), score Not applicable. If it is unclear whether predictor selection during modelling is done, score No. If it is clear there was predictor selection during modelling but the method was not described, score No.) What was the model building strategy?
Stepwise The approach used for predictor selection before Yes modelling is described No NA (Before modelling' means before any univariable or multivariable analysis of predictor-outcome associations. If no predictor selection before modelling is done, score Not applicable. If it is unclear whether predictor selection before modelling is done, score No. If it is clear there was predictor selection before modelling but the method was not described, score No.) The approach used for predictor selection during Yes modelling is described No NA (E.g. Univariable analysis, stepwise selection, bootstrap, Lasso. 'During modelling' includes both univariable or multivariable analysis of predictor-outcome associations. If no predictor selection during modelling is done (so-called full model approach), score Not applicable. If it is unclear whether predictor selection during modelling is done, score No. If it is clear there was predictor selection during modelling but the method was not described, score No.) What was the model building strategy?
Stepwise The approach used for predictor selection before Yes modelling is described No NA (Before modelling' means before any univariable or multivariable analysis of predictor-outcome associations. If no predictor selection before modelling is done, score Not applicable. If it is unclear whether predictor selection before modelling is done, score No. If it is clear there was predictor selection before modelling but the method was not described, score No.) The approach used for predictor selection during Yes modelling is described No NA (E.g. Univariable analysis, stepwise selection, bootstrap, Lasso. 'During modelling' includes both univariable or multivariable analysis of predictor-outcome associations. If no predictor selection during modelling is done (so-called full model approach), score Not applicable. If it is unclear whether predictor selection during modelling is done, score No. If it is clear there was predictor selection during modelling but the method was not described, score No.) What was the model building strategy?
Stepwise The approach used for predictor selection before Yes modelling is described No NA (Before modelling' means before any univariable or multivariable analysis of predictor-outcome associations. If no predictor selection before modelling is done, score Not applicable. If it is unclear whether predictor selection before modelling is done, score No. If it is clear there was predictor selection before modelling but the method was not described, score No.) The approach used for predictor selection during Yes modelling is described No NA (E.g. Univariable analysis, stepwise selection, bootstrap, Lasso. 'During modelling' includes both univariable or multivariable analysis of predictor-outcome associations. If no predictor selection during modelling is done (so-called full model approach), score Not applicable. If it is unclear whether predictor selection during modelling is done, score No. If it is clear there was predictor selection during modelling but the method was not described, score No.) What was the model building strategy? Stepwise

Model Performance
Instructions -Please extract the models in the order they are presented in the article.
-If more than 10 models were developed for the main outcome, only refer to the first 10.
-If a comparison with logistic regression was made, please include this model in the final count and extract information.
How many models were developed for the primary outcome?
__________________________________ (This should reflect the number of models you are going to extract on -primary outcome and primary timepoint. If more than 10 models were developed, please refer to the first 10 models. If a Logistic regression model was performed, please also extract data from this model ) If yes, did the authors make use of leading word to Yes reject those non predictive models reported?
No (E.g. The effect is said to be significant, although the 95% confidence interval of the adjusted odds ratio crosses 1; OR Words like "trend" or "borderline, "significance", "statistically significant" are used) Please copy the statement below __________________________________________ Is uncertainty reported in the abstract? Yes No (The use of any verbs as "may" or "could", nor any words as "likely to" or "maybe" ) Limitations are reported in the abstract Yes No Please copy the statement below __________________________________________ Any additional comment about the "abstract" section of this article? __________________________________________ (If there is something in the "Abstract" section that does not fit into the questions of this form -please use this space to detail. Also use this space to detail anything you are unsure about.) BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) No information (Yes/probably yes: If inclusion and exclusion of participants was appropriate, so participants correspond to unselected participants of interest. No/probably no: If participants are included who would already have been identified as having the outcome and so are no longer participants at suspicion of disease (diagnostic studies) or at risk of developing outcome (prognostic studies), or if specific subgroups are excluded that may have altered the performance of the prediction model for the intended target population. No information: When there is no information on whether inappropriate inclusions or exclusions took place.) BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) No information (Yes/probably yes: If inclusion and exclusion of participants was appropriate, so participants correspond to unselected participants of interest. No/probably no: If participants are included who would already have been identified as having the outcome and so are no longer participants at suspicion of disease (diagnostic studies) or at risk of developing outcome (prognostic studies), or if specific subgroups are excluded that may have altered the performance of the prediction model for the intended target population. No information: When there is no information on whether inappropriate inclusions or exclusions took place.)

Development
Low ROB Risk of bias introduced by Participants High ROB Unclear ROB (Low risk of bias: If the answer to all signaling questions is "Yes" or "Probably yes," then risk of bias can be considered low. If ≥1 of the answers is "No" or "Probably no," the judgment could still be "Low risk of bias" but specific reasons should be provided why the risk of bias can be considered low. High risk of bias: If the answer to any of the signaling questions is "No" or "Probably no," there is a potential for bias, except if defined at low risk of bias above. Unclear risk of bias: If relevant information is missing for some of the signaling questions and none of the signaling questions is judged to put this domain at high risk of bias.)

External validation
Low ROB Risk of bias introduced by Participants High ROB Unclear ROB (Low risk of bias: If the answer to all signaling questions is "Yes" or "Probably yes," then risk of bias can be considered low. If ≥1 of the answers is "No" or "Probably no," the judgment could still be "Low risk of bias" but specific reasons should be provided why the risk of bias can be considered low. High risk of bias: If the answer to any of the signaling questions is "No" or "Probably no," there is a potential for bias, except if defined at low risk of bias above. Unclear risk of bias: If relevant information is missing for some of the signaling questions and none of the signaling questions is judged to put this domain at high risk of bias.) No information (Yes/probably yes: If both calibration and discrimination are evaluated appropriately (including relevant measures tailored for models predicting survival outcomes). No/probably no: If both calibration and discrimination are not evaluated, or if only goodness-of-fit tests, such as the Hosmer-Lemeshow test, are used to evaluate calibration, or if for models predicting survival outcomes performance measures accounting for censoring are not used, or if classification measures (like sensitivity, specificity, or predictive values) were presented using predicted probability thresholds derived from the data set at hand. No information: Either calibration or discrimination are not reported, or no information is provided as to whether appropriate performance measures for survival outcomes are used (e.g., references to relevant literature or specific mention of methods, such as using Kaplan-Meier estimates), or no information on thresholds for estimating classification measures is given.)

External validation
Yes / Probably yes Were relevant model performance measures evaluated No / Probably no appropriately?
No information (Yes/probably yes: If both calibration and discrimination are evaluated appropriately (including relevant measures tailored for models predicting survival outcomes). No/probably no: If both calibration and discrimination are not evaluated, or if only goodness-of-fit tests, such as the Hosmer-Lemeshow test, are used to evaluate calibration, or if for models predicting survival outcomes performance measures accounting for censoring are not used, or if classification measures (like sensitivity, specificity, or predictive values) were presented using predicted probability thresholds derived from the data set at hand. No information: Either calibration or discrimination are not reported, or no information is provided as to whether appropriate performance measures for survival outcomes are used (e.g., references to relevant literature or specific mention of methods, such as using Kaplan-Meier estimates), or no information on thresholds for estimating classification measures is given.)

Development
Yes / Probably yes Were model overfitting, under-fitting, and optimism No / Probably no in model performance accounted for?
No information BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) Low ROB Risk of bias introduced by the analysis High ROB Unclear ROB (Low risk of bias: If the answer to all signaling questions is "Yes" or "Probably yes," then risk of bias can be considered low. If ≥1 of the answers is "No" or "Probably no," the judgment could still be low risk of bias, but specific reasons should be provided why the risk of bias can be considered low. High risk of bias: If the answer to any of the signaling questions is "No" or "Probably no," there is a potential for bias. Unclear risk of bias: If relevant information about the analysis is missing for some of the signaling questions but none of the signaling question answers is judged to put the analysis at high risk of bias)

External validation
Low ROB Risk of bias introduced by the analysis High ROB Unclear ROB (Low risk of bias: If the answer to all signaling questions is "Yes" or "Probably yes," then risk of bias can be considered low. If ≥1 of the answers is "No" or "Probably no," the judgment could still be low risk of bias, but specific reasons should be provided why the risk of bias can be considered low. High risk of bias: If the answer to any of the signaling questions is "No" or "Probably no," there is a potential for bias. Unclear risk of bias: If relevant information about the analysis is missing for some of the signaling questions but none of the signaling question answers is judged to put the analysis at high risk of bias) Overall assessment of ROB

Development
Low risk of bias Overall risk of bias High risk of bias Unclear risk of bias (Low ROB: If all domains were rated low risk of bias. If a prediction model was developed without any external validation, and it was rated as low risk of bias for all domains, consider downgrading to high risk of bias. Such a model evaluation can only be considered as low risk of bias, if the development was based on a very large data set and included some form of internal validation. High ROB: If ≥1 domain is judged to be at high risk of bias. Unclear ROB: If an unclear risk of bias was noted in ≥1 domain and it was low risk for all other domains. )

External validation
Low risk of bias Overall risk of bias High risk of bias Unclear risk of bias (Low ROB: If all domains were rated low risk of bias. If a prediction model was developed without any external validation, and it was rated as low risk of bias for all domains, consider downgrading to high risk of bias. Such a model evaluation can only be considered as low risk of bias, if the development was based on a very large data set and included some form of internal validation. High ROB: If ≥1 domain is judged to be at high risk of bias. Unclear ROB: If an unclear risk of bias was noted in ≥1 domain and it was low risk for all other domains. ) Any additional comments about ROB on this article? __________________________________________ (If there is something in the "PROBAST" that does not fit into the questions of this form -please use this space to detail. Also use this space to detail anything you are unsure about.) BMJ Publishing Group Limited (BMJ) disclaims all liability and responsibility arising from any reliance Supplemental material placed on this supplemental material which has been supplied by the author(s) BMJ Open doi: 10.1136/bmjopen-2020-038832 :e038832.