Table 1

Significant gaps in knowledge needed to develop prospective real-world screening trials or evaluation (adapted from Houssami et al13)

Knowledge gap or limitations of published studies	Addressed by this study?	Description of how addressed in our study
Few studies use commercially available AI systems.	Partly	The AI algorithm used in this study10 underlies a triage product that is FDA-approved and commercially available in the USA.
Studies have used relatively small datasets, often consisting of mammograms from several hundred women (rarely several thousand). Larger validation datasets are required.	Yes	A large validation dataset including 109 000 women will be used.
The same or selected subsets of the same datasets were used to train and validate models. Validation using independent, external datasets is required.	Yes	The study dataset is external to and independent from the datasets used to train the algorithm.
Datasets were commonly enriched with malignant lesions, with studies often selecting images containing suspicious abnormalities. Studies are required in unselected screening populations.	Yes	The study dataset is a consecutive, unselected population drawn from a real world, biennial population-based breast screening programme (BreastScreen WA). The dataset is not enriched with cancers. The prevalence and disease spectrum of screen-detected and interval cancers are representative of population breast screening.
There is a paucity of studies reporting conventional screening metrics (CDR and recall rate).	Yes	The inclusion of unique, consecutive screening episodes will allow estimation of CDR and recall rate (it is not possible to accurately derive these metrics from case-controlled, cancer-enriched datasets).
There is limited data on AI versus human interpretation. Future studies should compare AI to radiologists’ performance or report the incremental improvement for AI algorithms in combination with radiologists.	Yes	The comparative accuracy of AI and radiologists will be estimated in terms of AUC-ROC, sensitivity and specificity. Incremental rates of cancer detection and recall will be estimated for double-reading with and without AI.
There are no studies on women’s or societal perspectives on the acceptability of AI.	No	This is beyond the scope of the present study. A parallel stream of social and ethical research by some of the study investigators will explore the acceptability of AI.
Future studies should include images from digital breast tomosynthesis, given the rapid adoption of this technology.	No	This is beyond the scope of the present study. Digital breast tomosynthesis is not currently used in Australian publicly funded population breast screening programmes.

AI, artiﬁcial intelligence; AUC-ROC, area under the receiver operating characteristic curve; CDR, cancer detection rate; FDA, Food and Drug Administration.