Original research
How accurate are digital symptom assessment apps for suggesting conditions and urgency advice? A clinical vignettes comparison to GPs
Compose Response

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Author Information
First or given name, e.g. 'Peter'.
Your last, or family, name, e.g. 'MacMoody'.
Your email address, e.g.
Your role and/or occupation, e.g. 'Orthopedic Surgeon'.
Your organization or institution (if applicable), e.g. 'Royal Free Hospital'.
Statement of Competing Interests


  • Responses are moderated before posting and publication is at the absolute discretion of BMJ, however they are not peer-reviewed
  • Once published, you will not have the right to remove or edit your response. Removal or editing of responses is at BMJ's absolute discretion
  • If patients could recognise themselves, or anyone else could recognise a patient from your description, please obtain the patient's written consent to publication and send them to the editorial office before submitting your response [Patient consent forms]
  • By submitting this response you are agreeing to our full [Response terms and requirements]

Vertical Tabs

Other responses

Jump to comment:

  • Published on:
    Response to comment of Schmieding and colleagues.

    We thank Schmieding and colleagues for commenting on our study. As noted, we took great effort to compare different symptom checkers fairly.

    Vignettes studies have advantages, but as discussed at length in the manuscript, vignette studies also have some inherent limitations. For this reason, as described, we are carrying out real patient studies to further investigate results.

    We support the efforts of Schmieding and colleagues to work towards transparency and a standardization of methods in the field of technology-supported clinical decision support: the literature is advancing in this direction. We agree the study of (Semigran et al., 2015) is important, but it evaluated a relatively small number of vignettes and used a small number of researcher vignettes enterers, likely not highly reflective of real world symptom assessment app use.

    There was discussion of the limitations of our study in the manuscript which we do not further elaborate here, except with respect to two of the points made by Schmieding and colleagues.

    Firstly, regarding open access to the repertoire of case vignettes, in our study, although not open access, researchers can request access to the vignettes as described in the Data Sharing Agreement and this is similar to the approach described for other recently reported studies (Richens et al., 2020). Although there are some advantages in releasing all the vignettes, as done by (Semigran et al., 2015) there is a degree to whi...

    Show More
    Conflict of Interest:
    Employee of Ada Health GmbH.
  • Published on:
    Lessons learnt from Ada’s study - Towards a framework to evaluate symptom checkers
    • Malte L Schmieding, Physician / Researcher Institute of Medical Informatics, Charité - Universitätsmedizin Berlin
    • Other Contributors:
      • Marvin Kopka, Student (Human Factors Master of Science)
      • Markus A Feufel, Researcher


    The study by ADA undertook great effort to compare different symptom checkers competitively. Some of the study’s limitations are unavoidable, such as the authors’ conflict of interest or the limits of vignette-based assessments in general. Other limitations, however, could be avoided in the future. To advance the field of technology-supported clinical decision support, we reason that more transparency and a standardization of methods are needed in at least four areas: (1) open access to the repertoire of case vignettes used to benchmark symptom checker performance; (2) reporting of the performance variation of those who entered the vignettes into the symptom checkers and defining clear guidelines on how to handle the ambiguities that occur when data from vignettes are entered in symptom checkers ; (3) transparent reporting of the performance results, for instance, using confusion tables with absolute numbers and percentages of the correct and the observed triage recommendations; and (4) full reporting of results showing each app’s assessment of each case vignette to make the analyses reproducible, and allow for secondary analysis of the data, as was commendably done by Semigran et al. (2015).

    In the following we elaborate these arguments in detail. Ultimately, we think that increasing transparency and standardization of assessment methods will not only advance the field, but also help to address the current discrepancy between industry-funded stud...

    Show More
    Conflict of Interest:
    MLS was an employee of medx GmbH (now Ada Health GmbH) from 2014 to 2015.
  • Published on:
    Response to comment of Aleksandar Ćirković and rebuttal of the suggestion of additional limitations

    This is a response to a comment on the paper dated 20 January 2021.

    Dear Dr. Ćirković,

    We thank Ćirković for welcoming our study. Before answering the points raised by Ćirković I first mention a currently-under-peer-review scientific communication between myself and the editor of the journal which published the article authored by Ćirković, that he cites in his comment [1]. The full text of this Letter to the Editor has been available as a preprint in the public domain since 18 Dec 2021 [2].

    - I address below the points raised by Ćirković on his comment on our BMJ Open paper in turn:

    1. TRIPOD and PROBAST checklists - searching the clinical literature using PubMed and Google Scholar identifies no studies that have used these checklists in the evaluation of vignettes studies or studies of symptom assessment applications. We reviewed the literature on symptom assessment app in our paper, and I have looked again at each of these papers, and none of them used the TRIPOD or the PROBAST checklist, nor did a recent paper in this domain from an independent academic group [3], nor did Ćirković himself in [1]. The TRIPOD checklist (transparent reporting of a multivariable prediction model for individual prognosis or diagnosis) is designed for studies into diagnostic and prognostic models for clinicians, and although an interesting suggestion, the use of such a checklist is not compulsory and not having adopted it is not a study limitation in itself. The P...

    Show More
    Conflict of Interest:
    Employee of Ada Health GmbH.
  • Published on:
    Five missing limitations to this evaulation

    The study is a welcome addition to the yet sparse database of evaluations of self-diagnosis tools that, unlike expert systems, are to be put into patients’ hands directly. However, there are some limitations missing in the Discussions section that I’d like to add. Firstly, as these self-diagnosis tools can have a significant impact on public health [1], it will be crucial for regulating authorities and lawmakers to create a framework that correctly distributes the liabilities on the participants in the system and ensures a positive ratio of benefit vs risk; this will only be possible if there is enough robust information available on the respective systems/programs/apps to thoroughly assess them. As this novel field is just unfolding, there are plenty of obstacles on the way to a truly objective assessment of AI-driven software, as Nagendran et al recently demonstrated here in the BMJ. [2] They used, among others, the TRIPOD and PROBAST checklists for evaluating the validities of respective studies. A quick check on this study with both would give a result of a TRIPOD adherence of 18 of 24 relevant checklist items and a high risk of bias when using the PROBAST checklist. The release of an updated TRIPOD checklist with a focus on AI-driven software has also been announced and will then be relevant for the evaluation of this study. [3] Thus, incomplete adherence to the present and unknown adherence to the presumably relevant future TRIPOD statements and a high risk of bias a...

    Show More
    Conflict of Interest:
    None declared.
  • Published on:
    Response to comment of Oscar Garcia-Esquirol and rebuttal of the suggestion of study bias.

    This is a response to a comment on the paper dated 7 January 2021.

    Dear Oscar Garcia-Esquirol, Cofounder and Chief Medical Director at Mediktor, one of the symptom checkers evaluated in the study,

    Thank you for your comment on this paper.

    Your letter addressed strong criticism at the paper [1] with respect to bias. I will answer these points in turn below. Our study included a rigorous design process conducted by experienced clinical researchers, data scientists and health policy experts, with the methodology and analysis peer-reviewed by independent and experienced primary care physicians and medical informatics experts at universities in the UK, and in Brown University in the US. To ensure a fair comparison, our team used a large number of clinical vignettes, which were generated from a mix of real patient experiences gleaned from the UK’s NHS 111 telephone triage service and from those generated from the many years’ combined experience of the research team. The gold-standard triage level used in the study was set independently of vignette creation, vignette review and vignette diagnosis gold-standard setting - this was done by a separate panel of three experienced primary care practitioners using a tie-breaker panel method based on the matching process set out by [2].

    A strength of this paper is that it not only compares a range of symptom assessment apps to each other, but also compares their performances to that of practicing GPs. While the...

    Show More
    Conflict of Interest:
    Employee of Ada Health GmbH
  • Published on:
    Lack of scientific rigour

    I read with great interest the article “How accurate are digital symptom assessment apps for suggesting conditions and urgency advice? A clinical vignettes comparison to GPs”
    published by Gilbert S. Mehl A, and Baluch A. et al. It evaluated the effectiveness of eight symptom checkers – including Mediktor, of which I am part.

    While I wholeheartedly agree with the statement the study researchers make, citing Chambers D et al. that “rigorous studies are required to show that these apps provide safe and reliable information,” I believe this article is a far cry from being such a rigorous study. I consider it to be little more than an advertising campaign by ADA, since the study falls prey to multiple biases, which I will describe below. These biases invalidate the article as a scientific publication.

    If we examine the foundations upon which the study rests, we can classify these biases thus:

    1) Biases due to the study variables:

    By creating vignettes that simulate clinical cases, a bias is created in the observational conditions of the study variables (SV). The diagnostic accuracy of symptom checkers (SC), created to assess patients in real conditions, are evaluated under artificial conditions. This further violates the nature of the measurement since they are simulated cases created by staff with a direct relationship to ADA, one of the symptom checkers analysed in the study.
    When comparing the SCs’ capability to perform triage, the re...

    Show More
    Conflict of Interest:
    Cofounder and Chief Medical Director at Mediktor, one of the symptom checkers evaluated in the study.