Standard echocardiography versus handheld echocardiography for the detection of subclinical rheumatic heart disease: a systematic review and meta-analysis of diagnostic accuracy

Objective To summarise the accuracy of handheld echocardiography (HAND) which, if shown to be sufficiently similar to that of standard echocardiography (STAND), could usher in a new age of rheumatic heart disease (RHD) screening in endemic areas. Design Systematic review and meta-analysis. Data sources PubMed, Scopus, EBSCOHost and ISI Web of Science were initially searched on 27 September 2017 and again on 3 March 2020 for studies published from 2012 onwards. Eligibility criteria Studies assessing the accuracy of HAND compared with STAND when performed by an experienced cardiologist in conjunction with the 2012 World Heart Federation criteria among populations of children and adolescents living in endemic areas were included. Data extraction and synthesis Two reviewers independently extracted data and assessed the methodological quality of included studies against review-specific Quality Assessment of Diagnostic Accuracy Studies (QUADAS)-2 criteria. A meta-analysis using the hierarchical summary receiver operating characteristic model was conducted to produce summary results of sensitivity and specificity. Forest plots and scatter plots in receiver operating characteristic space in combination with subgroup analyses were used to investigate heterogeneity. Publication bias was not investigated. Results Six studies (N=4208) were included in the analysis. For any RHD detection, the pooled results from six studies were as follows: sensitivity: 81.56% (95% CI 76.52% to 86.61%) and specificity: 89.75% (84.48% to 95.01%). Meta-analytical results from five of the six included studies were as follows: sensitivity: 91.06% (80.46% to 100%) and specificity: 91.96% (85.57% to 98.36%) for the detection of definite RHD only and sensitivity: 62.01% (31.80% to 92.22%) and specificity: 82.33% (65.15% to 99.52%) for the detection of borderline RHD only. Conclusions HAND displayed good accuracy for detecting definite RHD only and modest accuracy for detecting any RHD but demonstrated poor accuracy for the detection of borderline RHD alone. Findings from this review provide some evidence for the potential of HAND to increase access to echocardiographic screening for RHD in resource-limited and remote settings; however, further research into feasibility and cost-effectiveness of wide-scale screening is still needed. PROSPERO registration number CRD42016051261.

I, the Submitting Author has the right to grant and does grant on behalf of all authors of the Work (as defined in the below author licence), an exclusive licence and/or a non-exclusive licence for contributions from authors who are: i) UK Crown employees; ii) where BMJ has agreed a CC-BY licence shall apply, and/or iii) in accordance with the terms applicable for US Federal Government officers or employees acting as part of their official duties; on a worldwide, perpetual, irrevocable, royalty-free basis to BMJ Publishing Group Ltd ("BMJ") its licensees and where the relevant Journal is co-owned by BMJ to the co-owners of the Journal, to publish the Work in this journal and any other BMJ products and to exploit all rights, as set out in our licence.
The Submitting Author accepts and understands that any supply made under these terms is made by BMJ to the Submitting Author unless you are acting as an employee on behalf of your employer or a postgraduate student of an affiliated institution which is paying any applicable article publishing charge ("APC") for Open Access articles. Where the Submitting Author wishes to make the Work available on an Open Access basis (and intends to pay the relevant APC), the terms of reuse of such Open Access shall be governed by a Creative Commons licence -details of these licences and which Creative Commons licence will apply to this Work are set out in our licence referred to above.
Other than as permitted in any relevant BMJ Author's Self Archiving Policies, I confirm this Work has not been accepted for publication elsewhere, is not being considered for publication elsewhere and does not duplicate material already published. I confirm all authors consent to publication of this Work and authorise the granting of this licence.

INTRODUCTION
Rheumatic heart disease (RHD) is an acquired permanent heart valve condition which results from an atypical immune reaction to group A streptococcal (GAS) infection typically occurring in childhood. [1,2] Disease progression leading to chronic RHD can result in irreversible heart valve damage, cardiac failure, and premature death. [3,4] RHD is, however, a preventable and treatable chronic condition which most often effects disadvantaged populations. [3,5] Significantly, RHD can remain asymptomatic for many years, particularly during the initial stages, thereby hindering the timely implementation of penicillin prophylaxis. [6] Echocardiographic screening to identify those with subclinical disease has been advocated as a means to support secondary prevention and potentially slow disease progression to overt clinical RHD. [7,8] Yet the feasibility of wide scale echocardiographic screening remains hindered by high costs and the scarcity of trained personnel. [9] Alternative RHD screening tests which are both accurate and affordable are therefore needed in many endemic areas.
Handheld echocardiography (HAND) is a non-invasive, highly portable and comparatively less expensive device which has been presented in recent publications to be a promising alternative to standard echocardiography (STAND), despite some limitations such as a lack of spectral Doppler capabilities. [10,11] For HAND to be considered a suitable replacement for STAND, the device's accuracy needs to be similar to that of STAND.
We conducted a systematic review and meta-analysis of studies assessing the diagnostic accuracy of HAND for the detection of RHD in children and adolescents. The findings of this guidelines. [12] The protocol for this review is registered with the International Prospective Register of Systematic Reviews (PROSPERO) under the registration number CRD42016051261 and has been published in BMJ Open.

Data sources and study eligibility
Studies were considered eligible for inclusion if the following criteria were met: 1) the accuracy of HAND compared to STAND when performed by an experienced cardiologist and in conjunction with the 2012 World Heart Federation (WHF) criteria was evaluated, and 2) the sample consisted of populations of children and adolescents living in endemic areas.
Only primary observational studies of either a cross-sectional, cohort or diagnostic casecontrol design were considered. Descriptive studies such case studies and case series were excluded as were studies reporting on the same data. Studies using non-handheld devices as the index test or criteria other than the 2012 WHF criteria in combination with STAND as the reference test were also excluded.
We conducted systematic electronic literature searches of four sources (PubMed, Scopus, EBSCOHost and ISI Web of Science) using predefined tailor-made strategies. No restrictions in terms of language were applied, however, searches were limited to articles published The titles and/or abstracts of all identified articles were screened independently by two reviewers. During this process, and on the basis of predefined eligibility criteria, all clearly ineligible studies were excluded. Discrepancies regarding eligibility were resolved through discussion and consensus. Some authors were contacted for additional information on published data.

Data extraction and management
Using a predefined data extraction form, two reviewers independently extracted information on metrics of diagnostic accuracy: numbers of true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN) as well as other covariates relating to study characteristics, population, reference and index test details, test outcome and number of missing or unavailable test results from all included studies.
Accuracy measures: sensitivity and specificity were calculated using the numbers of TP, FP, TN and FN in accordance with standard convention. Data extraction conflicts were resolved through discussion and with the assistance of a third reviewer where necessary. Information garnered through the data extraction process was used to determine each study's quality as well as for synthesising evidence. A review-specific Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool was used to assess the risk of bias and concerns regarding applicability of all included studies. [13] The tool, encompassing four domains, was tailored to meet the specific requirements of this review. Two reviewers independently assessed the risk of bias in all included studies according to review-specific QUADAS-2 criteria. Discrepancies were resolved through discussion until consensus was reached and the assistance of a third reviewer was enlisted when necessary.

Statistical analysis and data synthesis
The Hierarchical Summary Receiver Operating Characteristic (HSROC) model was used for meta-analysis as it accounts for variations in test thresholds. Data were analysed according to three categorisations of RHD; any RHD (definite or borderline), definite RHD only and borderline RHD only. The any RHD category was selected as the main meta-analysis as it had the most complete data. We were unable to extract metrics of diagnostic accuracy for the definite and borderline RHD only categories from Beaton, 2016 and therefore excluded this study from these meta-analyses. We chose to use Nurse A's results for Mirabel, 2015 since Nurse A and Nurse B both interpreted the same HAND images which prevented the pooling of data.
Data from Zühlke, 2016 were included in the analysis and synthesis of data even though the age range of participants fell outside the predefined range for eligibility. It was determined that this study should be included, regardless, since the data overall were quite few and the variation in age was not significant enough to warrant exclusion. However, data from F o r p e e r r e v i e w o n l y 9 Zühlke, 2016 were excluded from all summary estimates of disease prevalence since this study used a nested case-control design which predetermines disease prevalence by design.
Heterogeneity was examined for the main meta-analysis only. We were only able to investigate the relationship between test accuracy and echocardiographer expertise through subgroup analysis. A sensitivity analysis was performed instead of subgroup analysis for the categorical covariates; HAND protocol and geographic location due to the skewed distribution of studies within each subgroup. We were unable to perform metaregression for the covariates; age and sex due to insufficient and inadequately reported data. [14] We were also unable to conduct a sensitivity analysis on risk of bias since no studies were found to have a high risk of bias. All plots were generated using the Review Manager (RevMan) software package, version 5.3. [15] Meta-analysis was performed using SAS ® software, version 9.4. [16] We did not investigate publication bias as methods of assessing publication bias for studies of diagnostic accuracy are still being developed. While the Deeks test has been suggested for use in diagnostic accuracy studies, the test has low power for detecting asymmetry in funnel plots, particularly when a large amount of heterogeneity is present. [14]

Results of the search
Results of the literature search are reported in accordance with the PRISMA Statement and the study selection process is illustrated in Fig 1.[17] All electronic searches were performed by two independent reviewers on September 27 th , 2017. Combined, the search yielded a total of ninety two records, of which nine were duplicates. A total of sixty seven were   [17] The same search was re-run on March 3 rd , 2020 to check for any additional eligilble studies.
Only one potentially eligible study [18,19] was found but has been excluded on the basis of being an abstract only publication with no full-text available for review.

Included studies
A summary of notable characteristics of all included studies [20][21][22][23][24][25] is shown in Table 1. One study did not avoid a case-control design, however, cases and controls were sampled from the same population. Research has shown that case-control studies which use alternative diagnosis controls, controls from non-endemic areas or confirmed disease-free (healthy) controls tend to overestimate specificity. [26] Significantly, all but two studies were conducted in Africa. Screening was performed in RHD endemic areas among children and adolescents with most studies being school-based.
Combined, all six studies included a total of 4208 participants of which 54% were female.
All included studies used the same make of handheld device; the Vscan machine (General   [21,23] Frame rates range from 25 to 30 Hz for greyscale imaging and 12 to 16 Hz for colour Doppler. [20,24] Vscan machines are, however, limited by a lack of spectral Doppler capabilities. [21] F o r p e e r r e v i e w o n l y 12

Excluded studies
Ten studies [9,10,[27][28][29][30][31][32][33][34] were excluded during full-text screening. Reasons for exclusion included abstract only publications, the use of ineligible reference or index tests, the use of duplicate data, not specifying the test threshold a-priori, and not being a study of diagnostic accuracy.

Methodological quality of included studies
Overall, only two of the six included studies were assessed as having a low risk of bias while the risk in the remaining four was unclear. Two studies had participant selection bias concerns. Of these both failed to adequately describe participant enrolment methods whilst one also did not avoid a case-control design. The risk of bias in terms of flow and timing was unclear in two studies. Of these, one study did not include all participants in the analysis due to technical difficulties while the time interval between the index and reference test was unclear in the other. Overall, time intervals between index and reference tests were poorly described. Likewise, reporting of quality control of the index test was uniformly poor across all included studies. Concerns regarding applicability were low in all six studies.

For any RHD
A total of six evaluations of HAND for any RHD were performed with data from six studies and a total of 4208 participants. Pooled prevalence of any RHD from five included studies was 12% (95% confidence interval (CI): 6%-19%). The forest plot revealed little variation in estimates of sensitivity and specificity and the HSROC plot revealed moderate accuracy of the test. Meta-analytical sensitivity and specificity (95% CI) of data at mixed thresholds were 82% (77%-87%) and 90% (85%-95%) respectively.

For definite RHD
A total of five evaluations of HAND for definite RHD were performed with data from five studies and a total of 3588 participants. Pooled prevalence of definite RHD from four included studies was 6% (95% CI: 2%-12%). The forest plot revealed some variation in estimates of specificity while estimates of sensitivity were largely homogenous with the exception of a single outlier. The HSROC plot indicated good accuracy of the test. Metaanalytical sensitivity and specificity (95% CI) of data at mixed thresholds were 91% (81%-100%) and 92% (86%-98%) respectively.

For borderline RHD
A total of five evaluations of HAND for borderline RHD were performed with data from five studies and a total of 3685 participants. Pooled prevalence of borderline RHD from four included studies was 20% (95% CI: 6%-39%). The forest plot revealed some variation in estimates of specificity while estimates of sensitivity were largely homogenous with the exception of a single outlier. The HSROC plot indicated poor accuracy of the test. Metaanalytical sensitivity and specificity (95% CI) of data at mixed thresholds were 62% (32%-92%) and 82% (65%-100%) respectively.

Investigations of heterogeneity
Heterogeneity or variation between studies was investigated both visually as well as through subgroup analysis for the main meta-analysis only. We were only able to perform this analysis for the any RHD category as the data were too few to enable model convergence for the definite and borderline RHD only categories.

Co-variates in the models
We were only able to use one of the five pre-specified covariates to investigate heterogeneity due to insufficient and inadequately reported data. HAND echocardiographer expertise (expert vs non-expert) was investigated as a possible source of heterogeneity through subgroup analysis. Half of all included studies evaluated the accuracy of HAND when performed and interpreted by trained non-experts while the other half assessed its accuracy in the hands of experts.

Subgroup and sensitivity analyses
A subgroup analysis was performed to investigate variations in echocardiographer expertise as a potential source of heterogeneity. Since no studies were found to have a high risk of bias we did not explore the effect of excluding such studies on the accuracy of summary estimates. Sensitivity analyses were, however, conducted to investigate the effect of removing a single study on summary estimates of sensitivity and specificity for the covariates; geographic location and HAND protocol.
A subgroup analysis for the covariate; echocardiographer expertise, as shown in  16 higher for any RHD detection using HAND when tests were performed and interpreted by experts compared to non-experts.
Sensitivity analyses (see table 2) were performed to investigate the effect of excluding a) the single high-income country study and b) the study which employed a single view protocol on the accuracy of summary estimates. We found that both sensitivity (81.17% vs 80.4%) and specificity (90.07% vs 87.15%) increased compared to the overall analysis when only lowand middle-income country studies were considered whereas both sensitivity (80.98% vs 85.35%) and specificity (87.46% vs 88.8%) decreased in comparison with the overall analysis when only studies which employed multiple view protocols were considered.

Summary of main findings
We evaluated the accuracy of HAND for three distinct disease categories and found that, overall, the test was both sensitive and specific for detecting definite RHD only and moderately accurate for detecting any RHD but demonstrated insufficient accuracy for detecting borderline RHD alone.
Findings from this review provide some evidence for the potential of HAND to increase access to echocardiographic screening for RHD in resource-limited and remote settings. A summary of the accuracy estimates produced by meta-analysis using the HSROC method is included in table 3.

Strengths
We have evaluated and summarised the accuracy of HAND for the detection of RHD in endemic areas, making the review relevant to current global agendas. This review also serves to highlight the existing gaps in evidence for which further research could be beneficial. We did not impose any limits in terms of language during the literature search so ; Either a GE Vivid-I or Q or Philips CX-50 ultrasound machine (1 study)

Importance
AND is being used as first line replacement for STAND in disease screening programmes for RHD, as it is comparably inexpensive, quick, user friendly, easy to interpret, and may have similar sensitivity to STAND.

Studies
Cross-sectional (n=4), spiked cohort (n=1) and nested case-control (n=1) studies. More than half (n=4) of all included studies did not explicitly state the study design used and were thus assigned a study design based on other reported characteristics and participant enrolment methods used.

Quality concerns
Poor reporting of study design, participant characteristics and pre-test probability were common concerns. For the majority of studies the risk of bias was unclear in terms of 'patient selection' and 'flow and timing'. Concerns regarding applicability were low in all included studies. as to minimise the chance of missing studies. Data extraction was performed by two independent reviewers thereby reducing the risk of bias.

Limitations
There were a number of shortcomings of this review, which included; Eligibility: We were unable to include studies which used STAND in conjunction with criteria other than the 2012 WHF criteria as the reference standard which limited the number of studies eligible for inclusion.
Quality of included studies: Insufficient reporting of participant characteristics and study methods including study design, participant selection and test timing restricted our ability to adequately assess risk of bias and investigate potential sources of heterogeneity.
Paucity of data: Insufficient and inadequately reported data as well as the presentation of aggregate data limited the scope of our investigations of heterogeneity while the small number of included studies prevented us from performing meta-regression. Overall, the findings from this review may lack power due to the small sample size.

Applicability of findings to the review question
Concerns regarding the applicability of included studies to the review question were considered low according to review-specific QUADAS-2 criteria. Since all but one were conducted in low-or middle-income countries, and all studies with one exception were conducted in field settings, the results of this review are applicable for use in endemic areas for which screening programmes are frequently targeted. However, our limited assessment

CONCLUSION
This review provides a summary of the accuracy of HAND for the detection of RHD. In populations of children and adolescents living in RHD endemic areas, HAND is both sensitive and specific for detecting definite RHD. The device is less accurate in detecting any RHD and demonstrates substandard accuracy for the detection of borderline RHD only. Nonetheless, this test may hold value as a replacement for first line screening due to its high sensitivity for definite RHD detection and adequate accuracy for any RHD detection.

Implications for practice
We have summarised the accuracy of HAND when used as a screening tool, however, the device's potential value in terms of diagnostics has yet to be established. We therefore posit that HAND could be recommended as an acceptable replacement test for first line screening Another key consideration is the applicability of these findings for recommendations to integrate screening into routine clinical practice. A recent publication has reviewed the cost-

Competing interests
None declared.

Patient consent for publication
Not required.

Data availability statement
Results may feed into evidence-based guidelines and should the findings of this review warrant a change in clinical practice, a summary report will be disseminated among leading clinicians and healthcare professionals in the field. PrOsPErO registration number CRD42016051261.

IntrOduCtIOn background
Rheumatic heart disease (RHD) is a permanent heart valve condition resulting from an abnormal immune reaction to group A streptococcal infection typically occurring in childhood. 1 If left untreated, disease progression can result in irreversible heart valve damage, cardiac failure, stroke and premature death. 2 3 Significantly, RHD is a preventable and treatable chronic condition which mostly affects disadvantaged populations across the world. 2 Even though the disease has mostly been eradicated in North America and Europe, barring a few indigent pockets, it remains prolific in areas of the Middle East, the South Pacific, Africa as well as Central and South Asia. 2 The continued persistence of RHD contributes to considerable amounts of preventable morbidity and mortality, particularly among adolescents and young adults. 4 This adds additional strain to what are often already overburdened health systems with endemic strengths and limitations of this study ► We will evaluate the accuracy of handheld echocardiography for detecting subclinical rheumatic heart disease in endemic areas, making the proposed review relevant to current global agendas. ► We will not impose a search filter or any limits in terms of language during the literature search so as to minimise the chance of missing studies. ► Data extraction will be performed by two independent reviewers, thereby reducing the risk of bias. ► Accuracy measures (sensitivity and specificity) may be influenced by underestimated burden of disease estimates (incidence and prevalence) due to the scarcity of good quality epidemiological data. ► Variation in diagnostic criteria for handheld echocardiography may affect data synthesis. Open Access regions, which are typically poorly resourced, bearing the brunt of the disease. 1 5 Furthermore, the accurate detection of subclinical RHD in children and adolescents remains hampered by the cost of diagnostic machinery and scarcity of trained personnel. 6 Alternative RHD screening tests, which are both accurate and affordable, are therefore needed in many endemic areas. The value of such a screening test is that significantly more cases of subclinical RHD might be detected, thereby reducing the time to commencement of secondary prophylaxis and thus, in turn, improving longterm outcomes. 7 Recently, handheld echocardiography has become widely available with a variety of clinical uses. 8 Similarly, diagnostic accuracy has already been demonstrated in a number of studies assessing its value as a screening tool, despite some limitations such as lack of Doppler capabilities. Due to the non-invasive, safe, portable and relatively inexpensive nature of handheld echocardiography, the device has been presented in recent publications as a promising alternative to standard echocardiography in resource-limited and remote settings. 4 8 To test this assertion, the diagnostic accuracy of handheld echocardiography needs to be evaluated using a systematic approach. This review, therefore, proposes to evaluate the accuracy of handheld echocardiography for the detection of RHD in children and adolescents within a screening setting. We seek to generate new quantitative evidence for clinicians and guideline developers to establish evidence-based guidelines for diagnosing RHD with handheld echocardiography. Ultimately, this will improve the management of patients with RHD, as effective treatment of subclinical RHD requires accurate and timely diagnosis.

Primary objective
To determine the diagnostic accuracy of handheld echocardiography compared with standard echocardiography (two-dimensional (2D), continuous-wave and colour-Doppler echocardiography) performed by an experienced imager in conjunction with the 2012 World Heart Federation (WHF) criteria for the detection of any RHD in children and adolescents.

secondary objective
To investigate potential sources of variation in relation to age, gender, geographical location, echocardiographic criteria and echocardiographer expertise in detecting subclinical RHD with handheld echocardiography.

MEthOds And AnAlysIs
The protocol was prepared according to the Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) guidelines. A PRISMA Protocol checklist is completed and included in online supplementary appendix 1. 9 Inclusion and exclusion criteria We will include all primary observational studies which compare the diagnostic accuracy of handheld Open Access echocardiography to the reference standard; standard echocardiography performed by an experienced imager and in conjunction with the 2012 WHF criteria. Eligible studies can be of a cross-sectional, cohort or diagnostic case-control design, provided both cases and controls have been sampled from the same population. Studies which report on, or contain the data necessary to extract information on the proportions of true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN) will be included. Studies which enrolled only those with a confirmed RHD diagnosis will be excluded on account of the potential for overestimation of sensitivity. Descriptive studies such as case studies/series will also be excluded from this review. Studies in which we are unable to generate two-by-two tables, as well as different studies which report on duplicate data will not be considered for inclusion in this review.
We will consider all studies in which samples of study participants are either, a randomly, or consecutively selected series of individuals from populations in which RHD is prevalent worldwide for inclusion. For the purposes of this review, children and adolescents will be defined as being between the ages of 5 and 17 years (age range: ≥5 years to <18 years). More specifically, participants will be considered children if they are between 5 and 9 years of age and adolescents if they are between 10 and 17 years of age.
We will include studies evaluating the accuracy of handheld echocardiography for RHD detection. There will be no restrictions regarding the type of handheld device used or the aptitude of person performing the cardiac ultrasound; however, these data will be recorded and analysed accordingly. Studies will be deemed eligible for inclusion if the reference standard constituted the Are there concerns that HAND, its conduct, or interpretation differ from the review question?
Are there concerns that the target condition as defined by STAND does not match the review question? *Criteria for grading risk of bias: If all indicator questions for a single domain are answered ' yes', then the risk of bias will be judged as being ' low'; if any indicator question is answered ' no', then the potential for bias will be flagged and the review authors will be required to judge the risk of bias with the assistance of the senior author (MEE); if all or most indicator questions were answered ' no', then the risk of bias will be judged as being ' high' and indicator questions are can only be answered as 'unclear' when the data are insufficient to allow for the formulation of a judgement. Adapted from Whiting et al. 11 IT, index test; RS, reference standard. Open Access interpretation of echocardiographic findings using the 2012 WHF criteria when echocardiographic assessment by 2D, continuous-wave and colour-Doppler echocardiography was performed by a cardiologist or cardiac sonographer. We will exclude all studies published before 2012 to omit any study which does not use standard echocardiography in conjunction with the 2012 WHF criteria as the reference standard. We will consider all studies which evaluate any RHD (definite and borderline) as the condition of interest for inclusion in this review. All case definitions will be consistent with the 2012 WHF criteria. 10 search strategy A comprehensive electronic literature search of PubMed, Scopus, Web of Science and EBSCOhost will be conducted to identify relevant literature. No restrictions in terms of language will be applied during the search. Searches will however be limited to only include articles published from 2012 up until the present. All sources will be systematically searched using a combination, where relevant, of both free text words and Medical Subject Heading terms. Search strategies will be tailored to meet the requirements of each electronic database as in table 1 below. Search terms will include synonyms for 'rheumatic heart disease', 'echocardiography' and 'handheld'. A list of all articles identified through the literature search will be compiled and references managed using Mendeley software. In addition, a manual search of all eligible articles' reference lists, articles citing eligible articles as well as relevant review articles will be carried out to identify any additional literature not identified by the comprehensive electronic literature search. Abstracts from any relevant conference proceedings will also be searched for among appropriate websites and followed up on if eligibility requirements are sufficiently met. Finally, experts in the field will be contacted for additional information where necessary.
selection of studies for inclusion The titles and/or abstracts of all articles identified by the literature search will be screened independently by two reviewers. Based on the predefined inclusion and exclusion criteria any clearly ineligible studies will be excluded. Following this, the full-text versions of all potentially eligible studies will then be reviewed by two independent reviewers to assess their eligibility. Any discrepancies regarding eligibility will be resolved through discussion and consensus with a third reviewer. data extraction and management Using a predefined data extraction form, two reviewers will independently extract the following information from all studies meeting the criteria for inclusion: ► Study identifiers: author(s), year of publication, journal; ► Study characteristics: study design, study country/ setting/context, study population/participants, sample size, participant recruitment procedures, participant demographics and RHD prevalence (pretest probability); ► Reference standard and index test details: -General: test positive or negative; -Specific: individual findings on cardiac ultrasound; -Expertise of person(s) performing and/or interpreting tests: expert versus non-expert; -Diagnostic criteria: test threshold(s); -Number of missing or unavailable test results. ► Diagnostic test outcome measures: sensitivity, specificity, positive and negative predictive values, number of TP, FP, TN and FN. If necessary, any disagreements will be resolved through discussion with a third reviewer until a consensus is reached. Any data missing from the reports of included studies will be requested from study authors. In cases where studies have used different diagnostic criteria for handheld echocardiography, attempts will be made to standardise them to mirror the 2012 WHF criteria as closely as possible. The information garnered through the data extraction process will be used to determine each study's quality as well as for synthesising evidence.

risk of bias and quality assessment
The Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool (see table 2) will be used to assess the risk of bias and concerns regarding applicability of all included studies. 11 The tool encompasses four domains which have been tailored to meet the specific requirements of the review. Two reviewers will independently assess the risk of bias in all included studies according to the revised QUADAS-2 criteria. Any discrepancies will be resolved through discussion until consensus is reached and with the assistance of a third reviewer if necessary. Both text and graphics will be used to demonstrate the results.
subgroup and sensitivity analyses Subgroup analysis may be performed, considering specific characteristics of the studies, such as echocardiography protocol, training background of the examiner, age and geographical location.
We will conduct a sensitivity analysis to investigate the effect of variations in criteria on the overall accuracy of diagnosis. In addition, we will explore the effect of excluding studies with a high risk of bias on the accuracy of summary estimates, sensitivity and specificity. We will not investigate publication bias.
statistical analysis and data synthesis We will first analyse data descriptively by plotting the sensitivity and specificity (including 95% CIs) of all included studies in both forest plots and receiver operating characteristic (ROC) space. These plots will be generated using the Review Manager software package. 12 If there are sufficient data, we will conduct a meta-analysis to produce summary results of sensitivity and specificity. Because we anticipate that studies will have different Open Access positivity thresholds due to the use of different sets of diagnostic criteria, we will pool the results using the hierarchical summary receiver operating characteristic (HSROC) method. Meta-analysis will be performed using SAS V.9.4/STATA V.14.2 software. 13 We will also explore, through metaregression, the relationship of test accuracy with categorical or continuous covariates such as test threshold. 14 Investigations of heterogeneity will initially begin by visually examining the forest and ROC plots for heterogeneity in sensitivity and specificity. We will then analyse the possible sources of heterogeneity as covariates in the statistical models. Potential sources of heterogeneity to be investigated as categorical variables include; age (children vs adolescents), sex (male vs female), geographical location (high vs low/middle-income countries), diagnostic criteria (single vs multiple views and different thresholds) and echocardiographer expertise (expert vs non-expert).

Presenting and reporting of results
The study selection process will be summarised in the form of a flow diagram detailing the reasoning behind all exclusions. Results will be reported in accordance with the PRISMA guidelines. 15 dissemination The planned review will provide a summary of the diagnostic accuracy of handheld echocardiography. Results may feed into evidence-based guidelines and will therefore be disseminated to members of the WHF criteria working group. Should the findings of this review warrant a change in clinical practice, a summary report will be circulated among leading clinicians and healthcare professionals in the field.
Contributors LJZ and MEE conceived the study idea and all the authors contributed to the conception and design of the protocol. LHT developed and wrote the first draft of the protocol. All authors have reviewed and accepted the final version of the protocol and have given their permission for publication. All authors contributed to editing subsequent versions of the draft. LHT and LHA will perform the literature searches as well as extract data and LHT and EAO will conduct the data analysis. All authors are in agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Funding LJZ and LHT receive funding from the Medtronic Foundation through support to RHD Action. LJZ and LT receive funding from the National Research Foundation of South Africa (NRFSA) Competing interests None declared.
Patient consent Not required.
Provenance and peer review Not commissioned; externally peer reviewed.
Open Access This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial.

Unclear (any other combination of answers)
Low concern (participants from endemic areas) High concern (tourists, non endemic areas) Unclear concern 4h. Is there concern that the included patients do not match the review question? (Please indicate the level of concern) Comment:

1a. Is this a primary study that examines the accuracy of tests for rheumatic heart disease (RHD)?
Tick the appropriate box. If answered 'no', EXCLUDE the paper. This item is meant to enable the quick distinction between potentially relevant articles and articles that clearly have a different scope. If the article does not clearly evaluate the accuracy of tests as a primary or secondary objective you can tick 'no' and stop the data extraction exercise. If you are unsure please tick 'unclear', comment on why you are unsure and then proceed with the data extraction exercise.

1b. If the article does not evaluate accuracy tests for RHD, please describe what kind of article it is?
Please describe what type of publication it is (e.g. narrative overview article, a purely prevalence study etc.)

Please indicate the target condition being evaluated in the study.
[All studies should evaluate both borderline and definite in accordance with the 2012 WHF criteria]

3d. Number of participants
Please write in the box provided the number of participants and/or the number of samples included in the study. Some studies may describe a subset of the larger cohort who received both the index test and reference standard and the entire cohort separately -in this instance please report only the number of participants who received both tests (i.e. the subset of participants who received both HAND and STAND). You may also state that the study consisted of a large cohort with a subset within that cohort in the space provided for 'number of samples'.

3e. Proportion or number of participants by gender
Please write in the box provided the proportion/percentage or number of participants per sex of those who received both tests (i.e. those that received both HAND and STAND). For example; 50% of participants were female or 590 females were in the study. This information can usually be found in the results section but if the sex of participants was not specified please tick the box 'Sex not specified'. the proportion of which received the reference standard etc. In addition, the number of true and false positives and true and false negatives are also displayed. If necessary please draw a flow diagram for the primary study in the space provided on page 6 of the data extraction form.
An example of a flow diagram is as shown below.

) PARTICIPANT SELECTION
These questions have been designed to help assess the risks of bias in the study.

4a. Please cite here the selection criteria
Please list the inclusion and exclusion criteria which were applied when recruiting study participants in the spaces provided. Inclusion criteria might also include the characteristics of included participants. For example "all children attending primary schools in the area were eligible for inclusion". If no criteria were reported, please tick 'Not reported' and if the criteria were unclear please tick "Unclear" and explain your answer.

4b. Stage of disease
Participants recruited into the study may be with or without symptoms. Please indicate the disease stage of participants at the time of enrolment by ticking the appropriate box. If the study does not clearly report the clinical status of participants, please tick the box marked 'unclear' and comment in the space provided.

4c. What was the study design?
Please indicate the design of the study by ticking one of the choices provided.
We will not include case-control studies which include healthy controls, alternative diagnosis controls or controls from non-endemic areas. Research has shown that these types of studies have a tendency to overestimate accuracy measures. Healthy controls are those who have been confirmed as being disease-free, alternative diagnosis controls are controls that have similar symptoms to those of the disease under study but do not have the condition of interest and controls from nonendemic areas are those from areas in which the condition of interest is not highly prevalent. If the study design is not reported or is unclear please tick the appropriate box and if necessary add a comment.

4d. Was a case-control design avoided?
 Yes: If the authors report using any study design apart from a case-control one.
 No: If the authors report using a case-control study design.
 Unclear: If the authors do not explicitly report the study design used.

4e. Was a consecutive or random sample of participants enrolled?
 Yes: If the authors report random sampling or consecutive enrolment of participants.
 No: If participants were selected, for example based on previous (reference or index) test results.  Unclear: If not reported or insufficient information given to make a decision.

4g. Could the selection of participants have introduced bias?
 Low: If both questions 4d and 4e were answered 'yes' or if at most one was answered 'unclear'.
 High: If one or more of questions 4d and 4e were answered 'no'.
 Unclear: Any other combination of answers, for example if both questions 4d and 4e were answered 'unclear'.
4h. Is there a concern that the included participants do not match the review question?
 Low concern: If study participants reside within RHD endemic areas as they will include those at risk of infection, those who are infected but asymptomatic as well as those who are infected and have symptoms.
 High concern: If study participants don't reside in endemic areas. For example; tourists, healthy controls or controls with alternative diagnoses.
 Unclear: If there is insufficient information to make a decision.

5a. Was there an appropriate interval between index test and reference standard?
 Yes: If the participants were examined using both the reference standard and index test at the same time or within a two week time period.
 No: If the time period between index and reference standard was more than two weeks.
 Unclear: If there is no or insufficient information on time period.

5b. Did all participants receive a reference standard?
(Focus on participants included in the 2*2 table)  Yes: If the whole study sample or a random selection of the sample or a selection of the sample with consecutive series receive verification using the reference standard.  Unclear: If there is no or insufficient information on the reference standard used.

5d. Were all participants included in the analysis?
 Yes: If all the participants that were included in the study, were also included in the analysis.
 No: If some participants / results are missing in the analysis.
 Unclear: If there is no or insufficient information to make a decision.

5e. Could the conduct or interpretation of participant flow & timing have introduced bias?
 Low: If all questions were answered 'yes' or at least three were answered 'yes' and the other 'unclear'.
 High: If two or more of questions 5a -5d were answered 'no'.  No: If the person who conducted echocardiographic screening using the handheld device does not have any prior experience or expertise in interpreting echocardiographic images and was not given any training with regards to interpreting results using a pre-specified diagnostic protocol.
 Unclear: If there is insufficient information to make a judgment.
 Not reported: If there is no information reported on this item.

6d. What was the initial level of expertise of the HAND interpreter?
Please state the HAND interpreter's initial level of clinical expertise. For example were they a clinician, nurse or community health worker etc. If not reported please tick the appropriate box and if necessary comment in the space provided.

6e. Was quality control done?
To ensure reliability or good quality of results a sub-set of the sample population may be cross- 6g. If a threshold was used, was it pre-specified?
 Yes: If the authors report the use of a primary, pre-specified, cut-off value or threshold. A prespecified threshold also includes statements such as "the test was scored according to manufacturer's instructions".
 No: If multiple cut-off values were tested and the best one chosen afterwards.
 Unclear: If only one cut-off value was used, but this was not explicitly reported in the methods section.
 Not reported: If no information was reported on this item.

6h. Could the conduct or interpretation of the index test have introduced bias?
 Low: If questions 6b -6g were all answered 'yes'.
 High: If one or more of questions 6b -6g were answered 'no'.
 Unclear: Any other combination of answers. For example if one or more questions were answered 'unclear'.  Please specify the threshold or cut-off value in terms of mitral regurgitation (MR) &/or aortic regurgitation (AR) in the second column (Threshold). If multiple different thresholds were used, please report the results for each threshold separately (i.e. each one on a different row).

General comments
At the end of the data extraction form a comment box has been provided for general comments about the paper you have evaluated. Please use this box when and if necessary. Supposing the means are given by M1, M2 and M3 and the SD's are S1, S2 and S3.
Let N1, N2 AND N3 represent the respective numbers of observations for each group.   (6) handling of different reference standards.

-9
Meta-analysis D2 Report statistical methods used for meta-analyses if performed. 8 -9 Additional analyses 16 Describe the methods of the additional analyses (eg, sensitivity or subgroup analyses, meta-regression), if done, indicating which were prespecified. 8 -9

Study selection 17
Provide the numbers of studies screened, assessed for eligibility, included in the review, and included in the meta-analysis if applicable, with reasons for exclusions at each stage, ideally with a flow diagram. 9 -10

Study characteristics 18
For each included study, provide citations and present key characteristics including (1)

Strengths and limitations of this study:
 Language restrictions were not imposed during the literature search to minimise the chance of missing studies.
 Data extraction was performed by two independent reviewers thereby reducing the risk of bias.
 Insufficient reporting limited our ability to adequately assess risk of bias and investigate potential sources of heterogeneity.
 The small number of included studies prevented us from performing metaregression.

INTRODUCTION
Rheumatic heart disease (RHD) is an acquired permanent heart valve condition which results from an atypical immune reaction to group A streptococcal (GAS) infection typically occurring in childhood. [1,2] Disease progression leading to chronic RHD can result in irreversible heart valve damage, cardiac failure, and premature death. [3,4] RHD is, however, a preventable and treatable chronic condition which most often effects disadvantaged populations. [3,5] Significantly, RHD can remain asymptomatic for many years, particularly during the initial stages, thereby hindering the timely implementation of penicillin prophylaxis. [6] Echocardiographic screening to identify those with subclinical disease has been advocated as a means to support secondary prevention and potentially slow disease progression to overt clinical RHD. [7,8] Yet the feasibility of wide scale echocardiographic screening remains hindered by high costs and the scarcity of trained personnel. [9] Alternative RHD screening tests which are both accurate and affordable are therefore needed in many endemic areas.
Handheld echocardiography (HAND) is a non-invasive, highly portable and comparatively less expensive device which has been presented in recent publications to be a promising alternative to standard echocardiography (STAND), despite some limitations such as a lack of spectral Doppler capabilities. [10,11] For HAND to be considered a suitable replacement for STAND, the device's accuracy needs to be similar to that of STAND.
We conducted a systematic review and meta-analysis of studies assessing the diagnostic accuracy of HAND for the detection of RHD in children and adolescents. The findings of this

Patient and public involvement
No patient involved.

Data sources and study eligibility
Studies were considered eligible for inclusion if the following criteria were met: 1) the accuracy of HAND compared to STAND when performed by an experienced cardiologist and in conjunction with the 2012 World Heart Federation (WHF) criteria was evaluated, and 2) the sample consisted of populations of children and adolescents living in endemic areas.
Only primary observational studies of either a cross-sectional, cohort or diagnostic casecontrol design were considered. Descriptive studies such case studies and case series were excluded as were studies reporting on the same data. Studies using non-handheld devices as the index test or criteria other than the 2012 WHF criteria in combination with STAND as the reference test were also excluded. included studies as well as relevant review articles was also conducted.
The titles and/or abstracts of all identified articles were screened independently by two reviewers. During this process, and on the basis of predefined eligibility criteria, all clearly ineligible studies were excluded. Discrepancies regarding eligibility were resolved through discussion and consensus. Some authors were contacted for additional information on published data.

Data extraction and management
Using a predefined data extraction form, two reviewers independently extracted Accuracy measures: sensitivity and specificity were calculated using the numbers of TP, FP, TN and FN in accordance with standard convention. Data extraction conflicts were resolved through discussion and with the assistance of a third reviewer where necessary. Information garnered through the data extraction process was used to determine each study's quality as well as for synthesising evidence.

Assessment of methodological quality
A review-specific Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool was used to assess the risk of bias and concerns regarding applicability of all included studies. [14] The tool, encompassing four domains, was tailored to meet the specific requirements of this review. Two reviewers independently assessed the risk of bias in all included studies according to review-specific QUADAS-2 criteria. Discrepancies were resolved through discussion until consensus was reached and the assistance of a third reviewer was enlisted when necessary.

Statistical analysis and data synthesis
A meta-analysis using the Hierarchical Summary Receiver Operating Characteristic (HSROC) model was conducted to produce summary results of sensitivity and specificity. Heterogeneity was examined for the main meta-analysis only. We were only able to investigate the relationship between test accuracy and echocardiographer expertise through subgroup analysis. A sensitivity analysis was performed instead of subgroup analysis for the categorical covariates; HAND protocol and geographic location due to the skewed distribution of studies within each subgroup. We were unable to perform metaregression for the covariates; age and sex due to insufficient and inadequately reported data. [17] We were also unable to conduct a sensitivity analysis on risk of bias since no studies were found to have a high risk of bias. All plots were generated using the Review   The same search was re-run on March 3 rd , 2020 to check for any additional eligible studies.

Results
Only one potentially eligible study [21,22] was found but has been excluded on the basis of being an abstract only publication with no full-text available for review.

Included studies
A summary of notable characteristics of all included studies [23-28] is shown in table 1. One study did not avoid a case-control design, however, cases and controls were sampled from the same population. Research has shown that case-control studies which use alternative Significantly, all but two studies were conducted in Africa. Screening was performed in RHD endemic areas among children and adolescents with most studies being school based.
Combined, all six studies included a total of 4208 participants of which 54% were female.
All included studies used the same make of handheld device; the Vscan machine (General

For any RHD
A total of six evaluations of HAND for any RHD were performed with data from six studies and a total of 4208 participants. Pooled prevalence of any RHD from five included studies was 12% (95% confidence interval (CI): 6%-19%). The forest plot revealed little variation in estimates of sensitivity and specificity and the HSROC plot (see supplementary file 2 for all plots) revealed moderate accuracy of the test. Meta-analytical sensitivity and specificity (95% CI) of data at mixed thresholds were 82% (77%-87%) and 90% (85%-95%) respectively.

For definite RHD
A total of five evaluations of HAND for definite RHD were performed with data from five studies and a total of 3588 participants. Pooled prevalence of definite RHD from four included studies was 6% (95% CI: 2%-12%). The forest plot revealed some variation in estimates of specificity while estimates of sensitivity were largely homogenous with the exception of a single outlier. The HSROC plot indicated good accuracy of the test. Metaanalytical sensitivity and specificity (95% CI) of data at mixed thresholds were 91% (81%-100%) and 92% (86%-98%) respectively.

For borderline RHD
A total of five evaluations of HAND for borderline RHD were performed with data from five studies and a total of 3685 participants. Pooled prevalence of borderline RHD from four included studies was 20% (95% CI: 6%-39%). The forest plot revealed some variation in estimates of specificity while estimates of sensitivity were largely homogenous with the exception of a single outlier. The HSROC plot indicated poor accuracy of the test. Meta- analytical sensitivity and specificity (95% CI) of data at mixed thresholds were 62% (32%-92%) and 82% (65%-100%) respectively.

Investigations of heterogeneity
Heterogeneity or variation between studies was investigated both visually as well as through subgroup analysis for the main meta-analysis only. We were only able to perform this analysis for the any RHD category as the data were too few to enable model convergence for the definite and borderline RHD only categories.

Co-variates in the models
We were only able to use one of the five pre-specified covariates to investigate heterogeneity due to insufficient and inadequately reported data. HAND echocardiographer expertise (expert vs non-expert) was investigated as a possible source of heterogeneity through subgroup analysis. Half of all included studies evaluated the accuracy of HAND when performed and interpreted by trained non-experts while the other half assessed its accuracy in the hands of experts.

Subgroup and sensitivity analyses
A subgroup analysis was performed to investigate variations in echocardiographer expertise as a potential source of heterogeneity. Since no studies were found to have a high risk of bias we did not explore the effect of excluding such studies on the accuracy of summary estimates. Sensitivity analyses were, however, conducted to investigate the effect of removing a single study on summary estimates of sensitivity and specificity for the covariates; geographic location and HAND protocol.

Summary of main findings
We evaluated the accuracy of HAND for three distinct disease categories and found that, overall, the test was both sensitive and specific for detecting definite RHD only and moderately accurate for detecting any RHD but demonstrated insufficient accuracy for detecting borderline RHD alone.
Findings from this review provide some evidence for the potential of HAND to increase access to echocardiographic screening for RHD in resource-limited and remote settings. A summary of the accuracy estimates produced by meta-analysis using the HSROC method is included in table 3. Table 3. Summary of findings.

Patients/Population
People residing in areas endemic for RHD (6 out of 6 studies) Prior testing with echo Yes (2 studies), No (4 studies) Settings 5 out of 6 screening studies were field setting (communities and schools) based while 1 study was half hospital registry follow-up, half school based.
4 of the 6 studies were conducted in Africa with 3 of those from Uganda.

Index test(s)
General Electric (GE) Vscan handheld machine (6 out of 6 studies)

Reference standard
Standard echocardiography (2D, continuous-wave, and colour-Doppler echocardiography) performed by an experienced imager and in conjunction with the 2012 WHF criteria (6 out of 6 studies).

Strengths
We have evaluated and summarised the accuracy of HAND for the detection of RHD in endemic areas, making the review relevant to current global agendas. This review also serves to highlight the existing gaps in evidence for which further research could be beneficial. We did not impose any limits in terms of language during the literature search so ; Either a GE Vivid-I or Q or Philips CX-50 ultrasound machine (1 study)

Importance
HAND is being used as first line replacement for STAND in disease screening programmes for RHD, as it is comparably inexpensive, quick, user friendly, easy to interpret, and may have similar sensitivity to STAND.

Studies
Cross-sectional (n=4), spiked cohort (n=1) and nested case-control (n=1) studies. More than half (n=4) of all included studies did not explicitly state the study design used and were thus assigned a study design based on other reported characteristics and participant enrolment methods used.

Quality concerns
Poor reporting of study design, participant characteristics and pre-test probability were common concerns. For the majority of studies the risk of bias was unclear in terms of 'patient selection' and 'flow and timing'. Concerns regarding applicability were low in all included studies. as to minimise the chance of missing studies. Data extraction was performed by two independent reviewers thereby reducing the risk of bias.

Limitations
There were a number of shortcomings of this review, which included; Eligibility: We were unable to include studies which used STAND in conjunction with criteria other than the 2012 WHF criteria as the reference standard which limited the number of studies eligible for inclusion.
Quality of included studies: Insufficient reporting of participant characteristics and study methods including study design, participant selection and test timing restricted our ability to adequately assess risk of bias and investigate potential sources of heterogeneity.
Paucity of data: Insufficient and inadequately reported data as well as the presentation of aggregate data limited the scope of our investigations of heterogeneity while the small number of included studies prevented us from performing meta-regression. Overall, the findings from this review may lack power due to the small sample size.

Applicability of findings to the review question
Concerns regarding the applicability of included studies to the review question were considered low according to review-specific QUADAS-2 criteria. Since all but one were conducted in low-or middle-income countries, and all studies with one exception were conducted in field settings, the results of this review are applicable for use in endemic areas for which screening programmes are frequently targeted. However, our limited assessment

CONCLUSION
This review provides a summary of the accuracy of HAND for the detection of RHD. In populations of children and adolescents living in RHD endemic areas, HAND is both sensitive and specific for detecting definite RHD. The device is less accurate in detecting any RHD and demonstrates substandard accuracy for the detection of borderline RHD only. Nonetheless, this test may hold value as a replacement for first line screening due to its high sensitivity for definite RHD detection and adequate accuracy for any RHD detection.

Implications for practice
We have summarised the accuracy of HAND when used as a screening tool, however, the device's potential value in terms of diagnostics has yet to be established. We therefore posit that HAND could be recommended as an acceptable replacement test for first line screening Another key consideration is the applicability of these findings for recommendations to integrate screening into routine clinical practice. A recent publication has reviewed the costeffectiveness of screening in high-risk populations [39] and determined that screening all indigenous Australian 5 to 12 year-olds in half of their communities in alternate years was found to be cost-effective, if RHD can be detected at least 2 years earlier. However, this result was sensitive to a number of assumptions, including local costs and context. Other cost-effectiveness models have also suggested modestly improved outcomes at lower cost. [40] Neither of these studies included the significant cost-reduction using HAND instead of STAND, hence, we highly recommend adding a cost-effectiveness analysis into proposed new screening studies.
Finally, our findings demonstrate comparable results by non-experts, this has also been demonstrated in several other reports [31,41], but again there are no detailed costeffectiveness analyses using non-experts and HAND.

Implications for research
The findings of this review highlight the need for a new set of evidence-based guidelines tailored to the capabilities of HAND in order to maximise the device's diagnostic potential.
Further studies assessing the diagnostic accuracy of HAND when using a standardised protocol are needed as is further research into the feasibility, cost-effectiveness and consequences of implementing wide scale screening programs. Furthermore, the development of standardised training programs for non-experts is recommended as

Competing interests
None declared.

Patient consent for publication
Not required.

Provenance and peer review
Not commissioned; externally peer reviewed.
All other relevant data are included in the article or uploaded as supplementary information.

Objectives 4
Provide an explicit statement of questions being addressed in terms of participants, index test, and target conditions. 5

Protocol and registration 5
Indicate where the review protocol can be accessed (e.g. web address) and provide trial registration number if available. 6 Eligibility criteria 6 Specify study characteristics (participants, setting, index test, reference standards, target conditions, and study design) and report characteristics (e.g., years considered, language, publication status) used as criteria for eligibility and providing rationale. 6 -7 Information sources 7 Describe all information sources (e.g. databases with dates of coverage, contact with study authors to identify additional studies) in the search and the date last searched. 6, 9 & 10 Search 8 Present full search strategies for all electronic databases and other sources searched, including any limits used so that they can be repeated.

See supplementary files
Study selection 9 State the process for selecting studies (e.g., screening, eligibility, whether included in systematic review, and, if applicable, included in the meta-analysis). 6 -8 Data collection process 10 Describe methods of data extraction from reports (e.g., piloted forms, independently, in duplicate) and any processes for obtaining and confirming data from investigators. 7 Definitions for data extraction 11 Provide definitions used in data extraction and classifications of target conditions, index tests, reference standards, and other characteristics (e.g. study design, clinical setting).

See supplementary files for editors
Risk of bias and applicability 12 Describe methods used for assessing risk of bias in individual studies and concerns regarding the applicability to the review question. 8 Diagnostic accuracy measures 13 State the principal diagnostic accuracy measures reported (e.g. sensitivity, specificity) and state the unit of assessment (e.g. per patient vs per lesion). 7  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  Describe the methods of handling the data, combining the results of the studies and describing the variability between studies. This could include, but is not limited to (1) handling of multiple definitions of the target condition, (2) handling of multiple thresholds of test positivity, (3) handling multiple index test readers, (4) handling of indeterminate test results, (5) grouping and comparing tests, and (6) handling of different reference standards.

-9
Meta-analysis D2 Report statistical methods used for meta-analyses if performed. 8 -9 Additional analyses 16 Describe the methods of the additional analyses (e.g. sensitivity or subgroup analyses, meta-regression), if done, indicating which were prespecified. 8 -9

Study selection 17
Provide the numbers of studies screened, assessed for eligibility, included in the review, and included in the meta-analysis if applicable, with reasons for exclusions at each stage, ideally with a flow diagram. 10 -11

-12
Risk of bias and applicability 19 Present evaluation of risk of bias and concerns regarding applicability for each study. 13

Results of individual studies 20
For each analysis in each study (e.g. unique combination of index test, reference standard, and positivity threshold), report 2 × 2 data (TP, FP, FN, TN) with estimates of diagnostic accuracy and confidence intervals, ideally with a forest plot or a receiver operating characteristic curve. See.

Supplementary files
Synthesis of results 21 Describe test accuracy, including variability; if meta-analysis was done, include results and confidence intervals. 14 -15 Additional analysis 23 Give results of additional analyses, if done (e.g. sensitivity or subgroup analyses, meta-regression, analysis of index test, failure rates, proportion of inconclusive results, and adverse events).

Summary 24
Summarize the main findings including the strength of evidence.

-19
Limitations 25 Discuss limitations from included studies (e.g. risk of bias and concerns regarding applicability) and from the review process (e.g., incomplete retrieval of identified research).

Objectives:
The research question including components such as participants, interventions, comparators, and outcomes. 3

Eligibility criteria:
Study and report characteristics used as criteria for inclusion.

Strengths and limitations of this study:
 Language restrictions were not imposed during the literature search to minimise the chance of missing studies.
 Data extraction was performed by two independent reviewers thereby reducing the risk of bias.
 Insufficient reporting limited our ability to adequately assess risk of bias and investigate potential sources of heterogeneity.
 The small number of included studies prevented us from performing metaregression.

INTRODUCTION
Rheumatic heart disease (RHD) is an acquired permanent heart valve condition which results from an atypical immune reaction to group A streptococcal (GAS) infection typically occurring in childhood. [1,2] Disease progression leading to chronic RHD can result in irreversible heart valve damage, cardiac failure, and premature death. [3,4] RHD is, however, a preventable and treatable chronic condition which most often effects disadvantaged populations. [3,5] Significantly, RHD can remain asymptomatic for many years, particularly during the initial stages, thereby hindering the timely implementation of penicillin prophylaxis. [6] Echocardiographic screening to identify those with subclinical disease has been advocated as a means to support secondary prevention and potentially slow disease progression to overt clinical RHD. [7,8] Yet the feasibility of wide scale echocardiographic screening remains hindered by high costs and the scarcity of trained personnel. [9] Alternative RHD screening tests which are both accurate and affordable are therefore needed in many endemic areas.
Handheld echocardiography (HAND) is a non-invasive, highly portable and comparatively less expensive device which has been presented in recent publications to be a promising alternative to standard echocardiography (STAND), despite some limitations such as a lack of spectral Doppler capabilities. [10,11] For HAND to be considered a suitable replacement for STAND, the device's accuracy needs to be similar to that of STAND.
We conducted a systematic review and meta-analysis of studies assessing the diagnostic accuracy of HAND for the detection of RHD in children and adolescents. The findings of this

Patient and public involvement
No patient involved.

Data sources and study eligibility
Studies were considered eligible for inclusion if the following criteria were met: 1) the accuracy of HAND compared to STAND when performed by an experienced cardiologist and in conjunction with the 2012 World Heart Federation (WHF) criteria was evaluated, and 2) the sample consisted of populations of children and adolescents living in endemic areas.
Only primary observational studies of either a cross-sectional, cohort or diagnostic casecontrol design were considered. Descriptive studies such case studies and case series were excluded as were studies reporting on the same data. Studies using non-handheld devices as the index test or criteria other than the 2012 WHF criteria in combination with STAND as the reference test were also excluded.

Assessment of methodological quality
A review-specific Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool was used to assess the risk of bias and concerns regarding applicability of all included studies. [14] The tool, encompassing four domains, was tailored to meet the specific requirements of this review. Two reviewers independently assessed the risk of bias in all included studies according to review-specific QUADAS-2 criteria. Discrepancies were resolved through discussion until consensus was reached and the assistance of a third reviewer was enlisted when necessary.

Statistical analysis and data synthesis
A meta-analysis using the Hierarchical Summary Receiver Operating Characteristic (HSROC) model was conducted to produce summary results of sensitivity and specificity. Heterogeneity was examined for the main meta-analysis only. We were only able to investigate the relationship between test accuracy and echocardiographer expertise through subgroup analysis. A sensitivity analysis was performed instead of subgroup analysis for the categorical covariates; HAND protocol and geographic location due to the skewed distribution of studies within each subgroup. We were unable to perform metaregression for the covariates; age and sex due to insufficient and inadequately reported data. [17] We were also unable to conduct a sensitivity analysis on risk of bias since no studies were found to have a high risk of bias. All plots were generated using the Review   The same search was re-run on March 3 rd , 2020 to check for any additional eligible studies.

Results
Only one potentially eligible study [21,22] was found but has been excluded on the basis of being an abstract only publication with no full-text available for review. Significantly, all but two studies were conducted in Africa. Screening was performed in RHD endemic areas among children and adolescents with most studies being school based.

Included studies
Combined, all six studies included a total of 4208 participants of which 54% were female.
All included studies used the same make of handheld device; the Vscan machine (General  [10,37], not specifying the test threshold a-priori [9], and not being a study of diagnostic accuracy [36].

For any RHD
A total of six evaluations of HAND for any RHD were performed with data from six studies and a total of 4208 participants. Pooled prevalence of any RHD from five included studies was 12% (95% confidence interval (CI): 6%-19%). The forest plot revealed little variation in estimates of sensitivity and specificity and the HSROC plot (see supplementary file 2 for all plots) revealed moderate accuracy of the test. Meta-analytical sensitivity and specificity (95% CI) of data at mixed thresholds were 82% (77%-87%) and 90% (85%-95%) respectively.

For definite RHD
A total of five evaluations of HAND for definite RHD were performed with data from five studies and a total of 3588 participants. Pooled prevalence of definite RHD from four included studies was 6% (95% CI: 2%-12%). The forest plot revealed some variation in estimates of specificity while estimates of sensitivity were largely homogenous with the exception of a single outlier. The HSROC plot indicated good accuracy of the test. Metaanalytical sensitivity and specificity (95% CI) of data at mixed thresholds were 91% (81%-100%) and 92% (86%-98%) respectively.

Investigations of heterogeneity
Heterogeneity or variation between studies was investigated both visually as well as through subgroup analysis for the main meta-analysis only. We were only able to perform this analysis for the any RHD category as the data were too few to enable model convergence for the definite and borderline RHD only categories.

Co-variates in the models
We were only able to use one of the five pre-specified covariates to investigate heterogeneity due to insufficient and inadequately reported data. HAND echocardiographer expertise (expert vs non-expert) was investigated as a possible source of heterogeneity through subgroup analysis. Half of all included studies evaluated the accuracy of HAND when performed and interpreted by trained non-experts while the other half assessed its accuracy in the hands of experts.

Summary of main findings
We evaluated the accuracy of HAND for three distinct disease categories and found that, overall, the test was both sensitive and specific for detecting definite RHD only and moderately accurate for detecting any RHD but demonstrated insufficient accuracy for detecting borderline RHD alone.
What is the diagnostic accuracy of handheld echocardiography in detecting any RHD (definite or borderline)?

Patients/Population
People residing in areas endemic for RHD (6 out of 6 studies) Prior testing with echo Yes (2 studies), No (4 studies) Settings 5 out of 6 screening studies were field setting (communities and schools) based while 1 study was half hospital registry follow-up, half school based.
4 of the 6 studies were conducted in Africa with 3 of those from Uganda.

Index test(s)
General Electric (GE) Vscan handheld machine (6 out of 6 studies)

Reference standard
Standard echocardiography (2D, continuous-wave, and colour-Doppler echocardiography) performed by an experienced imager and in conjunction with the 2012 WHF criteria (6 out of 6 studies).

Importance
HAND is being used as first line replacement for STAND in disease screening programmes for RHD, as it is comparably inexpensive, quick, user friendly, easy to interpret, and may have similar sensitivity to STAND.

Studies
Cross-sectional (n=4), spiked cohort (n=1) and nested case-control (n=1) studies. More than half (n=4) of all included studies did not explicitly state the study design used and were thus assigned a study design based on other reported characteristics and participant enrolment methods used.

Strengths
We have evaluated and summarised the accuracy of HAND for the detection of RHD in endemic areas, making the review relevant to current global agendas. This review also serves to highlight the existing gaps in evidence for which further research could be beneficial. We did not impose any limits in terms of language during the literature search so as to minimise the chance of missing studies. Data extraction was performed by two independent reviewers thereby reducing the risk of bias.

Limitations
There were a number of shortcomings of this review, which included; Eligibility: We were unable to include studies which used STAND in conjunction with criteria other than the 2012 WHF criteria as the reference standard which limited the number of studies eligible for inclusion.
Quality of included studies: Insufficient reporting of participant characteristics and study methods including study design, participant selection and test timing restricted our ability to adequately assess risk of bias and investigate potential sources of heterogeneity.

Applicability of findings to the review question
Concerns regarding the applicability of included studies to the review question were considered low according to review-specific QUADAS-2 criteria. Since all but one were conducted in low-or middle-income countries, and all studies with one exception were conducted in field settings, the results of this review are applicable for use in endemic areas for which screening programmes are frequently targeted. However, our limited assessment of risk of bias and investigations into sources of heterogeneity such as age and gender due to insufficient reporting may lessen the applicability of findings to the review question.
In the context of disease control programmes, being able to demonstrate variation in test accuracy associated with factors such as age and gender would be beneficial for policy makers. Fully understanding included studies' risk of bias would also assist in objectively assessing the strength of evidence. For these reasons, prospective authors of diagnostic test accuracy studies are urged to make use of the Standards for Reporting of Diagnostic Accuracy Studies (STARD) guidelines [38] when reporting methods of study design and conduct.

CONCLUSION
This review provides a summary of the accuracy of HAND for the detection of RHD. In populations of children and adolescents living in RHD endemic areas, HAND is both sensitive and specific for detecting definite RHD. The device is less accurate in detecting any RHD and demonstrates substandard accuracy for the detection of borderline RHD only. Nonetheless, this test may hold value as a replacement for first line screening due to its high sensitivity for definite RHD detection and adequate accuracy for any RHD detection.  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59

Implications for practice
We have summarised the accuracy of HAND when used as a screening tool, however, the device's potential value in terms of diagnostics has yet to be established. We therefore posit that HAND could be recommended as an acceptable replacement test for first line screening in endemic areas provided a standardised set of device-specific diagnostic criteria are developed.
Another key consideration is the applicability of these findings for recommendations to integrate screening into routine clinical practice. A recent publication has reviewed the costeffectiveness of screening in high-risk populations [39] and determined that screening all indigenous Australian 5 to 12 year-olds in half of their communities in alternate years was found to be cost-effective, if RHD can be detected at least 2 years earlier. However, this result was sensitive to a number of assumptions, including local costs and context. Other cost-effectiveness models have also suggested modestly improved outcomes at lower cost. [40] Neither of these studies included the significant cost-reduction using HAND instead of STAND, hence, we highly recommend adding a cost-effectiveness analysis into proposed new screening studies.
Finally, our findings demonstrate comparable results by non-experts, this has also been demonstrated in several other reports [31,41], but again there are no detailed costeffectiveness analyses using non-experts and HAND.

Competing interests
None declared.

Patient consent for publication
Not required.

Provenance and peer review
Not commissioned; externally peer reviewed.

Protocol and registration 5
Indicate where the review protocol can be accessed (e.g. web address) and provide trial registration number if available. 6 Eligibility criteria 6 Specify study characteristics (participants, setting, index test, reference standards, target conditions, and study design) and report characteristics (e.g., years considered, language, publication status) used as criteria for eligibility and providing rationale. 6 -7 Information sources 7 Describe all information sources (e.g. databases with dates of coverage, contact with study authors to identify additional studies) in the search and the date last searched. 6, 9 & 10 Search 8 Present full search strategies for all electronic databases and other sources searched, including any limits used so that they can be repeated.

See supplementary files
Study selection 9 State the process for selecting studies (e.g., screening, eligibility, whether included in systematic review, and, if applicable, included in the meta-analysis). 6 -8 Data collection process 10 Describe methods of data extraction from reports (e.g., piloted forms, independently, in duplicate) and any processes for obtaining and confirming data from investigators. 7 Definitions for data extraction 11 Provide definitions used in data extraction and classifications of target conditions, index tests, reference standards, and other characteristics (e.g. study design, clinical setting).

See supplementary files for editors
Risk of bias and applicability 12 Describe methods used for assessing risk of bias in individual studies and concerns regarding the applicability to the review question. 8 Diagnostic accuracy measures 13 State the principal diagnostic accuracy measures reported (e.g. sensitivity, specificity) and state the unit of assessment (e.g. per patient vs per lesion). 7 Describe the methods of handling the data, combining the results of the studies and describing the variability between studies. This could include, but is not limited to (1) handling of multiple definitions of the target condition, (2) handling of multiple thresholds of test positivity, (3) handling multiple index test readers, (4) handling of indeterminate test results, (5) grouping and comparing tests, and (6) handling of different reference standards.

-9
Meta-analysis D2 Report statistical methods used for meta-analyses if performed. 8 -9 Additional analyses 16 Describe the methods of the additional analyses (e.g. sensitivity or subgroup analyses, meta-regression), if done, indicating which were prespecified. 8 -9

Study selection 17
Provide the numbers of studies screened, assessed for eligibility, included in the review, and included in the meta-analysis if applicable, with reasons for exclusions at each stage, ideally with a flow diagram. 10 -11

-12
Risk of bias and applicability 19 Present evaluation of risk of bias and concerns regarding applicability for each study. 13

Results of individual studies 20
For each analysis in each study (e.g. unique combination of index test, reference standard, and positivity threshold), report 2 × 2 data (TP, FP, FN, TN) with estimates of diagnostic accuracy and confidence intervals, ideally with a forest plot or a receiver operating characteristic curve. See.

Supplementary files
Synthesis of results 21 Describe test accuracy, including variability; if meta-analysis was done, include results and confidence intervals. 14 -15 Additional analysis 23 Give results of additional analyses, if done (e.g. sensitivity or subgroup analyses, meta-regression, analysis of index test, failure rates, proportion of inconclusive results, and adverse events).

Summary 24
Summarize the main findings including the strength of evidence.

-19
Limitations 25 Discuss limitations from included studies (e.g. risk of bias and concerns regarding applicability) and from the review process (e.g., incomplete retrieval of identified research).