Do private providers give patients what they demand, even if it is inappropriate? A randomised study using unannounced standardised patients in Kenya

Introduction Low and varied quality of care has been demonstrated for childhood illnesses in low-income and middle-income countries. Some quality improvement strategies focus on increasing patient engagement; however, evidence suggests that patients demanding medicines can favour the selection of resistant microbial strains in the individual and the community if drugs are inappropriately used. This study examines the effects on quality of care when patients demand different types of inappropriate medicines. Methods We conducted an experiment where unannounced standardised patients (SPs), locally recruited individuals trained to simulate a standardised case, present at private clinics. Between 8 March and 28 May 2019, 10 SPs portraying caretakers of a watery diarrhoea childhood case scenario (in absentia) conducted N=200 visits at 200 private, primary care clinics in Kenya. Half of the clinics were randomly assigned to receive an SP demanding amoxicillin (an antibiotic); the other half, an SP demanding albendazole (an antiparasitic drug often used for deworming), with other presenting characteristics the same. We used logistic and linear regression models to assess the effects of demanding these inappropriate medicines on correct and unnecessary case management outcomes. Results Compared with 3% among those who did not demand albendazole, the dispensing rate increased significantly to 34% for those who did (adjusted OR 0.06, 95% CI 0.02 to 0.22, p<0.0001). Providers did not give different levels of amoxicillin between those demanding it and those not demanding it (adjusted OR 1.73, 95% CI 0.51 to 5.82). Neither significantly changed any correct management outcomes, such as treatment or referral elsewhere. Conclusion Private providers appear to account for both business-driven benefits and individual health impacts when making prescribing decisions. Additional research is needed on provider knowledge and perceptions of profit and individual and community health trade-offs when making prescription decisions after patients demand different types of inappropriate medicines. Trial registration numbers American Economic Association Registry (#AEARCTR-0000217) and Pan African Clinical Trial Registry (#PACTR201502000770329).

I enjoyed reading this paper and learning about aspects of treatment and prescription practices in Kenyan clinics. The description of the experiment is clear and the materials provide a comprehensive picture of the study procedures and results.
All of my comments except (1) and (2) are expositional or requests for clarification and do not affect the reported results. It would be helpful to discuss potential limitations of the SP method for studying patient demand for specific drugs.
(1) Effect of the AHME program: It would be helpful to see AHME treatment assignment shown in the balance table. Additionally, could the quality aspects of AHME influence prescription behavior and response to patient demand? If we would expect important differences, would it make sense to estimate the response of providers to patient demand separately for treated and untreated clinics?
(2) Total number of medicines are reported but may include vitamins or supplements. Can the data speak to the incidence of polypharmacy, i.e., unambiguous treatment for two or more different underlying conditions (such as a parasitic and a bacterial infection)?
(3) Standard errors and number of SP visits per clinic: please clarify how standard errors were computed. In places the text seems to suggest that several SP visited the same clinic (e.g. notes Table 3, p. 38 point 5 on SP sequencing). It is stated that standard errors are clustered at the "clinic and individual standardized patient levels". However, it appears that the reported results include only one observation per clinic.
(4) Description of the policy context: I read Figure A.1 with interest, as the national guidelines for diarrhea management seem to indicate awareness of the problem of antimicrobial overuse. It would help the reader put the experiment into context to learn more about the policy background in Kenya at the time of the intervention. How were current AMR policies disseminated and enforced? What did providers know about AMR, overall and in the treatment arms of AHME? I am not well acquainted with the situation in Kenya, but e.g. a report by the Global Antibiotic Resistance Partnership raises alarms about AMR for diarrheal infections (GARP 2011), and Kenya seemed to have a national AMR policy in place at the time of the study (Govt of Kenya, 2017). It would also be helpful to discuss to what extent the detailed policy context might influence the study results. AMR initiatives tend to focus on the most widely used drugs, so perhaps providers were more aware that they should resist patient demand when amoxicillin is demanded rather than albendazole. As a downside, it could be possible that the providers suspected an SP in one case but not the other.
(5) Standardized-patient method: the SP method has clear advantages related to the accuracy of measurement that the authors describe well in several places (pages 5, 7, appendix p32). The authors also mention limitations related to the types of conditions that can be tested on p12. However, there are other caveats discussed in the literature, such as the possibility of testers behaving in a way that confirms the study hypotheses (e.g. Currie, Lin & Zhang 2011, Currie, Lin & Meng 2014, Aujla et al. 2021. Discussions of audit studies on discrimination (Goldberg, 1996;Heckman, 1998) also point out that an observed supply response to specific auditor characteristics or trained behavior may not translate to differences in market outcomes. Applied to the SP context, we cannot know whether providers in general overprescribe antiparasitics but not antibiotics in response to patient demand, because we do not know how often real patients demand these drugs. Relatedly, we do not know what conclusions providers draw about patients who demand a drug if this is uncommon behavior (see also point on SP detection above). I believe the authors are referencing this issue in lines 36-40 on p12, but it could be more clearly discussed. Such a discussion does not take away from the significance of the finding that the SP simply naming albendazole can raise the rate at which it is prescribed by over 30 percentage points.
Minor comments: -Personally I consider AMR an extremely important policy issue. However, I see a slight tension between the argument on p.5/6 that diarrhea is an important case because it is a major cause of mortality and morbidity, and the focus of the study on the overtreatment aspect of quality of care. The authors might consider reframing a little.
-Consider re-ordering 1st paragraph of "Data Sources" to define pre-demanding/post-demanding before stating observation counts.
- Table 2, last 4 rows: it appears to me that the order of magnitude for these is off (unless perhaps these numbers are per patient?).
-Typo on p.12, l.15: our study is about the overprescription and overuse of antimalarials in Mali.
-Appendix Figure A.1 has a couple of typos in the first box. One concern is that after two months, the SPs might forget some of the scripts? Any quality control before the fieldwork? P37 Line 24. This study described three time points when the SP was assigned to the "demanding dawa ya minyaoo" group. Will the first one affect the providers' behavior? Say, change the prescribe itself? P37 Line 39 The same concern. P8 Line 8. It is a good way to conduct the provider survey among those who saw the SPs to explore the know-do gap. Outcomes P8 Line 14. What is the difference between prescribe and dispense? Results P9 Line 27. why measure the person waiting in the waiting room when the SP arrived? Discussion P10 Line 16-30. the authors described the result of anecdotal narratives during debriefs with the supervisors of the SP fieldwork.
It might be easier to understand the method used in this study. If I were the author, I would like reorganize this section. To be specify, move these to "results" as part of the qualitative research method. P19. I failed to find Figure 1. Clinic sample and SP randomized study design. P22. In Table 3. Summary statistics of SP visits, under "Visit characteristics", it seams that there are some indicators with missing data. e.g. "Minutes spent with provider", just wondering, is this due to SP failed to collect this information? Or other reason?

VERSION 1 -AUTHOR RESPONSE
Reviewer 1 comments and author responses: Dr. Anja Sautmann, World Bank Group Comments to the Author: This paper compares SP visits to 200 private clinics in Kenya in which the tester demands a prescription of either amoxicillin or albendazole for a case of childhood diarrhea. The paper finds that relative demand for albendazole leads to a significant increase in prescription rates by 32 percentage points, whereas use of amoxicillin was not significantly affected.
I enjoyed reading this paper and learning about aspects of treatment and prescription practices in Kenyan clinics. The description of the experiment is clear and the materials provide a comprehensive picture of the study procedures and results.
All of my comments except (1) and (2) are expositional or requests for clarification and do not affect the reported results. It would be helpful to discuss potential limitations of the SP method for studying patient demand for specific drugs.
We are appreciative that you took time to review our manuscript. We would like to kindly note that our quantitative results changed slightly based on incorporating the helpful feedback and comments from you and the other reviewer. Changes were made specifically to: variables to assess balance in Table  1, clustering of standard errors, missing data, and the addition of a polypharmacy outcome. Overall, reported findings retain the same conclusions. Your comments and questions are addressed point-bypoint below.
(1) Effect of the AHME program: It would be helpful to see AHME treatment assignment shown in the balance table. Additionally, could the quality aspects of AHME influence prescription behavior and response to patient demand? If we would expect important differences, would it make sense to estimate the response of providers to patient demand separately for treated and untreated clinics?
Thank you for this comment and raising these important questions. We respond to each of your points below in turn.
First, we agree that it is helpful to show balance for the AHME treatment assignment. We tested if there was any AHME treatment difference between the clinics assigned to receive an SP demanding albendazole or amoxicillin. We find in the analytic sample that 60% of the 102 clinics assigned albendazole and 46% of the 98 clinics assigned amoxicillin were also assigned to receive the AHME program (p-value = 0.05). We continue to include an AHME treatment indicator (1=assigned AHME treatment; 0, otherwise) in all our analyses to control for the effects of the independently randomized program. We are hesitant to include this variable in the main balance table as this would warrant a longer description of the program and the inclusion of the AHME treatment variable in the main tables (we will get back to this in a moment). We include the following text and Appendix Table A1 (p. S5): The SP experiments were randomly assigned independent of the AHME treatment assignment. The table below shows the balance of AHME assignment across the SP demanding experiment assignments for our analytic sample. We include a AHME treatment indicator for analyses based on the clinic assignment to the AHME treatment or control group. Second, the reviewer also raises very important considerations regarding the AHME program and the random assignment of clinics to the program. We also asked ourselves whether quality aspects of AHME influenced prescription behavior and response to patient demand while writing this manuscript. Based on the reviewer's comments, we present the following tables to show the effect of the AHME program in the same analyses we present the demanding experiment ("AHME treatment" indicator) without and with interactions: (a) N=400 observations (pre-demanding and post-demanding) without AHME and Demanding interactions (c) N=200 observations (post-demanding only) without AHME and Demanding interactions (d) N=200 observations (pre-demanding and post-demanding) with AHME and Demanding interactions The above tables (a) and (b) are now Appendix Table B1 (a) and (b) on pp. S17-S18.
The above tables (c) and (d) are now Appendix Table B2 (a) and (b) on pp. S19-S20.
(Also note: The analyses have been updated to correspond to the reviewer's comment (2) related to polypharmacy and (3) related to clusteringsee below.) After much thought, we elected to not report or show the effects of demanding separately for AHME treated and untreated clinics in the main text for three reasons: (1) quality aspects of AHME do not influence prescription behavior and response to patient demand, (2) the program did not directly intend to influence provider prescribing behavior, and (3) the quality aspects as they relate to the program are outside the scope of this study. Because of this, we refer to the balance results in the main text and refer the reader to Appendix Table A1, B1, and B2. We present a more thorough discussion of the effects of the AHME program on quality in a separate manuscript in preparation.
(2) Total number of medicines are reported but may include vitamins or supplements. Can the data speak to the incidence of polypharmacy, i.e., unambiguous treatment for two or more different underlying conditions (such as a parasitic and a bacterial infection)?
We thank the reviewer for this insightful comment and question. We do feel that we can better address the unambiguous treatment specifically for both a parasitic and a bacterial infection. However, since the average number of medicines given out were 2.38 (pooled SP visits), we report a dichotomous variable for whether the visit resulted in any antibiotic and any antiparasitic, including antimalarials.
Specifically, in Table 3 (p. 22), we add the variable "Dispensed/ prescribed: antibiotics and antiparasitics", which refers to whether the provider gave any antibiotic and any antiparasitic (defined in the table notes). We also included this outcome in Figure 2 and Appendix Tables B1 and B2. (3) Standard errors and number of SP visits per clinic: please clarify how standard errors were computed. In places the text seems to suggest that several SP visited the same clinic (e.g. notes Table 3, p. 38 point 5 on SP sequencing). It is stated that standard errors are clustered at the "clinic and individual standardized patient levels". However, it appears that the reported results include only one observation per clinic.
We thank the reviewer for bringing this to our attention. The reviewer is correct in that we have one visit conducted by one SP per clinic. We have considered this carefully and following Abadi, Athey, Imbens, and Wooldridge (2017) "When should you adjust standard errors for clustering?", we aimed to cluster standard errors at the level of SP demanding treatment assignment. In this case, the treatment (or independent variable) is the assignment of an SP to a clinic. Therefore, in analyses where we include one SP observation per clinic, we do not cluster standard errors (Figure 2 and Appendix Table B2; where N=200 observations). In analyses where we include 2 observations per clinic, we clustered standard errors at the clinic level (see Appendix table B1; where N=400 observations).
We read through the text carefully to make sure we have removed any places in the text that: (i) describe clustered standard errors except in Appendix Table B1; and (ii) suggest more than one SP ever visited a single clinic (e.g., notes in Table 3 and Figure 2).
(4) Description of the policy context: I read Figure A.1 with interest, as the national guidelines for diarrhea management seem to indicate awareness of the problem of antimicrobial overuse. It would help the reader put the experiment into context to learn more about the policy background in Kenya at the time of the intervention.
How were current AMR policies disseminated and enforced? What did providers know about AMR, overall and in the treatment arms of AHME? I am not well acquainted with the situation in Kenya, but e.g. a report by the Global Antibiotic Resistance Partnership raises alarms about AMR for diarrheal infections (GARP 2011), and Kenya seemed to have a national AMR policy in place at the time of the study (Govt of Kenya, 2017).
It would also be helpful to discuss to what extent the detailed policy context might influence the study results. AMR initiatives tend to focus on the most widely used drugs, so perhaps providers were more aware that they should resist patient demand when amoxicillin is demanded rather than albendazole. As a downside, it could be possible that the providers suspected an SP in one case but not the other.
We are very appreciative of this comment and thank you for sharing references to the GARP report and the GoK AMR policy. We agree that antimicrobial overuse is a very important problem, and in fact we discussed this before submission amongst ourselves for the reasons you put forth.
Given the tension of this, as you point out, we discussed further. We have cited the GARP report and AMR policy and describe how our study is not able to speak to whether providers were given or exposed to training on AMR (references 46 and 47).
Further, with our data, we do not find ourselves in the position to respond to how AMR policies were disseminated and enforced across the country. AMR policies were also outside the scope of the AHME program aims. We reviewed the current national policy document issued by the Government of Kenya and it mainly provides high level guidance for reducing the burden of antibiotic resistance. Since this study was implemented only shortly after the release of the policy document, we feel not enough time has passed to assess change related to the policy.
We can stress the fact that there is a know-do gap and feel like our study can shed light on this through providing information on the actual actions of providers when faced with a SP demanding inappropriate medications. Providers seem to be aware that amoxicillin is not appropriate for most cases of acute diarrhea enough to not give it.
Given all of these points, we have worked to address the reviewer's important comment with the following text to the discussion (p. 11 line 39p. 12 line 17 see below) Our findings also have implications on the literature on overdispensing of antimicrobial therapy and understanding quality of care outcomes that are related to antimicrobial resistance (AMR). Particularly, in Kenya there have been alarms raised about AMR for diarrheal infections and the Government has launched a national AMR policy just before our study was implemented.(46,47) Although our study does not have enough data to address the AMR issue more deeply, we show how demanding an inappropriate medicine can result in higher rates of mismanagement of childhood illnesses than demanding other inappropriate medicines, which has implications for antimicrobial stewardship efforts, training on the consequences of overprescription, and quality improvement interventions. Our study is not able to speak to whether providers were given training on AMR but is able to shed light on how providers seem to be aware that certain medicines are inappropriate for most cases of childhood acute diarrhea. We highlight what happens if a provider gives medicines demanded by a mother or caretaker of a child.
(5) Standardized-patient method: the SP method has clear advantages related to the accuracy of measurement that the authors describe well in several places (pages 5, 7, appendix p32). The authors also mention limitations related to the types of conditions that can be tested on p12. However, there are other caveats discussed in the literature, such as the possibility of testers behaving in a way that confirms the study hypotheses ( Thank you for this comment. We agree and have worked to make this clearer. We have added the following text to the Results section (p. 10, lines 33-37): It is important to recognize that SPs are not real patients and can behave in a way that confirms the study hypotheses which has been discussed in previous SP studies. (6,7,41) However, if this were the case for our study, the effects would likely be non-differential with respect to the type of medicine being demanded. Instead, an increased rate is only observed after demanding albendazole, and not after demanding amoxicillin.
Discussions of audit studies on discrimination (Goldberg, 1996;Heckman, 1998) also point out that an observed supply response to specific auditor characteristics or trained behavior may not translate to differences in market outcomes. Applied to the SP context, we cannot know whether providers in general overprescribe antiparasitics but not antibiotics in response to patient demand, because we do not know how often real patients demand these drugs.
We appreciate this link to the literature on the use of audit studies on discrimination. The reviewer raises an important discussion point. We have added the references suggested by the reviewer and have added the following text to the Results section (p. 10, line 38p. 11, line 12): Third, given that we only examine the interaction between providers and SPs, we do not report on the role of care-seeking behavior and thus interpret findings conditional on patients seeking care. Further, since SPs are not real patients, what was found with SPs may not exactly reflect what happens with real patients, nor are we able to report on how satisfied real patients would have been given these prescription patterns. Similar to discussions in audit studies on discrimination, provider behaviors captured in this study as a response to SP features or trained characteristics may not translate to actual practice behaviors with real patients.(42,43) Our study does not conduct a detection survey to measure the extent of provider suspicion, but other SP studies with detection surveys find very low detection rates (0-5%). (8,23,44) In the study where we based our childhood diarrhea SP case, Daniels et al.
(2017) administered a structured questionnaire two weeks after the completion of SP fieldwork in Nairobi, Kenya and found that despite providers having detected SPs in 9 instances, none of these actually matched the study's SP visits. As described earlier, what the SP method allows us to do which other methods cannot is to identify what happens across multiple providers when providers are presented with SPs randomly assigned to demand different inappropriate medicines, with the same presentation otherwise.
Relatedly, we do not know what conclusions providers draw about patients who demand a drug if this is uncommon behavior (see also point on SP detection above). I believe the authors are referencing this issue in lines 36-40 on p12, but it could be more clearly discussed. Such a discussion does not take away from the significance of the finding that the SP simply naming albendazole can raise the rate at which it is prescribed by over 30 percentage points.
Thank you for this comment. We agree and worked to strengthen the limitations and discussion. We note that multiple SPs visited different providers, and no providers raised suspicion. We hope we have adequately addressed this by including the following text to Appendix A3 (pp. S10-S11): We developed and finalized the SP script and demanding experiment together with a group of five field supervisors from Kenya, 40 individuals from Kenya who were recruited and hired to be standardized patients for this study (and approximately 60 more who were recruited and underwent partial training but not hired), and a technical advisory group of 4 health care providers who at the time of the study advised on national guidelines and actively trained cadres of health care providers. All of these individuals played a role in days of discussions and exercises during training on what medicines were trusted in the community and whether people in the community are open to using them. SPs and supervisors were involved in piloting the demanding of inappropriate medicines in the field. The team together acknowledged that amoxicillin and albendazole were common medicines, and their selection for study was not done arbitrarily. Further, we conducted the SP pilot with SPs demanding these two medicines before the actual study. The selection of these two medicines in the script above were the result of the training and piloting process.
From the experience before fieldwork for this study, the SP recruits, supervisors, and our technical advisory group did not find that it was uncommon for patients in Kenya to ask for specific medicines they are familiar with. In particular, amoxicillin and albendazole are commonly prescribed drugs in the study setting, and thus presumed patients demanding either of those would not be seen as suspicious. It should be noted that the SP scripts were developed while taking into account local habits and behaviors in order to minimize the risk of SPs being identified as simulated, standardized patients.
We do want to emphasize that SPs are not a perfect methodology, so what was found with SPs doesn't fully reflect what happens with real patients, as the reviewer mentions. We have taken additional care to make sure that this is clear (see excerpts above).
Minor comments: -Personally I consider AMR an extremely important policy issue. However, I see a slight tension between the argument on p.5/6 that diarrhea is an important case because it is a major cause of mortality and morbidity, and the focus of the study on the over-treatment aspect of quality of care. The authors might consider reframing a little.
We agree and refer the reviewer to our comments to our responses to comment (4) above. We want to also emphasize again that we do think that both childhood illnesses and AMR issues in public health are important. We don't think there is enough data to address the AMR issue more deeply, but we do want to link the mismanagement of childhood illnesses to the implications to AMR and underline the consequences of overprescription, issues that warrant additional research. We have added on p. 11 line 43p. 12 line 4: Although our study does not have enough data to address the AMR issue more deeply, we show how demanding an inappropriate medicine can result in higher rates of mismanagement of childhood illnesses than demanding other inappropriate medicines, which has implications for antimicrobial stewardship efforts, training on the consequences of overprescription, and quality improvement interventions.
Thank you. We have revised these lines as such (p. 6 line 43p. 7 line 3): The SP method has the advantage that the researchers know the true condition of the 'patient' which is not possible when examining data derived from real patients. SP data particularly allows for providers across different facilities to be compared against the exact same patient scenario and is thus increasingly considered the gold standard for measuring provider practice across a sample of providers that lack standardized health records.
-Consider re-ordering 1st paragraph of "Data Sources" to define pre-demanding/post-demanding before stating observation counts.
Thank you for this suggestion. We agree with the reviewer that this reorganization is clearer. The text now reads as such (with the underlined text reflecting the sentence that previously later in the paragraph but now has been moved; p. 6, lines 14-17): Between March 8 and May 28, 2019, 200 successful SP visits were conducted at 200 private Kenyan clinics. Data was captured at two moments during the interaction: "pre-demanding" includes actions before the SP demanded the assigned medicine, and "post-demanding" includes all actions by the completion of the visit. We analyze N=200 pre-demanding and N=200 post-demanding observations for the childhood diarrhea case scenario.
- Table 2, last 4 rows: it appears to me that the order of magnitude for these is off (unless perhaps these numbers are per patient?).
Thank you for raising this. We examined this more carefully and have updated the table with data presented from 2019 when this SP study was conducted in parallel with a clinic survey at the same clinic sample. (The previous table 3 contained data from the AHME program evaluation from 2014.) - Table 3, notes: sentence "Single observations…" is unclear.
Thank you. We agree that this sentence is unclear. We have removed this sentence.
The purpose of the deleted sentence was to describe the merge of the SP data to the provider datain that if different SP observations were seen by the same provider, the provider would show up in the same frequency. However, since there is one observation per clinic, it made sense to us to remove this unclear sentence.
We have examined comments from the supervisors and SPs for the visits that correspond to these missing observations. All SPs who conducted these visits had conversations with providers at the clinic but did not receive any medicines. The observations have been recoded to "0" referring to not having received each medicine.
-Typo on p.12, l.15: our study is about the overprescription and overuse of antimalarials in Mali.
Thank you for pointing this out, and we apologize for misspeaking. We have updated this to say that the findings reported by Lopez et al. (2020) "assessed whether patients' demands influence overprescription and overuse of antimalarials in Mali" (p. 11, lines 32-33).
-Appendix Figure A.1 has a couple of typos in the first box. Thank you for catching these. We reviewed the content in all of the boxes and also ran the text through spellcheck to correct typos.
Reviewer 2 comments and author responses: Dr. Xiaohui Wang, Lanzhou University Comments to the Author: This study examined the role of patient demand for inappropriate care using the methods of USP and vignette data. Inappropriate care refers to the behavior of the providers. That is to say, whether the provider prescribes or dispenses the antibiotic amoxicillin or the deworming drug albendazole. This paper makes a valuable contribution to the study of quality of care using the SP method.
Here are some Minor comments: 1.
The format: please standardize the reference format in the text. (P10, Line 5 reference 23,23,34 vs. Line9 reference 21,23 Thank you for bringing this to our attention. We have redone the references and the reference formatting is now standardized throughout the text.

2.
Inconsistent page numbers throughout the text We have double checked the page numbers and have also changed the page number formatting of the supplemental file to S1-S22 (corresponding to pages 1 through 22) so that they are not confused with the main text pagination. We have maintained our line numbers associated with our file so that we can refer to where we made corrections more easily.

3.
Please follow the CONSORT reporting guideline to organist the manuscript.
We have worked to incorporate several aspects that were missing based on the CONSORT reporting guidelines to organize the manuscript, including Figure 1. We have also added sample size calculations to Appendix A (pp. S11-12) and Appendix Table A2. We note that our study is a randomized experiment and not a randomized trial so no follow-up period exists. We have also uploaded the CONSORT reporting guideline checklist.
We have updated the Figure 1 based on the Reviewer's suggestion to follow the CONSORT flow diagram as closely as possible. The updated figure is now: 4. Some comment regarding the main text are as follows: P2 Abstract Methods L19. According to the manuscript, I assume you used unannounced SP visits instead of SP visits? Would you please clarify?
Yes, we apologize for the lack of clarity. We did utilize unannounced SP visits. We have clarified this in the title, abstract, and throughout the text.

Results
Line 28 of this study aims to examine the effect of quality of care when patients demand different types of inappropriate medicines. But according to the result, the outcome is dispensing rate of the two types of medicine. Is there any possibility to measure the quality of care more specifically?
Thank you for this comment. We have reviewed this and updated the abstract and discussion text to more clearly reflect that the objective of our study was to examine the role of patient demand for inappropriate care on prescribing and dispensing practices for childhood diarrhea in Kenya.
The abstract's results reads (p. 1): Neither significantly changed any correct management outcomes, such as treatment or referral elsewhere.
We have also added the underlined text to the outcomes description in the methods section (p. 7, lines 18-19): Using SP and vignette data, we constructed binary measures for our main outcomes of interest: correct case management and whether any unnecessary medicines were prescribed or dispensed, since one aspect of quality of care is not only dispensing correct medicines but also not dispensing inappropriate medicines.
Is there any underlining incentive-induced difference if the patient requires the medicine at a different time point?
Thank you for this question. For this study, we assume the different times are balanced across each demanding arm. We have chosen to include the following text on p. S11 (3 rd full paragraph) of the Appendix: It is quite possible that demanding a medicine when the provider is writing a prescription or about to dispense drugs could have an underlining incentive-induced difference. In this study, we assume that the different time points for demanding are balanced across each demanding arm.
Because of the reviewer's comment, we added "only" to clarify the following sentence in Data Sources in the Methods section (p. 6, line 21). Unfortunately, we are not able to ascertain from the data how frequently this occurred.
SP requests were done at the end of the visit or earlier only if it was necessary to avoid an unusual interaction.
P6 Line 35. is the number (36) typo? Or does this refer to reference 36?
We thank the reviewer for bringing this to our attention. "(36)" refers to reference 36. Also, we have redone the references and the reference formatting is now standardized throughout the text.

Data Sources
P37 Line 22. The SPs were locally recruited and trained in January 2019. The last day of the training was 01-Feb-19, while the visits were conducted from March 8 to May 28. One concern is that after two months, the SPs might forget some of the scripts? Any quality control before the fieldwork?
Refresher training, ongoing piloting (check) Quality control is an important concern for any fieldwork including the SP method, so we appreciate the reviewer for raising this. Between the last day of training and the first day in the data, the team conducted a 2-week pilot, followed by 1-week of classroom training on the final cases. We actually began fieldwork on February 28, but the team halted SP fieldwork because of an unrelated issue. It then re-commenced in March 8.
We recognize the concern raised by the reviewer and thus have decided to expand the description of the pilot and motivation for experiments in Appendix A3.3 on p. S9. The text now reads: Between February 5-15, 2019, the SPs piloted in Nairobi, and some teams also traveled out to three different areas in Kenya to ensure that we understood whether the experiments for the case needed to be adapted for different regions (since the clinic sample was spread across the country).
Given the experience during the pilot, we designed experiments for demanding unnecessary medicines. Figure A6 shows the case scenario narratives with scripts for the two experiments: demanding amoxicillin and demanding albendazole.
After the pilot and between fieldwork, the supervisors conducted refresher trainings in the classroom on the cases and did quality checks on the programmed SP exit questionnaire. Throughout fieldwork, the supervisors also conducted sessions where the case and experiments were reviewed again as a team to ensure there was no evolution of presentation in any given SP.
P37 Line 24. This study described three time points when the SP was assigned to the "demanding dawa ya minyaoo" group. Will the first one affect the providers' behavior? Say, change the prescribe itself?
P37 Line 39 The same concern.
Thank you for this question. It is quite possible that demanding a medicine when the provider is writing a prescription or about to dispense drugs could have an underlining incentive-induced difference. Examining this, though very interesting, is outside the scope of this particular study. We assume that the different time points for demanding are balanced across each demanding arm.
(Unfortunately, we are not able to ascertain from the data how frequently each occurred.) Another study could very interestingly examine these incentives more scientifically.
Because of the reviewer's comment, we added "only" to clarify the following sentence in Data Sources in the Methods section (p. 6, line 20).
SP requests were done at the end of the visit or earlier only if it was necessary to avoid an unusual interaction.
Also, we have chosen to include the following text on p. S10 (2 nd full paragraph) of the Appendix: When we began piloting the demanding experiment before fieldwork, we did not have the first two time points ((i) when the provider writes a prescription or is about to dispense drugs, (ii) when the provider asks what the patient wants). We only had the third (at the end of the interaction). However, the pilot anecdotally demonstrated to us that some providers did (i) and (ii) in the same moment, and for the SPs, it was unusual and out of their character to not respond if they came in "wanting the medicine they demanded".
P8 Line 8. It is a good way to conduct the provider survey among those who saw the SPs to explore the know-do gap.
Thank you.
Outcomes P8 Line 14. What is the difference between prescribe and dispense?
We use 'prescribe' to capture any situation where the provider may have written a prescription, including when the SP may not have actually received the medicine (e.g., if there was a stockout). Similarly, we use 'dispense' to capture any situation where the provider may have given the medicine, including when the provider may not have written a prescription. Together, 'dispense/prescribe' allows us to capture the intent of the provider on giving the medicine to the patient, regardless of whether the SP walked away with it.
Because of the reviewer's question, we have chosen to include the following text on p. 7 (lines 27-32) in the outcomes subsection of methods for clarity: We define 'prescribe/dispense' as a term to capture the intent of the provider on giving the medicine to the patient, regardless of whether the SP walked away with it: 'prescribe' can capture a situation where the provider may have written a prescription, including when the SP may not have actually received the medicine (e.g., a stockout) and 'dispense' captures a situation where the provider may have given the medicine, including when the provider may not have written a prescription.

Results
P9 Line 27. why measure the person waiting in the waiting room when the SP arrived?
We measure the number of people waiting when the SP arrives to capture how busy the clinic was. Because we do not utilize administrative data, we do not have medical records (which may not exist in this setting and if they did exist, they may not be of the same quality) to summarize utilization numbers. By measuring the patients in the waiting room, we have a proxy for utilization.
Because of the reviewer's questions, we have chosen to include red text below on p. 8 (lines 38-39) for clarity: On average, there were approximately 1.55 (95% CI: 1.16-1.94) individuals waiting in the waiting room when the SP arrived (to capture how busy the clinic was in lieu of utilization data)… Discussion P10 Line 16-30. the authors described the result of anecdotal narratives during debriefs with the supervisors of the SP fieldwork. It might be easier to understand the method used in this study. If I were the author, I would like reorganize this section. To be specify, move these to "results" as part of the qualitative research method.
We thank the reviewer for this comment. After some deliberation, we have decided to remove the anecdotal narratives from this paper, since this study did not have a qualitative component with methods rigorous enough to present. We keep in the discussion that providers may be trading off clinical benefits and risks with profits, but instead of supporting it with the narratives, we suggest that a future study could examine this in more depth. We felt that adding it to the results as part of the qualitative research method was outside the scope of the original research question for this manuscript.
P19. I failed to find Figure 1. Clinic sample and SP randomized study design.
We include the original Figure 1 from our submission for the Reviewer: We note for the reviewer that we have updated the Figure 1 based on the Reviewer's suggestion to follow CONSORT reporting guidelines. The updated figure is now: P22. In Table 3. Summary statistics of SP visits, under "Visit characteristics", it seams that there are some indicators with missing data. e.g. "Minutes spent with provider", just wondering, is this due to SP failed to collect this information? Or other reason?
Thank you for raising this data issue. From your comment, we examined the 11 observations which were consistently missing data (because of a skip pattern in the questionnaire). We examined the comments entered by the supervisors and SPs for the visits that correspond to the 11 missing observations (which we will note is a small percentage of the total sample; 5.5% of total observations). All the missing observations had conversations with staff or providers at the clinic but did not receive any medicines. Minutes spent with the provider and other variables have been recoded (values highlighted) to match the comments.
This has now been corrected in Table 3 for the majority of the variables on pp. 22-23. For the other variables: -Provider is female: we were not able to recover this variable for 4 observations -Provider age group: the data was not entered for the 11 visits below -Provider qualification: the data was not entered for the 11 visits below -Provider knowledge of diarrhea correct management: of the 200 visits, we were only able to return and find providers corresponding to 140 of the visits. -Provider did a good job explaining: the data was not entered for the 11 visits below Reviewer: 1 Competing interests of Reviewer: None.

GENERAL COMMENTS
Thank you for your very professional work, I enjoyed reading the revised manuscript. The majority of the comments and concerns from my side were well addressed. here are two further minor suggestions 1. Figure 1 could be further improved. How about combining the content of excluded reasons with the successful visits to make it more clear? 2. I assume an arrow pointing from the data collection frame to analysis was missing.
This paper undoubtedly adds new knowledge of health services in the real healthcare setting in Kenya. Thanks to the author and her team for sharing their experience and founding using USP method to access the health workers' behavior.