Examining the effectiveness of parental strategies to overcome bedwetting: an observational cohort study

Objective To examine whether a range of common strategies used by parents to overcome bedwetting in 7½-year-old children (including lifting, restricting drinks before bedtime, regular daytime toilet trips, rewards, showing displeasure and using protection pants) are effective in reducing the risk of bedwetting at 9½ years. Design Prospective cohort study. Setting General community. Participants The starting sample included 1258 children (66.7% boys and 33.2% girls) who were still bedwetting at 7½ years. Outcome measure Risk of bedwetting at 9½ years. Results Using propensity score-based methods, we found that two of the parental strategies used at 7½ years were associated with an increased risk of bedwetting at 9½ years, after adjusting the model for child and family variables and other parental strategies: lifting (risk difference=0.106 (95% CI 0.009 to 0.202), ie, there is a 10.6% (0.9% to 20.2%) increase in risk of bedwetting at 9½ years among children whose parents used lifting compared with children whose parents did not use this strategy) and restricting drinks before bedtime (0.123 (0.021 to 0.226)). The effect of using the other parental strategies was in either direction (an increase or decrease in the risk of bedwetting at 9½ years), for example, showing displeasure (−0.052 (−0.214 to 0.110)). When we re-analysed the data using multivariable regression analysis, the results were mostly consistent with the propensity score-based methods. Conclusion These findings provide evidence that common strategies used to overcome bedwetting in 7½-year-olds are not effective in reducing the risk of bedwetting at 9½ years. Parents should be encouraged to seek professional advice for their child’s bedwetting rather than persisting with strategies that may be ineffective.


Strengths and limitations of this study:
• Major strengths of this study are the availability of prospective data on bedwetting and a wide range of covariates associated with both bedwetting and the parental strategies in a large birth cohort.
• Use of propensity score-based methods with observational data makes it easier to assess whether observed confounding has been adequately eliminated.
• We did not separately examine whether the parental strategies are differentially effective for children with non-monosymptomatic and monosymptomatic nocturnal enuresis.
• We did not have information on the onset or duration of the strategies to prevent bedwetting and, therefore, we were only able to assess strategies that were reported by parents as being used currently.
• There were very small numbers of parents using medication or bedwetting alarms so we were unable to examine the effectiveness of these interventions in this study.

INTRODUCTION
Attainment of bladder control is a major milestone in child development that marks the end of a period of toilet training that is sometimes prolonged and stressful for parents and children. Nocturnal enuresis is the term used by the International Children's Continence Society (ICCS) to describe bedwetting in children aged 5 years or older after ruling out organic causes [1]. It is most common for children to become dry during the day before they remain dry at night, [2] with children usually attaining night-time bladder control between the ages of four and six years [3]. Although a significant proportion of children continue to wet the bed at school age, only a small proportion wet the bed at least twice a week, the frequency required for DSM-V diagnosis of nocturnal enuresis. For example, bedwetting was reported in 15.5% of 7½ year olds in the Avon Longitudinal Study of Parents and Children (ALSPAC), but only 2.6% wet the bed twice a week or more at this age [4]. With increasing age, bedwetting becomes more socially unacceptable and is often met with intolerance and frustration from parents [5]. Bedwetting also places considerable practical and financial burdens on the family in terms of the extra workload of washing bed linen and the cost of protective pants [6]. The psychosocial implications of bedwetting among school age children include worries about participating in sleepovers and school trips, fear of detection, teasing from peers, a sense of being different from others, emotional distress and low self-esteem [6]. Achieving bladder control is, therefore, important for a child's health and wellbeing.
It is a common belief among parents that bedwetting will eventually resolve with age and, as a result, many parents delay seeking treatment for bedwetting until it is having is a considerable impact on the child and family [7]. There is evidence from randomised and quasi-randomised trials that treatment of bedwetting with an alarm or medication can be effective [8,9], but many parents are unaware that effective treatments for bedwetting are  [10,11]. Before seeking medical advice, parents often employ a range of simple strategies aimed at preventing bedwetting, the most common being restricting drinks before bed, lifting (removing the sleeping child from bed to empty the bladder in the toilet or potty), rewarding for being dry, regular daytime toilet trips, using protection pants and showing displeasure [10]. Caldwell et al. [12] conducted a systematic review of randomised (or quasirandomised) trials of simple behavioural interventions that are often used by parents as first line interventions for bedwetting. The studies compared the interventions with an appropriate comparison group, comprising no active treatment, other behavioural interventions, or drugs either alone or in combination with other interventions. They found evidence that simple behavioural interventions (e.g. rewards, lifting) were more effective in promoting dryness than no intervention, but were inferior to treatments such as alarms and medication. The authors of the review noted, however, that most trials were small and of poor methodological quality and caution should be exercised in interpreting the findings.
Although the best method for drawing causal inferences about interventions is a randomized controlled trial (RCT), this type of study design is not always feasible or ethical. When an RCT is not possible, observational studies are the only available type of evidence.
Causality is difficult to assess in such studies because of the possibility of confoundingwhere one or more variable(s) influence both the exposure and the outcome, and thus it appears that there is a causal link between them. One method to assess the effectiveness of an intervention whilst adjusting for known and measured confounders is by using methods based on propensity scores. The aim of this study is to apply propensity score-based methods to observational data from a birth cohort to examine the effectiveness of a range of parental strategies aimed at preventing childhood bedwetting and to compare this with results using regression methods.

Participants
The sample comprised participants from the Avon Longitudinal Study of Parents and Children. Detailed information about ALSPAC is available on the study website (http://www.bristol.ac.uk/alspac), which includes a fully searchable dictionary of available data (http://www.bris.ac.uk/alspac/researchers/data-access/data-dictionary). Pregnant women resident in the former Avon Health Authority in south-west England, having an estimated date of delivery between 1/4/91 and 31/12/92 were invited to take part, resulting in a cohort of 14,541 pregnancies and 13,973 singletons/twins (7,217 boys and 6,756 girls) alive at 12 months [13]. Ethical approval for the study was obtained from the ALSPAC Law and Ethics committee and local research ethics committees. Written informed consent was obtained after the procedure(s) had been fully explained.
Bedwetting at 7½ and 9½ years: The starting sample for this study comprised children who were still bedwetting at 7½ years. Parents were asked "How often does your child wet him/herself during the night?" and for both questions were given the response options 'Never'; 'Occasional accident but less than once a week'; 'About once a week'; '2-5 times a week'; 'Nearly every day; and 'More than once a day'. A total of 8,151 parents responded to this questionnaire and 1,258 children (15.4%) were still wetting the bed at 7½ years, comprising 840 (66.7%) boys and 418 (33.2%) girls. Of the 1,258 children, 215 (69.8% boys and 30.2% girls) wet the bed 'at least twice a week'. At age 9½ years the questions on frequency of bedwetting were repeated. Bedwetting data were provided for 8,101 children and 788 (9.7%) wet the bed at this age (120 of these children wet the bed at least twice a week). The proportion of children with bedwetting at 7½ years whose parents provided data at age 9½ was 83% (1,045 out of 1,258) and 47% (n= 493) were still wetting the bed at this

Parental strategies to prevent bedwetting at 7½ years
A questionnaire was administered to parents when the study children were aged 7½ that contained a list of strategies aimed at preventing bedwetting (see table 2) that were originally elicited from parents [10]. Parents were asked: "Which of the following methods have you tried in the past or are you using now to try and help your child stop wetting the bed?" We restricted our analysis to strategies that parents reported 'using now' because there was no information on the onset or duration of strategies used in the past. The questionnaire explained that these strategies are not necessarily effective in preventing bedwetting.

Rationale for using propensity score methods in this study
Baseline characteristics of children whose parents used particular strategies to prevent bedwetting (referred to as 'treated' hereafter) might be expected to differ systematically from those whose parents did not use the strategy ('non-treated'), giving rise to confounding. It is, therefore, necessary to account for these differences when estimating the effect of the parental strategy ('treatment') on bedwetting. This could be achieved through multivariable regression (adjusting for differences in baseline variables between treated and untreated participants). Propensity score matching (PSM) has been proposed as a more appropriate method to minimise confounding when estimating treatment effects using observational data [14]. A detailed description of our rationale for using propensity score-based methods in our study is provided in the online supplementary materials.

Statistical analysis
We used PSM based methods to assess the effectiveness of the parental strategies to prevent bedwetting and compared this with estimates derived using logistic regression. Both of these analyses aimed to estimate the effect of each parental strategy (used by parents of children who wet the bed at 7 ½ years) on risk of bedwetting at 9½ years.
Propensity score-based model: We assessed the effectiveness of each strategy by examining the difference in risk of bedwetting at 9½ years between children receiving the parental strategy ('treated') versus those children not receiving the strategy ('non-treated').
The estimate we computed was the average treatment effect for the treated (ATT) [15] (see supplementary material for an explanation of our choice of ATT as the measure of treatment effect). In order to put the results of PSM analyses in context we report estimates obtained both from unadjusted models (without any confounders) and from models adjusted for the set of measured confounders.  13 We implemented the inverse probability weighting (IPW) method using the propensity score because there is evidence that this method is relatively less biased than other methods based on propensity scores [15,16]. Aside from this, in the reweighting based on propensity scores, as opposed to matching on propensity scores, no case is excluded from the analysis. The baseline variables we selected into the propensity score model were those that were associated with the treatments (parental strategies used at 7 ½ years) and outcome (bedwetting at 9 ½ years).
We used a two-stage procedure to derive the list of baseline variables to include in the model. Firstly, we examined two logistic regression models: in model 1 the outcome variable was predicted from all theoretically relevant variables relating to children and their environment (see table S1 in the supplementary materials for the list of all variables included) In model 2, the parental strategy was the outcome variable. In the second stage, we applied a threshold for selection of variables to the propensity score model. The variables we included were those that had associations (expressed in terms of odds ratios) with the outcome variables in both models of <0.90 or >1.20, thus allowing us to detect at least 'weak' associations [17]. We estimated models with and without adjustment for the confounders. A detailed list of the confounding variables is provided in table S1. Parents frequently used multiple strategies concurrently (indicated by strong correlations between subsets of parental strategies -table 2). For this reason, at the second stage of analysis, we repeated the IPW models for every strategy adjusting for confounders and strategies that were correlated  14 strategies to prevent bedwetting by using multivariable logistic regression to adjust for confounders. We applied logistic regression models to the same set of baseline variables used in the propensity score model and reported the results in the form of adjusted risk differences (rather than odds ratios) for ease of comparison with the results from the propensity score-based model. We estimated risk differences through the STATA logit command followed by adjrr command [18]. The latter command returned estimates equivalent to average treatment effects (see supplementary materials). In order to obtain robust estimates of the coefficients and their standard errors we used loops developed in STATA in exactly the same way as for propensity score model.

Missing data
Many of the potential confounders we considered had some missing data, thus reducing the sample size available for analyses. The number of missing cases on confounding variables ranged from 0 (gender) to 659 (mother's social class based on occupation) (see table S1). There were no missing data on exposure (treatment) variables.
There were 213 missing cases on the outcome variable (bedwetting at 9½ years).
The exact numbers describing the amount of missing data per variable is presented in the supplementary materials (table S1). To deal with the missing data, we used multiple imputation by chained equations [19] Figure 1 shows the prevalence of each strategy used by parents of 7½ year old children and the total number of strategies employed by parents. The most common strategies for preventing bedwetting were restricting drinks, rewarding and lifting. Only a very small number of parents used medication or bedwetting alarms, so we were unable to examine the effectiveness of these interventions. Over 50% of parents did not use any of the strategies to prevent bedwetting, whilst over 26% of parents used only one strategy. Table 2 shows the associations between the strategies. The strongest correlations were found between lifting and restricting drinks and between regular daytime toilet trips and rewards. Table 3 shows the average treatment effects on treated for each parental strategy used at 7½ years on risk of bedwetting at 9½ years derived from the propensity score based

RESULTS
analysis. The table also shows the average treatment effects derived from the logistic regression model. The coefficients in table 3 are estimated differences between the risk of bedwetting at 9½ years after receiving given 'treatment' (parental strategy) and the risk of bedwetting if they had remained 'untreated'.   Table 3. Estimated differences in the risk of bedwetting at 9½ years in children whose parents used each strategy compared with those who did not use the strategy.
Average treatment effect on treated (95% CI) based on inverse probability weighting using propensity score Average treatment effect / adjusted risk difference (95% CI) based on logistic regression analysis Empty (unadjusted) model a F o r p e e r r e v i e w o n l y 17 a Empty model column for propensity score based methods analysis shows unadjusted differences in risk of bedwetting between 'treated' and 'untreated' children. These coefficients are equivalent to bivariate regressions including bedwetting at 9 ½ years as an outcome variable and each strategy as a single predictor. b Model adjusted for child and family variables represents average treatment effects on treated children (ATETs). These coefficients are estimated differences in risks of bedwetting in weighted samples. These are differences in risks adjusted for child and family variables that accounted for the differences between treated and untreated groups and between those with and without bedwetting at age 9 ½. The list of child and family variables was derived separately for every strategy on the basis of odds ratios (ORs) from regression analyses providing evidence of an association (ORs <0.90 or >1.20) both with the given strategy and with bedwetting at age 9 ½. A detailed list of child and family variables included in model for each parental strategy is provided in the online supplementary material (see tables S2 -S7). c Model adjusted for child and family variables and other parental strategies. This is the column with coefficients adjusted for the child and family variables and for other parental strategies that were highly correlated (coefficient of tetrachoric correlation: >.45) with given strategy. d Empty model: results of univariable logistic regression analyses including bedwetting at 9 ½ years as an outcome variable and the given strategy as a single predictor. To ensure direct comparisons with the output of the analysis using propensity score based methods, the results of regression analyses are expressed here in terms of risk differences instead of ORs. e Model adjusted for child and family variables. This includes estimated differences in risk of bedwetting for given strategy obtained from logistic regression, also including child and family variables.  The unadjusted propensity score based analysis shows that the parental strategies used at 7½ years are associated with an increase in the risk of bedwetting at 9½ years. Showing displeasure was the only strategy that was also associated with a small decrease in the risk of bedwetting in the unadjusted model. After applying the propensity score model adjusting for the child and family variables and then further adjusting for the other strategies, the adjusted treatment effects provided evidence that lifting and restricting drinks are associated with an increase in the risk of bedwetting at 9 ½ years. The adjusted results for the other strategies indicated that the effect could be in either direction (either an increase or decrease in the risk of bedwetting).

Logistic regression results
The results obtained from the logistic regression analysis were mostly consistent with the analysis using the propensity score-based methods i.e. both lifting and restricting drinks were associated with an increase in the risk of bedwetting at 9½ years in the fully adjusted model. Using rewards was also associated with an increased risk of bedwetting at 9½ years.

DISCUSSION
We examined a range of common strategies used by parents to prevent bedwetting and found that when these strategies were used with 7½ year-old children who wet the bed, they were not effective in reducing the risk of bedwetting at 9½ years. Parental strategies including lifting and restricting drinks before bedtime were associated with an increased risk of subsequent bedwetting. These were among the most common parental strategies used to prevent bedwetting in our study and there is evidence in a review that these strategies are frequently used by parents across different countries [10].
This is the first study, to our knowledge, to apply propensity score-based methods to examine the effectiveness of parental strategies to prevent bedwetting using observational data from a large birth cohort. Austin [14] discusses several reasons for preferring the use of propensity score based methods to regression models when estimating treatment effects using observational data. In particular, it is easier to assess whether observed confounding has been adequately eliminated using propensity score based methods. It is important to note some caveats associated with PSM-based methods and these are discussed in the supplementary materials. Another major strength of this study is the availability of a wide range of confounders in the ALSPAC dataset that are associated with bedwetting and with the parental strategies to prevent bedwetting. We did not separately examine whether the parental strategies are differentially effective for children with non-monosymptomatic propensity score-based models. We did, however, include a range of covariates in the analyses including frequency of bedwetting (high frequency= twice or more/week), daytime wetting, urgency and voiding postponement. All of these variables were important confounders in most of the models. We did not have information on the onset or duration of the strategies to prevent bedwetting and, therefore, we were only able to assess strategies that were reported by parents as being used currently. Due to this being a community-based, rather than a clinical, sample, there were very small numbers of parents using medication or bedwetting alarms. We were, therefore, unable to examine the effectiveness of these interventions in this study. Additionally, there was no information on which strategies parents initiated compared with those recommended by health care workers or other sources (e.g. the child's grandparents).

Supplementary online material
Rationale for applying propensity score methods The main purpose of propensity score-based methods is to draw causal inferences about effects of treatments from observational data. In order to do this certain assumptions are required. In observational data the process of selection to the 'treated' and 'non-treated' groups is not random. Direct comparisons of the outcomes in both groups are likely to be biased if 'treated' and 'non-treated' groups differ systematically on certain variables (both measured and unmeasured) that may also be associated with the outcome. It is, therefore, not possible to establish to what extent the observed results are effects of the 'treatment' versus other important variables that remained uncontrolled in the process of selection of cases into the groups. Dealing with this problem of confounding is a central problem of studies based on observational data. As Rosenabum (2005) 1 states the key issues in such studies are to remove 'overt biases' (related to measured/observed confounders) and to address somehow uncertainty about those hidden (unmeasured confounders).
Multivariable regression and, more recently, propensity score-based methods can be used to minimise effects of confounders when estimating treatment effects using observational data. These methods adjust the raw results to take into account the non-random mechanism of selection of cases to the 'treated' versus 'non-treated' groups by controlling for differences between both groups. Propensity score-based methods are based on the assumption that differences between 'treated' and 'non-treated' groups can be explained in terms of observable variables and bias should, therefore, be minimised by ensuring similarity of both groups on these variables. This is achieved using propensity score-based methods either by matching each case from the 'treated' group to at least one similar case from the 'non-treated' group or by reweighting both groups in order to construct a synthetic sample in which both groups are similar with respect to the observed confounders. Similarity of the groups is assessed in terms of the probability of receiving the 'treatment' as estimated from the observed variables of the cases. This probability, called the 'propensity score', is further used to create/or reshape the sample in which distributions of variables of study participants are similar in both groups. If this condition is fulfilled then treated and untreated groups are balanced on measured confounders. However, in order to compare their results directly it is also important to assume that all of the important confounders were measured and included in the analysis. It is only under these conditions that estimates of causal effects are assumed to be unbiased.
Application of propensity score-based methods in the current study We used propensity score-based methods to assess the effectiveness of parental strategies aimed at preventing bedwetting. Within our study, 'treated' refers to those children receiving each parental strategy at 7 ½ years and 'non-treated' refers to those not receiving the strategy. The propensity score-based analysis involves three general steps: (1) constructing the propensity scores; (2) balancing the sample based on their propensity scores; and (3) estimating effects of the treatment (parental strategy). At each of these steps in the analysis, decisions need to be made that influence the results and final conclusions. Below, we present the key decisions we made in conducting the propensity score-based analysis: a) Average treatment effect versus average treatment effect on treated: One of the most crucial decisions in the analysis concerns the indicator of the impact of the treatment. These are: average treatment effect (ATE) or average treatment effect on treated (ATT). They are both rooted within the counterfactual framework, which assumes that each participant included in the study has two potential outcomes i.e. the outcome when treatment was applied and the outcome when treatment was not applied. In reality, only one of these outcomes is truly observed, and the other is unobserved (counterfactual). The ATE is defined as the difference between averages of these two potential outcomes at the population level. The ATT is the difference between potential outcomes calculated for the treated group. The aim of the analysis was to estimate the effect of each parental strategy aimed at preventing bedwetting at 7½ years as a difference in risk of bedwetting at 9½ years between those children receiving the strategy (treated) versus those children not receiving the strategy (non-treated). We assessed the effectiveness of each parental strategy by means of the average treatment effect for the treated (ATT) because we were interested in the average effect of treatment on those children who ultimately received the treatment rather than in the estimation of the average treatment effect at the population level (ATE).

b) Selection of propensity score based method:
The four most reported methods for using propensity scores in the estimation of the effect of treatment are propensity score matching, stratification on propensity score, inverse probability weighting and covariate adjustment using propensity scores. Austin (2011) 2 provides a detailed discussion of the strengths and limitations of these methods. In brief, the method based on inverse probability weighting is favoured, whilst the other methods were found to be more prone to bias in Monte Carlo studies. We used the inverse probability of treatment weighting (IPTW) where the synthetic balanced sample is constructed (participant variables are independent on the assignment to treated/non-treated groups). We estimated the propensity scores by logistic regression.
c) Selection of baseline variables to the propensity score model: There are at least four possible strategies for variable selection and there is lack of consensus in the literature on which strategy is the most appropriate. 2 The available strategies are: (1) propensity model should include all measured, theoretically relevant variables; (2) model should include variables that are associated with the treatment; (3) model should include variables that are associated with the outcome; (4) model should include only those variables that are associated both with treatment and outcome. The final strategy seems to be the favoured approach as regards theoretical arguments and results of simulation studies. 2 Therefore, we chose this model including only those variables that are associated both with treatment (parental strategy) and outcome (bedwetting at 9 ½ years). We used a two-stage procedure to derive the list of baseline variables to include in the model. Firstly, we conducted two logistic regression analyses in the first completed dataset (m=1). In the first regression model, the outcome variable was predicted from all theoretically relevant variables relating to children and their family (see table S1 for the list of all variables included). In the second regression model, each parental strategy was the outcome variable. Secondly, we applied a threshold for selection of variables to the propensity score model -only the variables that had associations (expressed in terms of odds ratios) with the outcome variables in both models of <0.90 or >1.20. 3 We enriched the propensity score models by including the parental strategies that were highly correlated with the Propensity score based diagnostic methods are designed to examine whether the distributions of the baseline variables are the same/or similar enough in treated and non-treated groups. There are several specific diagnostic procedures that are recommended. Among them the most frequently used are those examining the standardized differences between means of baseline variables after matching/weighting and those comparing their variances. Before analysing the treatment effects, we assessed the adequacy of the propensity score model by checking the overlap in propensity scores. We based our diagnostic checks on the examination of differences between means of baseline variables in the weighted sample. Categorical variables with more than two categories were examined using sets of binary variables. Following the literature we assumed that absolute standardized mean difference < 0.1 indicates negligible differences between groups. Since the model computations were conducted on 50 datasets obtained from the multiple imputation procedure (see section below for details) we performed the diagnostic checks in 5 randomly selected datasets (m=1,4,5,37,50). See tables S2 to S7 for the results of the diagnostic checks to assess the adequacy of the propensity score models.

e) Dealing with missing data and estimating the treatment effects:
The magnitude of missing data per variable is presented in table S1. To deal with the missing data, we used multiple imputation by chained equations within ICE (Imputation by Chained Equations) STATA package v.14. We generated 50 imputed datasets using 10 cycles of regression switching. We used a range of auxiliary data, which does not itself form part of the analytical model. The criteria of variable selection to the imputation model for each variable included: (i) theoretical and empirical evidence from prior published research; (ii) data driven: variable was selected if in the current dataset it was strongly associated with observed values of the imputed variable or missingness on that variable; (iii) additionally, the imputation models for every variable included the parental strategies and bedwetting at 9½ years. Conducting the analysis based on propensity scores on multiply imputed datasets required a decision about the exact routine by which the treatment effects are calculated. There are two possibilities: (1) first estimate the propensity score in each of 50 completed datasets and average the propensity score for every subject across all datasets and then estimate treatment effects based on those aggregated propensity scores; (2) estimate propensity scores and treatment effects separately in every completed dataset and then aggregate the treatment effects obtained according to Rubin's rules. Published simulation studies suggest that first strategy provides biased results 4 therefore, we used the second strategy. To our best knowledge there is no STATA package that deals with treatment effects in the context of multiple imputation, so we developed our own syntax using loops to estimate treatment effects and standard errors separately within each file and exported the results to Excel. At the next stage, we aggregated the exported results by implementing Rubin's rules for estimating the effects and standard errors from the multiply imputed datasets. 5 For the estimation teffects ipw we used the STATA built-in command and for the diagnostic procedures we used tebalance summarize. Models for medication and bedwetting alarm were not possible to estimate due to a very low number of children receiving these interventions.  4 We conducted diagnostics checking the plausibility of the imputation process by means of the midiagplots STATA command after datasets from the ICE package were successfully exported. Due to the large number of generated datasets, the diagnostics were based only on a subset of five randomly selected datasets (m=1,4,5,37,50) obtained from the MI procedure. The diagnostics were conducted with the assumption that the missing data mechanism was MAR.
We compared the distributions of observed, imputed and completed values of each variable. The diagnostics revealed that they did not differ greatly which suggested that the imputation model was correctly specified. (The figures and tables including those diagnostic statistics are available on request).
Caveats of using propensity score based methods Propensity score based methods are a group of methods used to estimate treatment effects from observational data by balancing treatment and non-treatment groups on the set of measured confounders (X). As   6 showed that this can be achieved by using propensity scores instead of all combinations of variables from X.
All propensity score methods rely on several assumptions of which the most crucial is the one known as the conditional independence assumption (CIA). It consists of two parts: (a) the potential result of being treated (Y 1 ) and the potential result of remaining untreated (Y 0 ) are independent of the treatment group (G), after controlling for a vector of measured confounders (X): ܻ , ܻ ଵ ⊥ ‫|ܩ‬ and (b) every case, (with any characteristic on vector X) has a positive chance of being included in either treatment group, which is 0<P(G=1|X)<1. Part (a) of CIA assumes that there are no unmeasured confounders. If vector X does not include all important confounders (those related both to the outcome and the treatment) (a) is not satisfied and causal inferences might be biased. Unfortunately, this is the untestable part of CIA and needs to be assumed. Part (b) of CIA implies that there must be an overlap between treated and non-treated groups with regards to their characteristics on vector X which means that any person with any characteristic on X within treatment group will have their counterpart in non treatment group 1 . In contrary to (a), part (b) of CIA concerning the overlap of treated and nontreated is testable and can be made by comparing distributions of propensity scores for both groups. We have done this by graphical examination of propensity scores estimated for both treatment groups (see figures S2-S7). This assumption was fulfilled to a satisfactory extent since the differences we observed in the distributions of propensity scores in treated and non-treated groups were minor. It should be also added here that since in the current paper we relied on ATT, both parts of CIA are adopted in their weaker, less stringent, versions, This, weaker version of CIA requires for (a) only ܻ , ⊥ ‫|ܩ‬ to hold (which remains untestable) and for (b) that cases from the treatment group will have their counterparts in the untreated group (but the overlap in the opposite direction is not necessary now). As shown in figures S2-S7 this milder version of the assumption generally held in the data. After applying methods using propensity scores the assessment of balance achieved between both compared groups is the final diagnostic step. It is also notable that the assessment of balance obtained as a result of applying inverse propensity score weighting returned satisfactory results.   when the study children were 24 months. The TTS comprises statements describing specific behaviours and mothers were asked to rate how often their child behaves in that way on a scale ranging from 1 (almost never) to 6 (almost always). The scale comprises nine temperament traits, but we restricted our analysis to five traits we found were associated with bedwetting in an earlier study (  Assessing the adequacy of the propensity score models                When we re-analysed the data using multivariable regression analysis, the results were mostly consistent with the propensity score-based methods.

Conclusion:
These findings provide evidence that common strategies used to overcome bedwetting in 7½ year olds are not effective in reducing the risk of bedwetting at 9½ years.
Parents should be encouraged to seek professional advice for their child's bedwetting rather than persisting with strategies that may be ineffective. Keywords: Bedwetting, prevention, child, cohort study, ALSPAC.

Strengths and limitations of this study:
• Major strengths of this study are the availability of prospective data on bedwetting and a wide range of covariates associated with both bedwetting and the parental strategies in a large birth cohort.
• Use of propensity score-based methods with observational data makes it easier to assess whether observed confounding has been adequately eliminated.
• We did not separately examine whether the parental strategies are differentially effective for children with non-monosymptomatic and monosymptomatic enuresis.
• We did not have information on the onset or duration of the strategies to overcome bedwetting and, therefore, we were only able to assess strategies that were reported by parents as being used currently.
• There were very small numbers of parents using medication or bedwetting alarms so we were unable to examine the effectiveness of these interventions in this study. (ICCS) to describe bedwetting in children aged 5 years or older after ruling out organic causes [1]. It is most common for children to become dry during the day before they remain dry at night, [2] with children usually attaining night-time bladder control between the ages of four and six years [3]. Although a significant proportion of children continue to wet the bed at school age, only a small proportion wet the bed at least twice a week, the frequency required for DSM-V diagnosis of enuresis. For example, bedwetting was reported in 15.5% of 7½ year olds in the Avon Longitudinal Study of Parents and Children (ALSPAC), but only 2.6% wet the bed twice a week or more at this age [4]. With increasing age, bedwetting becomes more socially unacceptable and is often met with intolerance and frustration from parents [5]. Bedwetting also places considerable practical and financial burdens on the family in terms of the extra workload of washing bed linen and the cost of protective pants [6]. The psychosocial implications of bedwetting among school age children include worries about participating in sleepovers and school trips, fear of detection, teasing from peers, a sense of being different from others, emotional distress and low self-esteem [6]. Achieving bladder control is, therefore, important for a child's health and wellbeing.
It is a common belief among parents that bedwetting will eventually resolve with age and, as a result, many parents delay seeking treatment for bedwetting until it is having is a considerable impact on the child and family [7]. There is evidence from randomised and quasi-randomised trials that treatment of bedwetting with an alarm or medication can be effective [8,9], but many parents are unaware that effective treatments for bedwetting are available [10,11]. Before seeking medical advice, parents often employ a range of simple  Although the best method for drawing causal inferences about interventions is a randomized controlled trial (RCT), this type of study design is not always feasible or ethical.
When an RCT is not possible, observational studies are the only available type of evidence.
Causality is difficult to assess in such studies because of the possibility of confoundingwhere one or more variable(s) influence both the exposure and the outcome, and thus it appears that there is a causal link between them. One method to assess the effectiveness of an intervention whilst adjusting for known and measured confounders is by using methods based on propensity scores. The aim of this study is to apply propensity score-based methods to observational data from a birth cohort to examine the effectiveness of a range of parental strategies aimed at overcoming childhood bedwetting and to compare this with results using regression methods.

Parental strategies to overcome bedwetting at 7½ years
A questionnaire was administered to parents when the study children were aged 7½ that contained a list of strategies aimed at overcoming bedwetting that were originally elicited from parents [10]. Parents were asked: "Which of the following methods have you tried in the past or are you using now to try and help your child stop wetting the bed?" We restricted our analysis to strategies that parents reported 'using now' because there was no information on the onset or duration of strategies used in the past. The questionnaire explained that these strategies are not necessarily effective in overcoming bedwetting.

Rationale for using propensity score methods in this study
Baseline characteristics of children whose parents used particular strategies to overcome bedwetting (referred to as 'treated' hereafter) might be expected to differ systematically from those whose parents did not use the strategy ('non-treated'), giving rise to confounding. It is, therefore, necessary to account for these differences when estimating the effect of the parental strategy ('treatment') on bedwetting. This could be achieved through multivariable regression (adjusting for differences in baseline variables between treated and untreated participants). Propensity score matching (PSM) has been proposed as a more appropriate method to minimise confounding when estimating treatment effects using observational data [14]. A detailed description of our rationale for using propensity scorebased methods in our study is provided in the online supplementary materials.

Statistical analysis
We used PSM based methods to assess the effectiveness of the parental strategies to overcome bedwetting and compared this with estimates derived using logistic regression. Both of these analyses aimed to estimate the effect of each parental strategy (used by parents of children who wet the bed at 7 ½ years) on risk of bedwetting at 9½ years.
Propensity score-based model: We assessed the effectiveness of each strategy by examining the difference in risk of bedwetting at 9½ years between children receiving the parental strategy ('treated') versus those children not receiving the strategy ('non-treated').
The estimate we computed was the average treatment effect for the treated (ATT) [15] (see supplementary material for an explanation of our choice of ATT as the measure of treatment effect). In order to put the results of PSM analyses in context we report estimates obtained both from unadjusted models (without any confounders) and from models adjusted for the set of measured confounders.
We implemented the inverse probability weighting (IPW) method using the propensity score because there is evidence that this method is relatively less biased than other methods based on propensity scores [15,16]. Aside from this, in the reweighting based on propensity scores, as opposed to matching on propensity scores, no case is excluded from the analysis. The baseline variables we selected into the propensity score model were those that were associated with the treatments (parental strategies used at 7 ½ years) and outcome (bedwetting at 9 ½ years).
We used a two-stage procedure to derive the list of baseline variables to include in the model. Firstly, we examined two logistic regression models: in model 1 the outcome variable was predicted from all theoretically relevant variables relating to children and their environment (see table S1 in the supplementary materials for the list of all variables included) In model 2, the parental strategy was the outcome variable. In the second stage, we applied a threshold for selection of variables to the propensity score model. The variables we included were those that had associations (expressed in terms of odds ratios) with the outcome variables in both models of <0.90 or >1.20, thus allowing us to detect at least 'weak' Logistic regression model: We also examined the effectiveness of the parental strategies to overcome bedwetting by using multivariable logistic regression to adjust for confounders. We applied logistic regression models to the same set of baseline variables used in the propensity score model and reported the results in the form of adjusted risk differences (rather than odds ratios) for ease of comparison with the results from the propensity score-based model. We estimated risk differences through the STATA logit command followed by adjrr command [18]. The latter command returned estimates equivalent to average treatment effects (see supplementary materials). In order to obtain robust estimates of the coefficients and their standard errors we used loops developed in STATA in exactly the same way as for propensity score model.

Missing data
Many of the potential confounders we considered had some missing data, thus reducing the sample size available for analyses. The number of missing cases on confounding variables ranged from 0 (gender) to 659 (mother's social class based on  Table 1 describes the characteristics of the study participants.  Figure 1 shows the prevalence of each strategy used by parents of 7½ year old children and the total number of strategies employed by parents. The most common strategies for overcoming bedwetting were restricting drinks, rewarding and lifting. Only a very small number of parents used medication or bedwetting alarms, so we were unable to examine the effectiveness of these interventions. Over 50% of parents did not use any of the strategies to overcome bedwetting, whilst over 26% of parents used only one strategy. Table   2 shows the associations between the strategies. The strongest correlations were found between lifting and restricting drinks and between regular daytime toilet trips and rewards.       1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48 The estimates provided in this table are average treatment effects for each strategy. They are risk differences i.e. estimated differences between the risk of bedwetting at 9½ years after receiving given 'treatment' (i.e. parental strategy) and the risk of bedwetting if they had remained 'untreated'. The estimates in this table are risk differences. We provide examples of how to interpret the risk differences below:

RESULTS
(1) The risk difference for 'restricting drinks' (0.123 [95% CI= 0.021-0.226]) shows that there is a 12.3% (2.1% to 22.6%) increase in risk of bedwetting at 9½ years among children whose parents used restricting drinks compared with children whose parents did not use this strategy. years among children whose parents show displeasure, but the 95% confidence interval indicates that this result could be in either direction (between a 21% reduced risk and 11% increased risk).
* Indicates risk differences that provide evidence for an increase in bedwetting associated with the parental strategy.
a Empty model column for propensity score based methods analysis shows unadjusted differences in risk of bedwetting between 'treated' and 'untreated' children. These coefficients are equivalent to bivariate regressions including bedwetting at 9 ½ years as an outcome variable and each strategy as a single predictor. b Model adjusted for child and family variables represents average treatment effects on treated children (ATETs). These coefficients are estimated differences in risks of bedwetting in weighted samples. These are differences in risks adjusted for child and family variables that accounted for the differences between treated and untreated groups and between those with and without bedwetting at age 9 ½. The list of child and family variables was derived separately for every strategy on the basis of odds ratios (ORs) from regression analyses providing evidence of an association (ORs <0.90 or >1.20) both with the given strategy and with bedwetting at age 9 ½. A detailed list of child and family variables included in model for each parental strategy is provided in the online supplementary material (see tables S2 -S7). c Model adjusted for child and family variables and other parental strategies. This is the column with coefficients adjusted for the child and family variables and for other parental strategies that were highly correlated (coefficient of tetrachoric correlation: >.45) with given strategy.

Propensity score based results (inverse probability weighting)
The unadjusted propensity score based analysis shows that the parental strategies used at 7½ years are associated with an increase in the risk of bedwetting at 9½ years. Showing displeasure was the only strategy that was also associated with a small decrease in the risk of bedwetting in the unadjusted model. After applying the propensity score model adjusting for the child and family variables and then further adjusting for the other strategies, the adjusted treatment effects provided evidence that lifting and restricting drinks are associated with an increase in the risk of bedwetting at 9 ½ years. The adjusted results for the other strategies indicated that the effect could be in either direction (either an increase or decrease in the risk of bedwetting).

Logistic regression results
The results obtained from the logistic regression analysis were mostly consistent with the analysis using the propensity score-based methods i.e. both lifting and restricting drinks were associated with an increase in the risk of bedwetting at 9½ years in the fully adjusted model. Using rewards was also associated with an increased risk of bedwetting at 9½ years.

DISCUSSION
We examined a range of common strategies used by parents to overcome bedwetting and found that when these strategies were used with 7½ year-old children who wet the bed, they were not effective in reducing the risk of bedwetting at 9½ years. Parental strategies including lifting and restricting drinks before bedtime were associated with an increased risk of subsequent bedwetting. These were among the most common parental strategies used to overcome bedwetting in our study and there is evidence in a review that these strategies are frequently used by parents across different countries [10].
This is the first study, to our knowledge, to apply propensity score-based methods to examine the effectiveness of parental strategies to overcome bedwetting using observational data from a large birth cohort. Austin [14] discusses several reasons for preferring the use of propensity score based methods to regression models when estimating treatment effects using observational data. In particular, it is easier to assess whether observed confounding has been adequately eliminated using propensity score based methods. It is important to note some caveats associated with PSM-based methods and these are discussed in the supplementary materials. Another major strength of this study is the availability of a wide range of confounders in the ALSPAC dataset that are associated with bedwetting and with the parental strategies to overcome bedwetting. We did not separately examine whether the parental strategies are differentially effective for children with non-monosymptomatic score-based models. We did, however, include a range of covariates in the analyses including frequency of bedwetting (high frequency= twice or more/week), daytime wetting, urgency and voiding postponement. All of these variables were important confounders in most of the models. We did not have information on the onset or duration of the strategies to overcome bedwetting and, therefore, we were only able to assess strategies that were reported by parents as being used currently. Due to this being a community-based, rather than a clinical, sample, there were very small numbers of parents using medication or bedwetting alarms. We were, therefore, unable to examine the effectiveness of these interventions in this study. Additionally, there was no information on which strategies parents initiated compared with those recommended by health care workers or other sources (e.g. the child's grandparents).
Caldwell et al. [12] concluded from their systematic review, that simple interventions such as lifting (or waking) could initially be tried as a strategies to overcome bedwetting, since such methods are "safe and are better than doing nothing". However, they cautioned that most of the trials they reviewed were small and of poor methodological quality [12]. Our findings are consistent with the current advice given in the NICE guidelines stating: "Neither waking nor lifting children and young people with bedwetting, at regular times or randomly, will promote long-term dryness" [25]. Although lifting appears to be sensible for promoting nighttime dryness, its effectiveness as an intervention for reducing or stopping bedwetting has previously been questioned. It has been suggested that lifting may inadvertently maintain bedwetting since this strategy encourages the child to empty the bladder without fully waking and, therefore, children are not learning to waken to the sensation of a full bladder [10,26].
However, we are not aware of any studies that have formally tested this proposed mechanism.

Supplementary online material
Rationale for applying propensity score methods The main purpose of propensity score-based methods is to draw causal inferences about effects of treatments from observational data. In order to do this certain assumptions are required. In observational data the process of selection to the 'treated' and 'non-treated' groups is not random. Direct comparisons of the outcomes in both groups are likely to be biased if 'treated' and 'non-treated' groups differ systematically on certain variables (both measured and unmeasured) that may also be associated with the outcome. It is, therefore, not possible to establish to what extent the observed results are effects of the 'treatment' versus other important variables that remained uncontrolled in the process of selection of cases into the groups. Dealing with this problem of confounding is a central problem of studies based on observational data. As Rosenabum (2005) 1 states the key issues in such studies are to remove 'overt biases' (related to measured/observed confounders) and to address somehow uncertainty about those hidden (unmeasured confounders).
Multivariable regression and, more recently, propensity score-based methods can be used to minimise effects of confounders when estimating treatment effects using observational data. These methods adjust the crude results to take into account the non-random mechanism of selection of cases to the 'treated' versus 'non-treated' groups by controlling for differences between both groups. Propensity score-based methods are based on the assumption that differences between 'treated' and 'non-treated' groups can be explained in terms of observable variables and bias should, therefore, be minimised by ensuring similarity of both groups on these variables. This is achieved using propensity score-based methods either by matching each case from the 'treated' group to at least one similar case from the 'non-treated' group or by reweighting both groups in order to construct a synthetic sample in which both groups are similar with respect to the observed confounders. Similarity of the groups is assessed in terms of the probability of receiving the 'treatment' as estimated from the observed variables of the cases. This probability, called the 'propensity score', is further used to create/or reshape the sample in which distributions of variables of study participants are similar in both groups. If this condition is fulfilled then treated and untreated groups are balanced on measured confounders. However, in order to compare their results directly it is also important to assume that all of the important confounders were measured and included in the analysis. It is only under these conditions that estimates of causal effects are assumed to be unbiased.
Application of propensity score-based methods in the current study We used propensity score-based methods to assess the effectiveness of parental strategies aimed at preventing bedwetting. Within our study, 'treated' refers to those children receiving each parental strategy at 7 ½ years and 'non-treated' refers to those not receiving the strategy. The propensity score-based analysis involves three general steps: (1) constructing the propensity scores; (2) balancing the sample based on their propensity scores; and (3) estimating effects of the treatment (parental strategy). At each of these steps in the analysis, decisions need to be made that influence the results and final conclusions. Below, we present the key decisions we made in conducting the propensity score-based analysis: One of the most crucial decisions in the analysis concerns the indicator of the impact of the treatment. These are: average treatment effect (ATE) or average treatment effect on treated (ATT). They are both rooted within the counterfactual framework, which assumes that each participant included in the study has two potential outcomes i.e. the outcome when treatment was applied and the outcome when treatment was not applied. In reality, only one of these outcomes is truly observed, and the other is unobserved (counterfactual). The ATE is defined as the difference between averages of these two potential outcomes at the population level. The ATT is the difference between potential outcomes calculated for the treated group. The aim of the analysis was to estimate the effect of each parental strategy aimed at preventing bedwetting at 7½ years as a difference in risk of bedwetting at 9½ years between those children receiving the strategy (treated) versus those children not receiving the strategy (non-treated). We assessed the effectiveness of each parental strategy by means of the average treatment effect for the treated (ATT) because we were interested in the average effect of treatment on those children who ultimately received the treatment rather than in the estimation of the average treatment effect at the population level (ATE).

b) Selection of propensity score based method:
The four most reported methods for using propensity scores in the estimation of the effect of treatment are propensity score matching, stratification on propensity score, inverse probability weighting and covariate adjustment using propensity scores. Austin (2011) 2 provides a detailed discussion of the strengths and limitations of these methods. In brief, the method based on inverse probability weighting is favoured, whilst the other methods were found to be more prone to bias in Monte Carlo studies. We used the inverse probability of treatment weighting (IPTW) where the synthetic balanced sample is constructed (participant variables are independent on the assignment to treated/non-treated groups). We estimated the propensity scores by logistic regression.
c) Selection of baseline variables to the propensity score model: There are at least four possible strategies for variable selection and there is lack of consensus in the literature on which strategy is the most appropriate. 2 The available strategies are: (1) propensity model should include all measured, theoretically relevant variables; (2) model should include variables that are associated with the treatment; (3) model should include variables that are associated with the outcome; (4) model should include only those variables that are associated both with treatment and outcome. The final strategy seems to be the favoured approach as regards theoretical arguments and results of simulation studies. 2 Therefore, we chose this model including only those variables that are associated both with treatment (parental strategy) and outcome (bedwetting at 9 ½ years). We used a two-stage procedure to derive the list of baseline variables to include in the model. Firstly, we conducted two logistic regression analyses in the first completed dataset (m=1). In the first regression model, the outcome variable was predicted from all theoretically relevant variables relating to children and their family (see table S1 for the list of all variables included). In the second regression model, each parental strategy was the outcome variable. Secondly, we applied a threshold for selection of variables to the propensity score model -only the variables that had associations (expressed in terms of odds ratios) with the outcome variables in both models of <0.90 or >1.20. 3 We enriched the propensity score models by including the parental strategies that were highly correlated with the

d) Diagnostics:
Propensity score based diagnostic methods are designed to examine whether the distributions of the baseline variables are the same/or similar enough in treated and non-treated groups. There are several specific diagnostic procedures that are recommended. Among them the most frequently used are those examining the standardized differences between means of baseline variables after matching/weighting and those comparing their variances. Before analysing the treatment effects, we assessed the adequacy of the propensity score model by checking the overlap in propensity scores. We based our diagnostic checks on the examination of differences between means of baseline variables in the weighted sample. Categorical variables with more than two categories were examined using sets of binary variables. Following the literature we assumed that absolute standardized mean difference < 0.1 indicates negligible differences between groups. Since the model computations were conducted on 50 datasets obtained from the multiple imputation procedure (see section below for details) we performed the diagnostic checks in 5 randomly selected datasets (m=1,4,5,37,50). See tables S2 to S7 for the results of the diagnostic checks to assess the adequacy of the propensity score models.

e) Dealing with missing data and estimating the treatment effects:
The magnitude of missing data per variable is presented in table S1. To deal with the missing data, we used multiple imputation by chained equations within ICE (Imputation by Chained Equations) STATA package v.14. We generated 50 imputed datasets using 10 cycles of regression switching. We used a range of auxiliary data, which does not itself form part of the analytical model. The criteria of variable selection to the imputation model for each variable included: (i) theoretical and empirical evidence from prior published research; (ii) data driven: variable was selected if in the current dataset it was strongly associated with observed values of the imputed variable or missingness on that variable; (iii) additionally, the imputation models for every variable included the parental strategies and bedwetting at 9½ years. Conducting the analysis based on propensity scores on multiply imputed datasets required a decision about the exact routine by which the treatment effects are calculated. There are two possibilities: (1) first estimate the propensity score in each of 50 completed datasets and average the propensity score for every subject across all datasets and then estimate treatment effects based on those aggregated propensity scores; (2) estimate propensity scores and treatment effects separately in every completed dataset and then aggregate the treatment effects obtained according to Rubin's rules. Published simulation studies suggest that first strategy provides biased results 4 therefore, we used the second strategy. To our best knowledge there is no STATA package that deals with treatment effects in the context of multiple imputation, so we developed our own syntax using loops to estimate treatment effects and standard errors separately within each file and exported the results to Excel. At the next stage, we aggregated the exported results by implementing Rubin's rules for estimating the effects and standard errors from the multiply imputed datasets. 5 For the estimation teffects ipw we used the STATA built-in command and for the diagnostic procedures we used tebalance summarize. Models for medication and bedwetting alarm were not possible to estimate due to a very low number of children receiving these interventions.  4 We conducted diagnostics checking the plausibility of the imputation process by means of the midiagplots STATA command after datasets from the ICE package were successfully exported. Due to the large number of generated datasets, the diagnostics were based only on a subset of five randomly selected datasets (m=1,4,5,37,50) obtained from the MI procedure. The diagnostics were conducted with the assumption that the missing data mechanism was MAR.
We compared the distributions of observed, imputed and completed values of each variable. The diagnostics revealed that they did not differ greatly which suggested that the imputation model was correctly specified. (The figures and tables including those diagnostic statistics are available on request).
Caveats of using propensity score based methods Propensity score based methods are a group of methods used to estimate treatment effects from observational data by balancing treatment and non-treatment groups on the set of measured confounders (X). As   6 showed that this can be achieved by using propensity scores instead of all combinations of variables from X.
All propensity score methods rely on several assumptions of which the most crucial is the one known as the conditional independence assumption (CIA). It consists of two parts: (a) the potential result of being treated (Y 1 ) and the potential result of remaining untreated (Y 0 ) are independent of the treatment group (G), after controlling for a vector of measured confounders (X): , ! ⊥ | and (b) every case, (with any characteristic on vector X) has a positive chance of being included in either treatment group, which is 0<P(G=1|X)<1. Part (a) of CIA assumes that there are no unmeasured confounders. If vector X does not include all important confounders (those related both to the outcome and the treatment) (a) is not satisfied and causal inferences might be biased. Unfortunately, this is the untestable part of CIA and needs to be assumed. Part (b) of CIA implies that there must be an overlap between treated and non-treated groups with regards to their characteristics on vector X which means that any person with any characteristic on X within treatment group will have their counterpart in non treatment group 1 . In contrary to (a), part (b) of CIA concerning the overlap of treated and nontreated is testable and can be made by comparing distributions of propensity scores for both groups. We have done this by graphical examination of propensity scores estimated for both treatment groups (see figures S2-S7). This assumption was fulfilled to a satisfactory extent since the differences we observed in the distributions of propensity scores in treated and non-treated groups were minor. It should be also added here that since in the current paper we relied on ATT, both parts of CIA are adopted in their weaker, less stringent, versions, This, weaker version of CIA requires for (a) only , ⊥ | to hold (which remains untestable) and for (b) that cases from the treatment group will have their counterparts in the untreated group (but the overlap in the opposite direction is not necessary now). As shown in figures S2-S7 this milder version of the assumption generally held in the data. After applying methods using propensity scores the assessment of balance achieved between both compared groups is the final diagnostic step. It is also notable that the assessment of balance obtained as a result of applying inverse propensity score weighting returned satisfactory results.  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59   when the study children were 24 months. The TTS comprises statements describing specific behaviours and mothers were asked to rate how often their child behaves in that way on a scale ranging from 1 (almost never) to 6 (almost always). The scale comprises nine temperament traits, but we restricted our analysis to five traits we found were associated with bedwetting  Assessing the adequacy of the propensity score models       1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47      State specific objectives, including any prespecified hypotheses 5