Article Text

Cohort profile
Cohort profile: pathways to care among people with disorders of sex development (DSD)
  1. Michael Goodman1,
  2. Rami Yacoub1,
  3. Darios Getahun2,3,
  4. Courtney E McCracken4,
  5. Suma Vupputuri5,
  6. Timothy L Lash1,6,
  7. Douglas Roblin5,
  8. Richard Contreras2,
  9. Lee Cromwell4,
  10. Melissa D Gardner7,
  11. Trenton Hoffman1,
  12. Haihong Hu5,
  13. Theresa M Im2,
  14. Radhika Prakash Asrani1,
  15. Brandi Robinson4,
  16. Fagen Xie2,
  17. Rebecca Nash1,
  18. Qi Zhang1,
  19. Sadaf A Bhai1,
  20. Kripa Venkatakrishnan1,
  21. Bethany Stoller1,
  22. Yijun Liu1,
  23. Cricket Gullickson1,
  24. Maaz Ahmed1,
  25. David Rink1,
  26. Ava Voss1,
  27. Hye-Lee Jung1,
  28. Jin Kim1,
  29. Peter A Lee8,
  30. David E Sandberg7
  1. 1Epidemiology, Rollins School of Public Health, Atlanta, Georgia, USA
  2. 2Research and Evaluation, Kaiser Permanente Southern California, Pasadena, California, USA
  3. 3Health Systems Science, Kaiser Permanente Bernard J Tyson School of Medicine, Pasadena, California, USA
  4. 4Center for Research and Evaluation, Kaiser Permanente Georgia, Atlanta, Georgia, USA
  5. 5Mid-Atlantic Permanente Research Institute, Kaiser Permanente, Rockville, Maryland, USA
  6. 6Aarhus Universitet, Aarhus, Midtjylland, Denmark
  7. 7Susan B Meister Child Health and Evaluation Research Center, University of Michigan Medical School, Ann Arbor, Michigan, USA
  8. 8Division of Endocrinology, Department of Pediatrics, Penn State College of Medicine, Hershey, Pennsylvania, USA
  1. Correspondence to Dr Michael Goodman; mgoodm2{at}


Purpose The ‘DSD Pathways’ study was initiated to assess health status and patterns of care among people enrolled in large integrated healthcare systems and diagnosed with conditions comprising the broad category of disorders (differences) of sex development (DSD). The objectives of this communication are to describe methods of cohort ascertainment for two specific DSD conditions—classic congenital adrenal hyperplasia with 46,XX karyotype (46,XX CAH) and complete androgen insensitivity syndrome (CAIS).

Participants Using electronic health records we developed an algorithm that combined diagnostic codes, clinical notes, laboratory data and pharmacy records to assign each cohort candidate a ‘strength-of-evidence’ score supporting the diagnosis of interest. A sample of cohort candidates underwent a review of the full medical record to determine the score cutoffs for final cohort validation.

Findings to date Among 5404 classic 46,XX CAH cohort candidates the strength-of-evidence scores ranged between 0 and 10. Based on sample validation, the eligibility cut-off for full review was set at the strength-of-evidence score of ≥7 among children under the age of 8 years and ≥8 among older cohort candidates. The final validation of all cohort candidates who met the cut-off criteria identified 115 persons with classic 46,XX CAH. The strength-of-evidence scores among 648 CAIS cohort candidates ranged from 2 to 10. There were no confirmed CAIS cases among cohort candidates with scores <6. The in-depth medical record review for candidates with scores ≥6 identified 61 confirmed cases of CAIS.

Future plans As the first cohort of this type, the DSD Pathways study is well-positioned to fill existing knowledge gaps related to management and outcomes in this heterogeneous population. Analyses will examine diagnostic and referral patterns, adherence to care recommendations and physical and mental health morbidities examined through comparisons of DSD and reference populations and analyses of health status across DSD categories.

  • general endocrinology
  • paediatric endocrinology
  • epidemiology
  • sexual medicine

Data availability statement

Data sharing not applicable as no data sets generated and/or analysed for this study. Once the initial data analyses are complete, we will be open to collaborations with outside investigators as permitted by the Institutional Review Boards (IRBs) of participating sites as well as by local, state and Federal laws and regulations. In particular, we will encourage collaborations with researchers whose expertise is under-represented on our research team. To become a collaborator, a researcher will be required to submit an application, which will undergo both a scientific and an IRB review. In view of the complexity of the database, interested investigators will be asked to form a collaborative arrangement with the DSD Pathways investigators rather than simply receive the data themselves. No additional data are available.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • Study strengths include systematic cohort identification without a need for participant opt-in and comprehensive ascertainment of diagnostic workup and treatments received at large integrated health systems.

  • The main limitation of the study is dearth of information on care received outside of the participating health plans.

  • An additional limitation is lack of data on patient-reported outcomes not captured in the health records.


Overview of DSD conditions

Disorders of sex development (DSD) are a heterogeneous group of congenital medical conditions characterised by atypical development of chromosomal, gonadal or anatomical sex.1–3 Whereas the acronym ‘DSD’ is typically used in medical practice to denote ‘disorders of sex development’ an alternative term is ‘differences of sex development’. In addition, individuals with these conditions may reject the term DSD in favour of the specific diagnosis, if available (eg, congenital adrenal hyperplasia) or instead prefer to self-identify as ‘intersex’.

The pathogenesis of DSD often involves a departure from typical sex determination or sex differentiation. Sex determination is the process whereby the bipotential gonad develops into a testis or an ovary.4–11 Sex differentiation is subsequently dependent on appropriately functioning gonads and responsiveness of tissue to hormone action. In males, sex differentiation involves regression of müllerian structures, stabilisation of wolffian structures, masculinisation of the external genitalia and descent of the testes to the scrotum. In females, ovarian development is associated with the absence of anti-müllerian hormone and testosterone synthesis, resulting in differentiation of the müllerian ducts into the internal female genitalia and the upper third of the vagina. The wolffian ducts regress in the absence of testosterone. When the genetic or hormonal mechanisms responsible for these processes are disrupted, the chromosomal, gonadal or anatomical characteristics of an organism become incongruent, resulting in a DSD.12

The current classification divides DSD into three main groups: (1) Sex Chromosome DSD, including various forms of sex chromosome aneuploidy or sex chromosomal mosaicism; (2) 46,XX DSD, involving disorders of ovarian development, androgen excess or non-hormonal DSD with a female-typical karyotype; and (3) 46,XY DSD, encompassing disorders of testicular development, androgen synthesis or action, and non-hormonal DSD in people with a male-typical karyotype.13

The most common cause of hormone-mediated virilizing 46,XX DSD is classic congenital adrenal hyperplasia (CAH), an autosomal recessive condition with prevalence of about 1:14 000–18 000 live births, which is characterised by impaired biosynthesis of cortisol, most commonly due to congenital 21-hydroxylase deficiency.14 In 75% of those with a severe enzyme defect, deficiency in the production of cortisol is accompanied by a deficit in aldosterone, the salt-retaining hormone; this form of CAH is life-threatening due to potential hypovolaemia and shock.15 The 21-hydroxylase deficiency also results in an accumulation of cortisol precursors that are diverted to excess androgen biosynthesis.16 ‘Backdoor’ pathways resulting in androgen excess have also been described;17 however, the relevance of this pathway to classic 46,XX CAH is not clear. The features of classic 46,XX CAH that are responsible for its categorisation as a DSD are a urogenital sinus, varying degrees of clitoromegaly and labioscrotal fusion in women.18 19 By contrast somatic sex development in men with classic CAH is not affected. The 11β-hydroxylase deficiency is responsible for CAH in approximately 5% of the patients. Although 11β-hydroxylase and 21-hydroxylase deficiencies are similar with respect to their effects on somatic development in females, 11β-hydroxylase deficiency is characterised by a tendency for salt retention and hypertension.20

A well-known example of 46,XY DSD is androgen insensitivity syndrome (AIS), which has prevalence of about 1:20 400 to 1:99 000 individuals.21 AIS is an X-linked disorder that affects persons with 46,XY karyotype and normal production of androgens.22 AIS is a consequence of genetic variants impairing the androgen receptor (AR) function. The most extreme case of AIS is complete androgen insensitivity syndrome (CAIS), which presents as a female phenotype with primary amenorrhoea in adolescence, or inguinal swellings (resembling bilateral hernia) in infancy.23

Knowledge gaps and challenges in DSD research

The recommendations for DSD management were first published approximately 15 years ago.3 24 Although the main principles of DSD care outlined in the original recommendations remain largely unchanged,1 25 their implementation has not been investigated in population-based studies. For example, it is recommended that the DSD diagnostic workup should begin with karyotype testing to determine the individual’s sex chromosome complement, followed by next-generation sequencing to identify genetic variants indicative of specific DSD diagnoses.4 26–30 Another example of current recommendations for evaluation and management of patients with DSD is involvement of a multidisciplinary team of providers representing diverse areas of expertise, including endocrinology, urology, gynaecology, genetics and mental health.31 32 The extent to which these recommendations are followed in day-to-day practice is not known due to the lack of large-scale studies investigating the types of diagnostic workup and patterns of care in an unselected set of patients with DSD in the USA.

The relative paucity of large-scale data leaves considerable room for controversy related to the application of the principles of DSD care outlined in the current guidelines.1 25 For example, whereas initial surgeries for DSD conditions characterised by atypical genitalia are commonly done in early childhood with the goal of achieving ‘gender-validating’ appearance and function,33 the point of view that such procedures should be delayed to allow patient participation in treatment decisions is receiving increasing consideration.34–38 Current literature indicates that initial gender assignment in patients with DSD does not guarantee stability of gender identity later in life. According to the available data, virtually all individuals with CAIS,39 and 89% of patients with 46,XX CAH who are raised as girls,40 self-identify as women in adulthood. The remaining 46,XX CAH group is composed of those reared as girls, but who subsequently change their identity, and those born with essentially male genitalia and reared as boys who develop and maintain a male gender identity.41

The current literature reports a number of DSD-related comorbidities, both in early childhood,42–47 and later in life;42 43 48–54 however, the available data are difficult to interpret for several reasons. First, the overall sample sizes in published studies are too small to allow assessment of comorbidities associated with specific DSD and are focused on relatively few diagnoses, specifically Klinefelter and Turner syndromes and 46,XX CAH. Second, people affected by DSD represent a hard-to-reach population, and to date most existing studies were assembled at specialised clinics. Although this approach provides good options for detailed data collection, the identification of study participants depends on referral routes and may exclude individuals who received care outside of established clinical centres. Third, recruitment for such studies relies on agreement from clinicians and requires participation opt-in. Finally, and perhaps most importantly, a determination of whether the presence of DSD affects the risk of other conditions is not possible due to a lack of comparisons with similar non-DSD populations. For all of the above reasons, more comprehensive evaluations of specific comorbidities using large cohorts of patients with DSD of different ages and matched reference groups from the same population base are required to fill the existing knowledge gaps.

Objectives of the present study

The ‘DSD Pathways’ study was designed to examine patterns of care and to address the existing knowledge gaps, using data from a cohort of patients with DSD identified among members of three large integrated healthcare systems: Kaiser Permanente Southern California (KPSC), Kaiser Permanente Georgia (KPGA) and Kaiser Permanente Mid-Atlantic States (KPMAS). The study uses data from KPSC, KPGA and KPMAS to address three specific areas of importance in DSD research: (1) patterns and guideline-concordance of care; (2) controversies in treatment; and 3) comorbidities and long-term health outcomes.

The present paper describes the main elements of the DSD Pathways study design, outlines methods of cohort ascertainment and data collection and discusses lessons learnt during the implementation of the early stages of this ongoing project. In this ‘cohort profile’ communication we offer detailed documentation of approaches used to assemble, validate and characterise the analytic cohorts for two specific DSD conditions of interest: classic 46,XX CAH and CAIS. We also offer an overall description of each of the two study populations that will provide data for a multitude of subsequent hypothesis-testing studies.

Cohort description

Study design and setting

The DSD Pathways study is an electronic health record (EHR)-based retrospective/prospective cohort study of persons affected by different types of DSD and enrolled at three participating sites: KPSC, which covers 12 counties across the Santa Barbara–Los Angeles–San Diego area; KPGA, which includes residents of Metro Atlanta and surrounding counties; and KPMAS, which operates in Maryland, Virginia and the District of Columbia. These health systems represent a geographically and demographically diverse population of over 5 million members. For example, 53% of KPGA enrollees and a large proportion (39%) of KPMAS enrollees, but only 8% of the KPSC enrollees, identify as non-Hispanic black. By contrast, the proportion of enrollees identifying as Hispanic ranges from 38% at KPSC to 5% at KPGA. Individuals and their families may become members of KP through an employer, through state or federal programmes such as Medicaid and Medicare or directly. The populations of KP enrollees have been shown to broadly represent their corresponding communities.55 56

The participating organisations are members of several research consortia including the Health Care Systems Research Network57 and the Mental Health Research Network.58 They share similarly structured databases termed ‘Virtual Data Warehouses’ with common data tables stored behind security firewalls at each site. The tables assign identical variable names and formats, which allows creating pooled analytical data sets59 and constructing EHR-based historical and prospective cohorts.60

The study is conducted in partnership with Emory University and the University of Michigan. Emory University serves as the data-coordinating centre whereas the University of Michigan provides insight into the multitude of scientific and clinical issues specific to DSD research.

Identification of CAH and CAIS cohort candidates

Figure 1 shows the four-step algorithm used to identify candidates for inclusion in the DSD Pathways cohort. In Step 1, a SAS programme (SAS institute, Cary, North Carolina, USA) was used to search the EHR of KPGA, KPSC and KPMAS members of all ages enrolled between 1 January 1988 and 31 December 2017 to identify two types of evidence supporting DSD status: presence of specific keywords in free-text clinical notes (available since 2006), and relevant International Classification of Diseases 9th and 10th edition (ICD-9 and ICD-10) codes that are available from the late 1980s (online supplemental tables 1–3). All members of participating health plans who had at least one diagnostic code or keyword of interest were included in the initial group of DSD cohort candidates.

Figure 1

Ascertainment of classic 46,XX CAH and CAIS cohort candidates The figure depicts application of sequential steps comprising the case identification algorithm. CAH, congenital adrenal hyperplasia; CAIS, complete androgen insensitivity syndrome; DSD, disorders (differences) of sex development; EHR, electronic health record; ICD-9/ICD-10, International Classification of Diseases 9th and 10th edition.

The initial list of all possible cohort candidates was then used for Step 2, which involved a more targeted search focusing on two conditions of interest: classic 46,XX CAH and CAIS. The keywords used to identify candidates for inclusion in each of the three cohorts are listed in table 1.

Table 1

Diagnostic codes and keywords used to identify classic 46,XX CAH and CAIS cohort candidates

Table 2

Strength-of-evidence scoring of classic 46,XX CAH and CAIS cohort eligibility

In Step 3, a separate programme extracted de-identified strings of text that included 100 characters before and 50 characters after each keyword of interest. Each text string was examined by two trained reviewers whose task was to confirm that the keywords were used to identify the condition of interest in the patient in question. In performing this task, the reviewers were instructed to characterise each candidate as ‘eligible’, ‘possibly eligible’ or ‘not eligible’ for inclusion in the condition-specific cohort. The criteria for eligibility assignment are included in online supplemental tables 4 and 5. Following initial assessment of eligibility, disagreements among reviewers were adjudicated by a committee that included the project coordinator (RY) and three investigators (MGo, DES and PAL).

The final, Step 4, of cohort identification involved another round of linkages with EHR data to obtain additional evidence supporting the condition of interest (figure 1). For CAH, this supporting evidence included relevant ICD codes, laboratory confirmation of disease status (ie, 17-hydroxyprogesterone (17-OHP) level above 2000 ng/dL), pharmacy records consistent with glucocorticoid replacement therapy and examination of text strings containing keywords indicative of masculinised genitalia (online supplemental table 6). For CAIS, supporting evidence included 46,XY karyotype ascertained from laboratory reports or additional text string review, relevant ICD codes and genetic testing for AR variants as documented in the EHR. All additional text string reviews in Step 4 were carried out using the same approach as in Step 3.

Summary of evidence on cohort eligibility

Following the four-step data collection, information on each cohort candidate was integrated to summarise the strength-of-evidence in support of the condition in question. For classic 46,XX CAH and CAIS, each data element was assigned a score ranging from 0 to 2 points in order of increasing strength-of-evidence. The points across data elements were then summed to obtain the overall ‘strength-of-evidence’ score, which ranged from 0 to 10 in order of increasing certainty regarding the diagnosis of interest (table 2).

The classic 46,XX CAH cohort candidates received a maximum of 10 points (2 points for each of the following data elements): (1) the initial text string review designated this person as having CAH based on clinical note excerpts; (2) the records included at least one diagnostic code specific to CAH (eg, ICD-10 code E25.0); (3) the second text string review confirmed presence or history of masculinised genitalia; (4) the laboratory reports included at least one 17-OHP level above 2000 ng/dL; and (5) pharmacy records indicated chronic use of oral glucocorticoids (such as hydrocortisone) most commonly used in CAH treatment.

A similar approach was used when summarising evidence in support of the CAIS diagnosis. The five lines of evidence for CAIS point assignment included: (1) the initial text string review-confirmed AIS diagnosis mentioned in the clinical notes; (2) diagnostic code specific to AIS (eg, ICD-9 code 259.51); (3) a separate text string review confirmed presence of keywords indicating AR variant or evidence of AR genetic testing; (4) presence of 46,XY karyotype documented in clinical notes; and (5) pharmacy records indicated chronic use of feminising hormone therapy most commonly used in CAIS treatment (table 2).

Validation of classic 46,XX CAH and CAIS status

Once the DSD Pathways cohort candidates were assigned a strength-of-evidence score, we performed the final eligibility validation using an in-depth medical chart review. The purpose of validation was to confirm the two diagnoses of interest: classic 46,XX CAH and CAIS. The validation was initially carried out in samples of 10 cases randomly sampled from each strength-of-evidence score stratum. The total validation sample for 46,XX CAH included 110 cohort candidates—10 from each of the 11 score values. The corresponding validation sample for CAIS included 81 cohort candidates—a random sample of 10 each with scores 2–8 and 11 total cohort candidates with scores 9–10 because not all of the highest score values had at least 10 candidates. The proportions of persons with confirmed classic 46,XX CAH or CAIS in each stratum-specific sample was used to identify a score cut-off for validating all cohort candidates. As classic 46,XX CAH is typically diagnosed prior to menarche, sample validation was performed separately for participants under the age of 8 years and those who were older. The sample validation of CAIS cohort was conducted for all ages.

Once the random sample validation study identified a cut-off score below which an in-depth medical records review was deemed futile, all persons with the score at or above the cut-off were included in the final validation. The validation criteria for classic 46,XX CAH and CAIS are included in online supplemental tables 7 and 8.

Patient and public involvement

Involving patients in the conduct of this study was not possible because the data are de-identified and required no patient contact. The overall study design and its objectives were presented at the 2018 meeting of the World Professional Association for Transgender Health (WPATH). WPATH engages a wide range of stakeholders including members of the gender minority community as well as professionals in the fields of medicine, psychology, law, social work and public health.

Findings to date

Initial broad search of the EHR (Step 1) identified 602 693 individuals with at least one diagnostic code or keyword consistent with possible DSD. Following a more specific search of keywords within this population (Step 2), 5404 were 46,XX CAH cohort candidates, and 648 were CAIS cohort candidates.

Among 46,XX CAH cohort candidates, 2499 (46%) were deemed eligible based on the initial text sting review (Step 3). After adding information on ICD codes, genital appearance, laboratory tests and pharmacy records indicative of glucocorticoid therapy (Step 4), the majority of cohort candidates (90%) received a strength-of-evidence score of ≤5, whereas a score of ≥8 was assigned to just 4% of cohort candidates.

Following random sample validation of 110 cases by chart review, no cases of classic 46,XX CAH were identified among persons with strength-of-evidence score of <7 among persons of any age. The sample with a score of 7 contained two classic 46,XX CAH cases among cohort candidates under the age of 8 years, but none in the older age group. Random samples of 10 cases with scores 8 through 10 produced four, five and eight confirmed classic 46,XX CAH cases, respectively.

Based on the results of random sample validation, the eligibility cut-off for full validation was set at the strength-of-evidence score of ≥7 among children under the age of 8 years and ≥8 among older cohort candidates. The final validation of all cohort candidates who met the above criteria identified a total of 115 classic 46,XX CAH cases (table 3). The positive predictive values (95% CIs) for strength-of-evidence scores 7 through 10 were 0.16 (95% CI 0.06 to 0.33), 0.33 (95% CI 0.23 to 0.43), 0.48 (95% CI 0.36 to 0.61) and 0.87 (95% CI 0.76 to 0.93), respectively; and the overall positive predictive value of the EHR search algorithm with scores in the range 0 through 10 was 0.37 (95% CI 0.32 to 0.43).

Table 3

Cohort eligibility by ‘strength-of-evidence’ score using cutoffs from sample validation

The corresponding sample validation for CAIS produced no confirmed cases among 50 cohort candidates with strength-of-evidence scores <6. The in-depth medical record review for all 162 candidates with scores ≥6 identified 61 confirmed cases of CAIS. The overall positive predictive value of the search algorithm for those with scores 2 through 10 was 0.30 (95% CI 0.24 o 0.37). As the strength-of-evidence score increased from the cut-off value of 6 to the maximum of 10, so did the corresponding positive predictive value for CAIS (table 3).

Table 4 summarises the characteristics of the 46,XX CAH and CAIS cohorts. The majority of confirmed cohort members (84%) were from the KPSC site. Among classic 46,XX CAH participants 27% were non-Hispanic white, 11% were non-Hispanic black and 47% were Hispanic and for 15% race/ethnicity was characterised as Asian/Pacific Islander, Native American, ‘mixed’ or ‘unknown’ (these groups are reported together to avoid presenting numbers <5). The corresponding proportions among CAIS cohort members were 23% for non-Hispanic white, 10% for non-Hispanic black, 43% for Hispanic and about 25% for the category other/mixed or unknown. With respect to calendar year of first DSD evidence, nearly 54% of the participants with classic 46,XX CAH and 84% of CAIS cohort members were first identified after 2006 when full-text EHR data became available. At the time of the first documented evidence of the condition of interest (index date), 54% of patients with classic 46,XX CAH and only 8% of CAIS study participants were between ages 0 and 7 years. By contrast, the proportions of cohort members who were over the age of 25 years at index date were substantially lower in the classic 46,XX CAH group (16%) than in the CAIS group (54%).

Table 4

Characteristics of the classic 46,XX CAH and CAIS cohorts

Next steps and future directions

Ascertainment of other DSD conditions

With ascertainment of the classic 46,XX CAH and CAIS cohorts completed, we will turn our attention to other DSD diagnoses, including various sex chromosome anomalies and 46,XY DSD (other than CAIS). The most common examples of sex chromosome DSD (SC-DSD) are Turner (45,X and variants) and Klinefelter (46,XXY and variants) syndromes, with prevalence of 1:2000–2500 live female birth and 1:500–1000 of live male births, respectively.42 61 SC-DSD may also present with cell line mosaicism where karyotype differs from cell to cell (eg, 45,X/46,XY—mixed gonadal dysgenesis, ovotesticular DSD; 46,XX/46/XY—chimeric, ovotesticular DSD).62 63

The approach for SC-DSD will be somewhat different from that used to assign the strength-of-evidence score for 46,XX CAH and CAIS. In determining eligibility for inclusion in the SC-DSD cohort, greater weight will be assigned to the karyotype information as documented in the EHR. Conversely, any cohort candidate whose EHR indicate an unequivocally normal 46,XX or 46,XY karyotype will be excluded from further consideration, regardless of other lines of evidence.

46,XY DSD of interest (other than CAIS) include a variety of conditions developing as a consequence of disorders of testicular development or androgen synthesis or action.64–66 Disorders of testicular development present on a spectrum. In complete testicular dysgenesis (Swyer syndrome), the person presents with female-typical external genitalia and internal reproductive structures. In partial testicular dysgenesis, the phenotype ranges from clitoromegaly to ambiguous genitalia to isolated hypospadias. Remnants of the müllerian duct may also persist. Impaired metabolism of androgens due to enzyme deficiencies (eg, 5α-reductase type 2 deficiency67 or 17β-hydroxysteroid dehydrogenase type 3 deficiency68 69) result in incomplete masculinisation of the external genitalia. These conditions are variably expressed, ranging from typical female external genitalia to a phallic structure with varying degrees of hypospadias, but because testicular production of anti-müllerian hormone by Sertoli cells remains intact, the müllerian ducts are absent.70–75 In contrast to CAIS, in which the individual is born with female-typical external genitalia, the presentation of partial androgen insensitivity syndrome is highly variable and may include penoscrotal hypospadias, micropenis and bifid scrotum.76–78

In the process of validating the CAIS cohort, we identified a number of patients with 46,XY karyotype who presented with genital atypia, potentially indicative of a DSD. We also performed a separate search relying on keywords indicative of genital atypia such as penoscrotal hypospadias or non-specific ‘ambiguous genitalia’ documented in the health records. Many of these patients were categorised as ‘46,XY DSD’, but further characterisation of their underlying condition will require additional in-depth review.

Selection of the reference cohorts and data integration

Selection of reference groups will depend largely on the DSD category under investigation. We expect that all DSD cohort members will be matched to up to 10 male and 10 female KP enrollees without evidence of DSD status.

Using the previously described approach,79 referents will be matched to each member of the final validated DSD cohort on year of birth (within 5-year groups for adults and 2-year groups for children and adolescents), race/ethnicity, KP site and membership year at the ‘index date’. For persons with classic 46,XX CAH and CAIS, index date is defined as the date of the first recorded evidence of DSD status in the EHR. To ensure comparable follow-up, members of the referent cohorts will only be included if they are enrolled on that day. A cluster ID for each matched group will be assigned to allow stratified analyses (eg, by DSD subtype or treatment received). In addition, we will consider matching patients with DSD with individuals who have other chronic conditions (eg, type 1 diabetes mellitus as a reference category for 46,XX CAH cohort) requiring routine evaluations and daily treatment. Another potentially informative reference group will be transgender people identified in one of our ongoing EHR-based studies.79 80

Patient identification numbers for both the DSD and the reference cohorts will be linked to multiple data sources to obtain ICD-9 and ICD-10 diagnostic codes for non-DSD comorbidities and healthcare utilisation. The pathways to care among patients with DSD will be examined through linkages to surgical history with corresponding pathology reports, diagnostic and imaging procedures, specialist visits and pharmacy records indicating hormone replacement regimens (box 1). These data will allow us to determine, for example, if the DSD study participants underwent evaluation and treatment by an interdisciplinary team.

Box 1

Data available for DSD cohorts

Data categories and specific elements

  • Demographic and membership characteristics.

  • Age, sex and race/ethnicity.

  • Health plan site.

  • Area-based SES factors.

  • Enrolment/disenrolment intervals.

  • Insurance plan type.

General health indicators

  • Height/weight (BMI).

  • Smoking status.

  • Comorbidities.

Surgical procedures

  • CPT and/or ICD code.

  • Date of procedure.

  • Pathology report.

  • History of procedures (clinical notes).

Pharmacy records (hormone therapy, psych medications)

  • Medication prescribed.

  • Filled prescription for medication.

  • Dose.

  • Form.

  • Dates of prescription and fill.

  • Number of refills.

Visit-associated diagnoses

  • Neurological problems.

  • CVD.

  • Renal diseases.

  • Endocrine problems.

  • Mental health problems.

Cancer diagnoses

  • Stage.

  • Site.

  • Histology.

  • Date of diagnosis.

Laboratory results

  • Laboratory test.

  • Value.

  • Date.

Vital status

  • Date of death.

  • Cause of death.

  • BMI, body mass index; CPT, current procedural terminology; CVD, cardiovascular disease; DSD, disorders (differences) of sex development; ICD, International Classification of Diseases; SES, socioeconomic status.

Strengths and limitations

In this communication, we describe DSD Pathways, an ongoing observational study that to-date includes 115 persons with classic 46,XX CAH and 61 individuals with CAIS. This health system EHR-based study is designed to examine the health status of people living with various types of DSD and to evaluate care receipt, and the possible risks and benefits of this care.

The DSD Pathways study aims to overcome four previously described methodological challenges facing DSD health research: (1) relatively low incidence and prevalence resulting in small samples and low statistical power; (2) lack of population-based sampling frame, which precludes unbiased selection of study participants; (3) difficulty of systematic case ascertainment in population-based studies; and (4) limited understanding of real-life DSD care in a community setting. Each of these challenges, and the related strengths and weaknesses of the DSD Pathways study are discussed below.

Sample size and power

Adequate sample size can be feasibly achieved with the use of large well-defined populations that offer an adequate sampling frame. In practical terms, at least in the USA, this can be done by basing the study in large integrated health systems with millions of members and comprehensive EHR.79 The EHR data from the health systems allow assembling cohorts of hard-to-reach populations and ample options for selection of referent groups.

The DSD Pathways cohort will likely represent one of the largest studies of its kind available to date. Nevertheless, important analyses (e.g. according to rare subtypes of DSD), may not be feasible due to sparse stratum-specific data.

Sampling frame

A distinguishing feature of the DSD Pathways study is its ability to create a cohort nested within a large community-based health plan. The use of EHR data ensures that all eligible individuals are included in the analyses, as participation does not require subject opt-in and is not dependent on referral patterns. The well-defined source population also allows selecting matched reference cohorts of people who have the same access to care, have the same demographic characteristics and reside in the same geographical areas, as well as possibly living with non-DSD conditions requiring similar continuous evaluation and treatment. On the other hand, the EHR-based design of this study means that participants are identified at different ages and with variable follow-up depending on their enrolment in and disenrolment from the KP plans.

DSD ascertainment

We demonstrated that by using standard codes, supplemented with analysis of digitised provider notes, it is possible to comprehensively identify patients with DSD among people enrolled in participating health plans. The use of keyword-containing text strings enhanced validity of cohort ascertainment relative to the ICD code-only based approaches. In conducting cohort ascertainment, we reviewed up to three clinical note excerpts on 6052 people and performed full-record validation of DSD status for 512 cohort candidates. This review required considerable time and resources, but it is still more efficient and more comprehensive than the traditional unstructured chart review. A more efficient way of accomplishing this task may use natural language processing (NLP). We have successfully applied NLP when searching for transgender KP members;81 however, a similar search for persons with DSD is more challenging due to the heterogeneity of conditions and diverse terminology.

In performing cohort ascertainment, we sought to reduce the likelihood of including false positive cohort candidates. This approach likely excluded some of the eligible patients with insufficient or incomplete evidence of the diagnosis in question. As a result, it is possible that some of the cohort candidates who received a strength-of-evidence score below the validation cut-off were missed. We justified this approach based on the consideration that high specificity should take precedence over sensitivity if the goal is to reduce threats to internal validity.82

Assessment of real-life care

Although the data on diagnostic evaluation and treatment received within the KP system is high quality, one of the main limitations of DSD pathways data is the relative paucity of information on care received outside the KP system. We attempted to address this limitation by obtaining as much information as possible from the free-text notes. For example, when a karyotype analysis report was not available, we conducted free-text search to identify instances when karyotype is mentioned in the notes. The broadening of EHR data collection at KP now offers an opportunity to access records both within and outside the participating health plans. As this data capture was implemented relatively recently, it will be important to continue expanding the cohort to include more recent years and to extend the follow-up of current participants.


Although the body of literature addressing health issues facing persons with DSD has been growing, due in large part to the development of clinical research networks,83 84 limited data are available on the general health status or the pathways to care in an unselected population of patients with DSD. To date, most data on morbidity and care outcomes in DSD populations come from specialised centres.85–100 Of those, the largest studies are based in Europe;94–100 whereas US clinical studies tend to be relatively small.88–93 Although these studies are characterised by high-quality data, they are dependent on referral patterns without a defined sampling frame. For this reason, the DSD Pathways study is well positioned to fill existing knowledge gaps and make important contributions to the current literature.

We recognise that a DSD cohort identified through an integrated healthcare system may not have comprehensive clinical diagnostic and treatment information on each study participant. Weighing against this concern is the demonstrated ability to collect real-world data on a large cohort of DSD subjects and referents obtained from the same underlying population. Moreover, as KP provides ‘one-stop’ delivery of care, the likelihood of capturing full details of DSD care is increased.

Lessons learnt while conducting this project may provide direction for future DSD research. The methodology can be implemented at other healthcare institutions with EHR, particularly in organisations participating in the Health Care Systems Research Network that is based on the total population of almost 20 million.101 102 With extended follow-up and expanded cohort size, the data will permit additional analyses of rare health endpoints across a wider range of diagnostic and therapeutic interventions.

Data availability statement

Data sharing not applicable as no data sets generated and/or analysed for this study. Once the initial data analyses are complete, we will be open to collaborations with outside investigators as permitted by the Institutional Review Boards (IRBs) of participating sites as well as by local, state and Federal laws and regulations. In particular, we will encourage collaborations with researchers whose expertise is under-represented on our research team. To become a collaborator, a researcher will be required to submit an application, which will undergo both a scientific and an IRB review. In view of the complexity of the database, interested investigators will be asked to form a collaborative arrangement with the DSD Pathways investigators rather than simply receive the data themselves. No additional data are available.

Ethics statements

Patient consent for publication

Ethics approval

All activities described in this manuscript were reviewed and approved by the Institutional Review Boards (IRB) of the participating institutions with waived requirement for informed consent.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Contributors MGo is the author responsible for the overall content as the guarantor. DES and MGo prepared the original draft of the manuscript. RN, TH, QZ and RPA conducted data analyses and put together tables and figures. LC, RC, FX and HH were responsible for the preparation and application of data collection programmes and ascertainment of study variables. DG, CEM and SV led study implementation at participating Kaiser Permanente sites and were actively involved in study planning and design. RY, MGa, BR and TMI were responsible for the day-to-day project management at each site and especially record retrieval and validation of cohort eligibility. PAL served as a paediatric endocrinology consultant and offered expertise in the development of the study algorithms and in validation of cohort eligibility. DRo and TLL provided methodological input on various aspects of study design, including identification of sources of bias, and ways of addressing threats to validity. SAB, KV, BS, YL, CG, MA, DRi, AV, H-LJ and JK were responsible to review and categorisation of free-text notes. All authors provided critical review of the manuscript for important intellectual content and approved the final version.

  • Funding This research was supported by the Grant R01HD092595 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development.

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.