Article Text

Download PDFPDF

Validation of intellectual disability coding through hospital morbidity records using an intellectual disability population-based database in Western Australia
  1. Jenny Bourke,
  2. Kingsley Wong,
  3. Helen Leonard
  1. Department of Epidemiology, Telethon Kids Institute, University of Western Australia, Perth, Australia
  1. Correspondence to Dr Helen Leonard; helen.leonard{at}


Objectives To investigate how well intellectual disability (ID) can be ascertained using hospital morbidity data compared with a population-based data source.

Design, setting and participants All children born in 1983–2010 with a hospital admission in the Western Australian Hospital Morbidity Data System (HMDS) were linked with the Western Australian Intellectual Disability Exploring Answers (IDEA) database. The International Classification of Diseases hospital codes consistent with ID were also identified.

Main outcome measures The characteristics of those children identified with ID through either or both sources were investigated.

Results Of the 488 905 individuals in the study, 10 218 (2.1%) were identified with ID in either IDEA or HMDS with 1435 (14.0%) individuals identified in both databases, 8305 (81.3%) unique to the IDEA database and 478 (4.7%) unique to the HMDS dataset only. Of those unique to the HMDS dataset, about a quarter (n=124) had died before 1 year of age and most of these (75%) before 1 month. Children with ID who were also coded as such in the HMDS data were more likely to be aged under 1 year, female, non-Aboriginal and have a severe level of ID, compared with those not coded in the HMDS data. The sensitivity of using HMDS to identify ID was 14.7%, whereas the specificity was much higher at 99.9%.

Conclusion Hospital morbidity data are not a reliable source for identifying ID within a population, and epidemiological researchers need to take these findings into account in their study design.

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • The greatest strength of this study was the availability of a population-based source of intellectual disability (ID).

  • The statewide data linkage system allowed this database to be linked to other population datasets such as hospital morbidity.

  • Through data linkage, the study was able to investigate characteristics of children known to have ID by whether or not they were not identified with ID within hospital morbidity data.

  • One limitation is that for some conditions associated with ID and used to identify ID in hospital codes, not all children will necessarily meet the criteria for ID.

  • The International Classification of Diseases (ICD)-9/10 coding system is limited in its provision of delineation of some genetic syndromes; however, the integration of Orphanet coding into ICD-11 will allow many more genetic ID syndromes to be specifically identified in hospital morbidity data.


Intellectual disability (ID) is characterised by globally impaired cognitive functioning and significant deficits in adaptive functioning, manifest before the age of 18 years.1 Comorbid medical or psychiatric conditions are common in people with ID,2 leading to increased hospitalisations. The increased risk of admission has been shown to range from twofold for those with ID associated with autism up to 10-fold for those with severe ID.3 For conditions typically managed through ambulatory (outpatient) care, people with ID have been shown to have a sixfold increase in risk of hospitalisations compared with those without ID.4 Epilepsy is one of the most common health conditions in this population with a prevalence of around 20%2 5 and is one of the main reasons for hospital admission.4 Specific disorders consistent with ID such as Down syndrome are often associated with multiple medical conditions (eg, cardiac defects, ear disease and respiratory infections) which often require hospitalisation.6 Mental health disorders are also more prevalent in individuals with ID,7 and hospitalisation is common.4 8

Children and young adults with ID, however, form a heterogeneous group, and reliable population-based cohorts are not often available. Researchers investigating ID may use health data as well as other administrative datasets relating to education or service provision as well as household surveys as their sampling strategy.9 Studies have also used health-related datasets including insurance claims to identify ID and investigate specific causes of hospitalisation in this population.4 10

In Western Australia, the Intellectual Disability Exploring Answers (IDEA) database is a population-based register of children with ID, with ascertainment from both disability service providers and education sources.11 It is a research infrastructure that can be linked to other population datasets such as hospital morbidity data.12 The current study aims to investigate how well the Western Australian Hospital Morbidity Data System (HMDS), which contains all admissions to private and public hospitals, recorded ID using the designated International Classification of Diseases (ICD) codes compared with the IDEA database and thus assess the usefulness of hospitalisation data as a source of ID status.


The study cohort was restricted to children and young adults born between 1983 and 2010 and who were identified with ID in either the HMDS or the IDEA database over this period. Individuals were defined as having an ID in the HMDS if they were assigned any of the following ICD diagnostic codes during hospitalisation: mental retardation (ICD-9-CM 317–319; ICD-10-AM F70–F79), Down syndrome (trisomy 21) (ICD-9-CM 758.0; ICD-10-AM Q90.0–Q90.2, Q90.9), Edwards/Patau syndrome (trisomy 18/13) (ICD-9-CM 758.1, 758.2; ICD-10-AM Q91.0–Q91.7), trisomy 9/8 (ICD-9-CM 758.5; ICD-10-AM Q92.0–Q92.5), chromosomal deletions (ICD-9-CM 758.3; ICD-10-AM Q93.3–Q93.5), fragile X syndrome (ICD-9-CM 759.83; ICD-10-AM Q99.2), neurofibromatosis (ICD-9-CM 237.7; ICD-10-AM Q85.0), tuberous sclerosis (ICD-9-CM 759.5; ICD-10-AM Q85.1), Prader-Willi syndrome (ICD-9-CM 759.81; ICD-10-AM Q87.14) and Marfan syndrome (ICD-9-CM 759.82; ICD-10-AM Q87.4). ICD coding in the hospital morbidity dataset is completed by clinical coders who abstract relevant information from the patient’s medical record and decide which diagnoses and procedures meet the criteria for coding as per Australian and WA Coding Standards.

Individuals diagnosed with an ID in the IDEA database, considered the ‘gold standard’ for ID diagnosis in the Western Australian population, have a confirmed IQ <70 with adaptive behaviour deficits. The IDEA database and the HMDS data were linked to investigate the proportion of children confirmed with ID through IDEA who were also identified as having an ID from any one of their HMDS ICD codes. Maternal race (Aboriginal or non-Aboriginal), gender (male or female) and date of birth were obtained by linkage to the Midwives’ Notification System. Information on deaths was obtained by linkage to the WA Mortality database, and children and those who had died before 1 year of age were identified.

Age at admission (<1, 1–2, 3–5, 6–12 and >12 years), gender (male or female), race (non-Aboriginal or Aboriginal) and level of ID (mild or moderate or severe) of individuals with an ID in the IDEA dataset were compared between those who were and were not identified in the HMDS. The main cause of ID was determined by medical personnel at the Disability Services Commission from medical records and recorded in the IDEA database using the Heber codes.13 Cases with no information on cause of ID were assigned as ‘unassessed’. The main cause was further grouped into broad categories based on biomedical or other causes14 in order to investigate whether the cause of ID differed between those identified and not identified with ID from the ICD codes in the HMDS dataset. Categorical variables were reported as proportions and compared using the Pearson’s Χ2 test for independence. Analyses were performed using STATA V.13.1.


A total of 1 548 478 records representing admissions for 488 905 individuals were identified. Among them, 10 218 (2.1%) were identified as having an ID and 478 687 (97.9%) cases as not having ID in either the HMDS or the IDEA database. Those children known to IDEA who were hospitalised (n=9740) represented 92% of all children with an ID in the IDEA database (9740/10593). Of those who were diagnosed with ID, 1435 (14.0%) were identified in both, 8305 (81.3%) were unique to the IDEA database and 478 (4.7%) were unique to the HMDS dataset (figure 1). Of all children identified in the HMDS dataset through the ICD codes (N=1913), 75% (n=1435) had their ID confirmed through IDEA. Death before the age of 1 year had occurred in 160/10 218 (1.5%) of the individuals identified with ID in either source with the majority (n=124, 78%) of these unique to HMDS. Limited to those who survived past 1 year of age, the sensitivity of using HMDS to identify ID was 14.6%, whereas the specificity was much higher at 99.9%. The positive and negative predictive values were 79.9% and 98.3%, respectively.

Figure 1

Identification of ID in children born in 1983–2010 and hospitalised in Western Australia using linkage to the IDEA database and the HMDS. HMDS, Hospital Morbidity Data System; ID, intellectual disability; IDEA, Intellectual Disability Exploring Answers.

We compared the characteristics of the 9704 individuals who were registered in the IDEA database and thus known to have an ID, survived past 1 year of age and were admitted to hospital by whether they were identified with ID from the ICD codes in HMDS (table 1).

Table 1

Characteristics of children born between 1983 and 2010 in Western Australia and survived past 1 year of age, who were identified with ID through the IDEA database and admitted to hospital, according to their ID diagnosis status in the HMDS database

Children with ID who were also coded with ID in the HMDS data were more likely to be less than 1 year of age at first admission compared with children with ID not coded in the HMDS data (79.2% vs 68.0%). They were also more likely to be female (44.6% vs 33.8%), be non-Aboriginal (92.2% vs 85.7%) and have a severe level of ID (21.6% vs 6.2%).

Children in the IDEA database with a biomedical cause of their ID were more likely to have also been coded with ID in the HMDS dataset (table 2).

Table 2

Cause of ID as determined in the IDEA database for children who survived to 1 year of age and were either identified/not identified with ID through HMDS codes

The causes in IDEA most likely to have also been identified with ID in any of the HMDS ICD codes were Down syndrome (94.2%), tuberous sclerosis (90.6%), Prader-Willi syndrome (87.0%), neurofibromatosis (70.6%), muscular dystrophy (57.1%) and fragile X (51.6%). Those least likely to have been identified with ID were those with an unassessed cause (2.7%), autism (3.0%) Asperger’s (3.9%), foetal alcohol syndrome (8.0%) and other associated conditions such as intrauterine growth restriction (2.9%) and prematurity (5.6%) (table 2). Additionally, 30% of children who had been identified with any epilepsy diagnosis in the IDEA database, regardless of their main cause of ID diagnosis, were found to be identified with ID in the hospital dataset (not shown in table 2). For the children who were identified through both IDEA and HMDS and survived 1 year of age (n=1412), n=623 had an ICD code for ‘mental retardation’. For the remaining n=789, the consensus of diagnosis between IDEA and the ICD codes for particular disorders was 80%–98% for Down syndrome, trisomy 18/13, trisomy 9/8, chromosomal deletions, fragile X syndrome, tuberous sclerosis and Prader-Willi syndrome; and less for neurofibromatosis (63%) and Marfan syndrome (12.5%).

Children identified with ID in the HMDS dataset who were not in the IDEA database and had survived 1 year were investigated according to the ICD codes used to identify ID in HMDS (table 3).

Table 3

Children born between 1983 and 2010 in Western Australia and were identified with ID through ICD codes in the HMDS database but not identified in the IDEA database, by death status and ID diagnosis in HMDS

The majority of those not in IDEA had been assigned an ICD code aligned to mental retardation (n=138, 39.0%), neurofibromatosis (n=79, 22.3%) or Down syndrome (n=45, 12.7%) (table 3). Among the 124 (25.9%) individuals who had died before 1 year of age, 75% had died before 1 month, and the majority of diagnoses included trisomy 18/13 (n=80, 64.5%), Down syndrome (n=25, 20.2%) or trisomy 8/9 (n=10, 8.1%). If it is assumed that all additional cases identified through ICD codes but not in the IDEA database did have ID (n=478), then the completeness of ascertainment in IDEA would represent 95.7%. With the assumption that those who died under 1 year would not be able to be ascertained (n=124, of whom the majority died under 1 month), then IDEA would represent 96.8%.


Data from Western Australia suggest that hospital morbidity data may be an inadequate source of identification of ID in epidemiological studies with a sensitivity of only 14%. After removing children who died before 1 year of age, ID of syndromic or monogenic aetiology such as that associated with Down syndrome, neurofibromatosis and fragile X syndrome was most likely also to be identified in hospital sources and ID of unknown cause least likely to be identified. Females and children under 1 year were also more likely to be identified, whereas Aboriginal children and those with a mild–moderate level of ID were less likely to be identified.

The greatest strength of this study was the availability of a population source of ID, the IDEA database which has used both disability service use and education sources to maintain high ascertainment over the last 30 years.15 It has already been used as a data source for multiple data linkage studies investigating determinants16–18 and outcomes3 19 associated with ID. One limitation is the lack of information on cause of ID for those cases ascertained only through education sources, as medical information is obtained through the referral process to disability services. Another limitation is that there are several conditions where only a percentage of children have an ID, in contrast to conditions such as Down syndrome where almost all children are affected. However, for the purposes of this study, we still elected to use the ICD codes for these diagnoses to identify ID in the HMDS in order to capture the maximum possible number of children with ID. Thus, by doing this and assigning ID status to all children with these conditions in hospital morbidity records, we could have overestimated the number with ID. For example, ID is diagnosed in approximately half of individuals with tuberous sclerosis20 and while almost all of those with Prader-Willi syndrome will have cognitive deficits, up to 40% may fall within the borderline range.21 About one-third of children with neurofibromatosis have been reported to have general learning difficulties associated with borderline or lower IQ,22 and children with Marfan syndrome may only have a slightly increased risk of ID.23 Children diagnosed with autism spectrum disorder have been found to have an ID in approximately 30%–60% of cases although this proportion has been shown to be decreasing in more recent years.17 24 25 The effect of removing these conditions from our HMDS search list would have been to slightly increase the sensitivity and positive predictive value of using HMDS to identify ID.

Children with a cause of ID commonly known to be associated with ID, such as Down syndrome or Prader-Willi syndrome, were most likely to be identified with ID in the hospital data, possibly due to the fact that these codes had been specifically designated in the ICD search codes for ID, unlike those for whom no clear cause had been recorded in the IDEA database. The inability of ICD codes to specifically identify relatively rare conditions is also problematic if relying on such identification of ID. For example, Williams syndrome, known to be highly associated with ID,26 is identified with a Q89.8 ICD-10 code which is in itself not specific for Williams syndrome and was not used in our search strategy as it would also identify children possibly without ID such as those with Stickler syndrome. Perhaps as a consequence, children with Williams syndrome were poorly identified as ID in the hospital codes, with only 16% of children being coded as such. Recent versions of ICD-10-AM provide a finer delineation of genetic syndromes and thus allow better differentiation of syndromes with ID from those without the condition. The integration of Orphanet coding into ICD-11 will allow many more genetic ID syndromes to be specifically identified in hospital morbidity data.27 This has become a matter of urgency given the accelerated identification of these genetic causes over the last decade and particularly since the introduction of next generation sequencing.28–30

Many children who would be expected to develop ID by virtue of their diagnosis experience serious and life-threatening comorbidities and as a consequence may die early. As we have shown, about one-third of those not identified in the IDEA database had died, nearly three-quarters before 1 month of age and the majority by 1 year. In these cases, it would be unlikely that families would have sought registration for disability services before their child died, and hence they would not have been included within the IDEA database. The remaining cases identified with ID through the hospital ICD codes but who were not in IDEA represent potential missed ascertainment within IDEA; however, this number is relatively small, effectively reducing the completeness of IDEA to 96% if these cases had met eligibility for inclusion in IDEA. There is the possibility that some of these, most likely those with neurofibromatosis, tuberous sclerosis, Marfan syndrome or Prader-Willi syndrome, may have a milder cognitive deficit and not meet the criteria for ID.

We found one Canadian study which had used hospital morbidity codes to identify ID in at least one patient record in order to form their cohort, but had found that as many as half of the multiple records for these individuals did not code ID as a comorbidity in the hospital morbidity system.8 It was therefore likely that other individuals with ID had been missed from their cohort due to inconsistent coding of ID as a comorbidity. The authors acknowledged that, similar to our own findings, it was likely that those who had been identified with ID were more severe. Linked data studies in New South Wales, Australia have provided further evidence of the need for multiple sources of ascertainment of ID31 using ICD codes for ID within health datasets, as well as disability services, birth and mortality linkages to identify individuals with ID.

Practical considerations for clinical care would suggest that hospital coding which does not include reference to ID as a comorbidity may impact on the way in which service is delivered to this particularly vulnerable population. Better coding practices for ID would enable researchers to investigate directly whether care or procedures are compromised for individuals with ID and facilitate the development of ID-related policies and service planning. The hospital experiences for people with ID, who we know experience higher rates of hospitalisation than the rest of the population,3 have been described as relying heavily on carers for inhospital patient assistance with failure to provide appropriate care and lack of knowledge and discharge planning by medical staff.32

The reliance on hospital morbidity data, as well as other administrative datasets, to identify ID in a population for research purposes has been shown to provide varied results.9 Overall, we would not recommend that researchers use hospital morbidity datasets alone as a source of identification of ID.


Through linkage to a hospital morbidity dataset, this study has shown that hospital data do not adequately identify individuals with ID when compared with the population-based IDEA database. A high proportion of those uniquely identified in hospital morbidity data had died early, or alternatively, they had a condition not necessarily associated with ID. It is important for hospital codes to reflect the ID status of patients, primarily for the benefit of recognising their specific needs, but also for improvement of ascertainment of ID through this source. Clearly with such a high proportion of individuals not being recognised with ID, coding practices which identify ID need to be better implemented.


The authors acknowledge the Disability Services Commission, the Telethon Kids Institute, the Western Australia Department of Education, the Catholic Education Office and the Association of Independent Schools of Western Australia for assistance with data collection for the IDEA database.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.


  • Contributors All authors contributed to the initial design of the manuscript. JB and HL were responsible for the drafting of the manuscript. KW was responsible for analysis and contributed to the writing of the final draft. All authors contributed to the final writing of the manuscript and checked for important intellectual content.

  • Funding This study was partly funded by the Australian National Health and Medical Research Council (NHMRC) programme grant no 572742.

  • Competing interests None declared.

  • Patient consent Not required.

  • Ethics approval The study was reviewed and approved by the government of Western Australia Department of Health, Human Research Ethics Committee (project no 2011/64).

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement Data are only available through ethical approval from the Western Australian Department of Health, Human Research Ethics Committee in collaboration with the authors.