Article Text

Download PDFPDF

Predicting risk of hospitalisation: a retrospective population-based analysis in a paediatric population in Emilia-Romagna, Italy
  1. Daniel Z Louis1,
  2. Clara A Callahan1,
  3. Mary Robeson1,
  4. Mengdan Liu1,
  5. Jacquelyn McRae2,
  6. Joseph S Gonnella1,
  7. Marco Lombardi3,
  8. Vittorio Maio1,2
  1. 1 Center for Medical Research in Medical Education and Health Care, Sidney Kimmel Medical College at Thomas Jefferson University, Philadelphia, Pennsylvania, USA
  2. 2 Jefferson College of Population Health, Thomas Jefferson University, Philadelphia, Pennsylvania, USA
  3. 3 Risk Management and Clinical Governance, Parma Local Health Authority, Parma, Italy
  1. Correspondence to Dr Vittorio Maio; vittorio.maio{at}


Objectives Develop predictive models for a paediatric population that provide information for paediatricians and health authorities to identify children at risk of hospitalisation for conditions that may be impacted through improved patient care.

Design Retrospective healthcare utilisation analysis with multivariable logistic regression models.

Data Demographic information linked with utilisation of health services in the years 2006–2014 was used to predict risk of hospitalisation or death in 2015 using a longitudinal administrative database of 527 458 children aged 1–13 years residing in the Regione Emilia-Romagna (RER), Italy, in 2014.

Outcome measures Models designed to predict risk of hospitalisation or death in 2015 for problems that are potentially avoidable were developed and evaluated using the C-statistic, for calibration to assess performance across levels of predicted risk, and in terms of their sensitivity, specificity and positive predictive value.

Results Of the 527 458 children residing in RER in 2014, 6391 children (1.21%) were hospitalised for selected conditions or died in 2015. 49 486 children (9.4%) of the population were classified in the ‘At Higher Risk’ group using a threshold of predicted risk >2.5%. The observed risk of hospitalisation (5%) for the ‘At Higher Risk’ group was more than four times higher than the overall population. We observed a C-statistic of 0.78 indicating good model performance. The model was well calibrated across categories of predicted risk.

Conclusions It is feasible to develop a population-based model using a longitudinal administrative database that identifies the risk of hospitalisation for a paediatric population. The results of this model, along with profiles of children identified as high risk, are being provided to the paediatricians and other healthcare professionals providing care to this population to aid in planning for care management and interventions that may reduce their patients’ likelihood of a preventable, high-cost hospitalisation.

  • predictive risk modeling
  • paediatric population
  • population-based study

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • This study included the entire paediatric population of the Emilia-Romagna Region of Italy, with a total of 527 458 children ages 1–13 years.

  • The study used an existing longitudinal administrative healthcare database with both the advantage of much lower cost than new data collection and the disadvantage of gaps and potential errors in administrative data.

  • The results of the study are being used to assist paediatricians and health authorities manage high-risk children.


Healthcare systems have been moving from a passive approach of waiting for and reacting to patients’ problems to a more active model that includes identification of patients at risk, taking the initiative in offering care and actively seeking to avoid recurrence or progression of medical problems. With the ageing of populations worldwide, and high prevalence of chronic diseases, it is not surprising that these efforts have often focused on the elderly. Less attention has been paid to the paediatric population. However, despite the relatively low prevalence of chronic disease in children, there is evidence that children experience preventable hospitalisations.1 For example, a study of paediatric inpatient claims in the USA estimated that paediatric ‘ambulatory care sensitive’ conditions accounted for US$4.05 billion in hospital charges and over 1 million hospitalisation days in a 1-year period.2

Predictive risk modelling is a tool that can be used to estimate the risk of an outcome within the context of prespecified variables and uncertainty. Predictive risk modelling may offer an opportunity to better understand individuals who may be at higher risk for an undesirable outcome.3 A number of predictive risk modelling studies have been conducted in paediatrics; however, many of these studies have focused on children with specific medical problems or use data that are not routinely available in administrative databases.4–9

Under the auspices of the Italian National Health Service (NHS), the 21 regional governments are responsible for delivering healthcare through a network of geographically defined Local Health Authorities. Primary care physicians, including paediatricians, work for the Local Health Authorities as independent contractors. Every Italian is expected to enrol with a primary care physician (a paediatrician for those under age 14 years) who serve as the ‘gatekeepers’ for delivering primary care and coordinating specialty services for their enrolled patients.10 This focus on primary care is ideal for the development and implementation of a proactive model of healthcare.

To further encourage coordinated care, the Regione Emilia-Romagna (RER) has established Patient-Centered Medical Homes. The identification of patients who would most benefit from outreach efforts is fundamental to achieving the goals of promoting population health and practising proactive medicine. The RER has therefore developed and implemented a population-based model to predict risk of hospitalisation or death for adult residents in the region.11 The results of the model are presented to physicians in Patient-Centered Medical Homes as patient profiles to support care management and the identification of patients who may benefit from additional outreach such as home healthcare, disease management or case management.

Current risk models used in RER focus on the adult population. This paper describes the development of predictive risk models for the paediatric population using the RER’s regional longitudinal administrative healthcare database to help identify children who are at risk of hospitalisation for conditions that may be affected through improved patient care.


Data source

The RER is a region of northern Italy that lies between the River Po and Apennine Mountains with approximately 4.5 million inhabitants. RER maintains a longitudinal healthcare database for all its residents. The RER database contains patient-level demographic data (age, gender, birth and death dates, location of residence and primary care physician/paediatrician) and utilisation data for inpatient (hospital discharge abstract data with International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) diagnosis and procedure codes and admission/discharge dates), outpatient (laboratory, diagnoses and physician services, and pharmacy claims including WHO Anatomical Therapeutic Chemical (ATC)/Defined Daily Dose (DDD) system codes),12 specialty (therapeutic procedures, rehabilitation and specialist visits) and emergency room (ER) visits. Inpatient medications are not captured. Patients with disabilities or low family income are eligible for exemption of service copayment for specialty visits and outpatient prescriptions, which provide some socioeconomic information. Each resident is assigned an anonymous identifier so that utilisation can be tracked over time while maintaining patient privacy.13

Study cohort

In Italy, children aged 14 years are required to switch from a paediatrician to a primary care physician; therefore, we limited the study population to children 1–13 years old on 31 December 2014. The study population also was narrowed to exclude children who did not reside in RER for the entire year 2014. The study population was stratified into three age groups: 1–2 years old (on 31 December 2014); 3–5 years old; and 6–13 years old. Children less than 1 year old on 31 December 2014 were not included in the study population due to insufficient data for prediction of outcomes.

Dependent variable

The outcome was defined as the occurrence of hospitalisation that could have potentially been prevented or delayed with appropriate patient care or death by any cause.11 We developed a list of hospitalisations that are potentially preventable with appropriate patient care using a three-step process. First, we conducted a literature search to evaluate paediatric studies that defined potentially avoidable disease in paediatrics that could require hospitalisation.14–16 We began with the listing of ICD-9-CM codes for ‘pediatric ambulatory care sensitive conditions’ identified in Shi and Lu.15 All hospitalisations in 2013 of children in the target age groups were classified using both ICD-9-CM codes and Disease Staging categories.17 18 The results were reviewed by the authors of this paper and compared with Shi et al’s list. A number of changes were made for this project. For example, the list of immunisation preventable conditions to be included in the dependent variable was expanded to include currently available vaccines. We included additional conditions, such as acute cystitis (ICD-9-CM code of 595.0) and hypoglycaemic coma (ICD-9-CM code of 251.0). Advanced stages of selected medical problems were added where stage one may not be avoidable but advanced stages can potentially be delayed or prevented through timely intervention, for example, stage 2 or 3 appendicitis, stage 2 or 3 sinusitis. While certainly not always preventable, we believed that inclusion of hospitalisations for certain types of trauma and toxicities (eg, acetaminophen toxicity, adverse drug reactions and burns) was appropriate especially for a paediatric population. These changes are summarised in online supplementary appendix 1.

Supplemental material

Finally, we used disease staging categories for inclusion of relevant hospitalisations that would have been missed using solely primary ICD-9-CM codes. For example, if a child was hospitalised with a primary diagnosis of respiratory failure with asthma (ICD-9-CM code of 493) as the secondary diagnosis, then the disease staging category of asthma would include that admission that might have been missed by including only primary ICD-9-CM codes. This is summarised in online supplementary appendix 2.

Supplemental material

Children hospitalised for these selected conditions or who died from any cause in 2015 were counted as being positive for the outcome.

Independent variables

A list of predictor variables was developed using the RER administrative data from 2006 to 2014. Independent variables included information such as: demographics, socioeconomic factors, diseases/conditions grouped by aetiology or body systems, mother’s medical history and pregnancy/birthing information, ER visits, potentially inappropriate prescriptions and antibiotic usage.

Demographic variables included age on 31 December 2014, gender and citizenship (Italian or non-Italian). Children from low-income families or with disabilities are exempt from copayments for prescriptions and specialty visits. This information was used as a potential predictor variable.

We mapped diseases defined primarily by the affected body system with the exceptions of cancer, genetic conditions and trauma, which were based on aetiology11 using 2014 hospital discharge data, outpatient prescription information and specialty visit claims. A total of 24 groups were defined. Disease staging diagnostic categories was used to map hospital admissions to the 24 body system/aetiology groups17 (see first column of table 1). Patients with cardiovascular diseases, chronic respiratory diseases, diabetes mellitus, epilepsy and disorders of the thyroid were identified using the ATC Classification System codes from outpatient prescriptions.19 Specialty visit records were also used for identifying medical conditions of some body systems. For example, if a child was admitted to the hospital for type 1 diabetes mellitus, or visited an endocrinologist, or had filled a prescription for insulin injection(s) (ATC code of A10AB), this patient would be identified as having an endocrine diagnosis in 2014.

Table 1

Study population 2014

Severity level codes (critical (C), acute (A), urgent but deferred (U) and not urgent (N)) are assigned to individuals on discharge from the emergency department. We excluded ER visits that resulted in a hospital admission because diagnosis information was captured by hospital discharge data with more accurate information. We believe more frequent or severe ER visits may indicate a poor outcome; therefore, number of ER visits by severity level was calculated for each patient.

There is evidence that the risks outweigh the benefits for certain medication usage in the paediatric population.20 For example, certain mood-altering medications such as, citalopram, sertraline, fluvoxamine and any tricyclic antidepressants are not recommended in children of any age. Some medications can be harmful within specified ages. For example, loperamide is not indicated for children under 3 years old. For children who filled an outpatient prescription in 2014, we calculated their age at dispensation date and amount of medications they had filled, in order to identify patients with potentially inappropriate prescriptions in 2014. The number of antibiotic prescriptions used in 2014 was estimated since high utilisation of antibiotics has been linked to decreased gut microflora, decreased immune function and resistant strains of bacteria.21

For children ages 1–5 years, the models considered problems identified at birth as potential predictors using hospital discharge abstract data. About 86% of the newborns were healthy, with no serious medical problems noted on their birth records. Infants with diagnostic categories of premature birth with low birth weight, full-term infants with abnormal birth weight, premature with very low birth weight or extremely low birth weight, were classified as abnormal birth weight; all other conditions were considered as a group. The mothers’ delivery information, such as age at delivery, C-section and parity, were identified based on the mothers’ hospitalisation records and linked to children. Information about deliveries that occurred outside hospitals could not be captured.

Children ages 1–5 years old were also linked with information regarding their mothers’ medical history and drug use during pregnancy. There is evidence on the association between prenatal (up to 270 days before delivery) exposure to antibiotics and the development of asthma.22 We estimated the total exposure to any antibiotics during the prenatal period using the mother outpatient prescription claims. We included two categories of mother’s potentially inappropriate drug use, class D (potential risks outweigh the benefits) and X (contraindicated during pregnancy), since these drugs may be linked to harm to children. Mothers’ 3-year medical history before delivery was retrieved for identifying certain conditions such as abortion, diabetes and psychological condition. For about 22% of children, we were not able to establish the mother–baby linkage.

We developed history variables with up to 5 years of data (pharmacy, specialty, hospital admission and ER visit) for children in age strata 3–5 years old and 6–13 years old. Children who had conditions in any year from 2009 to 2013 were flagged as having a utilisation history.


Logistic regression was used to estimate predicted probabilities for the occurrence of an inpatient hospital stay for the selected conditions, or death from any cause, for the individual patients. Since age and gender may be strongly correlated with children’s risk, we fit a total of six multivariable logistic regression models: female and male by age groups (1–2, 3–5 and 6–13 years old). All models were developed using SAS V.9.3 statistical software.

Model validation

The predicted accuracy of the modelling was evaluated using C-statistics (the area under the receiver operating characteristics curve), comparing the results of the ‘predicted’ to the ‘observed’ outcomes in 2015. We stratified patients into risk strata based on the predicted risk of hospitalisation or death. ‘At higher risk’ was defined as children with a predicted risk greater than 2.5%. ‘Higher than average’ was defined as children with a predicted risk of hospitalisation or death between the mean rate and 2.5%. The rest of population was grouped into ‘Lower than average’. These risk strata were defined to yield a manageable number of patients to review for the typical paediatric panel of approximately 800 patients. Calibration of the model across these risk groups was assessed by comparing observed to predicted rates among the risk groups. We also report the sensitivity, specificity and positive predictive value (PPV) for the defined risk group cut-offs.


Characterisation of risk groups

A total of 568 117 children ages 1–13 years resided in RER in 2014. We excluded from our analysis 40 659 children (7.2%) who did not reside in RER for the entire year, resulting in a population of 527 458 children. Of those, 6391 children (1.21%) were hospitalised for selected conditions or died in 2015. Table 1 displays the distribution of gender, age category, presence of selected chronic conditions, ER visits, selected prescription drug usage, copay exemption for income or disability and specialty visits for the eligible RER residents as of 31 December 2014.

Table 1 also compares the characteristics of the total selected paediatric population to the subgroups of the population classified by risk categories based on the model results. Forty nine thousand four hundred and eighty-six children (9.4%) of the population were classified in the ‘At higher risk’ group using a threshold of predicted risk >2.5%. The children predicted to be ’At higher risk' were more likely to be male (58.9%) compared with 51.5% in the total population. The two youngest age strata (1–2 and 3–5 years) had much higher proportions of children identified in the ’At higher risk' group than the children aged 6–13 years. For example, 18 112 (23%) of the children age 1–2 years were identified in the ’At higher risk' group. This age category includes 36% of the ’At higher risk' children, although it represents 15% of the total paediatric population. Children in the ‘At higher risk’ category were more likely to have each of the selected conditions. When looking at the highest prevalence conditions, 43.8% of children in the ‘At higher risk’ category had an ear, nose or throat problem, compared with 6.1% in the overall population, 5.5% had a gastrointestinal problem compared with 1.4% in the overall population, 4.3% had a neurological problem compared with 0.7% in the overall population, 14.7% had a respiratory problem compared with 3.9% in the overall population and 11.7% had a skin problem compared with 7.5% in the overall population.

Children identified as being ‘At higher risk’ were much more likely to have a history of ER visits and were more likely to have a history of 2, 3 or more antibiotic prescriptions. Overall, 14.6% of children had three or more antibiotic prescriptions, while in the ‘At higher risk’ category, 51.7% had a history of 3 or more antibiotic prescriptions. Children with exemptions from copayments due to either family income or disability were more likely to be identified as being ’At Higher Risk' as were children with a history of medical or surgical specialty visits.

Table 2 displays information about the delivery (for the children age 1–5 years) and medical history of the mother for those children where we were able to match to their mother’s record. First children, children who were delivered by caesarean section and children where an abnormal birth weight or other problems were noted at birth were more likely to be classified in the ‘At higher risk’ category. If the mother was prescribed a potentially inappropriate drug or an antibiotic during pregnancy, the child was more likely to be classified in the ‘At higher risk’ category. When examining a 3-year medical history of the mother, the mother’s asthma, cardiovascular disease, diabetes mellitus or mental health problems, or the record of a previous abortion, were all relatively frequent and more prevalent in the mothers of children predicted to be in the ‘At higher risk’ category.

Table 2

Birthing and medical history of mother*


The population was divided into three risk groups based on predicted probability of hospitalisation as defined above. We observed good calibration; each stratum’s predicted risks were similar to observed prevalence of hospitalisations or deaths (figure 1). Individuals, who fell in the ‘At higher risk’ group, with predicted risk greater than 2.5%, had 2683 predicted events based on the model results, and 2737 observed events. While the overall rate of hospitalisation or death for children ages 1–13 years was 1.21%, the predicted and observed risk of the ‘At higher risk’ group was over 5%.

Figure 1

Model calibration: predicted and observed prevalence of hospitalisation or death in 2015 by risk category.

Model performance among risk groups

We observed a C-statistic of 0.78 indicating good model performance (table 3). The sensitivity (proportion predicted to be at higher risk of those who had an event in 2015) was 0.43 and 0.70 for predicted risk categories of ‘At higher risk’ and ‘Higher than average’, respectively (table 4). In other words, among those whom were hospitalised or deceased in 2015, 43% were predicted to have risk greater than 2.5% of hospitalisation or death and 70% have risk higher than average. The specificity (proportion predicted to be at a ‘lower’ risk of those who did not have an event) was 0.91 and 0.72 for the predicted ‘At higher’ and ‘Higher than average’ risk categories, respectively; among those who were not hospitalised and did not die in 2015, 91% were not predicted to be ‘At higher risk’. The PPV (proportion with an event of those who were predicted to be at an elevated risk) was 0.06 and 0.03 for the ‘At higher’ and ‘Higher than average’ predicted risk categories, respectively. In other words, of those individuals who were estimated to have a >2.5% risk of hospitalisation or death approximately 6% had an event in 2015. (Regression coefficients and significance levels of independent variables for multivariable logistic regression models for each the six age and gender strata are included in online supplementary appendix 3).

Supplemental material

Table 3

Observed and predicted events by risk group

Table 4

C-statistic, sensitivity, specificity and PPV


We have developed a population-based model that identifies risk of hospitalisation for potentially preventable problems in a paediatric population including all children under the age of 14 years living in the RER of Italy. The C-statistic of 0.78 indicates that the model performs well. By comparison, in a study predicting high-cost paediatric patients, Leininger et al reported a C-statistic of 0.73.9 In their work in predictive risk modelling in the UK, Billings et al reported a C-statistics of 0.68523 and C-statistics ranging from 0.731 to 0.780.24 However, neither of these papers focused on a paediatric population. In a project also conducted in the Emilia-Romagna region of Italy but focused on the adult population, Louis et al 11 reported a C-statistic of 0.856. Given the similar organisation of the healthcare system and the similar database used for the adult and paediatric analyses, we believe that the somewhat lower C-statistic in the paediatric study results form the fact that hospitalisation is less frequent in children.

We believe that the definition of the dependent variable used in our models increases the likelihood that they are identifying patients whose risk may be reduced through proactive care. We have updated previously published criteria to include hospitalisations that may have been prevented by currently available vaccines, and we have used the logic of disease staging to include relevant hospitalisations that would have been missed using solely primary ICD-9-CM codes. Specifics of the selection criteria are available in the supplemental material.

The richness of the administrative data available in the RER allowed for a robust definition of the predictive variables. The RER data allow for the linkage of patients’ use of diverse inpatient and outpatient healthcare services over multiple years. In addition, the ability to link child and mother’s information allows the models to consider some of the mother’s medical history such as the presence of chronic disease and use of prescription drugs in the years prior to birth as well as complications that may have arisen at birth.

There are limitations to our models. The models were developed with administrative data that lack some of the clinical specificity that would be useful in assessing patient risk. Children who have not had the types of encounters included in the RER database would have potentially missing information. The RER database does not have encounter level diagnostic data available documenting visits with the primary care paediatrician. The administrative data have very limited information available about the patient and family socioeconomic status. Our models use prior utilisation among the predictor variables. With the administrative date, we cannot distinguish appropriate from inappropriate prior utilisation, which may bias our results. Despite their limitations, administrative data have many advantages for a project such as ours. They are relatively inexpensive to analyse and, in the case of the RER, include a large population over multiple years.

While the evidence was mixed, a systematic review suggests that hospitalisations can be prevented in children with medical complexity.1 The Local Health Authority of Parma has begun working with the primary care paediatricians caring for the patients identified by the models to develop individual ‘profiles’ of children identified as being at higher risk. Data in the profiles, along with the more detailed information available in the medical record, can be used by the paediatricians to assess what additional intervention, if any, may help to manage the child’s risk. For example, review of the profiles of higher risk children can help identify children whose parents might be contacted for a visit if they have not been seen recently. Summaries of prescriptions that have been filled from the profiles can be reviewed for potential over use, under use or inappropriate use of mediation. High-risk children with chronic illness might be referred to a specialist or home healthcare provided.

The RER healthcare system offers several advantages in the goal of reducing potentially preventable hospitalisation. Every child is enrolled with a primary care paediatrician. The population is quite stable allowing for continuity of care. Through the Italian NHS, every child is entitled to healthcare with little or no cost at the point of service. While the primary care paediatricians are paid on a per-capita basis, the RER can negotiate incentive payments and monitor improvements in care that may help to reduce avoidable hospitalisations. If successful, the results of the models can be applied by other Local Health Authorities in the RER, other Italian regions and other countries with similar data availability.


We would like to thank Elena Saccenti, JD, Massimo Fabi, MD and Ettore Brianti, MD, from the Local Health Authority, Parma, Italy, for their assistance; and Claudio Cantadori, MD, Sergio Capobianco, MD, Ada Donadio, MD, Maura Morelli, MD and Michael Andreas Ramon Werth, MD, for their contribution in reviewing early results of our models.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.


  • Contributors DZL, VM, MaL and JSG were responsible for the conceptualisation of this project. MR and MeL were responsible for creation of the datasets used in this project. DZL, CAC, VM, MR and JSG were responsible for the definition of analytical variables. MR and MeL were responsible for modelling and statistical analysis. DZL managed the research team. CAC, VM, MaL, JM and JSG advised on the analyses and results. All authors contributed to the preparation of the manuscript.

  • Funding This research was funded by the Local Health Authority of Parma, Italy.

  • Competing interests DZL and JSG declare personal fees from Truven Health Analytics. CAC, MR, MeL, JM, MaL and VM declare no conflict of interests.

  • Patient consent Not required.

  • Ethics approval This study was conducted under the auspices of regulation of privacy of the Emilia-Romagna Region N.3 of 24 April 2006 (title: Processing of sensitive data) of act N.1 of 30 May 2014 still in force. In addition, this study was approved by the Institutional Review Board (IRB) of Thomas Jefferson University as an expedited retrospective database/record review. The IRB granted a waiver of informed consent.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement No additional data available. Data used for this research were retrieved from the Regional database of the Emilia-Romagna Region provided through a collaborative agreement between the Regional Health Care and Social Agency, Emilia-Romagna, Italy, the Health Care Authority, Emilia-Romagna, Italy, and Thomas Jefferson University.