Article Text


A population-level prediction tool for the incidence of first-episode psychosis: translational epidemiology based on cross-sectional data
  1. James B Kirkbride1,
  2. Daniel Jackson2,
  3. Jesus Perez3,
  4. David Fowler4,
  5. Francis Winton5,
  6. Jeremy W Coid6,
  7. Robin M Murray7,
  8. Peter B Jones1,8
  1. 1Department of Psychiatry, University of Cambridge, Herchel Smith Building for Brain & Mind Sciences, Cambridge, UK
  2. 2MRC Biostatistics Unit, Institute of Public Health, University of Cambridge, Forvie Site, Robinson Way, Cambridge, UK
  3. 3CAMEO, Cambridgeshire & Peterborough NHS Foundation Trust, Cambridge, UK
  4. 4Norfolk and Suffolk Partnership Trust, Hellesdon Hospital, Norwich, UK
  5. 5Suffolk Early Intervention Psychosis Service, Norfolk and Suffolk Partnership Trust, Stowmarket, Suffolk, UK
  6. 6Forensic Psychiatry Research Unit, Queen Mary's University London, St. Bartholomew's Hospital, London, UK
  7. 7Department of Psychosis Studies, Institute of Psychiatry, London, UK
  8. 8NIHR Collaboration for Leadership in Applied Health Research & Care, Cambridge, UK
  1. Correspondence to Dr James Kirkbride; jbk25{at}


Objectives Specialist early intervention services (EIS) for people aged 14–35 years with first episodes of psychosis (FEP) have been commissioned throughout England since 2001. A single estimate of population need was used everywhere, but true incidence varies enormously according to sociodemographic factors. We sought to develop a realistically complex, population-based prediction tool for FEP, based on precise estimates of epidemiological risk.

Design and participants Data from 1037 participants in two cross-sectional population-based FEP studies were fitted to several negative binomial regression models to estimate risk coefficients across combinations of different sociodemographic and socioenvironmental factors. We applied these coefficients to the population at-risk of a third, socioeconomically different region to predict expected caseload over 2.5 years, where the observed rates of ICD-10 F10-39 FEP had been concurrently ascertained via EIS.

Setting Empirical population-based epidemiological data from London, Nottingham and Bristol predicted counts in the population at-risk in the East Anglia region of England.

Main outcome measures Observed counts were compared with predicted counts (with 95% prediction intervals (PI)) at EIS and local authority district (LAD) levels in East Anglia to establish the predictive validity of each model.

Results A model with age, sex, ethnicity and population density performed most strongly, predicting 508 FEP participants in EIS in East Anglia (95% PI 459, 559), compared with 522 observed participants. This model predicted correctly in 5/6 EIS and 19/21 LADs. All models performed better than the current gold standard for EIS commissioning in England (716 cases; 95% PI 664–769).

Conclusions We have developed a prediction tool for the incidence of psychotic disorders in England and Wales, made freely available online (, to provide healthcare commissioners with accurate forecasts of FEP based on robust epidemiology and anticipated local population need. The initial assessment of some people who do not require subsequent EIS care means additional service resources, not addressed here, will be required.

Statistics from

Article summary

Article focus

  • Commissioners require precise information on the health needs of their local populations to effectively plan health services.

  • A failure to arm mental health commissioners with precise epidemiological data led to misestimation of actual activity in early intervention in psychosis services (EIS).

  • We sought to develop a prediction tool for the incidence of first episode psychosis (FEP), by applying precise estimates of epidemiological risk in various sociodemographic groups to the structure of the population at-risk in a different region, where the observed incidence had been concurrently ascertained.

Key messages

  • A model of psychosis incidence which included age, sex, ethnicity and population density yielded precise FEP predictions in our new region, outperforming the Department of Health in England's current gold standard for EIS commissioning.

  • While our model provides forecasts of the burden of FEP in different populations, the initial assessment of some people who do not require subsequent EIS care means additional service resources, not addressed here, will be required.

  • We have translated this model into a freely available prediction tool ( to facilitate evidence-based healthcare commissioning of socioculturally relevant services according to local need.

Strengths and limitations of this study

  • Our modelling approach used robust epidemiological data from two large studies of first episode psychosis in England to provide estimates of incidence in a third study region, producing accurate FEP forecasts.

  • While our models provide estimates of the expected clinical burden of FEP in the community, services may see a broader range of psychopathology consuming resources, or incepted rates may be influenced by supply-side organisational factors.

  • Owing to data availability, it was not possible to validate our prediction tool in settings outside of England and Wales, or for specific psychotic disorders. As data become available, we will extend the capability of our prediction tool, including into other settings and disorders.


Commissioners of health and social care require precise information on the health needs of their local populations,1 especially if parity of mental and physical health is to be realised.2 Mental health disorders alone represent the leading disease burden in the UK (22.8%).3 They contribute substantially to healthcare expenditure and societal costs even before physical ill health is taken into account. The Centre for Mental Health estimated the total costs of mental health to British health services and society at £105 billion in 2009/2010,4 a figure expected to double over the next 20 years.2 These are serious challenges compounded by a paucity of information on which to commission appropriate services. Early intervention in psychosis services (EIS) for people aged 14–35 years with a first episode of psychosis (FEP) offer a useful example of failure to map services to local need.

EIS are a major evidence-based innovation, systematically commissioned throughout England and Wales over the past decade.5 When EIS intervention is sustained, there is evidence that people with psychosis achieve better functional and social outcomes.6 ,7 Such services are also highly cost-effective.4 ,8 ,9 However, EIS were originally commissioned on an anticipated rate of 150 new cases of any psychotic disorder per 1 000 000 of the total population per year in the Department of Health's Mental Health Policy Implementation Guide (MH-PIG).5 In 2001 in England and Wales, 29.3% of the population were aged 14–35 years, meaning that the MH-PIG commissioned incidence rate was approximately 51 cases/100 000 person-years in the age range covered by EIS. Following their deployment, anecdotal reports began to emerge from EIS in different regions to suggest that a uniform figure for commissioning was simultaneously underestimating10 and overestimating11 the actual observed need in urban and rural populations, respectively. Recent epidemiological evidence of FEP incidence in rural communities in England has suggested that rates are somewhat lower than the uniform figure upon which services were commissioned,12 ,13 confirming previous calls that a ‘one-size-fits-all’ prescription for EIS implementation is unlikely to lead to the efficient allocation of finite mental health resources.14 ,15

Using rich epidemiological data on variation in the incidence of FEP according to major sociodemographic risk factors,16–19 we describe the development and validation of a population-level prediction tool capable of accurately estimating the expected incidence of psychiatric disorder, based on the sociodemographic structure of the population in a given region. Applied to FEP as proof-of-concept, we show that it is possible to closely predict the expected incidence in a given population, where the observed count of cases was within the prediction intervals (PI) forecast by our models. We applied our most precise prediction model to the population of England and Wales to provide health commissioners with a translational epidemiological prediction tool to underpin information-based service planning.


Our prediction models were based on epidemiological data from the Aetiology and Ethnicity in Schizophrenia and Other Psychoses (ÆSOP) and the East London First Episode Psychoses (ELFEP) studies,18 ,20 two methodologically similar population-based FEP studies. We fitted various count-based regression models with different combinations of sociodemographic and socioenvironmental factors, well established in the literature to be associated with the incidence of psychotic disorder.21 ,22 We first established the relative apparent validity of each model by estimating model-fit diagnostics to assess how well each model fitted the empirical data (henceforth, the prediction sample). We next sought to estimate the external validity of each model by applying model-based parameter coefficients to the population structure of a purposefully different region of England, East Anglia (henceforth, the validation sample). This out-of-sample prediction technique allowed us to obtain the expected incidence of disorder in this region forecast by each model, which we compared with observed rates simultaneously ascertained in this region via the ongoing Social Epidemiology of Psychoses in East Anglia (SEPEA) study.13 We performed various model-fit diagnostics to identify which, if any, model demonstrated utilisable predictive capability.

Empirical data underlying prediction models (prediction sample)

Case ascertainment (numerator)

The designs of the ÆSOP and ELFEP studies have been described in detail elsewhere,18 ,20 with features relevant to the present paper summarised here. Case ascertainment took place over 2 years in ELFEP (Newham: 1996–1998; Tower Hamlets & Hackney: 1998–2000) and the Southeast London and Nottingham centres of the ÆSOP study (1997–1999), and over the first 9 months of 1997 in Bristol (ÆSOP). All service bases were screened regularly for potential new contacts aged 16–64 years (18–64 in ELFEP) resident within these catchment areas. Leakage studies were conducted to identify participants missed by this initial screen, but meeting inclusion criteria for FEP.18 ,20 All participants who received an ICD-10 F10–39 diagnosis for psychotic disorder following assessment via the Schedules for Clinical Assessment in Neuropsychiatry were included in the incident sample, except those with an organic medical basis to their disorder or profound learning difficulty. Data on age-at-contact, sex and ethnicity were collected on included participants. We geocoded participants’ residential postcode at first contact to their corresponding local authority district (LAD) to allow us to model possible neighbourhood effects associated with the incidence of psychotic disorder, such as population density or socioeconomic deprivation.

Population at-risk

We estimated the population at-risk using the 2001 Census of Great Britain, adjusted for study duration, and stratified by age group (16–17, 18–19, then 5-year age bands), sex and ethnicity. Ethnicity was based on self-ascription according to 1 of 10 categories derived from the census: white British, non-British white, black Caribbean, black African, Indian, Pakistani, Bangladeshi, mixed white and black Caribbean, other mixed ethnic backgrounds and all other ethnicities.

Socioenvironmental variable estimation

We estimated LAD-level deprivation using the 2004 Index of Multiple Deprivation (IMD) in England, which estimated domains of deprivation using measures predominantly collected close to the time of our case ascertainment periods (see table 1).23 We z-standardised English LAD IMD scores to have a mean of zero and SD of 1, and extracted IMD z-scores for the 14 LADs in the ÆSOP and ELFEP studies. To inspect whether any particular deprivation domain was a better predictor of psychosis incidence than IMD, we also considered LAD-level income deprivation, employment deprivation and the extent of deprivation in our models (table 1). We estimated population density by dividing each LAD’s usual resident population by its area (in hectares), using ArcGIS V.9.3 software.

Table 1

Description of included socioenvironmental variables*†

Observed data for external validation of prediction models (validation sample)

Observed participants and population at-risk data for our validation sample were obtained from the SEPEA study, an ongoing study of the incidence of psychotic disorders incepted over 3.5 years (2009–2013) through one of six EIS covering 20 LADs and a subsection of 1 LAD (the town of Royston, Hertfordshire) in Norfolk (three EIS: West Norfolk, Central Norfolk, and Great Yarmouth and Waveney), Suffolk (one EIS) and Cambridgeshire, Royston and Peterborough (CAMEO North and South EIS).13

Case ascertainment

To establish the incepted incidence of FEP as seen through EIS, entry criteria for the SEPEA study were:

  • Referral to an EIS in East Anglia for a suspected first episode of psychosis;

  • Aged 16–35 years at first referral to EIS (17–35 years in CAMEO services);

  • Resident within the catchment area at first referral;

  • First referral during case ascertainment period (2009–2013).

At 6 months after EIS acceptance, or discharge from the service, whichever was sooner, we asked the clinician responsible for care to provide an ICD-10 F10–39 psychiatric diagnosis using all information available. We excluded participants without a clinical FEP diagnosis, or participants presenting with an organic basis to their disorder or profound learning disability. For the remaining participants, basic sociodemographic and postcode information was recorded and classified in the same way as in the prediction sample. We included participants presenting to EIS during the first 2.5 years of the ongoing SEPEA study.

Population at-risk

We estimated the population at-risk of East Anglia using 2009 mid-year census estimates published by the Office for National Statistics (ONS) at the LAD level, by age group, sex and ethnicity.24 These estimates used the 2001 census base, adjusted for immigration, births and deaths each year. It was not possible to obtain 2009 mid-year estimates for the town of Royston, because data were only published at LAD level. Here, we used denominator data from the 2001 census data in order to estimate the population at-risk in Royston. We do not believe that this would have substantially invalidated our results as this town represented 0.6% of the overall population at-risk (n=9555) in the SEPEA study. Denominator data were multiplied by 2.5 to account for person-years of exposure in the validation sample.

Socioenvironmental variable estimation

For each LAD in the SEPEA study, we obtained corresponding socioenvironmental variables to those included in our prediction sample, using updated data collected as close to the SEPEA case ascertainment period as possible. Population density was estimated using 2009 mid-term population estimates. Our measures of deprivation were derived from IMD 2010,25 which was estimated in an analogous way to 2004 data, but collected from sources obtained immediately prior to the SEPEA study.

Statistical techniques

Dataset generation

We constructed a dataset for the regression analysis of count data by pooling data from the ÆSOP and ELFEP studies (the prediction sample). Data were stratified by age group, sex, ethnicity and LAD, such that each stratum (N=2536) represented the total count of FEP cases in a unique sociodemographic group for a given LAD, with a corresponding estimate of the population at-risk, treated as an offset in our models. Our socioenvironmental measures (population density, deprivation) were adjoined to the dataset for each LAD. Population at-risk data from the validation sample were stratified in the same way and retained in a separate database. Here, the count of cases, which we wished to predict, was entered as a vector of missing data which would be populated with predicted case estimates following prediction modelling.

Prediction models

We used the prediction sample data to fit negative binomial regression models to obtain parameter coefficients of incidence for the sociodemographic and socioenvironmental factors included in each model. We considered the internal and external predictive capabilities of six models, all of which contained age group, sex, an age–sex interaction term and ethnicity. Model 1 contained no further covariates. Model 2 also included IMD. We replaced IMD with either income, employment or the extent of deprivation, respectively, in models 3–5. Model 6 included population density. Initial exploration of the prediction sample data indicated the presence of possible overdispersion (variance (δ2=1.37) exceeded mean (μ=0.4) count of cases), so negative binomial regression was preferred to Poisson regression since it explicitly models any overdispersion with an extra dispersion parameter.

Apparent model validity and prediction

We assessed apparent model validity in three ways. First, we used Akaike's Information Criterion (AIC) to assess the respective overall fit of each model to the data. Second, we conducted K-fold cross-validation to assess each model's apparent validity to predict cases within the prediction sample. This method randomly allocated strata in the prediction sample into K subsets. Each model was then re-estimated on K-1 subsets (the training data) to predict the expected counts of cases in the Kth subset (the test data). This was repeated over K trials, such that each stratum in the dataset appeared exactly once as the test data. At the end of this process, we derived Lin's concordance correlation coefficient (CCC) and 95% CI to estimate the correlation between the predicted and observed counts of cases across all strata in the prediction sample. Finally, we estimated the root mean squared error (RMSE) to determine the average error between fitted and observed values from each model. Lower RMSE scores indicated a smaller prediction error. The RMSE is derived asEmbedded Imagewhere γi and Embedded Image are the observed and predicted counts of cases in the ith stratum, respectively, and n is the number of strata.

We repeated K-fold cross-validation h times, generating K new random divisions of the data each time. We retained model-fit diagnostics across Kh iterations, and reported the mean of Lin's CCC and RMSE to provide summary cross-validation statistics for each model. We specified K=10 and h=20, as recommended for cross-validation to obtain precise model-fit diagnostics.26

External model prediction and validation

We retained parameter coefficients from each model (using the full prediction sample data) and applied these to the corresponding population at-risk in the validation sample dataset. This gave out-of-sample prediction estimates for the expected count of cases in each stratum of the validation sample, given the model. We summed expected counts across relevant strata to estimate the (1) total predicted count of cases in the SEPEA region, (2) predicted counts in each EIS and (3) predicted counts by LAD. These counts were further stratified by broad age group (16–35, 36–64 and 16–64 years). Because the census (denominator) data were unavailable for 35-year-olds alone (needed to estimate their contribution to predicted counts in the age range for EIS, 16–35 years), we assumed that the risk coefficient was the same across all ages within the 35–39-year-old age group. We apportioned predicted counts on a 1:4 ratio (35:36–39 years) to their respective broad age groups.

To determine how well the MH-PIG5 figure of 51 new cases per 100 000 person-years for EIS performed as a predictive tool, we also estimated the predicted count of cases in the validation sample under this scenario, which we termed ‘Model 7’.

We derived 95% PIs for all summary predictions from first principles, since their derivation is not straightforward, nor routinely implemented by statistical software. PIs are similar to CIs, but account for SEs introduced in both the prediction and validation samples. We developed a bootstrap-like approach to obtain PIs from each model by simulating 1000 model-based realisations of the quantities we wished to predict, where we took the parameters to be the maximum likelihood estimates. We obtained the lower and upper bounds of the PIs as the corresponding quantiles of the simulated realisations (see appendix for full details).

To assess each model's external predictive capabilities, we considered five markers of predictive accuracy. We compared the number of times the observed count of cases in the SEPEA study fell within the PIs estimated from each model for (1) the SEPEA region, (2) at the EIS level and (3) at the LAD level. We also derived EIS-level (4) and LAD-level (5) RMSE scores to estimate prediction error from each model in our validation sample. We ranked model performance (1: best and 7: worst) on these five measures, and estimated an overall mean rank to determine the overall predictive validity of each model.

Observational data on first episode psychosis in our validation sample were not available for the age range 36–64 years, so external validation was restricted to the 16–35 year old age range. For completeness, however, we also reported the overall predicted count of cases for this age group from each model.

Extrapolation to the UK

Guided by our validation procedures, we identified which model had the greatest overall predictive validity, and proposed this as a candidate for FEP incidence prediction in England and Wales. We repeated out-of-sample prediction on the sociodemographic and socioenvironmental population characteristics of each LAD in England and Wales to obtain national-level and LAD-level predictions. Denominator data were obtained from the ONS 2009 mid-term estimates and stratified as previously described. Overall counts were derived for three broad age groups (16–35, 36–64 and 16–64 years), and for each of these, by sex and ethnicity. The 95% PIs were estimated as before. We visualised these data on maps and in tables to provide healthcare planners and commissioners with an easy-to-use tool to forecast the expected incidence of psychotic disorder in England and Wales. We have made this available as a free, open-use prediction tool, known as PsyMaptic (V.0.5) (Psychiatric Mapping Translating Innovations into Care; Counts of cases predicted by our model were compared with those obtained under the Department of Health's uniform rate in each LAD. We expressed these comparisons as ratios with 95% CIs derived using the same method as for standardised morbidity ratios (SMR). This approach was conservative because here we substituted the usual numerator in an SMR, the observed, O, for a predicted count. Unlike an observed count, no sampling variation is present for the predicted count, only uncertainty due to the model from which the prediction was estimated. Since variance in the prediction is therefore much smaller than the variance normally present for the numerator (O), this led to conservative estimates of 95% CI. Ratios in LAD where 95% CI did not span unity could therefore be interpreted as regions where there was strong evidence that the predictions from our model differed significantly from those predicted by the department of health's uniform rate.


All negative binomial regression models, out-of-sample prediction and estimation of 95% PI were conducted in R (V.2.15.1). Cross-validation and model-fit diagnostics were conducted in Stata (V.11). Prediction maps for England and Wales were created using StatPlanet Plus (V.3.0) visualisation software.27


Prediction sample

Our prediction models contained data on 1037 persons with a first episode psychosis in the ÆSOP (n=553; 53.3%) and ELFEP (n=484; 46.7%) studies, ascertained from over 2.4 m person-years at-risk. Twelve participants were excluded from the original ÆSOP sample because they were of no fixed abode and could not be geocoded to an LAD.18

The population at-risk in the prediction sample came from LADs with higher median levels of multiple and employment deprivation, extent of deprivation and population density than the population at-risk in the validation sample, though there were no statistically significant differences in median income deprivation between the two samples (see online supplementary table S1).

Parameter coefficients obtained from the full prediction sample following negative binomial regression are shown in table 2. As previously reported from these data,20 ,28 incidence rates were generally raised in ethnic minority groups compared with the white British population. Models 2–6 included a measure of LAD deprivation (models 2–5) or population density (model 6), which were all significantly associated with an increased incidence of psychotic disorder, after control for individual-level confounders. Each of these models produced a lower AIC score than a model fitted solely with individual-level covariates (model 1), indicating a better fit. Cross-validation suggested that all models achieved good CCC agreement between predicted and observed cases, with low RMSE values (table 2).

Table 2

Prediction models, covariates and fit: all clinically relevant psychoses (F10–39)

Validation sample

Observed participants

We identified 572 potential participants over the first 30 months of the SEPEA study, aged 16–35 years, who met initial acceptance criteria for EIS in East Anglia. We excluded 50 participants (8.7%) who did not meet clinical criteria for the ICD-10 psychotic disorder. This left an incidence sample of 522 participants from nearly 1.4 m person-years at-risk (37.4/100 000 person-years; 95% CI 34.3 to 40.7). A further 2.3 m person-years at-risk accrued in the same region for people aged 36–64 years over this period. Median levels of multiple, income and employment deprivation in the region did not differ significantly from the remainder of England, although the median population density and extent of deprivation in East Anglia were lower than elsewhere in England (see online supplementary table S1).

External model prediction and validation

The overall observed count of cases, aged 16–35 years, in the validation sample (n=522) fell within 95% PIs in four of seven models (models 3–6, table 3). Of these, the observed count (n=522) was closest to the point estimate for model 6 (508.5; 95% PI 449.0, 559.0), fitted with age group, sex, their interaction, ethnic group and LAD population density. The observed count of cases also fell within PIs from this model in five of six EIS in the study region, and 19 of 21 LADs, the most in any model (table 4). This model had the lowest error scores at the EIS (RMSE=11.6) and LAD (RMSE=6.1) levels of any model. Overall, model 6 was ranked highest across all external model-fit diagnostics (table 4). All models outperformed the department of health's uniform figure of 51 per 100 000 person-years (model 7), which generally overestimated cases in the validation sample (overall prediction: 715.7 cases; 95% PI 664.0, 769.0).

Table 3

Observed versus predicted cases in Social Epidemiology of Psychoses in East Anglia study for all clinically relevant psychoses, 16–35 years*

Table 4

External model validation diagnostics*

We reported predicted cases aged 36–64 years from our models (table 4), although we could not test these in the validation sample. Model 6 predicted an additional 262.9 cases aged 36–64 years over a 2.5-year period in East Anglia (95% PI 233.0, 297.0).

We inspected the stratum-specific external validity of our best-fitting model (model 6, see online supplementary table S2), which performed accurately for sex-specific predictions, but less well in age-specific and ethnicity-specific strata. Thus, our model tended to underpredict observed cases in people aged 16–19 years, but overpredicted cases observed in people over 25 years old. With respect to ethnicity, model predictions were consistent with observed FEP cases for people of non-British white, black African, Bangladeshi and mixed ethnicities. However, our model tended to underpredict observed rates in the white British group, and overpredicted rates in the black Caribbean, Indian and Pakistani populations.

Extrapolation to England and Wales

We predicted the expected count and incidence of first episode psychosis per annum in each LAD in England and Wales based on model 6, and visualised these data in maps and tables freely available at Many maps can be visualised (eg, see online supplementary figure S1), including the overall predicted incidence counts and rates for each broad age group at the LAD level, and by sex. We will make PsyMaptic data available by ethnic group when we can improve the validity in ethnic-specific strata. According to our model, the annual number of new FEP cases in England and Wales would be 8745 (95% PI 8558, 8933), of which our model predicted 67.9% (n=5939; 95% PI 5785, 6102) would be seen through EIS. Only 176 (95% PI 151, 203) cases aged 16–64 years were forecast in Wales per annum. Assuming that our prediction model is accurate, it indicated that the Department of Health's current uniform rate of 51/100 000 person-years was higher than the predicted point estimates for rates forecast by our PsyMaptic model in 351 LADs (93%) in England and Wales, but was lower than that predicted by our model in Birmingham and several London boroughs (see online supplementary figure S2, left-hand map). Under a conservative approach, these differences achieved statistical significance in parts of London (where the Department of Health's model underestimated the need as predicted by PsyMaptic), and in some more rural parts of England and Wales (where the Department of Health's model overestimated the need; see online supplementary figure S2, right-hand map).


Principal findings

We have developed and tested several epidemiological prediction models to forecast FEP incidence in England and Wales, having taken into account regional differences in the sociodemographic and socioenvironmental profiles of different populations. Inspection of our data suggested that a model fitted with age group, sex, their interaction, ethnic group and LAD-level population density provided the greatest external predictive validity when compared with the observed FEP caseload ascertained through EIS in our validation sample. This model also had good apparent validity across the entire age range (16–64 years). All models outperformed the Department of Health's current gold standard for EIS commissioning,5 based on a uniform incidence rate. Our data suggested that the original figure used to commission EIS probably overestimated the true incidence of FEP in rural areas, and underestimated rates in urban settings. However, we acknowledge that commissioning decisions will need to be based on several additional factors, including the level of preclinical or non-psychotic psychopathology requiring assessment at initial referral to EIS, and variation in service organisation, remit and delivery.

Limitations and future development

Our prediction models were based on epidemiological data obtained from large, robust population-based FEP studies for people aged 16–64 years.18 ,19 The best-fitting model had good apparent validity over this age range, and good external validity over the age range 16–35 years. While 16–35 years covers the majority of adult onset psychosis cases seen in mental health services, we recognise that some EIS teams incept people from 14 years old. We were unable to extrapolate our models to this age range, given the current absence of incidence data for this group in England. Data from Scandinavia suggest that the incidence of such ‘early onset’ psychoses is absolutely low,29 although the rate may have increased over the last few decades, probably as a result of movement towards earlier detection. We were also unable to externally validate prediction models for people aged 36–64 years, because comparable observed incidence data were not available in our validation sample. We have no reason to believe our predictions will be invalid for this group, however, since the empirical data which underpinned our models were ascertained from the same two large, well-conducted studies as for data on the younger age group.18 ,19 ,28 Furthermore, published findings from these studies are consistent with the wider epidemiological literature on FEP in England and internationally.17 ,21 ,30 It will be important to validate the predictive capability of our model(s) in this age range, and we will seek to identify suitable samples to do so in future versions of PsyMaptic.

Our best-fitting overall model demonstrated excellent external validity for predicting sex-specific FEP cases in our validation region (ie, SEPEA). It performed less well across age-specific and ethnic-specific stratum in this region. With respect to age, this discrepancy is most likely to be a function of EIS provision itself, which seeks to intervene as early as possible in the onset of psychosis. The effect of this will reduce median age at onset in comparison to studies conducted prior to the introduction of EIS, such as the ÆSOP and ELFEP studies upon which our models are based. Future versions of PsyMaptic will incorporate empirical data from post-EIS studies to improve age-specific predictions. The validity of our model in some ethnic groups also requires further refinement. Much of the prediction data underlying our models came from urban environments with large proportions of ethnic minority groups. The sociodemographic profile and sociocultural experiences of these groups may be very different to those of their counterparts in other, less urban, parts of England, thus altering psychosis risk in different ethnic groups. In our observed data, a larger proportion of cases were white British than predicted by our model. If ethnicity is a partial proxy for exposure to deleterious socioenvironmental experiences, such as the combined effect of social inequality, fragmentation, deprivation and population density,31 then simultaneously incorporating such factors into our models may improve their predictive validity by ethnicity. Alternatively, risk by ethnic group may be conditional upon (ie, interact with) environmental factors in urban areas (as with the ethnic density effect32 ,33), but whether such interactions exist in less urban regions is not known. Forthcoming SEPEA and PsyMaptic data will explore such possibilities.

All prediction models had reasonable apparent validity, although our proposed model performed slightly worse (most noticeably for AIC) than models which included deprivation (ie, models 2–4) instead of population density. Our decision to use model 6 as our proposed candidate for the prediction tool was supported by the fact that it produced the most accurate external forecasts of any model, despite considerable socioenvironmental differences between regions in our prediction and validation samples. We were unable to predict the expected incidence of psychotic disorder in geographical areas smaller than LADs, such as electoral wards, or to other parts of the UK, because appropriate denominator data were not published as mid-term census estimates. The 2011 census will provide small area and national data for the whole of the UK, scheduled for release in mid-2013. This will allow us to update our tool to the latest population estimates for the UK, and refine our PsyMaptic tool at a smaller geographical level for fine-grained healthcare commissioning. We will then be able to develop models to explore cross-level interactions, such as the association between individual ethnicity and neighbourhood-level ethnic density. Small area prediction models will require a multilevel approach, not attempted here, because obtaining predictions from multilevel random effects models is not straightforward and requires active statistical development.

We believe case ascertainment in our validation sample led to a reliable estimate of the incidence of psychotic disorder for people aged 16–35 years. EIS were the only mental health service for people aged 14–35 years experiencing a first episode of psychosis in East Anglia, minimising the potential for underascertainment in the population at-risk when derived from careful epidemiological design.13 We are confident that our validation sample also contained few false positive cases for any clinically relevant psychoses, since participants were excluded who failed to meet acceptance criteria for EIS, or who did not meet clinical diagnosis for psychotic disorder in the first 6 months following EIS acceptance. It is important to recognise that while our prediction models are based on diagnosed clinically relevant psychotic disorders, service commissioning will also need to account for additional preclinical or non-psychotic psychiatric morbidity presenting to EIS, particularly in services which operate early detection models or implement ‘watch-and-wait’ briefs. The SEPEA data used to validate our models do not predict (1) the number of ‘false positive’ subjects who may require psychiatric triage and assessment, even though they are not accepted by EIS or (2) the number of ‘true positive’ subjects accepted by services, but who did not meet epidemiological criteria for inclusion in the validation sample of the SEPEA study (ie those living outside the catchment area at first contact, or those transferred from other services); these people will consume varying degrees of service resources which need to be considered in service planning.

We also note that pathways to care may affect the level of incidence observed in EIS, since many filters are likely to operate before subjects come to the attention of EIS. These will include local level service organisation and the relationship between Community Mental Health Teams, Child and Adolescent Mental Health and EIS. Furthermore, acceptance criteria for entry to EIS vary, which will have a downstream effect on the number of new cases of clinically relevant psychoses received in each team. Future versions of PsyMaptic will include forecasts for specific psychotic disorders, as standardised research-based diagnoses (using OPCRIT34) are currently being collected in the ongoing SEPEA study. Acceptance rates to EIS may also be influenced by local community awareness of such services. While our prediction models outperformed the current gold standard for EIS commissioning in England when restricted to clinically relevant caseloads, we recommend that our models are best interpreted as forecasts of the expected burden of first episode psychosis in given populations, not the total burden of resource consumption through EIS, given these issues.

We estimated PIs from first principles (DJ) since their derivation is an area of statistical development.35 We used a bootstrap-like methodology to produce 95% PI accounting for natural variation in the validation sample, but ignoring parameter uncertainty in the coefficients included in prediction models, which we assumed to be the true coefficients of risk in the population. Our approach therefore naturally led to slightly artificially narrow 95% PIs. This was not necessarily undesirable for the purpose of model validation and the precise prediction of expected counts, because we wished to apply stringent criteria. Ideally, 95% PIs should take into account both these sources of variation, although we note that parameter uncertainty is usually small compared with the natural variation of the quantities of interest. The addition of more empirical data in the prediction sample would not lead to narrower 95% PIs, though it would tend to move the point estimate of risk for each coefficient closer to the true value in the population. We do not believe we have misestimated the point estimates of risk across major sociodemographic groups, since our results accord with the wider literature.17 ,21 ,22 We sought independent confirmation that our development of 95% PI was correct (personal communication with Professor Ian White, MRC Biostatistics Unit). We recommend that all prediction point estimates from our PsyMaptic model are considered with their 95% PIs, which provide information about the natural variance in expected rates in the population.

Meaning of the findings

If commissioners are to meet the Department of Health's vision to orientate health services around local need,1 ,2 ,5 differences in the demand for EIS and other mental and physical health services will need to be taken into account to allocate finite resources where they are most needed. The PsyMaptic prediction model provides proof-of-concept that when robust empirical epidemiological data are combined with accurate population at-risk estimates, this can be realised. As such, our modelling approach could have utility in many other settings and for many disorders. Our translational approach demonstrated good validity to predict the expected incidence of first episode psychosis, particularly through EIS, where 76% and 63% of all male and female adult-onset FEP cases, respectively, will typically present.18 Since their inception in 2002, EIS in England and Wales have reported both lower11 and higher10 caseloads than they were originally envisioned to manage,5 with shortfalls or excesses in anticipated demand for services aligned to the degree of urbanisation in the underlying catchment area. Others have noted that EIS provision in rural areas may be difficult to implement effectively,14 ,15 and while the MH-PIG acknowledged that “…(a)n understanding of local epidemiology is needed as the size of population covered will depend on a number of different factors” (ref. 5, p. 55), no further elaboration on how to achieve this was provided. We believe PsyMaptic provides a possible tool to overcome this challenge, improving the description and prediction of local population need beyond the MH-PIG and including individual-level and neighbourhood-level indicators of local need.17 From an aetiological perspective, we acknowledge that variables such as ethnicity or population density are likely to be markers for a suite of more complex, interactive social, genetic and environmental determinants of psychosis.36

Our models are not the first to be used to forecast mental illness needs in England and Wales,37 though we believe this is the first attempt to forecast incidence rather than prevalence in the community. We recommend that our prediction methodology is used in conjunction with the wide range of public health observatory data available,38 as well as the caveats presented above. PsyMaptic has been included with other indicators in the Joint Commissioning Panel for Mental Health's forthcoming guidance for commissioning of public mental health services.39 Ongoing monitoring and audit of EIS will be vital to ensure that services meet the fidelity criteria upon which they were originally commissioned,11 ,40 including ensuring that service capacity matches local need as closely as possible. As part of this process, we will need to externally validate our models in a wider range of settings, refining them based on empirical observation.

We note that advocacy expressed for EIS by healthcare professionals in England and Wales broadly correlates with demand for services as predicted by PsyMaptic.41 Though by no means universal, proponents of EIS tend to be located in major conurbations—such as London,42 Birmingham43 or Manchester7 ,44—where the demand for EIS will be highest, while those who suggest EIS resources could be used more effectively through other types of mental health service provision tend to work in more rural communities,15 ,41 where but a handful of young people would be expected to come to the attention of services each year. It is possible that both sides are correct and that more resources are required to help with the tide of psychotic illness in inner cities. Resources might be used more effectively in other ways, elsewhere, so long as the needs of the small number of young people who suffer an FEP each year are met; a dedicated specialist EIS may not be the most effective approach when anticipated demand will be very low.

Given the significant downstream economic savings associated with spending on EIS as estimated in an urban setting,8 PsyMaptic could be used to highlight regions where sufficient investment to appropriate mental health services would lead to the greatest economic gains in terms of mental healthcare expenditure (assuming sustained intervention also leads to improved social and clinical benefit for patients6 ,7). PsyMaptic can also be used to highlight regional variation in demand according to age and sex and, in future versions, by ethnicity. This will allow service planners to tailor provision around the sociocultural characteristics of their local populations. Our prediction tool for first episode psychosis, which translates robust empirical epidemiological data on psychosis risk to the population structure of different regions, offers a methodology for improving the allocation of finite mental health resources based on local need.


The authors are grateful to the clinical services and staff participating in the SEPEA study, and the MHRN for their support. We are grateful to Professor Ian White of the MRC Biostatistics Unit (University of Cambridge) for his guidance on developing our prediction interval methodology for negative binomial regression.

View Abstract
  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:


  • Contributors JBK was responsible for the concept, design, analysis, data extrapolation, interpretation of the data, as well as for drafting the report and developing the content for the websites, and He was also the chief investigator of the SEPEA study, where the validation sample data were obtained. JBK gave final approval of this version of the manuscript to be published. DJ was responsible for developing and initially implementing the statistical approach with regard to prediction intervals from count-based regression. He also contributed to drafting and editing the final version of this manuscript. He gave final approval of this version of the manuscript to be published. JP is the principal collaborator on the SEPEA study in the CAMEO early intervention in psychosis services. He also contributed to editing the final version of this manuscript. He gave final approval of this version of the manuscript to be published. DF is the coprincipal collaborator on the SEPEA study in the Norfolk and Suffolk Foundation Trust. He also contributed to editing the final version of this manuscript. He gave final approval of this version of the manuscript to be published. FW is the coprincipal collaborator on the SEPEA study in the Norfolk and Suffolk Foundation Trust. He also contributed to editing the final version of this manuscript. He gave final approval of this version of the manuscript to be published. JC is the chief investigator of the ELFEP study and provided part of the empirical dataset underlying the prediction model for use in this project. He also contributed to editing the final version of this manuscript. He gave final approval of this version of the manuscript to be published. RMM is the cochief investigator of the ÆSOP study and provided part of the empirical dataset underlying the prediction model for use in this project. He also contributed to editing the final version of this manuscript. He gave final approval of this version of the manuscript to be published. PBJ is the cochief investigator of the ÆSOP study and provided part of the empirical dataset underlying the prediction model for use in this project. He contributed to the development of the prediction methodology and edited the final version of this manuscript. He gave final approval of this version of the manuscript to be published. JBK is the guarantor.

  • Funding Wellcome Trust (grant number WT085540) and NIHR (grant RP-PG-0606-1335).

  • Competing interests All authors declare that JBK has support from the Wellcome Trust for the submitted work (grant number WT085540) and PBJ has support from the NIHR (grant RP-PG-0606-1335).

  • Ethics approval Ethics approval to conduct the original ÆSOP and East London First Episode Psychosis studies was granted from local research ethics committees in their respective centres prior to the original date of onset of the study. We were granted ethical approval to conduct the work related to the present manuscript, including the use of data from the SEPEA study, from the Cambridgeshire 3 REC (09/H0306/39).

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement Extra data are available at our prediction website, PsyMaptic: We have made a large array of prediction data freely available to interested users, who can visualise prediction data in maps and tables for a range of psychotic disorder predictions by major sociodemographic subgroups and by LAD. Future versions of PsyMaptic will release data for other regions and at other geographical levels. Users can also download LAD-specific and national summaries of prediction data from our models in tabular form. Finally, additional explanatory material about how to use the data and the PsyMaptic website is also available at Bespoke data not provided freely on our website may be requested from the authors at the cost of data extraction. The empirical data underlying the prediction models (ÆSOP and ELFEP) or the validation sample (SEPEA) are not freely available as the studies are ongoing, but papers detailing the main findings from these studies have previously been published elsewhere.

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.