A comparison of conditional autoregressive models used in Bayesian disease mapping

https://doi.org/10.1016/j.sste.2011.03.001Get rights and content

Abstract

Disease mapping is the area of epidemiology that estimates the spatial pattern in disease risk over an extended geographical region, so that areas with elevated risk levels can be identified. Bayesian hierarchical models are typically used in this context, which represent the risk surface using a combination of available covariate data and a set of spatial random effects. These random effects are included to model any overdispersion or spatial correlation in the disease data, that has not been accounted for by the available covariate information. The random effects are typically modelled by a conditional autoregressive (CAR) prior distribution, and a number of alternative specifications have been proposed. This paper critiques four of the most common models within the CAR class, and assesses their appropriateness via a simulation study. The four models are then applied to a new study mapping cancer incidence in Greater Glasgow, Scotland, between 2001 and 2005.

Introduction

Modelling data that relate to contiguous spatial units, such as electoral wards or pixels, is a common problem in a number of statistical applications, including disease mapping (MacNab et al., 2006), geographical association studies (Lee et al., 2009), image analysis (Molina et al., 1999) and agricultural field trials (Besag and Higdon, 1999). The response variables in these applications typically display spatial dependence, that is, observations from units close together are more similar than those relating to units further apart. A number of statistical approaches have been adopted for modelling spatial correlation in such data, including geostatistical models (Biggeri et al., 2006), simultaneously autoregressive models (Kissling and Carl, 2008) and conditional autoregressive models (MacNab, 2003).

In this paper we focus on disease mapping, the aim of which is to map the spatial pattern in disease risk over a predefined study region. Bayesian hierarchical models are typically used in such analyses, where any spatial correlation in the disease data is modelled at the second level of the hierarchy by a set of random effects. These effects are most commonly represented by a conditional autoregressive (CAR) prior distribution, which is a type of Markov random field. A number of models have been proposed within this general class of CAR priors, including the intrinsic and convolution models (both Besag et al., 1991), as well as alternatives proposed by Cressie (1993) and Leroux et al. (1999). However, to our knowledge, no formal comparison has been made of the appropriateness of each of these prior models. Therefore this paper presents such a critique, by comparing both their theoretical properties and practical performance.

The remainder of this paper is organised as follows. Section 2 provides a background to Bayesian disease mapping, while Section 3 presents the four commonly used CAR prior models. Section 4 compares the performance of the four models via simulation, while Section 5 applies them to a new disease mapping study of cancer incidence in Greater Glasgow, UK. Finally, Section 6 contains a concluding discussion and areas for future work.

Section snippets

Disease mapping

In disease mapping studies the region of interest is split into n contiguous small-areas, such as census tracts or electoral wards, and the aim of the study is to detect which areas exhibit elevated disease risks. The observed numbers of disease cases in each small-area are collectively denoted by y=(y1,,yn), where yk denotes the number of cases in area k. To fairly assess which areas exhibit elevated levels of disease risk, the numbers of cases expected to occur in each small-area are

Conditional autoregressive models

In disease mapping studies the random effects ϕ=(ϕ1,,ϕn) are commonly modelled by the class of conditional autoregressive (CAR) prior distributions, which are a type of Markov random field model. These models are specified by a set of n univariate full conditional distributions f(ϕk|ϕ-k) (where ϕ-k=(ϕ1,,ϕk-1,ϕk+1,,ϕn)), for k=1,,n, rather than by a single multivariate distribution f(ϕ). Spatial correlation between the random effects is determined by a binary n×n neighbourhood matrix W,

Simulation study

In this section, we present a simulation study, that compares the performance of the four CAR priors described in the previous section.

Application

This section presents a study mapping the spatial pattern of cancer risk in Greater Glasgow, Scotland, between 2001 and 2005.

Discussion

This paper has critiqued four of the most commonly used conditional autoregressive prior distributions in Bayesian disease mapping, which include the intrinsic and convolution models (both Besag et al., 1991), as well as the alternatives proposed by Cressie (1993) and Leroux et al. (1999). The performance of these models has been quantified by simulation, specifically assessing the accuracy with which they can estimate regression parameters and disease risk surfaces. The paper then applies each

Acknowledgements

The author gratefully acknowledges the valuable comments and suggestions made by two referees, all of which have greatly improved the focus and presentation of this paper. The data and shapefiles used in this paper were provided by the Scottish Government.

References (23)

  • S. Banerjee et al.

    Hierarchical modelling and analysis for spatial data

    (2004)
  • J. Besag et al.

    Bayesian analysis of agricultural field experiments

    J R Stat Soc Ser B

    (1999)
  • J. Besag et al.

    Bayesian image restoration with two applications in spatial statistics

    Ann Inst Stat Math

    (1991)
  • A. Biggeri et al.

    Disease mapping in veterinary epidemiology: a Bayesian geostatistical approach

    Stat Methods Med Res

    (2006)
  • N. Cressie

    Statistics for spatial data, revised ed.

    (1993)
  • L. Eberly et al.

    Identifiability and convergence issues for Markov chain Monte Carlo fitting of spatial models

    Stat Med

    (2000)
  • P. Elliott et al.

    Spatial epidemiology: methods and applications

    (2000)
  • A. Gelman

    Prior distributions for variance parameters in hierarchical models

    Bayesian Anal

    (2006)
  • M. Jerrett et al.

    Spatial analysis of air pollution and mortality in Los Angeles

    Epidemiology

    (2005)
  • W. Kissling et al.

    Spatial autocorrelation and the selection of simultaneous autoregressive models

    Global Ecol Biogeogr

    (2008)
  • A. Lawson

    Bayesian disease mapping: hierarchical modelling in spatial epidemiology

    (2008)
  • Cited by (198)

    • Bayesian disease mapping: Past, present, and future

      2022, Spatial Statistics
      Citation Excerpt :

      The body of Bayesian disease mapping models and related methods for estimation, learning, and inference has grown in remarkable proportion from the early development of Bayesian hierarchical models for univariate disease mapping to multidimensional disease mapping based on data of space, time, multivariate, and multi-array (e.g. data of space, time, multiple diseases, age group and/or gender stratified). The richness of the disease mapping models and their wide range of health science applications are well documented in books (Lawson et al., 1999; Banerjee et al., 2004; Waller and Gotway, 2004; Cressie and Wikle, 2015; Lawson, 2018; Martinez-Beneito and Botella-Rocamora, 2019) and review articles (Richardson et al., 2004; Best et al., 2005; Wakefield, 2007; Ugarte et al., 2009; Lee, 2011; MacNab, 2011, 2018), in addition to a large quantity of research papers. Here, I review key developments of Bayesian disease mapping, with a focus on recent evolution of multivariate and adaptive Gaussian Markov random fields and their impact and importance in Bayesian disease mapping.

    View all citing articles on Scopus
    View full text