Estimating joinpoints in continuous time scale for multiple change-point models

doi:10.1016/j.csda.2006.07.044

Computational Statistics & Data Analysis

Volume 51, Issue 5, 1 February 2007, Pages 2420-2427

https://doi.org/10.1016/j.csda.2006.07.044 Get rights and content

Abstract

Joinpoint models have been applied to the cancer incidence and mortality data with continuous change points. The current estimation method [Lerman, P.M., 1980. Fitting segmented regression models by grid search. Appl. Statist. 29, 77–84] assumes that the joinpoints only occur at discrete grid points. However, it is more realistic that the joinpoints take any value within the observed data range. Hudson [1966. Fitting segmented curves whose join points have to be estimated. J. Amer. Statist. Soc. 61, 1097–1129] provides an algorithm to find the weighted least square estimates of the joinpoint on the continuous scale. Hudson described the estimation procedure in detail for a model with only one joinpoint, but its extension to a multiple joinpoint model is not straightforward. In this article, we describe in detail Hudson's method for the multiple joinpoint model and discuss issues in the implementation. We compare the computational efficiencies of the LGS method and Hudson's method. The comparisons between the proposed estimation method and several alternative approaches, especially the Bayesian joinpoint models, are discussed. Hudson's method is implemented by $C ++$ and applied to the colorectal cancer incidence data for men under age 65 from SEER nine registries.

Introduction

It is of great importance to describe the trend of cancer incidence and mortality data. The joinpoint regression model, which is composed of a few continuous linear phases, is often useful to describe changes in trend data. Suppose that for the observations ${(x_{1}, y_{1}), \dots, (x_{n}, y_{n})}, x_{1} ⩽ \dots ⩽ x_{n}$ , the responses $y_{i} = E (y | x_{i}) + e_{i}, i = 1, \dots, n$ , with $E (e_{i}) = 0$ and $V (e_{i}) = σ_{i}^{2}$ for random errors $e_{i}$ . The joinpoint regression models assume that, in each segment, the $E (y | x)$ follows a linear model $E (y | x) = β_{k, 0} + β_{k, 1} x, if τ_{k - 1} < x ⩽ τ_{k}, k = 1, \dots, K + 1,$ where $τ_{0} = - \infty$ , $τ_{K + 1} = \infty$ and $E (y | x)$ is continuous throughout $[x_{0}, x_{n}]$ , such that $β_{k, 0} + β_{k, 1} τ_{k} = β_{k + 1, 0} + β_{k + 1, 1} τ_{k} for k = 1, \dots, K .$ As the response is continuous at the change points, we call model (1) the joinpoint model and the $τ_{k}$ 's joinpoints (JPs). This model is also called segmented-line regression model or piecewise linear model (Kim et al., 2004). An alternative parameterization of the JP model (1) is $E (y | x) = β_{10} + β_{11} x + \sum_{k = 1}^{K} δ_{k} (x - τ_{k})^{+},$ where $δ_{k} = β_{k + 1, 1} - β_{k, 1}$ and $(x - τ_{k})^{+} = x - τ_{k}$ if $x ⩾ τ_{k}$ and 0 otherwise. This parameterization implicitly satisfies the continuity of $E (y | x)$ at $τ_{k}$ .

The current estimation method is the grid search (LGS) method proposed by Lerman (1980), which is implemented by Joinpoint software developed by U.S. National Cancer Institute (http://www.srab.cancer.gov/joinpoint). Although the LGS method can be refined such that the JPs could occur at the middle point or quarterly point between two data points, the computation time for finer grid increases dramatically. Hence, the LGS method is practical only when the JPs occur at the observed data points. Hudson (1966) described the continuous algorithm in detail for a one-JP model and discussed its extension to a model with more than two JPs, which is not straightforward. Our aims in this paper are to describe the details of the extension to a multiple JP model and to compare computational efficiencies of these two fitting methods.

Several alternative methods have been proposed to estimate the locations of the change points for single series in different contexts. For example, Quandt (1958) and Quandt and Ramsey (1978) proposed the procedure of estimating a single change point without continuity constraint at response in economics settings, Hinkley, 1969, Hinkley, 1971 discussed the estimation and inference for the joinpoints in one-joinpoint models, Smith (1975), Carlin et al. (1992), Slate and Turnbull (2000) and Tiwari et al. (2005) use Bayesian approaches to estimate the change points under different scenarios. Most of the available methods estimate the single change/join point. The proposed method in the paper estimates the multiple joinpoints in continuous scale, hence it provides a better fit.

The rest of the paper is organized as follows: The model formulation and notation are described in Section 2 and Hudson's method for a one-JP model is reviewed in Section 3. In Section 4, Hudson's method is extended to a multiple JP model and the issues arising in the implementation are discussed. Then the multiple JP model is applied to colorectal cancer incidence data for men under age 65 from the SEER nine registries. The relative merits of different approaches are discussed in the final section.

Section snippets

Model formulation and notation

Let the $k$ th segment denoted by $S_{k} = {x_{i} : τ_{k - 1} < x_{i} ⩽ τ_{k}} = {x_{i_{k - 1} + 1}, \dots, x_{i_{k}}}$ for $i_{0} = 0$ and $i_{K + 1} = n$ . For each segment $S_{k}, k = 1, \dots, K + 1$ , we define that $Y_{k} = (\begin{matrix} y_{i_{k - 1} + 1} \\ ⋮ \\ y_{i_{k}} \end{matrix}), X_{k} = (\begin{matrix} 1 & x_{i_{k - 1} + 1} \\ ⋮ & ⋮ \\ 1 & x_{i_{k}} \end{matrix}), ε_{k} = (\begin{matrix} e_{i_{k - 1} + 1} \\ ⋮ \\ e_{i_{k}} \end{matrix}), β_{k} = (\begin{matrix} β_{k 0} \\ β_{k 1} \end{matrix}),$ where $E (ε_{k}) = 0$ , $Cov (ε_{k}) = Σ_{k}$ and the weight matrix $W_{k} = Σ_{k}^{- 1}$ . Let $Y = (\begin{matrix} Y_{1} \\ ⋮ \\ Y_{K + 1} \end{matrix}), X = (\begin{matrix} X_{1} & \dots & 0 \\ ⋮ & ⋱ & ⋮ \\ 0 & \dots & X_{K + 1} \end{matrix}), β = (\begin{matrix} β_{1} \\ ⋮ \\ β_{K + 1} \end{matrix}), ε = (\begin{matrix} ε_{1} \\ ⋮ \\ ε_{K + 1} \end{matrix}) .$ Notice that $Y = (y_{1}, \dots, y_{n})^{'}$ and $ε = (e_{1}, \dots, e_{n})^{'}$ . Then, the JP model (1) can be expressed as $Y = X β + ε,$ with constraints (2), where $E (ε) = 0$ , $Cov (ε) = Σ$ . Let $τ = (τ_{1}, \dots, τ_{K})$ . To fit this model, we find the estimates $(\hat{τ}, β$

Review of Hudson's method: one JP $τ_{1}$

In this section, we first summarize Hudson's algorithm for the 1-JP model. The procedure to estimate ${\hat{τ}}_{1}$ is described as follows:

(a)
For the partition $[x_{1}, x_{i}], [x_{i + 1}, x_{n}], 2 ⩽ i ⩽ n - 2$ , fit the least square (LS) regression for each segment. Let $Y_{1} = (\begin{matrix} y_{1} \\ ⋮ \\ y_{i} \end{matrix}), Y_{2} = (\begin{matrix} y_{i + 1} \\ ⋮ \\ y_{n} \end{matrix}), X_{1} = (\begin{matrix} 1 & x_{1} \\ ⋮ & ⋮ \\ 1 & x_{i} \end{matrix}), X_{2} = (\begin{matrix} 1 & x_{i + 1} \\ ⋮ & ⋮ \\ 1 & x_{n} \end{matrix}) .$ The unconstrained weighted LS estimates are ${\tilde{β}}_{k} = ({\tilde{β}}_{k, 0}, {\tilde{β}}_{k, 1})^{'} = (X_{k}^{'} W_{k} X_{k})^{- 1} X_{k}^{'} W_{k} Y_{k}, k = 1, 2 .$
(b)
Let $τ_{1}^{(i)}$ be the solution to the equation ${\tilde{β}}_{10} + {\tilde{β}}_{11} τ = {\tilde{β}}_{20} + {\tilde{β}}_{21} τ$ . If $x_{i} ⩽ τ_{1}^{(i)} < x_{i + 1}$ , then $\hat{τ}_{1}^{(i)}$ is called in the “right” place. That

Estimation of multiple JP model in continuous scale

For a K-JP model, there are $K + 1$ segments, $S_{1}, \dots, S_{K + 1}$ and K JPs. The kth JP $τ_{k} \in [x_{i_{k}}, x_{i_{k} + 1})$ divides segments $S_{k}$ and $S_{k + 1}$ . Recall that the unconstrained LS estimates that minimize $R (τ, β)$ in (4) are $\tilde{β} = ({\tilde{β}}_{1}, \dots, {\tilde{β}}_{K + 1})^{'} = (X^{'} WX)^{- 1} X^{'} WY .$ When the $e_{i}, i = 1, \dots, n$ , are independent, then $W$ is block diagonal and $(X^{'} WX)^{- 1} = (\begin{matrix} (X_{1}^{'} W_{1} X_{1})^{- 1} & \dots & 0 \\ ⋮ & ⋱ & ⋮ \\ 0 & \dots & (X_{K + 1}^{'} W_{K + 1} X_{K + 1})^{- 1} \end{matrix}), X^{'} WY = (\begin{matrix} X_{1}^{'} W_{1} Y_{1} \\ ⋮ \\ X_{K + 1}^{'} W_{K + 1} Y_{K + 1} \end{matrix}),$ and ${\tilde{β}}_{k} = (X_{k}^{'} W_{k} X_{k})^{- 1} X_{k}^{'} W_{k} Y_{k}$ .

The $k$ th JP ${\hat{τ}}_{k}$ is obtained by solving equation ${\tilde{β}}_{k, 0} + {\tilde{β}}_{k, 1} τ_{k} = {\tilde{β}}_{k + 1, 0} + {\tilde{β}}_{k + 1, 1} τ_{k}$ . Let $T_{k}$ denote the location of $τ$

Application

In the “Annual Report to the Nation on the Status of Cancer”, jointly released by the National Cancer Institute (NCI), the American Cancer Society (ACS), the North American Association of Central Cancer Registries (NAACCR), and the Centers for Disease Control and Prevention (CDC), including the National Center for Health Statistics (NCHS), the rate of new cancer cases and deaths for all cancers combined as well as for most of the top 10 cancer sites were reported. The joinpoint regression

Discussion

In this paper, we discuss the computational details of estimating multiple joinpoints in continuous time scale, and compare the computational efficiencies of the two fitting methods, the Hudson's method and the Lerman's grid search method.

In summary, the Hudson's method takes longer time than the basic grid search where only the data points serve as the grid points, but it is more efficient than a grid search with more than four points inserted between the consecutive data points. To illustrate

References (15)

M.L. Brown et al.
The presidential effect: the public health response to media coverage about Ronald Reagan's colon cancer episode
The Public Opinion Quarterly
(1990)
B.P. Carlin et al.
Hierarchical Bayesian analysis of change point problems
Appl. Statist.
(1992)
D.V. Hinkley
Inference about the intersection in two-phase regression
Biometrika
(1969)
D.V. Hinkley
Inference in two-phase regression
J. Amer. Statist. Soc.
(1971)
D.J. Hudson
Fitting segmented curves whose join points have to be estimates
J. Amer. Statist. Soc.
(1966)
H.-J. Kim et al.
Permutation tests for joinpoint regression with applications to cancer rates
Statist. Medicine
(2000)
H.-J. Kim et al.
Comparability of segmented line regression models
Biometrics
(2004)

There are more references available in the full text version of this article.

Cited by (51)

Exploring Heart Failure Mortality Trends and Disparities in Women: A Retrospective Cohort Analysis
2023, American Journal of Cardiology
Heart failure (HF) remains a significant cause of morbidity and mortality in women. Population-level analyses shed light on existing disparities and promote targeted interventions. We evaluated HF-related mortality data in women in the United States to identify disparities based on race/ethnicity, urbanization level, and geographic region. We conducted a retrospective cohort analysis utilizing the Centers for Disease Control and Prevention Wide-ranging Online Data for Epidemiologic Research database to identify HF-related mortality in the death files from 1999 to 2020. Age-adjusted HF mortality rates were standardized to the 2000 US population. We fit log-linear regression models to analyze mortality trends. Age-adjusted HF mortality rates in women have decreased significantly over time, from 97.95 in 1999 to 89.19 in 2020. Mortality mainly downtrended from 1999 to 2012, followed by a significant increase from 2012 to 2020. Our findings revealed disparities in mortality rates based on race and ethnicity, with the most affected population being non-Hispanic Black (age-adjusted mortality rates [AAMR] 90.36), followed by non-Hispanic White (AAMR 83.25), American Indian/Alaska Native (AAMR 64.27), and Asian/Pacific Islander populations (AAMR 37.46). We also observed that nonmetropolitan (AAMR 103.36) and Midwestern (AAMR 90.45) regions had higher age-adjusted mortality rates compared with metropolitan (AAMR 78.43) regions and other US census regions. In conclusion, significant differences in HF mortality rates were observed based on race/ethnicity, urbanization level, and geographic region. Disparities in HF outcomes persist and efforts to reduce HF-related mortality rates should focus on targeted interventions that address social determinants of health, including access to care and socioeconomic status.
Impact of Social Vulnerability and Demographics on Ischemic Heart Disease Mortality in the United States
2023, JACC: Advances
Cardiovascular disease is a leading cause of morbidity and mortality, largely dominated by ischemic heart diseases (IHDs). Social determinants of health, including geographic, psychosocial, and socioeconomic factors, influence the development of IHD.
This study aimed to evaluate yearly trends and disparities in IHD mortality and to assess the impact of social vulnerability.
We performed cross-sectional analyses using United States county-level mortality data and social vulnerability index (SVI) obtained from the Centers for Disease Control and Prevention databases. Age-adjusted mortality rates (AAMRs) per 100,000 population were compared between aggregated U.S. county groups, stratified by demographic information and SVI quartiles. Log-linear regression models were used to identify mortality trends from 1999 to 2020, with inflection points determined through the Monte-Carlo permutation test.
We identified a total of 9,108,644 deaths related to IHD between 1999 and 2020. Overall AAMR decreased from 194.6 in 1999 to 91.8 in 2020. Males (AAMR: 161.51) and Black (AAMR: 141.49) populations exhibited higher AAMR compared to females (AAMR: 93.16) and White (AAMR: 123.34) populations, respectively. Disproportionate AAMRs were observed among nonmetropolitan (AAMR: 136.17) and Northeastern (AAMR: 132.96) regions. Counties with a higher SVI experienced a greater AAMR, with a cumulative excess of 20.91 deaths per 100,000 person-years associated with increased social vulnerability.
Despite a decline in IHD mortality from 1999 to 2020, disparities persisted among racial, gender, and geographic subgroups. A higher SVI was linked to increased IHD mortality. Policy interventions should prioritize integrating the SVI into health care delivery systems to effectively address these disparities.
Recent Decline in the Incidence of Hepatocellular Carcinoma in the United States
2023, Clinical Gastroenterology and Hepatology
Burden of high-risk phenotype of heavy alcohol consumption among obese U.S. population: results from National Health and Nutrition Examination Survey, 1999–2020
2023, Lancet Regional Health - Americas
The phenotype of combined heavy alcohol consumption and obesity has the potential to pose as a considerable health burden in the U.S. No studies using nationally representative data in the U.S. have reported their secular joint prevalence trends. We estimated the prevalence and examined the joint trends of heavy alcohol use and obesity over time among adult U.S. men and women in different age groups and according to race/ethnicity.
Using data from 10 cycles of the U.S. National Health and Nutrition Examination Survey (NHANES) from 1999 to 2020, we examined secular trends in the combined phenotype of heavy drinking and obesity overall and by age-group, sex, and race/ethnicity. The main outcome measures were prevalence of heavy alcohol consumption (>14 drinks/week in men and >7 drinks/week in women) and obesity (BMI ≥30).
In 45,292 adults (22,684 men, mean age 49.26 years; and 22,608 women, mean age 49.86), the overall weighted prevalence of combined heavy alcohol drinking and obesity increased from 1.8% (95% CI: 1.2%, 3.1%) in 1999–2000 to 3.1% (95% CI: 2.7%, 3.7%) in 2017–2020 representing an increase of 72% over time. In the joinpoint regression, the combined phenotype of heavy alcohol consumption and obesity increased by 3.25% (95% CI: 1.67%, 4.85%) per year overall from 1999 to 2017. An increasing trend of 9.94% (95% CI: 2.37%, 18.06%) per year was observed among adults aged between 40 and 59 years from 2007 onwards. Prevalence of heavy alcohol consumption in obesity increased at a faster rate among women (APC, 3.96%; 95% CI: 2.14%, 5.82%) than men (APC, 2.47%; 95% CI: 0.63%, 4.35%), and increased among non-Hispanic Whites (APC, 4.12%; 95% CI: 1.50%, 6.82%) and non-Hispanic Blacks (APC, 2.78%; 95% CI: 0.47%, 5.14%), but not Hispanics.
The prevalence of combined heavy alcohol consumption and obesity increased overall in the U.S., but the rate of increase differed by age, sex, and race/ethnic groups. Given their independent and potential synergistic effects on premature mortality, public health policies on alcohol consumption need to reflect the background obesity epidemic.
Cancer Prevention & Research Institute of Texas (CPRIT) for the Systems Epidemiology of Cancer Training (SECT) Program (RP210037; PI: A. Thrift).
Differential adoption of opioid agonist treatments in detoxification and outpatient settings
2019, Journal of Substance Abuse Treatment
Citation Excerpt :
Thus, admissions were further dichotomized based upon whether treatment with the opioid agonists methadone or buprenorphine was planned (yes/no). Trends in planned opioid agonist utilization in detoxification settings between 2006 and 2015 were assessed using Joinpoint regression analyses (version 4.6.0) (Martinez-Beneito, García-Donato, & Salmerón, 2011; Yu, Barrett, Kim, & Feuer, 2007). Joinpoint regression was designed specifically to assess unique trends over time in cross-sectional data by optimizing the number of regression lines that best fit the shape of a curve (National Cancer Institute, Bethesda, MD).
Opioid use disorder (OUD) is a significant public health problem for which a substantial amount of treatment exists. The degree to which methadone and buprenorphine are administered in different treatment modalities is not clear but critical to understanding treatment success rates and service development strategies.
Data from the national Treatment Episode Dataset for Admissions and Discharges (TEDS-A [N = 4,070,264] and TEDS-D [832,731], respectively) were used to determine the likelihood patients initiating detoxification and outpatient OUD treatment between 2006 and 2015 were expected to receive opioid agonist treatment. Joinpoint regression evaluated significant trends and a generalized linear model with logit link function identified characteristics associated with receiving an agonist during detoxification. TEDS-D informed the percent of patients leaving detoxification against medical advice who did/did not receive an opioid agonist.
Though agonist use in outpatient settings increased by 60% during 2012–2015, agonist use in detoxification was lower than outpatient treatment, decreased significantly by 26% from 2009 to 2015, and never exceeded 16% of detoxification admissions during 2006–2015. In 2015, persons who were under 25, homeless, had co-occurring psychiatric problems, utilized Medicare, Medicaid, or had no insurance, and had no prior OUD treatment or were high treatment utilizers were the least likely to receive an agonist during detoxification.
Efforts to expand opioid agonist access has been successful for outpatient but not detoxification settings. Improving detoxification outcomes is a potentially high impact way for the US to expand efficacious OUD treatment access in the US.
Hypospadias increased prevalence in Surveillance Systems for Birth Defects is observed: Next to climate change are we going towards a human fertility alteration?
2019, European Urology

View all citing articles on Scopus

View full text

Estimating joinpoints in continuous time scale for multiple change-point models

Abstract

Introduction

Section snippets

Model formulation and notation

Review of Hudson's method: one JP τ1

Estimation of multiple JP model in continuous scale

Application

Discussion

The presidential effect: the public health response to media coverage about Ronald Reagan's colon cancer episode

The Public Opinion Quarterly

Hierarchical Bayesian analysis of change point problems

Appl. Statist.

Inference about the intersection in two-phase regression

Biometrika

Inference in two-phase regression

J. Amer. Statist. Soc.

Fitting segmented curves whose join points have to be estimates

J. Amer. Statist. Soc.

Permutation tests for joinpoint regression with applications to cancer rates

Statist. Medicine

Comparability of segmented line regression models

Biometrics

Review of Hudson's method: one JP $τ_{1}$