Article Text

Original research
Validation of type 2 diabetes subgroups by simple clinical parameters: a retrospective cohort study of NHANES data from 1999 to 2014
  1. Jing Xie1,
  2. Hua Shao2,
  3. Tao Shan3,
  4. Shenqi Jing3,
  5. Yaxiang Shi4,
  6. Junjie Wang3,
  7. Jie Hu3,
  8. Yong Li5,
  9. Ruochen Huang3,
  10. Naifeng Liu1,6,
  11. Yun Liu3
  1. 1College of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, Nanjing, Jiangsu, China
  2. 2Department of Pharmacy, Southeast University Zhongda Hospital, Nanjing, Jiangsu, China
  3. 3Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, Jiangsu, China
  4. 4Department of Information, Southeast University Zhongda Hospital, Nanjing, Jiangsu, China
  5. 5Department of Cardiology, Jiangsu Province People's Hospital and Nanjing Medical University First Affiliated Hospital, Nanjing, Jiangsu, China
  6. 6Department of Cardiology, Southeast University Zhongda Hospital, Nanjing, Jiangsu, China
  1. Correspondence to Dr Naifeng Liu; liunf{at}seu.edu.cn; Dr Yun Liu; liuyun{at}njmu.edu.cn

Abstract

Objectives To verify whether a simplified method based on age, body mass index (BMI) and glycated haemoglobin (HbA1c) is feasible in classifying patients with type 2 diabetes (T2D), and evaluate the predictive ability of subgroups in several health and mortality outcomes.

Design Retrospective cohort study.

Setting The National Health and Nutrition Examination Survey 1999–2014 cycle.

Participants A total of 1960 participants with diabetes and the age at diagnosis greater than 30.

Primary and secondary outcome measures Participants with T2D were assigned to previously defined (by Ahlqvist) subgroups based on five variables: age, BMI, HbA1c, homoeostasis model assessment (HOMA) 2 estimates of β-cell function (HOMA2-B), and insulin resistance (HOMA2-IR), and on three variables: age, BMI and HbA1c. The classification performances of the three variables were evaluated based on 10-fold cross validation, with accuracy, precision and recall as evaluation criteria. Outcomes were assessed using logistic regression and Cox regression analysis.

Results Without HOMA measurements, it is difficult to identify severe insulin-resistant diabetes, but other subgroups can be ideally identified. There is no significant difference between the five variables and the three variables in the ability to predict the prevalence of poor cardiovascular health (CVH), chronic kidney disease, non-alcoholic fatty liver disease and advanced liver fibrosis, and the risk of all-cause, cardiovascular disease and cancer-related mortality (p>0.05), except the prevalence of poor CVH in mild age-related diabetes (p<0.05).

Conclusions A simple classification based on age, BMI and HbA1c could be used to identify T2D with several health and mortality risks, which is accessible in most individuals with T2D. Due to its simplicity and practicality, more patients with T2D can benefit from subgroup specific treatment paradigms.

  • diabetes & endocrinology
  • diabetic nephropathy & vascular disease
  • general diabetes

Data availability statement

Data are available upon reasonable request. The datasets used and analysed during the current study are available from the corresponding author on reasonable request.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Strengths and limitations of this study

  • Our study compares the results of three variables with five variables, while Kahkoska’s study did not. We also explored the classification performance of mild age-related diabetes and mild obesity- related diabetes, whereas Ahlqvist simply provided explanations for identifying those initially diagnosed with severe insulin- resistant diabetes and severe insulin- deficient diabetes.

  • Our study conducts a comparison of comprehensive risk prediction between the two sets of variables regarding the prevalence of cardiovascular health, chronic kidney disease, non-alcoholic fatty liver disease and advanced liver fibrosis at diagnosis, as well as the risks of all-cause, cardiovascular disease and cancer-related mortality many years later.

  • Our study explores the influence of sex and different stages of diabetes on the results.

  • Though those younger than 30 were excluded, type 1 diabetes may also confound our results.

  • There is no information on the risk of complications and the observation time of mortality is not long enough. Further studies are needed to understand the value of classifying type 2 diabetes based on simple variables.

Introduction

Type 2 diabetes (T2D) is a public health concern worldwide.1 It is generally acknowledged that T2D is a heterogeneous disease that progresses through multiple physiopathological mechanisms2 and requires multi-dimensional and sustained treatment.3 Ahlqvist and colleagues tried to categorise diabetes subgroups, which offers an exciting approach to identifying diabetes heterogeneity.4 According to the method of Ahlqvist et al, T2D can be classified into four subgroups: severe insulin-deficient diabetes (SIDD), severe insulin-resistant diabetes (SIRD), mild obesity-related diabetes (MOD) and mild age-related diabetes (MARD). Studies have shown different risks of complications and different intervention strategies across the four subgroups. There is a more rapid progression of kidney disease and higher prevalence of non-alcoholic fatty liver disease (NAFLD) in the SIRD,4 5 while retinopathy and neuropathy are more prevalent in the SIDD.4 5 Those in the SIRD subgroup had a greater glucose reduction efficiency with the use of thiazolidinediones, and those in MARD had a greater glucose reduction efficiency with the use of sulfonylureas.6 These findings help to improve prevention and treatment and allow targeted medication against diabetes.

This sub-classification approach has been tested in many countries, including India,7 Japan8 and China,9 which suggest that the four T2D subgroups were stable and reproducible in non-European populations. However, fasting C-peptide or fasting insulin are required for homoeostasis model assessment (HOMA) variables which are not routinely performed in clinical practice and are not well standardised, particularly in low-income and middle-income countries. The approach is therefore unworkable for two-thirds of the world’s population with diabetes.10

Kahkoska et al used three simple clinical variables: age at diabetes diagnosis, glycated haemoglobin (HbA1c) and body mass index (BMI), attempting to validate T2D subgroups and evaluate their association with diabetes complications.11 They showed that three-variable clustering demonstrated similar results with those obtained with the five-variable clustering approach in All New Diabetics in Scania (ANDIS). Subsequently, Ahlqvist used the same method in ANDIS, and found the simplified clustering strategy based on three variables had a good performance in the identification of the SIDD subgroup and a poor performance in the identification of the SIRD subgroup,12 which emphasised the importance of properly validating alternative clustering methods.

However, there are some limitations in these studies. Kahkoska’s study with only three variables lacked a comparison with five variables. Ahlqvist simply provided explanations for identifying those initially diagnosed with SIRD and SIDD, but did not mention the classification performance of MARD and MOD. Furthermore, according to previous studies, SIRD is the least prevalent T2D among all patients,12 so it remains to be studied whether the overall prediction, clinical risk prediction and therapeutic intervention make sense.

Against this background, the objectives of this report are to (1) explore what are the changes in diabetes subgroups when HOMA2 indicators are missing, and discover the correctness of the division of each subgroup; (2) explore whether sex and different stages of diabetes (newly diagnosed and already suffering from diabetes) have an impact on the results; (3) test the utility of a simpler clustering method in predicting the prevalence of diabetic complications (cardiovascular health (CVH), chronic kidney disease (CKD), nonalcoholic fatty liver disease (NAFLD), advanced liver fibrosis) and the risk of mortality (cancer, cardiovascular disease (CVD) and all-cause mortality).

Materials and methods

Patient and public involvement

No patient was involved in the design and conduct of this study.

Study population

Data were obtained from the National Health and Nutrition Examination Survey (NHANES) (1999–2014; n=82 091). The NHANES is a nationally representative, population-based, multistage, cross-sectional study carried out and approved by the US NCHS.13

Newly diagnosed diabetes was categorised as ‘No’ based on the question ‘Have you ever been told by a doctor or health professional that you have diabetes or sugar diabetes?’ and fasting plasma glucose (FPG)≥7.0 mmol L−1 or HbA1c level≥6.5% (according to American Diabetes Association diabetes diagnostic criteria).9 Already suffering from diabetes was defined as ‘Yes’ to the question ‘Have you ever been told by a doctor or health professional that you have diabetes or sugar diabetes?’.

As information about types of diabetes was not available within the NHANES data, we were uncertain whether the participants had type 1 diabetes or T2D. To address this issue, we excluded participants diagnosed with diabetes before the age of 30.14 The data cleaning algorithm was shown in online supplemental figure S1, and the final sample size was 1960.

Outcomes definition

CVH was evaluated according to the American Heart Association’s Life’s Simple 7 (LS7),15 which includes: smoking, weight, physical activity, diet, blood cholesterol, blood glucose and blood pressure. AHA definitions of CVH for each metric are shown in table 1. Each LS7 component received a score of 0, 1 or 2 to reflect poor, intermediate or ideal health, respectively. An overall LS7 score ranging from 0 to 14 was calculated as the sum of the LS7 component scores. This score was categorised as low (0–4) and ideal (5–14) CVH.16

Table 1

AHA definitions of cardiovascular health for each metric

CKD was defined as kidney damage, with an estimated glomerular filtration rate (eGFR) of less than 60 mL min−1 per 1.73 m2. eGFR was calculated using the CKD-EPI study equation based on serum creatinine.17

The criteria to categorise NAFLD included a US Steatosis Index (USFLI) score of ≥30, no excessive alcohol consumption (average ≤1 alcoholic drink per day for women and ≤2 alcoholic drinks per day for men), negative hepatitis C antibody, and negative Hepatitis B surface antigen.18 The formula for USFLI score is shown as equation (1).19

Advanced liver fibrosis was determined using two non-invasive markers of liver fibrosis: the fibrosis-4 (FIB-4) score and the NAFLD Fibrosis Score (NFS). Their cut-off values were FIB-4 2.67 or NFS >0.676.20 NFS21 and FIB-422 indexes were calculated as shown in equations (2), (3).

Embedded Image(1)

Embedded Image(2)

Embedded Image(3)

Where ‘non-Hispanic Black’ and ‘Mexican American’ have a value of 1 if the person is of that ethnicity and 0 if the person is not; where impaired fasting glucose/diabetes has a value of 1 if the subjects have impaired fasting glucose or diabetes and a value of 0 if they do not; age (years), BMI (kg/m2), waist circumference (cm), glucose (mg/dL), insulin (pmol/L), GGT (U/L), AST (U/L), ALT (U/L), platelet (×109 /L), albumin (g/dL).

Mortality data of the NHANES (1999–2014) participants were provided by the National Centre for Health Statistics (NCHS) using probabilistic record matching with death certificate data found in the National Death Index (NCHS Linked Mortality File) by 31 December 2015. We set mortality-related parameters as all-cause, CVD and cancer-related mortality, based on ICD10 code defined in the NHANES. Follow-up time was the period between the NHANES examination date and the last known date about each person living or death.23 A total of 1960 participants were followed for a median of 98 (95% CI 96 to 101) months, calculated by the reverse Kaplan-Meier method.

Statistical analysis

To demonstrate the reliability of the k-means clustering method based on Euclidean distance, we calculated the silhouette coefficient of different clustering algorithms, and applied the t-SNE visualisation in subgroups.

The classification performance of this study was assessed using 10-fold cross-validation method.24 First, we calculated the centre of MARD, MOD, SIDD and SIRD subgroups in the training set through k-means clustering based on age, BMI, HbA1c, HOMA2 estimates of β-cell function (HOMA2-B) and insulin resistance (HOMA2-IR). Second, the distance from the five variables to the centre of the four subgroups was measured in the testing set to determine which subgroup did participants belong to (by the nearest distance). Third, the same method was used in the same testing set to determine which subgroup did participants belong to based on the distance between age, BMI and HbA1c to the centre of the four subgroups. Finally, we treated the five-variable results as the real value, the three-variable results as predicted value in the testing sets, and used the ‘confusionMatrix’ function in the R package caret to evaluate the performance of the three-variable classifier. Acknowledging the differences between our study and the study of Ahlqvist et al,4 we chose letter-based cluster labels, which correspond to the four subgroups proposed by Ahlqvist: Cluster A (MARD), Cluster B (MOD), Cluster C (SIDD) and Cluster D (SIRD). In order to visualise the participants’ redistribution among different subgroups, the Sankey diagram was used and plotted by the ‘sankeyNetwork’ function in the R package networkD3.

Additional analyses were conducted to compare the five-variable classifier and the three-variable classifier in the ability to predict CVH, CKD, NAFLD, advanced liver fibrosis, all-cause, CVD and cancer-related mortality. Odds ratio (OR) was used to evaluate the prevalence of poor CVH, CKD, NAFLD and advanced liver fibrosis across different classification approaches and subgroups. Logistic regression was conducted to calculate OR. The long-term mortality risks between different classification methods and different subgroups estimated from the Cox regression models were expressed as hazard ratio (HR).

As laboratory techniques varied in the long term as studied in our analysis, we calibrated FPG in 2005–2014 and insulin in 2003–2014 to earlier NHANES surveys, but did not calibrate HbA1c as it is not necessary according to NHANES recommendations.25 26 We also used fasting subsample weights recommended by the NHANES, and the survey package in R (V.4.1-1) to account for the complex sampling design.

Results

Subgroups characteristics

The study population included 1960 adults from the NHANES (1999–2014) database. Through the t-SNE visualisation of four subgroups and the silhouette coefficient of different clustering algorithm, we found that the k-means clustering method based on Euclidean distance was reliable (online supplemental figure S2, supplemental table S1).

As shown in figure 1, the demographic characteristics of the four subgroups (MARD, MOD, SIDD, SIRD) based on five-variable were consistent with many previous reports, which shows that the classification was stable. The SIRD subgroup was characterised by higher HOMA2-IR and HOMA2-B. The SIDD subgroup had the highest HbA1c and the lowest HOMA2-B. The MOD subgroup presented a lower age and higher BMI, average blood glucose, β-cell function and insulin resistance. The MARD subgroup demonstrated a higher age and lower BMI and HbA1c, average β-cell function and insulin resistance.

Figure 1

Distributions and of characteristics in the four type 2 diabetes subgroups based on 5-variable and 3-variable. BMI, body mass index; HbA1c, glycated haemoglobin; HOMA2-B, homoeostasis model assessment 2 of β-cell function; HOMA2-IR, homoeostasis model assessment 2 of insulin resistance; MARD, mild age-related diabetes; MOD, mild obesity-related diabetes; SIDD, severe insulin-deficient diabetes; SIRD, severe insulin-resistant diabetes. total_5, All sample clustering based on 5-variable; total_3, All sample clustering based on 3-variable; female_5, Female sample clustering based on 5-variable; female_3, Female sample clustering based on 3-variable; male_5, Male sample clustering based on 5-variable; male_3, Male sample clustering based on 3-variable; newly_5, Newly diagnosed with T2D sample clustering based on 5-variable; newly_3, Newly diagnosed with T2D sample clustering based on 3-variable; already_5, Already diagnosed with T2D sample clustering based on 5-variable; already_3, Already diagnosed with T2D sample clustering based on 3-variable.

We found that the age of cluster A was generally higher than that of MARD, and that the age of newly diagnosed patients was higher than that of patients with a history of diabetes. Comparing cluster B and MOD, we found that the age of cluster B is generally lower than that of MOD. The BMI level in cluster B was higher than that in MOD. When comparing cluster C with SIDD, the results of all variables were essentially the same, and the BMI for males was generally higher than that for females. Compared with SIRD, cluster D had a lower age in the newly diagnosed sample. The BMI of the newly diagnosed was lower than that of the already diagnosed. HbA1c was essentially the same between cluster D and SIRD. The HbA1c of the newly diagnosed was significantly lower than that of the already diagnosed. In terms of HOMA2-B and HOMA2-IR, cluster D was significantly lower than SIRD, which indicated that it is difficult to identify the high insulin resistance characteristic without the HOMA2 index. However, when the HOMA2 indicators were missing, the characteristics of cluster A, B, C and MARD, MOD and SIDD were basically the same.

Classification performance

Online supplemental table S2 shows the performance evaluation of the three-variable features compared with the five-variable in classifying subgroups. The overall classification accuracy was 0.74 (0.72, 0.76), kappa was 0.62. Females are more accurate in identifying the subgroups than males (accuracy: 0.73 vs 0.72). The accuracy of already diagnosed diabetes is higher than that of newly diagnosed diabetes (accuracy: 0.77 vs 0.73).

The precision (pos pred value) of cluster A, cluster B, cluster C and cluster D were 0.93, 0.84, 0.95 and 0.16, their recall (sensitivity) 0.76, 0.70, 0.89 and 0.42, and balanced accuracy 0.85, 0.83, 0.94 and 0.60, respectively. Except for cluster D, we found other subgroups had a relatively ideal classification effect, and cluster C had the highest accuracy (0.94). The results of males, females, the newly diagnosed and the already diagnosed were similar to those of total samples. In the already diagnosed, cluster A had a better classification effect than total samples. Overall, the sex distribution and stages of diabetes did not differ markedly across the subgroups (table 2).

Table 2

Performance in classifying the four subgroups

The pattern of participants redistribution was shown in a Sankey diagram (figure 2). Overall, a total of 517 (26.38%) participants switched into different subgroups, mainly from MARD and MOD to cluster D. The redistribution of male, female, newly diagnosed and already diagnosed samples were similar to that of total samples.

Figure 2

Patients redistribution among different subgroups. MARD, mild age-related diabetes; MOD, mild obesity-related diabetes; SIDD, severe insulin-deficient diabetes; SIRD, severe insulin-resistant diabetes.

Risk-subgroups association

In the total samples, except for MARD subgroup, there were no significant differences in predicting the prevalence of poor CVH, CKD, NAFLD and advanced liver fibrosisin every two paired subgroups obtained using three-variable and five-variable classification approaches (all the 95% CIs of OR included value 1). The poor CVH prevalence of cluster A is significantly lower than that of MARD, the adj. OR (95% CI) were 0.7 (0.56 to 0.89). In predicting the risk of mortality, the mortality showed no differences in every two paired groups classified by two different approaches (all the 95% CIs of OR included value 1). Different sexes and courses of diabetes had substantially similar results, as shown in table 3.

Table 3

Outcome risks for each subgroup between the five-variable method and three-variable approaches

Table 4 lists the ORs of diabetic complications among different subgroups based on three-variable. Compared with cluster A, cluster B had the highest poor CVH prevalence (adj. OR: 5.95, 95% CI 4.08 to 8.66), cluster C and cluster D’s poor CVH prevalence were also significantly higher. Cluster C had the lowest prevalence of CKD (adj. OR: 0.21, 95% CI 0.11 to 0.41), cluster B also had a significantly lower prevalence. Cluster D had a significantly higher prevalence of NAFLD (adj. OR: 1.68, 95% CI 1.10 to 2.57), and there were no significant difference between cluster A and other subgroups. Among the four subgroups, cluster C had the lowest prevalence of advanced liver fibrosis (adj. OR: 0.31, 95% CI 0.17 to 0.57), while cluster D has the highest prevalence (adj. OR: 1.89, 95% CI 1.31 to 2.73).

Table 4

Different risks among the four subgroups

After a median of 98 (95% CI 96 to 101) months follow-up, the HR of all-cause, CVD and cancer-related mortality among the subgroups were shown in table 4. Compared with cluster A, the adjusted HR (95% CI) of all-cause mortality of cluster B, cluster C and cluster D were 0.42 (0.3 to 0.6), 0.38 (0.21 to 0.69) and 0.66 (0.48 to 0.93), respectively. Cluster A had the highest all-cause mortality. Cluster B had a significantly lower CVD mortality risk than cluster A (adj. HR: 0.35, 95% CI 0.15 to 0.81), and there was no significant difference between the other subgroups. In terms of cancer-related mortality, there was no significant difference among the four subgroups (all the 95% CIs of HR included value 1).

In male samples, we found the prevalence of non-alcoholic fatty liver and advanced liver fibrosis of cluster D showed no significant difference compared with cluster A (adj. OR: 1.18, 95% CI 0.70 to 1.98, and 1.22, 95% CI 0.76 to 1.98). Also, there was no significant difference between cluster B and cluster A regarding the risk of CVD mortality (adj. HR: 0.64, 95% CI 0.34 to 1.2). In female samples, the prevalence of advanced liver fibrosis displayed no significant difference in cluster C and cluster A (adj. OR: 0.71, 95% CI 0.30 to 1.67). Predictions of mortality risk in each subgroup of females were consistent with those of males. In the newly diagnosed samples, cluster C and cluster D had no significant difference in terms of poor CVH compared with cluster A (the 95% CIs of OR included value 1). Compared with cluster A, cluster D had no significant difference in the prevalence of NAFLD (adj. OR: 1.30, 95% CI 0.68 to 2.48), significantly lower prevalence of advanced liver fibrosis (adj. OR: 0.40, 95% CI 0.17 to 0.94). In already diagnosed samples, cluster D had no significantly higher prevalence of NAFLD (adj. OR: 1.61, 95% CI 0.88 to 2.95). Also, cluster D had no significant lower all-cause mortality risk than cluster A, the adjusted HR (95% CI) was 0.78 (0.52 to 1.16).

Discussion

In this study, we tested reliability when HOMA2 indicators are absent to stratify T2D as well as the utility of the simplified clustering method to predict the risk of diabetes-related outcomes, including CVH, CKD, NAFLD, advanced liver fibrosis, cancer-related mortality, CVD-related mortality and all-cause mortality.

Our study shows that without HOMA measurements, MARD, MOD and SIDD subgroups can be ideally identified. The characteristic of SIDD is that its HbA1c was significantly higher than other subgroups, MARD was older and MOD had a higher BMI. Therefore, age, BMI and HbA1c can ideally identify the three subgroups. It is difficult to identify SIRD due to the absence of clinical characteristics of insulin resistance. We also attempted to add reliable and inexpensive surrogate biomarkers of insulin resistance, such as triglyceride-glucose index (TyG),27 28 TyG-body mass index (TyG-BMI),29 and TyG-waist circumference (TyG-WC),29 triglyceride to HDL-C ratio (TG/HDL-C).30 However, no indicators were found to correspond to the distribution of HOMA2-IR, as shown in online supplemental figure S3. In the NHANES 1999–2014, we observed low correlation coefficients among TyG, TyG-BMI, TyG-WC, TG/HDL-C and HOMA2-IR. The correlation coefficient in the group without diabetes was found to be higher than in the group with diabetes, the correlation coefficient between TyG-WC and HOMA2-IR was the greatest in the group with diabetes, as shown in online supplemental table S3, but TyG-WC cannot replace the HOMA2-IR. The relationship between these indexes and HOMA2-IR may be affected by many factors such as race, obesity and health status, which require further investigation. Furthermore, our study showed that three-variable classification and five-variable method had similar effects in risk prediction.

In the total samples, cluster A was found to have the lowest prevalence of poor CVH, while cluster B the highest. The CVH of males was worse than that of females in cluster B. An increasing number of studies indicate that CVH is related to the risk of CVD,31 32 and non-CVD outcomes.33 34 Therefore, we need to pay more attention to males in cluster B to prevent future CVD and non-CVD, and urge them to change their daily life to improve their CVH, such as improve diet quality, change dietary behaviours35 and enhance aerobic exercise.36 For CKDs, we found that cluster A had the highest prevalence of diagnosis, which may be related to their higher age.37 Hence, for newly diagnosed cluster A, the impact of CKD on drug selection should be taken into account while formulating treatment plans. The prevalence of both NAFLD and advanced liver fibrosis was high in cluster D, especially among women. This may be related to insulin resistance.38 Patients in this subgroup should be prioritised for liver assessment. Cluster A presented the highest risk of all-cause and CVD mortality, largely due to higher age. Therefore, more attention should be paid to this subgroup for its high risk of mortality, particularly the screening of CVD, although it had a low prevalence of poor CVD. For patients who are diagnosed as cluster B, the risk of cardiovascular events may be relatively high, and it is also necessary to pay attention to screening for cardiovascular events for this subgroup.

Our study has several strengths. First, we used the results of five variables as a comparison. Second, we conducted a comparison on comprehensive risk prediction, including the prevalence of CVH, CKDs, NAFLD, advanced liver fibrosis at diagnosis, as well as the risk of mortality many years later. Last but not least, we explored the effects of different sexes and stages of diabetes on results. However, several limitations should be noted. First, though those younger than 30 were excluded, type 1 diabetes may also confound our results. Second, since the FIB-4 or NFS are based on metabolic variables, these scores may not perform well in diabetic patients compared with the general NAFLD population.39 Third, there is no information on the risk of complication and the observation time of the mortality is not long enough. Further studies are needed to better understand the value of classification of T2D according to simple variables.

Conclusions

In conclusion, a simple classification based on age, BMI and HbA1c could be used to identify T2D regarding several health and mortality risks. Since this method is clinically feasible in nearly all individuals with T2D, it has a great potential to be widely applied in clinical practice and allows more T2D patients to benefit from subgroup specific treatment paradigms.

Data availability statement

Data are available upon reasonable request. The datasets used and analysed during the current study are available from the corresponding author on reasonable request.

Ethics statements

Patient consent for publication

Ethics approval

All participants gave written informed consent prior to participation, and the methods were approved by the National Centre for Health Statistics.

Acknowledgments

We thank all participants who volunteered as part of the NHANES. We also thank Weiwei Duan for his help with research methods, and Wanke Cao and Min Zhang for their help with the revision of the manuscript. We also thank Guangzhou Huati United Translation Service for its contribution to the language polishing of the manuscript.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Contributors NFL had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. JX, YL and NFL made substantial contributions to the conception and design of the work. TS, SJ, JH and RH collected and assembled the data. JX, JW and YL contributed to the analysis and interpretation of data for the work. JX, HS and YS drafted the manuscript. All authors gave final approval of the version to be published and agree to be responsible for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

  • Funding This work was supported by grants from the big data industry development pilot demonstration project of Ministry of Industry and Information Technology of China [grant numbers 2019243, 202084].

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.