Objectives To validate the performances of two prediction models (Brock and Lee models) for the differentiation of minimally invasive adenocarcinoma (MIA) and invasive pulmonary adenocarcinoma (IPA) from preinvasive lesions among subsolid nodules (SSNs).
Design A retrospective cohort study.
Setting A tertiary university hospital in South Korea.
Participants 410 patients with 410 incidentally detected SSNs who underwent surgical resection for the pulmonary adenocarcinoma spectrum between 2011 and 2015.
Primary and secondary outcome measures Using clinical and radiological variables, the predicted probability of MIA/IPA was calculated from pre-existing logistic models (Brock and Lee models). Areas under the receiver operating characteristic curve (AUCs) were calculated and compared between models. Performance metrics including sensitivity, specificity, accuracy, positive predictive value (PPV) and negative predictive value (NPV) were also obtained.
Results For pure ground-glass nodules (n=101), the AUC of the Brock model in differentiating MIA/IPA (59/101) from preinvasive lesions (42/101) was 0.671. Sensitivity, specificity, accuracy, PPV and NPV based on the optimal cut-off value were 64.4%, 64.3%, 64.4%, 71.7% and 56.3%, respectively. Sensitivity, specificity, accuracy, PPV and NPV according to the Lee criteria were 76.3%, 42.9%, 62.4%, 65.2% and 56.3%, respectively. AUC was not obtained for the Lee model as a single cut-off of nodule size (≥10 mm) was suggested by this model for the assessment of pure ground-glass nodules. For part-solid nodules (n=309; 26 preinvasive lesions and 283 MIA/IPAs), the AUC was 0.746 for the Brock model and 0.771 for the Lee model (p=0.574). Sensitivity, specificity, accuracy, PPV and NPV were 82.3%, 53.8%, 79.9%, 95.1% and 21.9%, respectively, for the Brock model and 77.0%, 69.2%, 76.4%, 96.5% and 21.7%, respectively, for the Lee model.
Conclusions The performance of prediction models for the incidentally detected SSNs in differentiating MIA/IPA from preinvasive lesions might be suboptimal. Thus, an alternative risk calculation model is required for the incidentally detected SSNs.
- subsolid nodule
- prediction model
- logistic model
- external validation
- brock model
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Strengths and limitations of this study
This is the first study to externally validate the performance of pre-existing risk prediction models for the incidentally detected pulmonary subsolid nodules.
This study performed head-to-head comparisons between the prediction models for the risk stratification of subsolid nodules.
The main limitation of this study is that it only analysed surgically resected lung nodules, thus inducing selection bias.
Study population was small to conduct separate analyses for the pure ground-glass nodules and part-solid nodules.
Pulmonary subsolid nodules (SSNs) represent a histological spectrum of adenocarcinoma, and its preinvasive precursors, including atypical adenomatous hyperplasia (AAH) and adenocarcinoma-in-situ (AIS).1 SSNs are common findings at chest CT which have been increasingly detected in CT screening studies.2 3 Indeed, according to one prospective screening study, 4.2% of the participants had at least one pure ground-glass nodule (pGGN) and 5.0% had at least one part-solid nodule (PSN) at baseline rounds of screening.2
With this prevalence in mind, numerous studies have justifiably focused on the differentiation of invasive adenocarcinomas from preinvasive lesions4–13 as invasive adenocarcinoma requires surgical resection with conventional lobectomy and lymph node dissection14 whereas preinvasive lesions can be followed up conservatively with annual CT surveillance or resected at a lesser extent (sublobar resection).15 Thus, the discrimination of invasive adenocarcinoma has been a major topic of interest for many radiologists and clinicians to date.
In a quest to obtain quantitative risk-prediction tools for pulmonary nodules, McWilliams et al 16 developed a prediction model (Brock model) using various clinical and radiological features. The Brock model demonstrated higher accuracy in determining the likelihood of malignancy in pulmonary nodules compared with other existing models17 and was also externally validated in three independent screening populations.18–20 Nevertheless, in the context that a substantial percentage of persistent SSNs may belong to the adenocarcinoma spectrum, the performance of the established model in differentiating invasive adenocarcinoma should also be validated in order to encourage the use of the model in routine practice, as suggested by the British Thoracic Society (BTS).21
Lee et al 7 also developed a prediction model (Lee model) using simple size metrics and morphological features for the differentiation of invasive adenocarcinomas appearing as SSNs. The model accuracy was reported to be excellent for the identification of invasive adenocarcinomas. However, it has also not been tested or validated.
Therefore, we aimed to validate the performances of the two prediction models (Brock and Lee models) for the differentiation of minimally invasive adenocarcinomas (MIAs) and invasive pulmonary adenocarcinomas (IPAs) from preinvasive lesions among SSNs. The purpose of our study was to evaluate the feasibility of the two models in the risk stratification of persistent SSNs.
We retrospectively reviewed the electronic medical records of our hospital and found 1915 patients who had undergone surgical resection for lung cancer between 2011 and 2015. Among the 1915 patients, we identified 1073 patients whose pathological diagnoses belonged to the pulmonary adenocarcinoma spectrum including AAH, AIS, MIA and IPA.1 22 Thereafter, we reviewed the thin-section CT images of the patients to include only those with SSNs (reviewers: JSK, JHL, SYA, REY, HL and HK); 548 patients whose lung cancers appeared as solid nodules on CT scans were excluded. We also excluded 76 patients with nodules smaller than 5 mm or larger than 3 cm and 39 patients in whom data regarding the family history of lung cancer were not available. Consequently, 410 patients were included in this study. Among these patients, 18 patients had two nodules and one patient had three nodules. A single nodule was selected randomly for these 19 patients in order to remove within-subject correlation. Therefore, a total of 410 nodules from 410 patients were analysed in the present study (figure 1). There were 174 men and 236 women (median, 61 years; IQR: 54–69 years). As for the nodule type, there were 101 pGGNs and 309 PSNs. IPAs were found in 290 nodules followed by MIA in 52 nodules, AIS in 51 nodules and AAH in 17 nodules. Median nodule size was 15.8 mm (IQR: 11.8–20.9 mm) (table 1).
Patient characteristics including demographic data were collected from the electronic medical records of Seoul National University Hospital. Patient age, sex, pathological diagnosis, family history of lung cancer and nodule location (lobe) were recorded. The thin-section CT images were also reviewed to obtain radiological information of nodules (nodule type, nodule size, solid portion size, solid proportion, lobulation, spiculation and nodule count per scan) and the background lung parenchyma (presence of visually detected emphysema). These features were used as input variables for logistic regression analysis at the Brock model16 and Lee model.7 Nodule size and solid portion size were measured as the maximum transverse diameter (mm) using an electronic calliper. Solid proportion (%) was calculated as the solid portion size divided by the nodule size. Nodule count was defined as the total number of non-calcified nodules at least 1 mm in diameter.16 Image review was conducted by three radiologists (JP, WHL and HK), and each nodule was analysed once by one of these radiologists. Details regarding the CT scanning protocols are described in the online supplementary material.
We previously analysed and reported the measurement variability of SSNs and solid portion size using two same-day repeat CT scans.23 Measurement variability range for the maximum transverse diameter of SSNs on lung window CT images was ±2.2 mm. For the solid portion, it was ±3.7 mm. Inter-reader agreement (κ) of nodule type ranged from 0.80 to 0.96. Therefore, we did not re-evaluate the measurement variability or inter-reader agreement of nodule type in this study.
To investigate whether the variables incorporated in the established models (Brock and Lee models) were significantly different between preinvasive (AAH and AIS) and invasive lesions (MIA and IPA), we first performed a univariate analysis. Categorical variables were analysed using the Pearson χ2 test or Fisher’s exact test, and continuous variables were analysed using the independent t-test or Mann-Whitney U test, as appropriate.
We then calculated the predicted probability from each logistic regression model. For the Brock model, a full model with spiculation was used with input variables of age, sex, family history of lung cancer, emphysema, nodule size, nodule type, nodule location, nodule count per scan and spiculation.16 Regression coefficients and the model constant were available from the original paper.16 Nodule size was subjected to power transformation prior to entry as described previously.16 Age and nodule count were centred at a mean of 62 years and 4, respectively.16 We recorded the predicted probability of each nodule which was a continuous value from 0 to 1 (0 to 100%). For the Lee model, two different methods were used for analysis. For pGGNs, a single cut-off of nodule size (≥10 mm) was used to discriminate invasive lesions as stated by Lee et al.7 In the case of PSNs, four variables (nodule size, solid proportion, lobulation and spiculation) were substituted into the following regression formula.7
This logistic regression formula was originally made to predict a preinvasive lesion. Therefore, predicted probability for an invasive lesion was calculated as '1 - probability of being a preinvasive lesion'. Predicted probability was obtained only for PSNs in terms of the Lee model. No preprocessing of variables was performed.
With the predicted probability obtained through each model, receiver operating characteristic curve (ROC) analysis was performed to investigate the discriminative performance of the prediction models in diagnosing invasive lesions. Areas under the ROC curve (AUCs) were obtained and an optimal cut-off value based on the Youden Index was recorded. We calculated the sensitivity, specificity, accuracy, positive predictive value (PPV) and negative predictive value (NPV) of each model with the optimal cut-off. For the Brock model, two different cut-offs were applied to the calculation: (1) a threshold of 10% risk of malignancy as suggested by the BTS21 and (2) an optimal cut-off based on the Youden Index. ROC analysis was performed for each nodule type separately and then for the entire SSNs in the case of the Brock model. In terms of the Lee model, ROC analysis was performed only for PSNs.
AUCs were compared between the models based on DeLong’s method.24 As the predicted probability of pGGNs was not available for the Lee model, AUC comparison was conducted only for PSNs. Diagnostic accuracy was also compared between the models using the McNemar test.
Lastly, calibration of the models was assessed using the Hosmer-Lemeshow test for the 10 probability groups (deciles). All statistical analyses were performed using two commercial software programs (MedCalc V.12.3.0; MedCalc Software, Mariakerke, Belgium and SPSS V.19.0; IBM SPSS Statistics) and R software V.3.1.0 (http://www.R-project.org; PredictABEL package). A p value <0.05 was considered to indicate statistical significance.
Patient and public involvement
Patients or public were not involved in the development of the research question and outcome measures. No patients were involved in the study design or conduct of the study. Dissemination of the study results to the study participants was not practical given the retrospective nature of our study. Lastly, there were no patient advisers.
Pathological diagnoses of pGGNs and PSNs
Among 101 pGGNs, 42 were preinvasive and 59 were invasive lesions. As for the 309 PSNs, 26 were preinvasive and 283 were invasive lesions.
Comparisons between preinvasive and invasive lesions
For pGGNs, a family history of lung cancer was more frequently observed in patients with invasive lesions (6/59) than in those with preinvasive lesions (0/42; p=0.040). Invasive lesions (14.2±5.4 mm) were also significantly larger than preinvasive lesions (11.1±4.1 mm; p=0.002). In addition, patients with invasive lesions had a smaller nodule count per scan (invasive vs preinvasive lesions: median, 2 vs 4 nodules per scan; p=0.006). There were no significant differences in age, sex, presence of emphysema, nodule location, lobulation and spiculation (table 2).
For PSNs, nodule size, solid portion size and solid proportion were significantly larger in invasive lesions (invasive vs preinvasive lesions: median nodule size, 17.6 mm vs 13.6 mm, p<0.001; median solid portion size, 8.4 mm vs 4.6 mm, p<0.001; median solid proportion, 52.8% vs 36.8%, p=0.032). Lobulation and spiculation were more frequently observed in invasive lesions (invasive vs preinvasive lesions: lobulation, 118/283 vs 5/26, p=0.025; spiculation, 122/283 vs 4/26, p=0.006). There were no significant differences in age, sex, family history of lung cancer, presence of emphysema, nodule location and nodule count per scan (table 3).
SSN risk stratification using the Brock and Lee models
For pGGNs, the AUC of the Brock model’s predicted probability for differentiating invasive lesions from preinvasive lesions was 0.671 (95% CI: 0.571 to 0.762) (table 4). A cut-off of 10%, suggested by the BTS, yielded a sensitivity, specificity, accuracy, PPV and NPV of 32.2%, 90.5%, 56.4%, 82.6% and 48.7%, respectively. Another cut-off of 4.29%, an optimal threshold based on the Youden Index, provided a sensitivity, specificity, accuracy, PPV and NPV of 64.4%, 64.3%, 64.4%, 71.7% and 56.3%, respectively. Brock model for pGGNs showed poor calibration (p<0.001). A nodule size cut-off (10 mm) suggested by Lee et al 7 was also applied to our study population. The resultant sensitivity, specificity, accuracy, PPV and NPV were 76.3%, 42.9%, 62.4%, 65.2% and 56.3%, respectively. There were no significant differences in diagnostic accuracy between the Brock model and Lee criteria (Brock model cut-off 10% vs nodule size cut-off 10 mm, p=0.461; Brock model cut-off 4.29% vs nodule size cut-off 10 mm, p=0.832).
As for PSNs, the AUC of the Brock model was 0.746 (95% CI: 0.694 to 0.794) for the discrimination of invasive lesions from preinvasive lesions (table 5). A cut-off of 10% yielded a sensitivity, specificity, accuracy, PPV and NPV of 82.3%, 53.8%, 79.9%, 95.1% and 21.9%, respectively. The optimal cut-off based on the Youden Index was 10.11% which was very close to the suggested threshold by the BTS. Therefore, performance metrics were not calculated separately. AUC of the Lee model was 0.771 (95% CI: 0.720 to 0.817), and an optimal cut-off of 66.68% provided a sensitivity, specificity, accuracy, PPV and NPV of 77.0%, 69.2%, 76.4%, 96.5% and 21.7%, respectively. AUCs and diagnostic accuracies were not significantly different between the two models (p=0.574 and p=0.169, respectively). In addition, both models exhibited poor calibration (p<0.001).
With respect to the pooled analysis for the entire SSNs, the AUC of Brock model was 0.810 (95% CI: 0.769 to 0.846). Calibration was also poor in this model (p<0.001).
In this study, we revealed that AUCs for the differentiation of invasive lesions among PSNs using the established risk prediction models ranged from 0.746 to 0.771 with no significant differences between the two models. Diagnostic accuracies based on the optimal cut-offs were 79.9% for the Brock model and 76.4% for the Lee model. For pGGNs, the diagnostic accuracy was 56.4%–64.4% for the Brock model depending on the cut-off values used and 62.4% for the Lee criteria. For the entire SSNs, the Brock model showed AUC of 0.810.
McWilliams et al 16 originally developed a lung cancer prediction model (Brock model) using participants enrolled in a lung cancer screening study. Thus, the Brock model initially targeted pulmonary nodules detected on first screening CT. Incidentally detected nodules and surgical candidates were not the original target lesions of this model. However, at present, the BTS recommends using the same diagnostic approach for nodules detected incidentally as those detected through screening.21 BTS also recommends using the Brock model for the risk calculation of both solid nodules and SSNs.21 A cut-off of 10% predicted probability for malignancy is suggested in order to differentiate high-risk SSNs for the performance of biopsy or surgical resection.21 This quantitative diagnostic approach is to discern malignant SSNs with an appropriate false-positive rate. However, it must be noted that most persistent SSNs belong to one of the four categories of the adenocarcinoma spectrum: AAH, AIS, MIA and IPA. Therefore, the potential of the risk-prediction model in discriminating lesions with invasive components (MIA and IPA) should also be tested using pathological diagnosis as a reference standard. Indeed, as clinical management strategies differ substantially between preinvasive and invasive lesions, if it can be feasible to predict the invasiveness of SSNs, clinical planning whether to perform annual CT surveillance, limited resection or conventional lobectomy can be facilitated.
The performance of the prediction models for the risk stratification of SSNs was not optimal according to our study results. For PSNs, AUCs of the two models ranged between 0.746 and 0.771 with diagnostic accuracies close to 80%. The performance of the prediction model for PSNs in the original paper by Lee et al 7 was 0.905 (AUC). The study population of the present study was similar to that of the study by Lee et al.7 An important reason for the performance drop would be the spectrum effect which is a common cause of model performance heterogeneity.25 A variation in the assessment of CT morphological features (lobulation and spiculation) would be another potential cause. Past research on distinguishing invasive adenocarcinomas appearing as SSNs have reported that logistic regression models built with size metrics, morphological features or texture features showed AUCs ranging from 0.79 to 0.98.4–6 9 11 13 However, these models were not tested for an independent cohort or validated externally.
Another important finding of our study was that the PPV for the differentiation of invasive lesions among PSNs was very high for both models, over 95%. In other words, the probability of being an invasive lesion was over 95% for nodules predicted as being invasive through these models. A concern, however, is the high false-negative rate of these models. PSNs predicted as preinvasive lesions, which have a low calculated risk, should be managed according to their solid portion size, if they are persistent lesions.26 A few studies have shown that the solid portions in PSNs are well correlated to the pathological invasive component.12 27 28 Fleischner Society guideline recommends that PSNs with solid components ≥6 mm should be monitored with CT scans at 3–6 months interval.26 PSNs with solid portions larger than 8 mm should be biopsied or surgically resected in consideration of invasive adenocarcinomas.26 BTS also recommends that the solid component size should be considered to further refine the estimate of malignancy risk.21 In addition, growing solid component is also a sign of an invasive adenocarcinoma as described in both guidelines.21 26
The diagnostic accuracies of both the Brock model and Lee criteria were even lower for pGGNs. Among multiple clinical and radiological characteristics investigated in our study, only three variables (family history of lung cancer, nodule size and nodule count per scan) were significantly different between preinvasive and invasive lesions. This implies the need for other useful features for the development of new better prediction models. Features such as nodule volume, mass or radiomic features may provide additional clues for their differentiation.4 29 In addition, changes in nodule characteristics at follow-up CT scans, such as an increase in nodule size, attenuation or new development of a solid portion, may also be valuable for the discrimination.14 Alternatively, computational classification analysis, including deep learning algorithms, which do not require hand-crafted features and can be self-trained directly from raw image pixels, may be another solution for the diagnosis of pGGNs.30
The Brock model has been externally validated for the cohorts of the Danish Lung Cancer Screening Trial19 and National Lung Screening Trial18; AUCs for the discrimination of malignant from benign nodules ranged from 0.834 for the former and 0.963 for the latter. AUCs for the validation cohort of the original paper, the British Columbia Cancer Agency chemoprevention trial cohort was 0.970.16 In addition, for an Australian lung cancer screening cohort, Zhao et al 20 tested the utility of the Brock model for the baseline evaluation of 52 SSNs and demonstrated that the AUC was 0.89. To the contrary, however, the model performance evaluated in our study was lower than those reported in the literature. The main reason for such a discrepancy may be that we included patients who underwent surgical resection of SSNs unlike previous studies. Thus, the proportion of preinvasive lesions was small (16.6%), and a major portion of our study population consisted of invasive lesions (83.4%). Such high prevalence of invasive lesions would have affected our study results. Nevertheless, SSNs of interest in daily clinical practice may be closer to those in our study. In routine practice, transient SSNs, which are definitely benign, do not require risk calculation as they are easily confirmed through follow-up CT scans at short-term intervals.26 In addition, small SSNs <6 mm are usually preinvasive and do not require CT surveillance. On the other hand, particular concern should be given to persistent SSNs ≥6 mm, especially to those with solid components. As the role of biopsy or positron emission tomography is limited for SSNs,15 we supposed that risk prediction models may provide value for more appropriate management planning. In this context, we applied prediction models to surgically resected SSNs for the validation of their clinical utility.
There were several limitations to our study. First, our study was not conducted for a screening cohort as described earlier in this manuscript. The prevalence of preinvasive lesions was low compared with that of the screening setting. Thus, the performance measures in this study should be carefully interpreted with respect to the target population which were the incidentally detected surgical candidates. Second, our retrospective study included a small number of patients, and analyses were conducted separately for pGGNs and PSNs. Separate analysis of pGGNs and PSNs has resulted in a slight underestimation of the performance of Brock model. Third, optimal cut-offs for the models were not obtained from ROC analyses of the original study populations from which the models were derived. Fourth, radiological nodule information was extracted from our heterogeneous CT dataset, in which CT acquisition parameters such as radiation dosage, slice thickness or contrast-enhancement were not uniform across the study population. However, these factors would have had little effect on the variables we used. In addition, all CT scans had thin-section images (slice thickness ≤1.5 mm). Fifth, nodule size and solid portion size were measured as the longest transverse diameter in accordance with the definition of lesion size and solid proportion in the original papers. However, recent analyses have revealed that the usage of average diameter as an input variable may enhance the model performance.31
In conclusion, the performance of the Brock model and Lee model for the differentiation of invasive lesions among SSNs was suboptimal. In particular, both models showed lower performance for pGGNs compared with that for PSNs. Thus, an alternative approach such as computer-aided classification should be developed for the preoperative diagnosis of invasive lesions among SSNs.
We would like to thank Chris Woo, BA, for editorial assistance.
Contributors HK and CMP contributed to conception and design; HK, SJ, JHL, SYA, R-EY, H-jL, JP, WHL, EJH, SML and JMG contributed to acquisition of data, or analysis and interpretation of data; HK, CMP and JMG were involved in drafting the manuscript or revising it critically for important intellectual content; all authors gave approval of the final version of the manuscript.
Funding This study was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Science, ICT & Future Planning (grant number: 2017R1A2B4008517).
Disclaimer Funder had no role in the study design; in the collection, analysis and interpretation of the data; in the writing of the report; and in the decision to submit the paper for publication.
Competing interests None declared.
Patient consent Not required.
Ethics approval This retrospective analysis was approved by the Institutional Review Board of Seoul National University Hospital (IRB No. 1705-116-855).
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement All data are available from the corresponding author.