Prognostic impact of deep learning-based quantification in clinical stage 0-I lung adenocarcinoma

Zhu, Ying; Chen, Li-Li; Luo, Ying-Wei; Zhang, Li; Ma, Hui-Yun; Yang, Hao-Shuai; Liu, Bao-Cong; Li, Lu-Jie; Zhang, Wen-Biao; Li, Xiang-Min; Xie, Chuan-Miao; Yang, Jian-Cheng; Wang, De-ling; Li, Qiong

doi:10.1007/s00330-023-09845-0

Zhu, Ying; Chen, Li-Li; Luo, Ying-Wei; Zhang, Li; Ma, Hui-Yun; Yang, Hao-Shuai; Liu, Bao-Cong; Li, Lu-Jie; Zhang, Wen-Biao; Li, Xiang-Min; Xie, Chuan-Miao; Yang, Jian-Cheng; Wang, De-ling; Li, Qiong

2023

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Abstract

Objectives To evaluate the performance of automatic deep learning (DL) algorithm for size, mass, and volume measurements in predicting prognosis of lung adenocarcinoma (LUAD) and compared with manual measurements.

Methods A total of 542 patients with clinical stage 0-I peripheral LUAD and with preoperative CT data of 1-mm slice thickness were included. Maximal solid size on axial image (MSSA) was evaluated by two chest radiologists. MSSA, volume of solid component (SV), and mass of solid component (SM) were evaluated by DL. Consolidation-to-tumor ratios (CTRs) were calculated. For ground glass nodules (GGNs), solid parts were extracted with different density level thresholds. The prognosis prediction efficacy of DL was compared with that of manual measurements. Multivariate Cox proportional hazards model was used to find independent risk factors.

Results The prognosis prediction efficacy of T-staging (TS) measured by radiologists was inferior to that of DL. For GGNs, MSSA-based CTR measured by radiologists ((R)MSSA%) could not stratify RFS and OS risk, whereas measured by DL using 0 HU ((2D-AI)MSSA(0HU)%) could by using different cutoffs. SM and SV measured by DL using 0 HU ((SM0HU)-S-AI% and (SV0HU)-S-AI%) could effectively stratify the survival risk regardless of different cutoffs and were superior to (2D-AI)MSSA(0HU)%. (SM0HU)-S-AI% and (SV0HU)-S-AI% were independent risk factors.

Conclusion DL algorithm can replace human for more accurate T-staging of LUAD. For GGNs, (2D-AI)MSSA(0HU)% could predict prognosis rather than (R)MSSA%. The prediction efficacy of (SM0HU)-S-AI% and (SV0HU)-S-AI% was more accurate than of (2D-AI)MSSA(0HU)% and both were independent risk factors.

Clinical relevance statement Deep learning algorithm could replace human for size measurements and could better stratify prognosis than manual measurements in patients with lung adenocarcinoma.