TY - GEN
T1 - Machine learning-powered prediction of recurrence in patients with non-small cell lung cancer using quantitative clinical and radiomic biomarkers
AU - Moon, Sehwa
AU - Choi, Dahim
AU - Lee, Ji Yeon
AU - Kim, Myoung Hee
AU - Hong, Helen
AU - Kim, Bong Seog
AU - Choi, Jang Hwan
N1 - Funding Information:
This research was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government, MSIP (grant no: NRF-2015M3A9A7029725, NRF-2017R1C1B5018287, and NRF-2017M2A2A6A02070522, URL: http://nrf.re.kr). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. This work was never submitted, published, or presented before.
Publisher Copyright:
© 2020 SPIE.
PY - 2020
Y1 - 2020
N2 - Lung cancer is a fatal disease, non-small cell lung cancer (NSCLC) being the most prevalent type. One of the main purposes of researching NSCLC is identifying patients at high risk for recurrence after surgical resection so that specific and suitable treatments can be found for them. The classification of cancer by anatomic disease extent, that is, by tumor-size (T stage) and nodal-involvement (N stage), is the most widely accepted determinant of appropriate treatment and prognosis among practicing clinicians. However, TN stage-based risk prediction can be inaccurate, as there is moderate observer variability when reporting the size of the lesion. Here, we propose a lung cancer-recurrence prediction model using principal component analysis (PCA) and machine learning (ML) techniques and considering radiomic features and clinical data, including the TN stage. After being filtered by a statistical model, the principal components, including Tand N-stage data and the handcrafted radiomic features from CT images, were applied to various ML models (i.e., random forests, support vector machines, naive Bayesian classifiers, and both boosting). We conducted this study, not only on recurrence, but also recurrence within two years of surgical resection, since more than 80% of recurrence occurs within this time frame. In both cases, the experimental results showed that combining radiomic features and clinical data improves the prediction of lung-cancer recurrence over that of models that only use TN stage data in terms of the 5-fold cross-validation accuracy mean, the receiver operating characteristic (ROC), the area under the ROC curve (AUC), and Kaplan-Meier curves. Finally, this model has been embedded in a website and is being prepared for the Ministry of Food and Drug Safety (MFDS) medical device registration and approval in South Korea.
AB - Lung cancer is a fatal disease, non-small cell lung cancer (NSCLC) being the most prevalent type. One of the main purposes of researching NSCLC is identifying patients at high risk for recurrence after surgical resection so that specific and suitable treatments can be found for them. The classification of cancer by anatomic disease extent, that is, by tumor-size (T stage) and nodal-involvement (N stage), is the most widely accepted determinant of appropriate treatment and prognosis among practicing clinicians. However, TN stage-based risk prediction can be inaccurate, as there is moderate observer variability when reporting the size of the lesion. Here, we propose a lung cancer-recurrence prediction model using principal component analysis (PCA) and machine learning (ML) techniques and considering radiomic features and clinical data, including the TN stage. After being filtered by a statistical model, the principal components, including Tand N-stage data and the handcrafted radiomic features from CT images, were applied to various ML models (i.e., random forests, support vector machines, naive Bayesian classifiers, and both boosting). We conducted this study, not only on recurrence, but also recurrence within two years of surgical resection, since more than 80% of recurrence occurs within this time frame. In both cases, the experimental results showed that combining radiomic features and clinical data improves the prediction of lung-cancer recurrence over that of models that only use TN stage data in terms of the 5-fold cross-validation accuracy mean, the receiver operating characteristic (ROC), the area under the ROC curve (AUC), and Kaplan-Meier curves. Finally, this model has been embedded in a website and is being prepared for the Ministry of Food and Drug Safety (MFDS) medical device registration and approval in South Korea.
KW - hand-crafted features
KW - machine learning
KW - non-small cell lung cancer
KW - radiomics
KW - recurrence risk
UR - http://www.scopus.com/inward/record.url?scp=85085516010&partnerID=8YFLogxK
U2 - 10.1117/12.2549962
DO - 10.1117/12.2549962
M3 - Conference contribution
AN - SCOPUS:85085516010
T3 - Progress in Biomedical Optics and Imaging - Proceedings of SPIE
BT - Medical Imaging 2020
A2 - Hahn, Horst K.
A2 - Mazurowski, Maciej A.
PB - SPIE
T2 - Medical Imaging 2020: Computer-Aided Diagnosis
Y2 - 16 February 2020 through 19 February 2020
ER -