Vol 26, No 6 (2021)
Research paper
Published online: 2021-08-12

open access

Page views 6637
Article views/downloads 436
Get Citation

Connect on Social Media

Connect on Social Media

A data mining based clinical decision support system for survival in lung cancer

Beatriz Pontes1, Francisco Núñez2, Cristina Rubio1, Alberto Moreno2, Isabel Nepomuceno1, Jesús Moreno2, Jon Cacicedo3, Juan Manuel Praena-Fernandez4, German Antonio Escobar Rodriguez5, Carlos Parra5, Blas David Delgado León67, Eleonor Rivin del Campo8, Felipe Couñago9, Jose Riquelme1, Jose Luis Lopez Guerra67
Rep Pract Oncol Radiother 2021;26(6):839-848.


Background: A clinical decision support system (CDSS) has been designed to predict the outcome (overall survival) by extracting and analyzing information from routine clinical activity as a complement to clinical guidelines in lung cancer patients.

Materials and methods: Prospective multicenter data from 543 consecutive (2013–2017) lung cancer patients with 1167 variables were used for development of the CDSS. Data Mining analyses were based on the XGBoost and Generalized Linear Models algorithms. The predictions from guidelines and the CDSS proposed were compared.

Results: Overall, the highest ( > 0.90) areas under the receiver-operating characteristics curve AUCs for predicting survival were obtained for small cell lung cancer patients. The AUCs for predicting survival using basic items included in the guidelines were mostly below 0.70 while those obtained using the CDSS were mostly above 0.70. The vast majority of comparisons between the guideline and CDSS AUCs were statistically significant (p < 0.05). For instance, using the guidelines, the AUC for predicting survival was 0.60 while the predictive power of the CDSS enhanced the AUC up to 0.84 (p = 0.0009). In terms of histology, there was only a statistically significant difference when comparing the AUCs of small cell lung cancer patients (0.96) and all lung cancer patients with longer (≥ 18 months) follow up (0.80; p < 0.001). 

Conclusions: The CDSS successfully showed potential for enhancing prediction of survival. The CDSS could assist physicians in formulating evidence-based management advice in patients with lung cancer, guiding an individualized discussion according to prognosis.



On 9th March, 2022 the Addendum has been enclosed with this article.


Article available in PDF format

View PDF Download PDF file


  1. Torre LA, Siegel RL, Jemal A. Lung Cancer Statistics. Adv Exp Med Biol. 2016; 893: 1–19.
  2. Didkowska J, Wojciechowska U, Mańczuk M, et al. Lung cancer epidemiology: contemporary and future challenges worldwide. Ann Transl Med. 2016; 4(8): 150.
  3. Bradley JD, Paulus R, Komaki R, et al. Standard-dose versus high-dose conformal radiotherapy with concurrent and consolidation carboplatin plus paclitaxel with or without cetuximab for patients with stage IIIA or IIIB non-small-cell lung cancer (RTOG 0617): a randomised, two-by-two factorial phase 3 study. Lancet Oncol. 2015; 16(2): 187–199.
  4. Rajan JR, Chelvan AC. A Data Mining Approach to Diagnose Cancer for Therapeutic Decision Making. Altern Ther Health Med. 2019; 25(S1): 2–7.
  5. de Jong EEC, van Elmpt W, Rizzo S, et al. Applicability of a prognostic CT-based radiomic signature model trained on stage I-III non-small cell lung cancer in stage IV non-small cell lung cancer. Lung Cancer. 2018; 124: 6–11.
  6. Roelofs E, Persoon L, Nijsten S, et al. Benefits of a clinical data warehouse with data mining tools to collect data for a radiotherapy trial. Radiother Oncol. 2013; 108(1): 174–179.
  7. He RQ, Cen WL, Cen JM, et al. Clinical Significance of miR-210 and its Prospective Signaling Pathways in Non-Small Cell Lung Cancer: Evidence from Gene Expression Omnibus and the Cancer Genome Atlas Data Mining with 2763 Samples and Validation via Real-Time Quantitative PCR. Cell Physiol Biochem. 2018; 46(3): 925–952.
  8. Xu T, Wei Q, Lopez Guerra JL, et al. HSPB1 gene polymorphisms predict risk of mortality for US patients after radio(chemo)therapy for non-small cell lung cancer. Int J Radiat Oncol Biol Phys. 2012; 84(2): e229–e235.
  9. Hsu ER, Klemm JD, Kerlavage AR, et al. Cancer Moonshot Data and Technology Team: Enabling a National Learning Healthcare System for Cancer to Unleash the Power of Data. Clin Pharmacol Ther. 2017; 101(5): 613–615.
  10. Jiang P, Liu XS. Big data mining yields novel insights on cancer. Nat Genet. 2015; 47(2): 103–104.
  11. Huang Z, Juarez JM, Li X. Data Mining for Biomedicine and Healthcare. J Healthc Eng. 2017; 2017: 7107629.
  12. Iavindrasana J, Cohen G, Depeursinge A, et al. Clinical data mining: a review. Yearb Med Inform. 2009: 121–133.
  13. Shahhoseini R, Ghazvini A, Esmaeilpour M, et al. Presentation of a model-based data mining to predict lung cancer. J Res Health Sci. 2015; 15(3): 189–195.
  14. Wang Z, Feng F, Zhou X, et al. Development of diagnostic model of lung cancer based on multiple tumor markers and data mining. Oncotarget. 2017; 8(55): 94793–94804.
  15. Torlay L, Perrone-Bertolotti M, Thomas E, et al. Machine learning-XGBoost analysis of language networks to classify patients with epilepsy. Brain Inform. 2017; 4(3): 159–169.
  16. Luo Y, Ye W, Zhao X, et al. Classification of Data from Electronic Nose Using Gradient Tree Boosting Algorithm. Sensors (Basel). 2017; 17(10).
  17. Jalal H, Pechlivanoglou P, Krijkamp E, et al. An Overview of R in Health Decision Sciences. Med Decis Making. 2017; 37(7): 735–746.
  18. Adeli E, Li X, Kwon D, et al. Logistic Regression Confined by Cardinality-Constrained Sample and Feature Selection. IEEE Trans Pattern Anal Mach Intell. 2020; 42(7): 1713–1728.
  19. Zhao H, Hodges JS, Carlin BP. Diagnostics for generalized linear hierarchical models in network meta-analysis. Res Synth Methods. 2017; 8(3): 333–342.
  20. Jan SL, Shieh G. Sample size calculations for model validation in linear regression analysis. BMC Med Res Methodol. 2019; 19(1): 54.
  21. Hsieh MH, Sun LM, Lin CL, et al. Development of a prediction model for pancreatic cancer in patients with type 2 diabetes using logistic regression and artificial neural network models. Cancer Manag Res. 2018; 10: 6317–6324.
  22. Kalemkerian GP, Loo BW, Akerley W, et al. NCCN Guidelines Insights: Small Cell Lung Cancer, Version 2.2018. J Natl Compr Canc Netw. 2018; 16(10): 1171–1182.
  23. Ettinger DS, Aisner DL, Wood DE, et al. NCCN Guidelines Insights: Non-Small Cell Lung Cancer, Version 5.2018. J Natl Compr Canc Netw. 2018; 16(7): 807–821.
  24. Thompson RF, Valdes G, Fuller CD, et al. Artificial intelligence in radiation oncology: A specialty-wide disruptive transformation? Radiother Oncol. 2018; 129(3): 421–426.
  25. Zindler JD, Jochems A, Lagerwaard FJ, et al. Individualized early death and long-term survival prediction after stereotactic radiosurgery for brain metastases of non-small cell lung cancer: Two externally validated nomograms. Radiother Oncol. 2017; 123(2): 189–194.
  26. Edgerton ME, Fisher DH, Tang L, et al. Data mining for gene networks relevant to poor prognosis in lung cancer via backward-chaining rule induction. Cancer Inform. 2007; 3: 93–114.
  27. Rivo E, de la Fuente J, Rivo Á, et al. Cross-industry standard process for data mining is applicable to the lung cancer surgery domain, improving decision making as well as knowledge and quality management. Clin Transl Oncol. 2012; 14(1): 73–79.
  28. Yaffe MJ. Emergence of "Big Data" and Its Potential and Current Limitations in Medical Imaging. Semin Nucl Med. 2019; 49(2): 94–104.
  29. Gleason KT, Dennison Himmelfarb CR. Big Data: Contributions, Limitations, and Implications for Cardiovascular Nurses. J Cardiovasc Nurs. 2017; 32(1): 4–6.