Volume 29, Issue 133 (March & April 2021)                   J Adv Med Biomed Res 2021, 29(133): 100-108 | Back to browse issues page


XML Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Shanbehzadeh M, Nopour R, Kazemi-Arpanahi H. Comparison of Four Data Mining Algorithms for Predicting Colorectal Cancer Risk. J Adv Med Biomed Res 2021; 29 (133) :100-108
URL: http://journal.zums.ac.ir/article-1-6082-en.html
1- Dept. of Health Information Technology, School of Paramedical, Ilam University of Medical Sciences, Ilam, Iran.
2- Dept.of Health Information Technology,School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran.
3- Dept. of Health Information Technology, Abadan Faculty of Medical Sciences, Abadan, Iran. , Hadi.kazemi67@yahoo.com
Abstract:   (142596 Views)

 Background and Objective: Colorectal cancer (CRC) is one of the most prevalent malignancies in the world. The early detection of CRC is not only a simple process, but it is also the key to its treatment. Given that data mining algorithms could be potentially useful in cancer prognosis, diagnosis, and treatment, the main focus of this study is to measure the performance of some data mining classifier algorithms in terms of predicting CRC and providing an early warning to the high-risk groups.
 Materials and Methods: This study was performed in 468 subjects (194 CRC patients and 274 non-CRC cases). We used the CRC dataset from the Imam Hospital, Sari, Iran. The Chi-square feature selection method was utilized to analyze the risk factors. Then, four popular data mining algorithms were compared based on their performance in predicting CRC, and, finally, the best algorithm was identified.
 Results: The best outcome was obtained by J-48 (F-Measure = 0.826, ROC=0.881, precision= 0.826 and sensitivity =0.827), Bayesian Net was the second-best performer (F-Measure = 0.718, ROC=0.784, precision= 0.719 and sensitivity=0.722). Random-Forest performed the third-best (F-Measure= 0.705, ROC=0.758, precision= 0.719, and sensitivity=0.712). Finally, the MLP technique performed the worst (F-Measure = 0.702, ROC=0.76, precision = 0.701 and sensitivity=0.703).                                                                      
 Conclusion: According to the results, we concluded that the J-48 could provide better insights than other proposed prediction models for clinical applications.

Full-Text [PDF 557 kb]   (153664 Downloads) |   |   Full-Text (HTML)  (3062 Views)  

✅ According to the results, we concluded that the J-48 could provide better insights than other proposed prediction models for clinical applications.


Type of Study: Original Article | Subject: Life science
Received: 2020/06/23 | Accepted: 2020/07/16 | Published: 2020/12/4

References
1. REFERENCES
2. Siegel RL, Miller KD, Goding Sauer A, et al. Colorectal cancer statistics, 2020. CA: CancerJ Clin. 2020;70(3):145-64. [DOI:10.3322/caac.21601]
3. Keum N, Giovannucci E. Global burden of colorectal cancer: emerging trends, risk factors and prevention strategies. Nature Rev Gastroenterol Hepatol. 2019;16(12):713-32. [DOI:10.1038/s41575-019-0189-8]
4. Kinar Y, Akiva P, Choman E, et al. Performance analysis of a machine learning flagging system used to identify a group of individuals at a high risk for colorectal cancer. PloS one. 2017;12(2): e0171759. [DOI:10.1371/journal.pone.0171759]
5. Ge H, Yan Y, Di Wu YH, Tian F. Potential role of LINC00996 in colorectal cancer: a study based on data mining and bioinformatics. OncoTarget Ther. 2018;11:4845. [DOI:10.2147/OTT.S173225]
6. Roberts PO, de Souza TG, Grant BM, et al. Pathologic factors affecting colorectal cancer survival in a Jamaican population-the UHWI experience. J Racial Ethnic Health Disparities. 2020;7(3): 413-20. [DOI:10.1007/s40615-019-00669-7]
7. Tsoi KK, Hirai HW, Chan FC, Griffiths S, Sung JJ. Predicted increases in incidence of colorectal cancer in developed and developing regions, in association with ageing populations. Clin Gastroenterol Hepatol. 2017;15(6):892-900. e4. [DOI:10.1016/j.cgh.2016.09.155]
8. Rieger AK, Mansmann UR. A Bayesian scoring rule on clustered event data for familial risk assessment-An example from colorectal cancer screening. Biometric J. 2018;60(1):115-27. [DOI:10.1002/bimj.201600264]
9. Goshayeshi L, Pourahmadi A, Ghayour-Mobarhan M, et al. Colorectal cancer risk factors in north-eastern Iran: A retrospective cross-sectional study based on geographical information systems, spatial autocorrelation and regression analysis. Geospatial Health. 2019;14(2). [DOI:10.4081/gh.2019.793]
10. Taheri M, Tavakol M, Akbari ME, Almasi-Hashiani A, Abbasi M. Associations of demographic, socioeconomic, self-rated health, and metastasis in colorectal cancer in Iran. Med J Iran. 2019;33:17.
11. Chen H, Lin Z, Wu H, Wang L, Wu T, Tan C. Diagnosis of colorectal cancer by near-infrared optical fiber spectroscopy and random forest. Spectrochimica Acta Part A: Molec Biomolec Spectrosc. 2015;135:185-91. [DOI:10.1016/j.saa.2014.07.005]
12. Kop R, Hoogendoorn M, Ten Teije A, et al. Predictive modeling of colorectal cancer using a dedicated pre-processing pipeline on routine electronic medical records. Comput Biol Med. 2016;76:30-8. [DOI:10.1016/j.compbiomed.2016.06.019]
13. Gage MM, Hueman MT. Colorectal cancer surveillance: What is the optimal frequency of follow-up and which tools best predict recurrence? Curr Colorect Cancer Reports. 2017;13(4):316-24. [DOI:10.1007/s11888-017-0382-5]
14. Nartowt B, Hart G, Muhammad W, Liang Y, Deng J. A model of risk of colorectal cancer tested between studies: building robust machine learning models for colorectal cancer risk prediction. Int J Radiation Oncol. 2019;105(1):E132. [DOI:10.1016/j.ijrobp.2019.06.2265]
15. Safdari R, Arpanahi HK, Langarizadeh M, Ghazisaiedi M, Dargahi H, Zendehdel K. Design a fuzzy rule-based expert system to aid earlier diagnosis of gastric cancer. Acta Informatica Medica. 2018;26(1):19. [DOI:10.5455/aim.2018.26.19-23]
16. Wu X, Kumar V, Quinlan JR, et al. Top 10 algorithms in data mining. Knowledge Inform Sys. 2008;14(1):1-37. [DOI:10.1007/s10115-007-0114-2]
17. Liaw A, Wiener M. Classification and regression by random forest. R news. 2002;2(3):18-22.
18. Amirkhani H, Rahmati M, Lucas PJ, Hommersom A. Exploiting experts' knowledge for structure learning of bayesian networks. IEEE transactions on pattern analysis and machine intelligence. 2016;39(11):2154-70. [DOI:10.1109/TPAMI.2016.2636828]
19. Zhang S, Tjortjis C, Zeng X, Qiao H, Buchan I, Keane J. Comparing data mining methods with logistic regression in childhood obesity prediction. Inform Sys Front. 2009;11(4):449-60. [DOI:10.1007/s10796-009-9157-0]
20. Baitharu TR, Pani SK. Analysis of data mining techniques for healthcare decision support system using liver disorder dataset. Proc Comput Sci. 2016;85:862-70. [DOI:10.1016/j.procs.2016.05.276]
21. Liu RS, Li HJ, Liang FX, et al. Diagnostic accuracy of different computer-aided diagnostic systems for malignant and benign thyroid nodules classification in ultrasound images A systematic review and meta-analysis protocol. Medicine. 2019;98(29):4. [DOI:10.1097/MD.0000000000016227]
22. Pillai L, Chouhan U. Comparative Analysis of machine learning algorithms for Mycobacterium Tuberculosis protein sequences on the basis of physicochemical parameters. J Medical Imag Health Inform. 2014;4(2):212-9. [DOI:10.1166/jmihi.2014.1241]
23. Vijayarani S, Dhayanand S. Data mining classification algorithms for kidney disease prediction. Int J Cybernetics Inform. 2015;4(4):13-25. [DOI:10.5121/ijci.2015.4402]
24. Shah C, Jivani AG. Comparison of data mining classification algorithms for breast cancer prediction. 2013 Fourth international conference on computing, communications and networking technologies (ICCCNT); 2013: IEEE. [DOI:10.1109/ICCCNT.2013.6726477]
25. Abdar M, Kalhori SRN, Sutikno T, Subroto IMI, Arji G. Comparing performance of data mining algorithms in prediction heart diseases. Int J Electric Computer Engin. 2015;5(6):1569-76. [DOI:10.11591/ijece.v5i6.pp1569-1576]
26. Sabouri S, Esmaily H, Shahidsales S, Emadi M. Survival prediction in patients with colorectal cancer using artificial neural network and cox regression. Int J Cancer Manag. 2020;13(1):6. [DOI:10.5812/ijcm.81161]
27. Nartowt BJ, Hart GR, Roffman DA, et al. Scoring colorectal cancer risk with an artificial neural network based on self-reportable personal health data. PloS one. 2019;14(8). [DOI:10.1371/journal.pone.0221421]
28. Sha S, Du W, Parkinson A, Glasgow N. Relative importance of clinical and socio demographic factors in association with post‐operative in‐hospital deaths in colorectal cancer patients in New South Wales: An artificial neural network approach. J Eval Clin Pract. 2020; 26(5):1389-98. [DOI:10.1111/jep.13318]
29. Chau R, Jenkins MA, Buchanan DD, et al. Determining the familial risk distribution of colorectal cancer: a data mining approach. Familial Cancer. 2016;15(2):241-51. [DOI:10.1007/s10689-015-9860-6]
30. Wang Q, Wei J, Chen Z, et al. Establishment of multiple diagnosis models for colorectal cancer with artificial neural networks. Oncol Lett. 2019;17(3):3314-22. [DOI:10.3892/ol.2019.10010]
31. Lualdi M, Cavalleri A, Battaglia L, et al. Early detection of colorectal adenocarcinoma: a clinical decision support tool based on plasma porphyrin accumulation and risk factors. BMC Cancer. 2018;18(1):841. [DOI:10.1186/s12885-018-4754-2]
32. Pourhoseingholi MA, Kheirian S, Zali MR. Comparison of basic and ensemble data mining methods in predicting 5-year survival of colorectal cancer patients. Acta Informatica Medica. 2017;25(4):254. [DOI:10.5455/aim.2017.25.254-258]
33. Zhang B, Liang X, Gao H, Ye L, Wang Y. Models of logistic regression analysis, support vector machine, and back-propagation neural network based on serum tumor markers in colorectal cancer diagnosis. Genet Mol Res. 2016;15(2):10.4238. [DOI:10.4238/gmr.15028643]
34. Pourahmad S, Pourhashemi S, Mohammadianpanah M. Colorectal cancer staging using three clustering methods based on preoperative clinical findings. Asian Pacific J Cancer Prevent. 2016;17(2):823-7. [DOI:10.7314/APJCP.2016.17.2.823]
35. Myte R, Gylling B, Häggström J, et al. One-carbon metabolism and colorectal cancer risk according to molecular subtypes: a Bayesian network learning approach. Cancer Res.2016. [DOI:10.1158/1538-7445.AM2016-4294]
36. Lu W, Fu DL, Kong XX, et al. FOLFOX treatment response prediction in metastatic or recurrent colorectal cancer patients via machine learning algorithms. Cancer Medicine.2020;9(4):1419-29 [DOI:10.1002/cam4.2786]
37. Ai D, Pan H, Han R, Li X, Liu G, Xia LC. Using decision tree aggregation with random forest model to identify gut microbes associated with colorectal cancer. Genes. 2019;10(2):112. [DOI:10.3390/genes10020112]
38. Afshar S, Warden E, Manochehri H, Saidijam M. Application of artificial neural network in miRNA biomarker selection and precise diagnosis of colorectal cancer. Iran Biomed J. 2019;23(3):175-83. [DOI:10.29252/ibj.23.3.175]

Send email to the article author


Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

© 2024 CC BY-NC 4.0 | Journal of Advances in Medical and Biomedical Research

Designed & Developed by : Yektaweb