Background and Objective: Colorectal cancer (CRC) is one of the most prevalent malignancies in the world. The early detection of CRC is not only a simple process, but it is also the key to its treatment. Given that data mining algorithms could be potentially useful in cancer prognosis, diagnosis, and treatment, the main focus of this study is to measure the performance of some data mining classifier algorithms in terms of predicting CRC and providing an early warning to the high-risk groups.
Materials and Methods: This study was performed in 468 subjects (194 CRC patients and 274 non-CRC cases). We used the CRC dataset from the Imam Hospital, Sari, Iran. The Chi-square feature selection method was utilized to analyze the risk factors. Then, four popular data mining algorithms were compared based on their performance in predicting CRC, and, finally, the best algorithm was identified.
Results: The best outcome was obtained by J-48 (F-Measure = 0.826, ROC=0.881, precision= 0.826 and sensitivity =0.827), Bayesian Net was the second-best performer (F-Measure = 0.718, ROC=0.784, precision= 0.719 and sensitivity=0.722). Random-Forest performed the third-best (F-Measure= 0.705, ROC=0.758, precision= 0.719, and sensitivity=0.712). Finally, the MLP technique performed the worst (F-Measure = 0.702, ROC=0.76, precision = 0.701 and sensitivity=0.703).
Conclusion: According to the results, we concluded that the J-48 could provide better insights than other proposed prediction models for clinical applications.