J Appl Biomed 16:165-174, 2018 | DOI: 10.1016/j.jab.2018.01.002

A survey on applying machine learning techniques for management of diseases

Enas M.F. El Houby*
National Research Centre, Systems and Information Department, Engineering Division, Cairo, Egypt

During the past years, the increase in scientific knowledge and the massive data production have caused an exponential growth in databases and repositories. Biomedical domain represents one of the rich data domains. An extensive amount of biomedical data is currently available, ranging from details of clinical symptoms to various types of biochemical data and outputs of imaging devices. Manually extracting biomedical patterns from data and transforming them into machine-understandable knowledge is a difficult task because biomedical domain comprises huge, dynamic, and complicated knowledge. Data mining is capable of improving the quality of extracting biomedical patterns.
In this research, an overview of the applications of data mining on the management of diseases is presented. The main focus is to investigate machine learning techniques (MLT) which are widely used to predict, prognose and treat important frequent diseases such as cancers, hepatitis and heart diseases. The techniques namely Artificial Neural Network, K-Nearest Neighbour, Decision Tree, and Associative Classification are illustrated and analyzed. This survey provides a general analysis of the current status of management of diseases using MLT. The achieved accuracy of the various applications ranged from 70% to 100% according to the disease, the solved problem, and the used data and technique.

Keywords: Data mining; K-nearest neighbour; Decision tree; Artificial neural network; Associative classification

Received: September 30, 2017; Revised: December 2, 2017; Accepted: January 11, 2018; Published: August 1, 2018  Show citation

ACS AIP APA ASA Harvard Chicago Chicago Notes IEEE ISO690 MLA NLM Turabian Vancouver
El Houby EMF. A survey on applying machine learning techniques for management of diseases. J Appl Biomed. 2018;16(3):165-174. doi: 10.1016/j.jab.2018.01.002.
Download citation

References

  1. Acharya, U.R., Faust, O., Sree, S.V., Molinari, F., Garberoglio, R., Suri, J., 2011. Costeffective and non-invasive automated benign & malignant thyroid lesion classification in 3D contrast-enhanced ultrasound using combination of wavelets and textures: a class of ThyroScanTM algorithms. Technol. Cancer Res. Treat. 10 (4), 371-380. Go to original source... Go to PubMed...
  2. Acharya, U.R., Chowriappa, P., Fujita, H., Bhat, S., Dua, S., Koh, J.E., et al., 2016. Thyroid lesion classification in 242 patient population using Gabor transform features from high resolution ultrasound images. Knowle. Based Syst. 107, 235-245. Go to original source...
  3. Acharya, U.R., Sudarshan, V.K., Koh, J.E., Martis, R.J., Tan, J.H., Oh, S.L., et al., 2017. Application of higher-order spectra for the characterization of Coronary artery disease using electrocardiogram signals. Biomed. Signal Process. Control 31, 31-43. Go to original source...
  4. Amato, F., López, A., Peña-Méndez, E.M., Van  hara, P., Hampl, A., Havel, J., 2013. Artificial neural networks in medical diagnosis. J. Appl. Biomed. 11 (2), 47-58. Go to original source...
  5. American Medical Informatics Association, 2017. [online] [cit. 2017-11-12]. Available from: http://www.amia.org/informatics/.
  6. Atkov, O.Y., Gorokhova, S.G., Sboev, A.G., Generozov, E.V., Muraseyeva, E.V., Moroshkina, S.Y., Cherniy, N.N., 2012. Coronary heart disease diagnosis by artificial neural networks including genetic polymorphisms and clinical parameters. J. Cardiol. 59 (2), 190-194. Go to original source... Go to PubMed...
  7. Bethapudi, P., Reddy, E.S., Varma, K.V., 2015. Classification of breast cancer using Gini index based Fuzzy Supervised Learning in Quest Decision Tree Algorithm. Int. J. Comput. Appl. 111 (14), 50-57. Go to original source...
  8. Bramer, M., 2007. Principles of Data Mining, vol. 180. Springer, London.
  9. Bremner, D., Demaine, E., Erickson, J., Iacono, J., Langerman, S., Morin, P., Toussaint, G., 2005. Output-sensitive algorithms for computing nearest-neighbour decision boundaries. Discrete Comput. Geom. 33 (4), 593-604. Go to original source...
  10. Brin, S., 1998. Extracting Patterns and Relations from the World Wide Web Paper Presented at the International Workshop on The World Wide Web and Databases. Springer, Berlin. Go to original source...
  11. Chen, Y., Su, Y., Ou, L., Zou, C., Chen, Z., 2015. Classification of nasopharyngeal cell lines (C666 - 1, null, null) via Raman spectroscopy and decision tree. Vib. Spectrosc. 80, 24-29. Go to original source...
  12. Chen, Y., Luo, Y., Huang, W., Hu, D., R-, Q., S-, Z., et al., 2017. Machine-learning-based classification of real-time tissue elastography for hepatic fibrosis in patients with chronic hepatitis B. Comput. Biol. Med. 89, 18-23. Go to original source... Go to PubMed...
  13. El Houby, E.M., 2014. A framework for prediction of response to HCV therapy using different data mining techniques. Adv. Bioinf. 2014 doi:http://dx.doi.org/ 10.1155/2014/181056.
  14. El-Bialy, R., Salamay, M.A., Karam, O.H., Khalifa, M.E., 2015. Feature analysis of coronary artery heart disease data sets. Procedia Comput. Sci. 65, 459-468. Go to original source...
  15. Elmasri, K., Hicks, Y., Yang, X., Sun, X., Pettit, R., Evans, W., 2016. Automatic detection and quantification of abdominal aortic calcification in dual energy X-ray absorptiometry. Procedia Comput. Sci. 96, 1011-1021. Go to original source...
  16. Femina, B.A.S., 2015. Disease diagnosis using rough set based feature selection and K-nearest neighbour classifier. Int. J. Multidiscip. Res. Dev. 2 (4), 664-668.
  17. Flores-Fernández, J.M., Herrera-López, E.J., Sánchez-Llamas, F., Rojas-Calvillo, A., Cabrera-Galeana, P.A., Leal-Pacheco, G., et al., 2012. Development of an optimized multi-biomarker panel for the detection of lung cancer based on principal component analysis and artificial neural network modelling. Expert Syst. Appl. 39 (12), 10851-10856. Go to original source...
  18. Freitas, A.A., 2003. A survey of evolutionary algorithms for data mining and knowledge discovery. In: Ghosh, A., Tsutsui, S. (Eds.), Advances in Evolutionary Computing. Natural Computing Series. Springer, Berlin, pp. 819-845. Go to original source...
  19. Gardezi, S.J.S., Faye, I., Bornot, J.M.S., Kamel, N., Hussain, M., 2017. Mammogram classification using dynamic time warping. Multimedia Tools Appl. 1-22. Go to original source...
  20. Guo, J., Fung, B.C., Iqbal, F., Kuppen, P.J., Tollenaar, R.A., Mesker, W.E., Lebrun, J.-J., 2017. Revealing determinant factors for early breast cancer recurrence by decision tree. Inf. Syst. Front. 1-9. Go to original source...
  21. Han, J., Kamber, M., Pei, J., 2006. Data Mining: Concept and Techniques. Morgan Kaufman Publisher, San Francisco.
  22. Hashem, A.M., Rasmy, M.E.M., Wahba, K.M., Shaker, O.G., 2012. Single stage and multistage classification models for the prediction of liver fibrosis degree in patients with chronic hepatitis C infection. Comput. Methods Programs Biomed. 105 (3), 194-209. Go to original source... Go to PubMed...
  23. Hayashi, Y., Fukunaga, K., 2016. Accuracy of rule extraction using a recursive-rule extraction algorithm with continuous attributes combined with a sampling selection technique for the diagnosis of liver disease. Inf. Med. Unlocked 5, 26- 38. Go to original source...
  24. Hayashi, Y., Nakano, S., 2015. Use of a recursive-rule eXtraction algorithm with J48graft to achieve highly accurate and concise rule extraction from a large breast cancer dataset. Inf. Med. Unlocked. 1, 9-16. Go to original source...
  25. Heath, M., Bowyer, K., Kopans, D., Moore, R., Kegelmeyer, W.P., 2000. The digital database for screening mammography. Proceedings of the 5th International Workshop on Digital Mammography.
  26. Helma, C., Gottmann, E., Kramer, S., 2000. Knowledge discovery and data mining in toxicology. Stat. Methods Med. Res. 9 (4), 329-358. Go to original source... Go to PubMed...
  27. Hosseini, Z.S., Zahedi, E., Attar, H.M., Fakhrzadeh, H., Parsafar, M.H., 2015. Discrimination between different degrees of coronary artery disease using time-domain features of the finger photoplethysmogram in response to reactive hyperemia. Biomed. Signal Process. Control 18, 282-292. Go to original source...
  28. Huang, M., Zhu, X., Ding, S., Yu, H., Li, M., 2006. ONBRIRES: ontology-based biological relation extraction system. Proceedings of the APBC. Go to original source...
  29. Iraji, M.S., 2017. Prediction of post-operative survival expectancy in thoracic lung cancer surgery with soft computing. J. Appl. Biomed. 15 (2), 151-159. Go to original source...
  30. Jabbar, M.A., Deekshatulu, B.L., Chandra, P., 2013. Knowledge discovery using associative classification for heart disease prediction. Intell. Inf. 29-39. Go to original source...
  31. Jilani, T.A., Yasin, H., Yasin, M.M., 2011. PCA-ANN for classification of Hepatitis-C patients. Int. J. Comput. Appl. 14 (7), 1-6 (0975-8887). Go to original source...
  32. Karabulut, E.M., Ibrikçi, T., 2012. Effective diagnosis of coronary artery disease using the rotation forest ensemble method. J. Med. Syst. 36 (5), 3011-3018. Go to original source... Go to PubMed...
  33. Kawamura, Y., Takasaki, S., Mizokami, M., 2012. Using decision tree learning to predict the responsiveness of hepatitis C patients to drug treatment. FEBS Open Biol. 2, 98-102. Go to original source... Go to PubMed...
  34. Kaya, Y., Uyar, M., 2013. A hybrid decision support system based on rough set and extreme learning machine for diagnosis of hepatitis disease. Appl. Soft Comput. 13 (8), 3429-3438. Go to original source...
  35. Kayaalti, Ö., Aksebzeci, B.H., Karahan, I.Ö., Deniz, K., Öztürk, M., Yilmaz, B., et al., 2014. Liver fibrosis staging using CT image texture analysis and soft computing. Appl. Soft Comput. 25, 399-413. Go to original source...
  36. Khalilabad, N.D., Hassanpour, H., Abbaszadegan, M.R., 2016. Fully automatic classification of breast cancer microarray images. J. Electr. Syst. Inf. Technol. 3 (2), 348-359. Go to original source...
  37. Kowal, M., Filipczuk, P., Obuchowicz, A., Korbicz, J., Monczak, R., 2013. Computeraided diagnosis of breast cancer based on fine needle biopsy microscopic images. Comput. Biol. Med. 43 (10), 1563-1572. Go to original source... Go to PubMed...
  38. Kumar, M., Rath, N.K., Rath, S.K., 2016. Analysis of microarray leukemia data using an efficient MapReduce-based K-nearest-neighbor classifier. J. Biomed. Inf. 60, 395-409. Go to original source... Go to PubMed...
  39. Kurosaki, M., Tanaka, Y., Nishida, N., Sakamoto, N., Enomoto, N., Honda, M., et al., 2011. Pre-treatment prediction of response to pegylated-interferon plus ribavirin for chronic hepatitis C using genetic polymorphism in IL28B and viral factors. J. Hepatol. 54 (3), 439-448. Go to original source... Go to PubMed...
  40. Liao, S.-C., Lee, I.-N., 2002. Appropriate medical data categorization for data mining classification techniques. Med. Inf. Internet Med. 27 (1), 59-67. Go to original source... Go to PubMed...
  41. Lubaib, P., Muneer, K.A., 2016. The heart defect analysis based on PCG signals using pattern recognition techniques. Procedia Technol. 24, 1024-1031. Go to original source...
  42. Ma, B.L.W.H.Y., Liu, B., 1998. Integrating classification and association rule mining. Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining.
  43. Mahmoud, A.M., Maher, B.A., El-Horbaty, E.-S.M., Salem, A.B.M., 2013. Analysis of machine learning techniques for gene selection and classification of microarray data. Proceedings of the 6th International Conference on Information Technology.
  44. Medjahed, S.A., Saadi, T.A., Benyettou, A., 2013. Breast cancer diagnosis by using knearest neighbour with different distances and classification rules. Int. J. Comput. Appl. 62 (1) doi:http://dx.doi.org/10.5120/10041-4635. Go to original source...
  45. Mohamed, H., Mabrouk, M.S., Sharawy, A., 2014. Computer aided detection system for micro calcifications in digital mammograms. Comput. Methods Programs Biomed. 116 (3), 226-235. Go to original source... Go to PubMed...
  46. Mohammed, M.A., Ghani, M.K.A., Hamed, R.I., Ibrahim, D.A., Abdullah, M.K., 2017. Artificial neural networks for automatic segmentation and identification of nasopharyngeal carcinoma. J. Comput. Sci. 21, 263-274. Go to original source...
  47. Mohanty, A.K., Senapati, M.R., Lenka, S.K., 2013. Retracted article: an improved data mining technique for classification and detection of breast cancer from mammograms. Neural Comput. Appl. 22 (1), 303-310. Go to original source...
  48. Nahar, J., Imam, T., Tickle, K.S., Chen, Y.-P.P., 2013. Association rule mining to detect factors which contribute to heart disease in males and females. Expert Syst. Appl. 40 (4), 1086-1093. Go to original source...
  49. National Library of Medicine, 2017. [online] [cit. 2017-11-14]. Available from: http://www.nlm.nih.gov/tsd/acquisitions/cdm/subjects58.html.
  50. O'Shea, K., Cameron, S.J., Lewis, K.E., Lu, C., Mur, L.A., 2016. Metabolomic-based biomarker discovery for non-invasive lung cancer screening: a case study. Biochim. et Biophys. Acta (BBA): Gen. Subjects 1860 (11), 2682-2687. Go to original source... Go to PubMed...
  51. Oliveira, J.E., Guled, M.O., Araújo, A., Ott, B., Deserno, T.M., 2008. Toward a standard reference database for computer-aided mammography. Medical Imaging. International Society for Optics and Photonics doi:http://dx.doi.org/10.1117/12.770325. Go to original source...
  52. Onan, A., 2015. A fuzzy-rough nearest neighbour classifier combined with consistency-based subset evaluation and instance selection for automated diagnosis of breast cancer. Expert Syst. Appl. 42 (20), 6844-6852. Go to original source...
  53. Peng, W., Mayorga, R.V., Hussein, E.M., 2016. An automated confirmatory system for analysis of mammograms. Comput. Methods Programs Biomed. 125, 134-144. Go to original source... Go to PubMed...
  54. Rajeswari, K., Vaithiyanathan, V., Neelakantan, T., 2012. Feature selection in ischemic heart disease identification using feed forward neural networks. Procedia Eng. 41, 1818-1823. Go to original source...
  55. Rashid, M.A., Hoque, M.T., Sattar, A., 2014. Association Rules Mining Based Clinical Observations. arXiv preprint arXiv:1401.2571.
  56. Rau, H.-H., Hsu, C.-Y., Lin, Y.-A., Atique, S., Fuad, A., Wei, L.-M., Hsu, M.-H., 2016. Development of a web-based liver cancer prediction model for type II diabetes patients by using an artificial neural network. Comput. Methods Programs Biomed. 125, 58-65. Go to original source... Go to PubMed...
  57. Resino, S., Seoane, J.A., Bellón, J.M., Dorado, J., Martin-Sanchez, F., Álvarez, E., et al., 2011. An artificial neural network improves the non-invasive diagnosis of significant fibrosis in HIV/HCV coinfected patients. J. Infect. 62 (1), 77-86. Go to original source... Go to PubMed...
  58. Ruiz, D., Berenguer, V., Soriano, A., Sánchez, B., 2011. A decision support system for the diagnosis of melanoma: a comparative approach. Expert Syst. Appl. 38 (12), 15217-15223. Go to original source...
  59. Ruiz-Fernández, D., Torra, A.M., Soriano-Payá, A., Marín-Alonso, O., Palencia, E.T., 2016. Aid decision algorithms to estimate the risk in congenital heart surgery. Comput. Methods Programs Biomed. 126, 118-127. Go to original source... Go to PubMed...
  60. Saftoiu, A., Vilmann, P., Gorunescu, F., Janssen, J., Hocke, M., Larsen, M., et al., 2012. Efficacy of an artificial neural network-based approach to endoscopic ultrasound elastography in diagnosis of focal pancreatic masses. Clin. Gastroenterol. Hepatol. 10 (1), 84-90 e81. Go to original source... Go to PubMed...
  61. Samuel, O.W., Asogbon, G.M., Sangaiah, A.K., Fang, P., Li, G., 2017. An integrated decision support system based on ANN and Fuzzy_AHP for heart failure risk prediction. Expert Syst. Appl. 68, 163-172. Go to original source...
  62. Sayed, A.M., Zaghloul, E., Nassef, T.M., 2016. Automatic classification of breast tumours using features extracted from magnetic resonance images. Procedia Comput. Sci. 95, 392-398. Go to original source...
  63. Sethi, G., Saini, B.S., 2016. Computer aided diagnosis system for abdomen diseases in computed tomography images. Biocybernet. Biomed. Eng. 36 (1), 42-55. Go to original source...
  64. Shouman, M., Turner, T., Stocker, R., 2011. Using decision tree for diagnosing heart disease patients. Proceedings of the Ninth Australasian Data Mining Conference 121.
  65. Shouman, M., Turner, T., Stocker, R., 2012. Applying k-nearest neighbour in diagnosing heart disease patients. Int. J. Inf. Educ. Technol. 2 (3), 220. Go to original source...
  66. Suckling, J., Parker, J., Dance, D., Astley, S., Hutt, I., Boggis, C., et al., 2015. Mammographic Image Analysis Society (MIAS) Database v1.21..
  67. Tayefi, M., Tajfard, M., Saffar, S., Hanachi, P., Amirabadizadeh, A.R., Esmaeily, H., et al., 2017. hs-CRP is strongly associated with coronary heart disease (CHD): A data mining approach using decision tree algorithm. Comput. Methods Programs Biomed. 141, 105-109. Go to original source... Go to PubMed...
  68. Thabtah, F.A., Cowling, P.I., 2007. A greedy classification algorithm based on association rule. Appl. Soft Comput. 7 (3), 1102-1111. Go to original source...
  69. Thakur, A., Mishra, V., Jain, S.K., 2011. Feed forward artificial neural network: tool for early detection of ovarian cancer. Sci. Pharm. 79 (3), 493-506. Go to original source... Go to PubMed...
  70. Thomas, M., Das, M.K., Ari, S., 2015. Automatic ECG arrhythmia classification using dual tree complex wavelet based features. AEU Int. J. Electron. Commun. 69 (4), 715-721. Go to original source...
  71. UC Irvine Machine Learning Repository, 2017. [online] [cit. 2017-11-12]. Available from: http://archive.ics.uci.edu/ml.
  72. World Health Organization, 2017a. Cancer.. [online] [cit. 2017-02-13]. Available from: http://www.who.int/mediacentre/factsheets/fs297/en/.
  73. World Health Organization, 2017b. Cardiovascular Diseases (CDV's).. [online] [cit. 2017-05-15]. Available from: http://www.who.int/mediacentre/factsheets/ fs317/en/.
  74. World Health Organization, 2017c. Hepatitis C.. [online] [cit. 2017-10-15]. Available from: http://www.who.int/mediacentre/factsheets/fs164/en/. Go to PubMed...
  75. Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., et al., 2008. Top 10 algorithms in data mining. Knowl. Inf. Syst. 14 (1), 1-37. Go to original source...
  76. Wu, Y., Wu, Y., Wang, J., Yan, Z., Qu, L., Xiang, B., Zhang, Y., 2011. An optimal tumour marker group-coupled artificial neural network for diagnosis of lung cancer. Expert Syst. Appl. 38 (9), 11329-11334. Go to original source...
  77. Yau, T., Tang, V.Y., Yao, T.-J., Fan, S.-T., Lo, C.-M., Poon, R.T., 2014. Development of Hong Kong Liver Cancer staging system with treatment stratification for patients with hepatocellular carcinoma. Gastroenterology 146 (7), 1691-1700 (e1693). Go to original source... Go to PubMed...