Home

Network Biology, 2021, 11(3): 222-240
[XML] [EndNote] [RefManager] [BibTex] [ Full PDF (941K)] [Comment/Review Article]

Article

Diagnosis of diabetes: A machine learning paradigm using optimized features

Rafid Mostafiz1,2, Khandaker Mohammad Mohi Uddin2, Mohammad Shorif Uddin3, Farhana Binte Hasan2, Mohammad Motiur Rahman1
1Department of Computer Science and Engineering, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
2Department of Computer Science and Engineering, Dhaka International University, Dhaka, Bangladesh
3Department of Computer Science and Engineering, Jahangirnagar University, Dhaka, Bangladesh

Received 24 April 2021;Accepted 29 May 2021;Published 1 September 2021
IAEES

Abstract
Diabetes is considered one of the incurable diseases at present which is caused by hyperglycemia. Modern healthcare finds some attributes such as uncontrolled lifestyle, lack of balanced diets, genetic complexities, excess mental fatigue, obesities, and so on, which are responsible to precipitate the rapid mobility of diabetes diseases. This is not only a single disease but it also damages the nervous systems, heart, kidney, liver, eyes, and various organic metabolisms. Currently, the clinical industries have a huge amount of data for the diagnosis of diabetic patients. Machine learning algorithms can work appropriately to mitigate this tedious task in finding hidden patterns, discovering knowledge from the database, and predict outcomes. This research has proposed an efficient machine learning-based diagnosis methodology that outperforms the existing similar methodologies. The experiment selects the minimum Redundancy Maximum Relevance (mRMR) features from the working dataset and then recursive feature elimination (RFE) technique for optimization. The irregularity problem in the dataset is addressed by the synthetic minority oversampling technique (SMOTE). Machine learning classification is performed on the selected optimized features through Decision Tree (C4.5 DT), K-Nearest Neighbors (KNN), Naive Bayes (NBs), Support Vector Machine (SVM), Logistic Regression (LGR), and Random Forest (RF), where RF classifier produces best-suited results with minimum false detection rate. This experiment has used a 5-fold cross-validation approach to justify the reliability of the proposed model and finally obtain an accuracy of 98.10%.

Keywords diabetic;machine learning;minimum Redundancy Maximum Relevance;Recursive Feature Elimination;Random Forest Classifier.



International Academy of Ecology and Environmental Sciences. E-mail: office@iaees.org
Copyright © 2009-2024 International Academy of Ecology and Environmental Sciences. All rights reserved.
Web administrator: office@iaees.org, website@iaees.org; Last modified: 2024/4/20


Translate page to: