Computer Science
A Comparative Evaluation of Machine Learning Classifiers in the Diagnosis of Dementia using Clinical Datasets
Authors: Taofik Ajagbe1, Mba Odim2, John Akintayo21, Funmilayo Olopade3, Benjamin Aribisala1,3
Affiliations:
1. Department of Computer Science, Faculty of Computing and Information Technology, Lagos State University, Nigeria
2. Department of Computer Science, Mountain Top University, Ibafo, Ogun State, Nigeria
3. Department of Medicine, University of Chicago, Chicago, USA
Abstract
Aims: This study aimed to develop and compare ML Models for Dementia diagnosis using clinical dataset.
Materials and Methods: The study utilized a publicly available dataset from Kaggle comprising 2,149 patient records. Data pre-processing was employed to address missing values, outlier handling, normalization, and class imbalance using SMOTE. Models were trained on 70% of the data and tested on 30%. Performance was assessed using sensitivity, specificity, accuracy, F1-score, and Area under the receiver operating curve (AUC-ROC). Features include demographic information (age, gender, education), lifestyle factors (BMI, smoking, physical activity), medical history (diabetes, hypertension), vital signs (blood pressure, cholesterol), and cognitive assessments (MMSE, functional assessment, ADL). Six machine learning classifiers; Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbors (KNN), Naive Bayes (NB), and Multi-Layer Perceptron (MLP) were employed to build the model for dementia diagnosis. We evaluated the model using accuracy, precision, recall, F1-score and Area Under Curve. We finally compared the metrics from the six models.
Results: RF classifier achieved the highest performance with 88.32% accuracy, 87.41% Sensitivity, 89.12% Specificity, 88.32% F1-Score and 94.12% AUC-ROC, SVM and MLP followed closely, while KNN showed the lowest performance due to sensitivity to noise.
Conclusion: This work provides valuable insights that ML models can predict Dementia using clinical dataset especially RF which has the highest metrics. ML tools in dementia diagnostics, potentially enhancing early detection and patient outcomes.