الملخص الإنجليزي
Diabetes Mellitus is a public health challenge with social and economic consequences
for nations irrespective of their economic stability. The Ministry of Health reports that
diabetes is on the rise for Omanis and has increased to the extent that it is the second ranked most prevalent disease among non-communicable diseases, a significant cause
of health loss condition in Oman. Despite this, diabetes is preventable by accepting a
healthier lifestyle. Thus, an appropriate tool for prognosis can assist the doctors in
detecting the disease early and recommending the necessary lifestyle changes to reduce
its progress. Information overloaded to the Clinic and health care centres for the patients
that require sophisticated analysis tools and techniques to extract valuable knowledge
and transform them into understandable patterns for decision-making. The latest data
mining techniques, including ensemble algorithms, can be applied to significantly
obtain the hidden patterns from the voluminous dataset to improve the diagnosis of
diabetes at the initial stage of pre-diabetes. This research aims to construct a predictive
ensemble model that can significantly classify and predict diabetes at the earliest stage
accurately using the dataset collected from the Sultan Qaboos University Hospital
database from Jan 2019 to October 2021. The ensemble model overcomes the variance
problem of individual algorithms by combining the prediction results from different
algorithms to create strong prediction results. The study applied five supervised
classification algorithms to build the Stacking and Voting ensemble models, including
Random Forest (RF), Decision Tree (J48), K-Nearest Neighbours (KNN), Support
Vector Machine (SVM), and Naïve Bayes (NB). These algorithms were evaluated using
different performance evaluation metrics, including recall, accuracy, AUC curve and
precision, and validated using the cross-validations process to overcome the bias of the
individual algorithms. The outperform algorithm is then used to demonstrate the
utilisation of the model and discover the hidden knowledge in the dataset. The study
outcomes confirm that accuracy achieved by the ensemble algorithms, including
stacking, and voting models, produce promising results compared to other techniques
for learning from complex datasets due to combining different prediction results from
the individual algorithms. Because the stacking algorithm combines the correct
prediction results from the individual classifiers, it provides the most accurate results
of 86.9% and an AUC curve of 86.9% in predicting diabetes. Therefore, it can be used
as an automatic prognostic tool to improve the diagnosis system in healthcare.
However, it can be optimised further to progress prediction accuracy by including the
significant risk factors in predicting diabetes. An improved diabetes diagnosis system
using machine learning methodologies results in the awareness and accelerating
decision-making that may help physicians and doctors understand the disease better and
enhance proper treatment. Consequently, it reduces hospitalisations, improves clinical
outcomes, and minimises health expenditure for government and individuals.