English abstract
The early diagnosis of type 2 diabetes mellitus (T2DM) will provide an early treatment
intervention to control disease progression and minimise premature death. This paper presents
artificial intelligence and machine learning prediction models for diagnosing T2DM in the Omani
population more accurately and with less processing time using a specially created dataset. Six
machine learning algorithms: K-nearest neighbours (K-NN), support vector machine (SVM), naive
Bayes (NB), decision tree, random forest (RF), linear discriminant analysis (LDA), and artificial neural
networks (ANN) were applied in MATLAB. All data used were clinical data collected manually from
a prediabetes register and the Al Shifa health system of South Al Batinah Province in Oman. The
results were compared with the most widely used Pima Indian Diabetes dataset. Eleven clinical
features were taken into consideration for predicting T2DM. The random forest and decision tree
models performed better than all the other algorithms, providing an accuracy of 98.38% for Oman
data. When the same model and number of features were used, the accuracy obtained with the Oman
dataset exceeded PID by 9.1%. The analysis showed that T2DM diagnosis efficiency increased with
more features, which is of help in the case of many missing values.