Applying supervised machine learning algorithms & ensemble models to enhance credit card fraud detection.

Applying supervised machine learning algorithms & ensemble models to enhance credit card fraud detection.

Source

Master's thesis

Author

Al-Balushiyah, Abrar Ahmed.

Country

Oman

City

Muscat

Publisher

Sultan Qaboos University

Gregorian

2024

English abstract


The alarming rise of credit card fraud poses a significant threat to individuals, financial 
institutions, businesses, and governments. Fraudsters employ phishing activities to 
commit fraud and cause significant annual economic loss. To address this challenge, 
efficient fraud detection systems must be used to identify and detect fraud. Yet, 
identifying credit card fraud is a challenging task due to various factors, including the 
absence of straightforward techniques, imbalanced datasets related to credit cards, and 
the lack of a standard evaluation matrix to assess the performance of existing techniques. 
To overcome the earlier-mentioned challenges, a feasible solution is leveraging data 
mining and machine learning techniques to detect suspicious transactions. This research 
aims to enhance credit card fraud detection using machine learning algorithms and 
ensemble models. Various supervised machine learning algorithms were implemented
including Decision Tree, Logistic Regression, Naïve Bayes, Random Forest, Artificial 
Neural Network, and XG-boost. Additionally, to tackle the problem of imbalanced 
datasets, several resampling methods, such as Under-sampling and Oversampling, were 
employed to achieve dataset balance. Moreover, various data selection techniques were 
performed for the feature selection. The models’ performance were assessed using 
diverse criteria, including Accuracy, Precision, Recall, F1-Score, and Area Under the 
Curve (AUC). Further modelings were developed using threshold variation on bestperformed models (Random Forest and XG-boost). Ensemble learning models were 
used to further refine the models' predictions for fraud and non-fraud instances. Three 
ensemble techniques (Bagging, Boosting, and Stacking) are employed, leveraging the 
main base models to boost the final fraud predictions and improve the robustness of 
accuracy, recall, and precision metrics. Based on the additional work performed, the
ensemble model using the bagging technique gave the best performance results with 0.99 
accuracy, ~0.90 recall, and a precision of 0.77. The ensemble model employed Decision 
Tree, Random Forest, and Neural Network base models, each utilizing different
resampling techniques. This approach gave the ensemble model diversity and 
robustness, proving model effectiveness even when tested on an unseen dataset.

Language

English

https://www.shuaa.om/en/node/16332/printable/print

Scan QR code with camera mobile to open this page on your mobile

Authors

Same Subject