English abstract
The alarming rise of credit card fraud poses a significant threat to individuals, financial
institutions, businesses, and governments. Fraudsters employ phishing activities to
commit fraud and cause significant annual economic loss. To address this challenge,
efficient fraud detection systems must be used to identify and detect fraud. Yet,
identifying credit card fraud is a challenging task due to various factors, including the
absence of straightforward techniques, imbalanced datasets related to credit cards, and
the lack of a standard evaluation matrix to assess the performance of existing techniques.
To overcome the earlier-mentioned challenges, a feasible solution is leveraging data
mining and machine learning techniques to detect suspicious transactions. This research
aims to enhance credit card fraud detection using machine learning algorithms and
ensemble models. Various supervised machine learning algorithms were implemented
including Decision Tree, Logistic Regression, Naïve Bayes, Random Forest, Artificial
Neural Network, and XG-boost. Additionally, to tackle the problem of imbalanced
datasets, several resampling methods, such as Under-sampling and Oversampling, were
employed to achieve dataset balance. Moreover, various data selection techniques were
performed for the feature selection. The models’ performance were assessed using
diverse criteria, including Accuracy, Precision, Recall, F1-Score, and Area Under the
Curve (AUC). Further modelings were developed using threshold variation on bestperformed models (Random Forest and XG-boost). Ensemble learning models were
used to further refine the models' predictions for fraud and non-fraud instances. Three
ensemble techniques (Bagging, Boosting, and Stacking) are employed, leveraging the
main base models to boost the final fraud predictions and improve the robustness of
accuracy, recall, and precision metrics. Based on the additional work performed, the
ensemble model using the bagging technique gave the best performance results with 0.99
accuracy, ~0.90 recall, and a precision of 0.77. The ensemble model employed Decision
Tree, Random Forest, and Neural Network base models, each utilizing different
resampling techniques. This approach gave the ensemble model diversity and
robustness, proving model effectiveness even when tested on an unseen dataset.