الملخص الإنجليزي
The rapid growth of networks has significantly impacted people,
corporations, and governments. However, this expansion has also been
accompanied by a sharp rise in cybercrime events, underscoring the
necessity of putting strong security measures in place to protect electronic
data. Among the various techniques used, Phishing is the leading approach
for unlawfully obtaining critical information from online users. This
deceptive practice commonly involves fraudulent attempts through emails
and counterfeit websites. Phishers employ diverse tactics and strategies,
enabling them to execute sophisticated phishing attacks, where attackers
mimic official websites to collect personal information from online users.
The objective of our study to analyze the effectiveness of fully connected
networks and convolutional neural networks in phishing detection and
improve reliability by reducing false positives and negative analyses, which
can help cyber security specialists detect whether a URL is phishing or
normal.
We began by collecting data from EuRepoC. Subsequently, the gathered
data undergoes various stages, including cleaning and preparation.
Following this, we implemented an embedding layer to embed words and
characters into low-dimensional vectors in both models. Next, we
constructed fully connected networks (FCNs), Long Short Term Memory
(LSTMs) and convolutional neural network (1D-CNNs) models. The
subsequent steps involve training each model and evaluating their
respective results.
The data analysis reveals that fully connected networks (FCNs) achieve
an 82% accuracy and 86% of long short term memory (LSTMs) in
identifying phishing URLs, whereas convolutional neural networks
VII
demonstrate a higher accuracy of 98%. Regarding precision, fully connected
networks perform at 82%, with a 96% recall and an 88% f1-score and long
short term memory perform 99% precision, with an 81 % recall and an 89
f1-score.Conversely, convolutional neural networks exhibit a precision of
99% and maintain a high 98% for f1-score and 100 % recall.
Examining false positives and false negatives, fully connected networks
exhibit a 4 %( FN), and 55 %( FP) occurrence, while long short term memory
show 19 % (FN) , and 2 % (FP).Finally, convolutional neural networks show
minimal instances with 11 % false positives and 0 % false negatives.
Additionally, the area under ROC curve indicates a 71% performance for
FCN, a 90 % LSTM, and 98% for 1D-CNN.