English abstract
With the emergence of connected objects and the Internet of Things (IoT), millions of users
connected to the network produce massive network traffic datasets. These vast datasets of network
traffic (Big Data) are challenging to store, deal with and analyze with a regular computer. In
addition to that, these large-dimensional files contain millions of cyber-attacks, as reports
published by CERT institutions around the world have proven a steady increase in the number of
cyber-attacks, so building an efficient intrusion detection system that can deal with this vast data
(Big Data) and has a high and fast detection ability and accuracy is a must.
Many intrusion detection systems have been proposed to deal with network traffic datasets. Among
these solutions are intrusion detection systems based on machine learning algorithms, which have
proven to be highly efficient in intrusion detection and have given promising results. An example
of some of these algorithms is SVM, KNN, and K-means, however; regular machine learning
algorithms suffer from slow training and testing when the dataset size is large.
In this study, an intrusion detection system was built based on distributed parallel nonnegative
matrix factorization, built in the high-performance computing system (Luban) of Sultan Qaboos
University.
We used Message Passing Interface MPI for inter-processor communications. The algorithm is
built so that all the A (input matrix), W, and H matrices are in memory divided across the
processors; we distribute the matrices between the processors carefully to avoid unnecessary
communications. Our parallel NMF gave us excellent training speedup results while we increased
the number of processors on vast datasets. Two datasets were used to verify the proposed solution's
performance: KDD99 and CIC.
Our experiments on the proposed solution proved that it gives better results than the traditional
ML-based intrusion detection systems. Hence, we could train the model with datasets of one
million samples in only 31 seconds and got an excellent detection accuracy rate of 97%.