Deep learning for cyber security intrusion detection: Approaches, datasets, and comparative study

Critical National Infrastructures (CNIs) such as ports, water and gas distributors, hospitals, energy providers are becoming the main targets of cyber attacks. Supervisory Control and Data Acquisitions (SCADA) or Industrial Control Systems (ICS) in general are the core systems that CNIs rely on in order to manage their production. Protection of ICSs and CNIs has become an essential issue to be considered in an organizational, national and European level.

For instance, in order to cope with the increasing risk of CNIs, Europe has issued during the past years a number of directives and regulations that try to create a coherent framework for securing networks, information and electronic communications. Apart from regulations, directives and policies, specific security measures are also needed to cover all legal, organizational, capacity building and technical aspects of cyber security.

Abstract

In this paper, we present a survey of deep learning approaches for cyber security intrusion detection, the datasets used, and a comparative study. Specifically, we provide a review of intrusion detection systems based on deep learning approaches. The dataset plays an important role in intrusion detection, therefore we describe 35 well-known cyber datasets and provide a classification of these datasets into seven categories; namely, network traffic-based dataset, electrical network-based dataset, internet traffic-based dataset, virtual private network-based dataset, android apps-based dataset, IoT traffic-based dataset, and internet-connected devices-based dataset.

We analyze seven deep learning models including recurrent neural networks, deep neural networks, restricted Boltzmann machines, deep belief networks, convolutional neural networks, deep Boltzmann machines, and deep autoencoders. For each model, we study the performance in two categories of classification (binary and multiclass) under two new real traffic datasets, namely, the CSE-CIC-IDS2018 dataset and the Bot-IoT dataset. In addition, we use the most important performance indicators, namely, accuracy, false alarm rate, and detection rate for evaluating the efficiency of several methods.

Mohamed Amine Ferrag

Leandros Maglaras

Sotiris Moschoyiannis

Helge Janicke