Network Traffic Fingerprinting using Machine Learning and Evolutionary Computing

Loading...
Thumbnail Image

Authors

Aksoy, Ahmet

Issue Date

2019

Type

Dissertation

Language

Keywords

evolutionary computing , inductive learning , iot device fingerprinting , machine learning , network traffic fingerprinting , operating system fingerprinting

Research Projects

Organizational Units

Journal Issue

Alternative Title

Abstract

The Internet has become essential to our daily life, especially with a multitude of IoT devices. However, the end hosts connected to the Internet are prone to be compromised. An essential measure for protecting attacks on end hosts is through the detection of system characteristics and isolation of vulnerable devices by restriction of communications to the device. Network traffic fingerprinting provides the ability to remotely and automatically gather information about the hosts within a network. Fingerprinting can help perform network management and the detection and isolation of vulnerable hosts. It is essential to automate this process to perform fingerprinting more efficiently. It also provides the ability to adapt to changes in the behavior of host devices and software. Network practitioners rely on some classifier tools for fingerprinting, but they rely on an expert to select features/attributes and generate machine learning models. Hence, existing approaches need to be manually updated for each new Operating System (OS) or IoT device introduced to the network. This dissertation addresses automated methods and tools for performing fingerprinting of OSes and IoT devices. It also presents SILEA, a new inductive learning algorithm along with improvements to its numerical feature quantization to further improve classification accuracy. SILEA is a covering-method inductive learning algorithm that reliably extracts IF-THEN rules from a collection of examples/instances. The algorithm eliminates exhaustive feature selection by reducing the number of features to be considered for each necessary iteration of rule extraction. We also use a genetic algorithm (GA) to determine the maximum number of clusters to be considered for each numeric feature for quantization and observe their contribution to classification accuracy. Once the number of clusters for each numeric feature is determined, we run the k-means algorithm for each feature with the number of clusters that are pre-determined by GA to obtain as optimal ranges for numeric features as possible. We analyze the TCP/IP packet headers to automate OS classification. We utilize a GA to determine the relevant packet header features, which helps reduce the classification complexity and increases accuracy by eliminating noisy features from the data. We use several machine learning algorithms to generate a set of rules and models that can differentiate OSes. We also investigate an automated system, called OSID, for classifying host OSes by analyzing the network packets that they generate without relying on human experts. We introduce another automated system, called SysID, for the classification of IoT device characteristics based on their network traffic. The system uses any single packet that is originated from the device to detect its kind. We utilize a GA to determine relevant features in different protocol headers, and then deploy various machine learning algorithms to classify host device types by analyzing features selected by GA. SysID allows a completely automated classification of IoT devices using their TCP/IP packets without expert input. SILEA, OSID and SysID codes and trained models are available at https://github.com/netml/.

Description

Citation

Publisher

License

Journal

Volume

Issue

PubMed ID

DOI

ISSN

EISSN