25. September 2024 By Matteo Flumini
Complexity vs. accuracy in CNN-based network classifiers
Nowadays, network traffic classification (NTC) is an essential tool for categorizing traffic into classes representing different network services. Categorizing services is fundamental in several fields such as network security, network management and QoS optimization.
Currently, the use of encrypted traffic to guarantee confidentiality and security for users makes NTC increasingly difficult to analyze with traditional methods, therefore we are moving towards classification based on Machine Learning.
My thesis, currently under review as a paper by IEEE GLOBECOM 2024, compares different CNN (convolutional neural network)-based models and evaluates a compromise between complexity and accuracy.
Reference scenario
Currently, the use of automated tools by service providers has become essential to meet the growing needs and critical issues of users regarding their experience in using the network.
As a result, NTC has become an indispensable tool for a variety of purposes, since it allows you to categorize traffic into classes representing different network services.
The class differentiation within the traffic is essential in the field of network security, since it is possible to implement, for example, intrusion detection systems, malware detection systems, firewalls, etc. thanks to the recognition of anomalous traffic. In addition to this, the results are also used in the management and administration of networks, QoS management and resource monitoring by internet service providers (ISPs).
NTC works by extracting statistical and/or behavioral information from the traffic and then using this information in appropriate classification algorithms.
NTC generally uses three basic approaches, which are:
- Port-based
- Payload-based
- Machine Learning (ML)-based
Port-based techniques associate services with registered port numbers and classify traffic accordingly. The use of private and dynamic ports makes this approach unusable.
The payload-based approach is based on the inspection of packet payloads, such as deep packet inspection algorithms. However, the use of encrypted traffic makes this increasingly difficult to analyze. Therefore, with the presence of the encrypted traffic invisible to third party users, the patterns used in these approaches are useless, obsolete – and above all – ineffective.
To solve this problem, new classification approaches have been developed including flow static-based ones, in which packets are aggregated into flows from which statistical information is extracted. This allows us to trace a pattern of the service classes to which the flows of interest belong, bypassing the contents of the payload.
The aim of the project was to analyze the performance of the main existing NTC methods based on convolutional neural networks (CNN) and then implement a classification model based on flows.
Starting from a reference project that is present in the literature and that provides a dataset containing encrypted QUIC traffic, we want to create an architecture for supervised ML models based on CNN
All the design phases of a Machine Learning model are followed, starting from the analysis of the dataset and by pre-processing to prepare the data and features for the implementation of these algorithms using the Python programming language and its libraries that are useful in this scenario. To observe the effectiveness of the models, appropriate evaluations of the obtained results are then carried out to observe the final performance and evaluate particularities of the designed models.
Analysis and pre-processing
The analysis of the dataset shows the presence of 5 different classes (Google Hangout Chat, Google Hangout VoIP, Google Play Music, File Transfer and YouTube).
With pre-processing, the traffic was divided into flows, (following the quintuple IP source/destination, port source/destination, protocol) for a total of 294062 flows. The flow features are then identified for a total of 16:
- Packet length
- Relative time elapsed since the first packet
- Delta time elapsed since the previous packet
- Percentage of large packets in a flow
- Percentage of small packets in a flow
- Flow size
- Flow duration
- and other statistical information related to these (standard deviation, kurtosis etc.)
We then moved on to changing the formatting of some classes and above all, managing outliers and reducing the number of flows to 231012. Subsequently, we normalized the data and used an encoding for the labels of the dataset.
NTC scheme
Taking the reference model that started from a combination of Random Forest and CNN with the use of hundreds of features, we decided to focus on simplicity and using CNN alone and as already written above, with only 16 features for the data processing part.
During the project, a few valid architectures equal to four have been identified.
They are CNN-based and start with a complex architecture and then gradually become simpler. The models have a general scheme that contains the essential layers of a convolutional neural network: convolutional layers, pooling layers, dropout layers, flatten layers and dense layers.
The models all differ in terms of the number of layers and the properties of the individual layers, but there are some architectural choices that are common to all templates, including:
- The 80%-20% dataset split for training and testing
- Train for 20 epochs in batches of 32 samples
- Optimizer Adam
- Ten-cross validation
The evaluation metrics used to compare the models are:
- Weighted accuracy
- Precision
- Recall
- F1 score
After numerous implementations of the considered CNN-based NTC architecture, trials, and errors, we found some best-performing models, summarized in the following table:
Results
Moving on to the results shown in the table, we can state that average results are always stable in the range between 89% and 91%.
Model 4 has the best performance, which is also one of the simplest, proving that complexity is not always synonymous with high performance.
These results of approximately 92% are 7-8% lower than the models present in the literature but it should be noted that the simplicity of the architecture was the main focus of the models selected in this project. In fact, the results obtained in the literature use an immense number of features, numbering in the hundreds, compared to our models which use only 16 features, showing the aptitude for use in real-time scenarios.
Conclusion
During the project we evaluated CNN models to classify QUIC-based services (encrypted traffic) using flow-based classification. The proposed models leverage Machine Learning-based approaches to classify services such as Google Hangout chat, VoIP, file transfer, Google Play Music, and YouTube. Various architectures with different levels of complexity are considered and their accuracy evaluated.
Compared to the baseline NTC scheme, our approach has a lower overall accuracy of approximately 91%, while the baseline model has an accuracy of 99%. However, our model is characterized by its simplicity, relying solely on a single-stage CNN, powered by only 16 features, while the baseline model combines Random Forest with CNN and uses hundreds of features. Despite the potentially lower performance, our approach offers high accuracy while prioritizing low complexity. Our future work aims to further improve the model's accuracy while maintaining or even improving its low complexity architecture.