DIstributed Firewall and Flow-shaper Using Statistical Evidence (DIFFUSE)

Overview

Architecture

Machine Learning

Papers and Interim Results

Downloads

Usage Examples

Other Links

Overview In recent years a body of research emerged around the identification and classification of traffic flows based on statistical properties (features) -- and in particular the application of Machine Learning (ML) techniques to generate such classifiers. Statistical properties, such as distributions of packet size or inter-packet arrival times, may be calculated without accessing packet payloads (packet inspection). Such techniques assist Internet Service Providers (ISPs) to work within any legal or technical limitations on direct payload inspection – potential new applications include characterising traffic for Lawful Interception, automated ‘market research’ or automated prioritisation of real-time traffic. For many of these new applications a de-coupling between flow classification and treatment (the actions performed on flows, such as blocking or shaping) is highly desirable. For example, a single high performance classifier near the core of an ISP network may control multiple low-power nodes near the network edge (perhaps embedded within ADSL or Cable modem gateways) so that centralised traffic classification can automatically modify the Quality of Service (QoS) treatment experienced by packets at the network edge. This de-coupling also enables potentially computationally intensive per-flow statistics calculations to be offloaded from the packet forwarding path. For many of these new applications a de-coupling between flow classification and treatment (the actions performed on flows, such as blocking or shaping) is highly desirable. For example, a single high performance classifier near the core of an ISP network may control multiple low-power nodes near the network edge (perhaps embedded within ADSL or Cable modem gateways) so that centralised traffic classification can automatically modify the Quality of Service (QoS) treatment experienced by packets at the network edge. This de-coupling also enables potentially computationally intensive per-flow statistics calculations to be offloaded from the packet forwarding path. However, common open-source packet filters that combine firewall and traffic shaping (such as ipfw, pf, netfilter and similar) currently do not use traffic statistics, instead relying on direct inspection of packets passing through the filtering node’s local interfaces. Furthermore, these filters tightly couple the flow classification and treatment, i.e. after flows are classified actions are executed locally immediately after the classification. In this project we will design and develop extensions for existing packet filter providing ML-based classification based on statistical properties and de-coupling of flow classification and treatment, and we will analyse the accuracy, performance and scalability of such a distributed system. We further will explore whether automatic (re)training of classifiers may be practically achieved using live IP traffic going past particular points inside an ISP network, and the degree to which noise (packet loss and jitter) in the live traffic feed negatively impacts on the system's ability to recognise the same class of traffic in the future.



Figure 1 : DIFFUSE allows the use of traffic statistics to augment traditional packet filtering and traffic shaping decisions : DIFFUSE allows the use of traffic statistics to augment traditional packet filtering and traffic shaping decisions In this project we will design and develop extensions for existing packet filter providing ML-based classification based on statistical properties and de-coupling of flow classification and treatment, and we will analyse the accuracy, performance and scalability of such a distributed system. We further will explore whether automatic (re)training of classifiers may be practically achieved using live IP traffic going past particular points inside an ISP network, and the degree to which noise (packet loss and jitter) in the live traffic feed negatively impacts on the system's ability to recognise the same class of traffic in the future. Project Goals Design packet filter extensions that allow ML-based classification and the de-coupling of flow classification and treatment.

Design a protocol to transport information about flow classes and actions from classifiers to nodes enforcing actions.

Develop extensions for existing packet filters that implement the developed approach and can be used as demonstrator.

Evaluate the accuracy, performance and scalability of a distributed classification system and characterise the various trade-offs.

Investigate methods for dynamic (re)training of classifiers and investigate the impact of noise on the performance of these methods. As part of this project we will develop and publicly release software that allows the classification of flows based on statistical properties and de-couples the classification from the actions undertaken, and publish interim results and papers on our website. The links at the top will take you to additional information.



News

April 13th, 2012: We have released DIFFUSE for OpenWRT, a version of DIFFUSE that works on embedded devices such as home Internet gateways. Our current prototype is based on the DIFFUSE 0.4 distribution running on the Attitude Adjustment (r29537) version of OpenWRT (an embedded Linux operating system). DIFFUSE for OpenWRT allows to enable automatic and dynamic QoS for home networks.

Project Members Grenville Armitage

Sebastian Zander

Nigel Williams This project began in June 2010 and has been made possible in part by a gift from The Cisco University Research Program Fund, a corporate advised fund of Silicon Valley Community Foundation, for a project titled "Exploring the efficacy of distributed statistical traffic classification using modified open source packet filters".