Efficient application identification and the temporaland spatial stability of classification schema

Wei, Li; Canini, M; Moore, A; Bolla, Raffaele

Motivated by the importance of accurate identification for a range of applications, this paper compares and contrasts the effective and efficient classification of network-based applications using behavioral observations of network-traffic and those using deep-packet inspection. Importantly, throughout our work we are able to make comparison with data possessing an accurate, independently determined ground-truth that describes the actual applications causing the network-traffic observed. In a unique study in both the spatial-domain: comparing across different network-locations and in the temporal-domain: comparing across a number of years of data, we illustrate the decay in classification accuracy across a range of application–classification mechanisms. Further, we document the accuracy of spatial classification without training data possessing spatial diversity. Finally, we illustrate the classification of UDP traffic. We use the same classification approach for both stateful flows (TCP) and stateless flows based upon UDP. Importantly, we demonstrate high levels of accuracy: greater than 92% for the worst circumstance regardless of the application.

Efficient application identification and the temporaland spatial stability of classification schema

WEI LI;CANINI M;MOORE A;BOLLA, RAFFAELE

2009-01-01

Abstract

Motivated by the importance of accurate identification for a range of applications, this paper compares and contrasts the effective and efficient classification of network-based applications using behavioral observations of network-traffic and those using deep-packet inspection. Importantly, throughout our work we are able to make comparison with data possessing an accurate, independently determined ground-truth that describes the actual applications causing the network-traffic observed. In a unique study in both the spatial-domain: comparing across different network-locations and in the temporal-domain: comparing across a number of years of data, we illustrate the decay in classification accuracy across a range of application–classification mechanisms. Further, we document the accuracy of spatial classification without training data possessing spatial diversity. Finally, we illustrate the classification of UDP traffic. We use the same classification approach for both stateful flows (TCP) and stateless flows based upon UDP. Importantly, we demonstrate high levels of accuracy: greater than 92% for the worst circumstance regardless of the application.