Traffic Anomaly Detection and Diagnosis on the Network Flow Level

Stöcklin, Marc Philippe

doi:10.5075/epfl-thesis-4997

doctoral thesis

Traffic Anomaly Detection and Diagnosis on the Network Flow Level

Stöcklin, Marc Philippe

2011

Monitoring traffic events in computer network has become a critical task for operators to maintain an accurate view of a network's condition, to detect emerging security threats, and to safeguard the availability of resources. Conditions detrimental to a network's performance need to be detected timely and accurately. Such conditions are observed as anomalies in the network traffic and may be caused by malicious attacks, abuse of resources, or failures of mission-critical servers and devices. Behavior-based anomaly detection techniques examine the traffic for patterns that significantly deviate from the network normal activities. Such techniques provide a complementary layer of defense to identify undesired conditions which traditional, signature-based methods fail to detect. These conditions may, for example, emerge from zero-day exploits, outbreaks of new worms, unanticipated user behavior, or deficiencies in the network infrastructure. This thesis is concerned with the challenge of detecting traffic anomalies with behavior-based methods from flow-level network traffic measurements while providing interpretable alert information. We address the problem from two opposite perspectives by analyzing network behavior and individual host behavior. Learning the normal behavior of network activities and detecting relevant deviations thereof is a complex task since behavior changes may also occur under legitimate conditions and should not be reported as anomalies. Due to the absence of explicit detection rules, behavior-based methods moreover provide less precise information as to the causes of aberrant events. Network operators, however, critically depend on meaningful detection results to timely react to alerts by defining effective countermeasures or ruling out potential false alarms. The first part of this work introduces a novel detection scheme which mines for anomalies in the network behavior observed from traffic feature distributions. We study how various types of anomalies may be detected while providing sufficient information to administrators for their characterization. Based on the observation that networks have multiple behavior modes, we propose a method to estimate and model the modes during an unsupervised learning phase. Observed network behavior is compared to the baseline models by means of a two-layered distance computation: Fine-grained anomaly indices indicate suspicious behavior of individual components of traffic features whereas collective anomaly scores for each feature enable effective detection of anomalies that affect multiple components. We show that the two detection layers reliably expose different types of anomalies. Compared with existing detection methods, the resulting alerts provide important additional information that enables administrators to draw early conclusions as to the anomaly causes. In the second part, we address the challenge of processing high-cardinality traffic information in behavior-based anomaly detection. We introduce an adaptive, locality preserving pre-processing method of measurement data into histogram representations with a manageable number of variable-sized bins. Our technique iteratively adapts, with limited, tunable memory requirements, to the empirical distribution of observations in a data stream in order to evenly balance the observations over the histogram bins. As an important result, we show that our method approximates input distributions well and improves the level of detail in histograms compared to traditional methods. Applied to behavior-based anomaly detection, higher detection sensitivity is achieved while, thanks to preserving the locality of observations, the meaning of bins and the interpretability of detection results is retained. In the third part, we focus on the problem that many low-volume anomalies, emerging from individual hosts, are likely to evade from detection because they are not reflected as significant deviations in the variability of aggregate behavior patterns of hosts on the network level. To address this problem, we examine the behavior of individual hosts from their network connection-level activities using an unobtrusive, passive monitoring approach. We develop an unsupervised method to track properties in the activities that recur over time and establish detailed behavior profiles. We propose three anomaly detectors that compare observed activities to the profiles in order to recognize suspicious changes in a host's activities, giving evidence of abnormal behavior. We demonstrate their effectiveness in revealing different types of anomalies, which are not detectable in aggregate network statistics, while providing meaningful alert information to administrators. In addition, we show that the profiles of individual hosts are stable over time and representative of their activities, and may even be used to identify hosts solely from their traffic behavior. In summary, the methods and algorithms presented in this thesis enable practical and interpretable detection of traffic anomalies on the network ow level with behavior-based methods.

Name

EPFL_TH4997.pdf

Access type

restricted

Size

2.16 MB

Format

Adobe PDF

Checksum (MD5)

56f63240bc677b723970e7104bab7dd6