LoboVault Home

Scan statistics for the online detection of locally anomalous subgraphs


Please use this identifier to cite or link to this item: http://hdl.handle.net/1928/13885

Scan statistics for the online detection of locally anomalous subgraphs

Show full item record

Title: Scan statistics for the online detection of locally anomalous subgraphs
Author: Neil, Joshua
Advisor(s): Storlie, Curtis
Committee Member(s): Christensen, Ronald
Vander Weil, Scott
Lane, Terran
Department: University of New Mexico. Dept. of Mathematics and Statistics
Subject: Scan Statistics
Cyber Security
Anomaly Detection
Computer Networks
LC Subject(s): Computer networks--Security measures--Statistical methods.
Computer networks--Monitoring--Statistical methods.
Computer security--Statistical methods.
Degree Level: Doctoral
Abstract: Identifying anomalies in computer networks is a challenging and complex problem. Often, anomalies occur in extremely local areas of the network. Locality is complex in this setting, since we have an underlying graph structure. To identify local anomalies, we introduce a scan statistic for data extracted from the edges of a graph over time. In the computer network setting, the data on these edges are multivariate measures of the communications between two distinct machines, over time. We describe two shapes for capturing locality in the graph: the star and the k-path. While the star shape is not new to the literature, the path shape, when used as a scan window, appears to be novel. Both of these shapes are motivated by hacker behaviors observed in real attacks. A hacker who is using a single central machine to examine other machines creates a star-shaped anomaly on the edges emanating from the central node. Paths represent traversal of a hacker through a network, using a set of machines in sequence. To identify local anomalies, these shapes are enumerated over the entire graph, over a set of sliding time windows. Local statistics in each window are compared with their historic behavior to capture anomalies within the window. These local statistics are model-based. To capture the communications between computers, we have applied two different models, observed and hidden Markov models, to each edge in the network. These models have been effective in handling various aspects of this type of data, but do not completely describe the data. Therefore, we also present ongoing work in the modeling of host-to-host communications in a computer network. Data speeds on larger networks require online detection to be nimble. We describe a full anomaly detection system, which has been applied to a corporate sized network and achieves better than real-time analysis speed. We present results on simulated data whose parameters were estimated from real network data. In addition, we present a result from our analysis of a real, corporate-sized network data set. These results are very encouraging, since the detection corresponded to exactly the type of behavior we hope to detect.
Graduation Date: July 2011
URI: http://hdl.handle.net/1928/13885

Files in this item

Files Size Format View
dissertationwithsig.pdf 5.688Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record

UNM Libraries

Search LoboVault


My Account