Computer Science ETDs

Publication Date

5-1-2007

Abstract

Network servers are vulnerable to attack, and this state of affairs shows no sign of abating. Therefore security measures to protect vulnerable software is an important part of keeping systems secure. Anomaly detection systems have the potential to improve the state of affairs, because they can independently learn a model of normal behavior from a set of training data, and then use the model to detect novel attacks. In most cases, this model represents more instances than were in the training data set—such generalization is necessary for accurate anomaly detection. This dissertation describes a framework for testing anomaly detection algorithms under identical conditions. Because quality test data representative of todays web servers is not available, this dissertation also describes the Hypertext Transfer Protocol (HTTP) request data collected from four web sites to use as training and test data representing normal HTTP requests. A collection of attacks against web servers and their applications did not exist either, so prior to testing it was necessary to also build a database of HTTP attacks, the largest publicly-available one. These data were used to test nine algorithms. This testing was more rigorous than any performed previously, and it shows that the previously-proposed algorithms (character distribution, a linear combination of six measures, and a Markov Model) are not accurate enough for production use on many of the web servers in use today, and might explain the lack of their widespread adoption. Two newer algorithms (deterministic finite automaton induction and n-grams) show more promise. This dissertation shows that accurate anomaly detection requires carefully controlled generalization. Too much or too little will result inaccurate results. Calculating the growth rate of the set that describes the anomaly detector's model of normal provides a means of comparing anomaly detection algorithms and predicting their accuracy. Identification of undergeneralization locations can be automated, leading to more rapid discovery of the heuristics needed to allow an anomaly detection system to achieve the required accuracy for production use.

Language

English

Keywords

HTTP anomaly detection, Computer and network security

Document Type

Dissertation

Degree Name

Computer Science

Level of Degree

Doctoral

Department Name

Department of Computer Science

First Committee Member (Chair)

Maccabe, Barney

Second Committee Member

Lane, Terran

Share

COinS