LoboVault Home

Anomaly detection for HTTP intrusion detection : algorithm comparisons and the effect of generalization on accuracy

LoboVault

Please use this identifier to cite or link to this item: http://hdl.handle.net/1928/2874

Anomaly detection for HTTP intrusion detection : algorithm comparisons and the effect of generalization on accuracy

Show full item record

Title: Anomaly detection for HTTP intrusion detection : algorithm comparisons and the effect of generalization on accuracy
Author: Ingham, Kenneth III
Advisor(s): Forrest, Stephanie
Committee Member(s): Maccabe, Barney
Lane, Terran
Department: University of New Mexico. Dept. of Computer Science
Subject: HTTP anomaly detection
Computer and network security
LC Subject(s): Anomaly detection (Computer security)
HTTP (Computer network protocol)
Web servers--Security measures
Degree Level: Doctoral
Abstract: Network servers are vulnerable to attack, and this state of affairs shows no sign of abating. Therefore security measures to protect vulnerable software is an important part of keeping systems secure. Anomaly detection systems have the potential to improve the state of affairs, because they can independently learn a model of normal behavior from a set of training data, and then use the model to detect novel attacks. In most cases, this model represents more instances than were in the training data set—such generalization is necessary for accurate anomaly detection. This dissertation describes a framework for testing anomaly detection algorithms under identical conditions. Because quality test data representative of today’s web servers is not available, this dissertation also describes the Hypertext Transfer Protocol (HTTP) request data collected from four web sites to use as training and test data representing normal HTTP requests. A collection of attacks against web servers and their applications did not exist either, so prior to testing it was necessary to also build a database of HTTP attacks, the largest publicly-available one. These data were used to test nine algorithms. This testing was more rigorous than any performed previously, and it shows that the previously-proposed algorithms (character distribution, a linear combination of six measures, and a Markov Model) are not accurate enough for production use on many of the web servers in use today, and might explain the lack of their widespread adoption. Two newer algorithms (deterministic finite automaton induction and n-grams) show more promise. This dissertation shows that accurate anomaly detection requires carefully controlled generalization. Too much or too little will result inaccurate results. Calculating the growth rate of the set that describes the anomaly detector’s model of normal provides a means of comparing anomaly detection algorithms and predicting their accuracy. Identification of undergeneralization locations can be automated, leading to more rapid discovery of the heuristics needed to allow an anomaly detection system to achieve the required accuracy for production use.
Graduation Date: May 2007
URI: http://hdl.handle.net/1928/2874


Files in this item

Files Size Format View
klinghamiii2007.pdf 1.599Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record

UNM Libraries

Search LoboVault


Browse

My Account