A Spatiotemporal Event Correlation Approach to Computer Security

6436 users shared this document! click Bookmark and Share
TAG:  virus removal 
Published Time: 1268-41
Filetype: pdf
Filesize: 2205858
Click Here To Download...
A Spatiotemporal Event Correlation Approach to Computer Security Yinglian Xie CMU-CS-05-175 August 2005 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Submitted in partial ful?llment of the requirements for the degree of Doctor of Philosophy. Thesis Committee: David O’Hallaron, Co-chair Hui Zhang, Co-chair Michael K. Reiter Albert Greenberg, AT&T Labs-Research Copyright c 2005 Yinglian Xie This research was sponsored by the US Army Research O?ce under contract no. DAAD19-02-1-
0389, the US Air Force Research Laboratory under grant no. F30602-02-1-0139, and the National
Science Foundation under grant no. ANI-0326472 and ANI-9814929. The views and conclusions
contained in this document are those of the author and should not be interpreted as representing
the o?cial policies, either expressed or implied, of any sponsoring institution, the U.S. government
or any other entity. Keywords: Correlation, Computer security, Anomaly detection, Forensics, PCA, Wavelet analysis, File updates, Random walks, Spectral analysis, Sampling, Worm,
Network ?ows Abstract Correlation is a recognized technique in security to improve the e?ectiveness of threat identi?cation and analysis process. Existing correlation approaches mostly
focus on correlating temporally located events, or combining alerts from multiple in-
trusion detection systems. Such approaches either generate high false alarm rates due
to single host activity changes, or fail to detect stealthy attacks that evade detection
from local monitors. This thesis explores a new spatiotemporal event correlation approach to capture the abnormal patterns of a wide class of attacks, whose activities, when observed in-
dividually, may not seem suspicious or distinguishable from normal activity changes.
This approach correlates events across both space and time, identifying aggregated
abnormal event patterns to the host state updates. By exploring both the temporal
and spatial locality of host state changes, our approach identi?es malicious events that
are hard to detect in isolation, without foreknowledge of normal changes or system-
speci?c knowledge. To demonstrate the e?ectiveness of spatiotemporal event corre-
lation, we instantiate the approach in two example security applications: anomaly
detection and network forensics. For anomaly detection, we present a “pointillist” method to detect similar, coin- cident changes to the patterns of ?le updates that are shared across multiple hosts.
The correlation is performed by clustering points, each representing an individual
host state transition, in a multi-dimensional feature space. We implement this ap-
proach in a prototype system called Seurat and demonstrate its e?ectiveness using
a combination of real workstation traces, simulated attacks, and manually launched
real worms. For network forensics, we present a general forensics framework called Dragnet, and propose a “random moonwalk” technique that can determine both the host re-
sponsible for originating a worm attack and the set of attack ?ows that make up the
initial stages of the attack tree via which the worm infected successive generations
of victims. Our technique exploits the “wide tree” shape of a worm propagation by
performing random walks backward in time along paths of ?ows. Using analysis, sim-
ulation, and experiments with real world traces, we show how the technique works
against both today’s fast propagating worms and a wide class of stealthy worms that
attempt to hide their attack ?ows among background tra?c. While the high level idea is the same, the two applications use di?erent types of event data, di?erent data representations, and di?erent correlation algorithms,
suggesting that spatiotemporal event correlation will be a general solution to reliably
and e?ectively capture the global abnormal patterns for a wide variety of security
applications. Acknowledgements I am extremely grateful to my advisors David O’Hallaron and Hui Zhang. Without their guidance and tremendous support, this thesis would not be possible. They
guided me through every aspect of my graduate study. They not only teach me
how to identify problems, how to solve problems, and how to build systems, but
also help me improve my writing skills, speaking kills, and communication skills.
Numerous times, they helped me set milestones together, and guided me to make
progress steadily. When there are di?culties, their encouragement and optimism
help me to gain more con?dence, and to come up with solutions. I treasure my PhD
study as a privileged opportunity to learn from both great teachers and masters.
Their insights and suggestions greatly increased the quality of this thesis work. I would like to thank other members in my thesis committee - Mike Reiter and Albert Greenberg. Mike is a source of inspiration, and guided the work in this thesis
from the very beginning. His guidance, suggestions, and help greatly improved every
aspect of this thesis step by step. This thesis work would not be possible without him
either. I would also like to thank Albert for his valuable suggestions on the Dragnet
project that became part of this thesis. His comments and feedbacks on our overall
work has helped improve my thesis. I thank my collaborators Dave Maltz, Hyang-Ah Kim, and Vyas Sekar. Much of this thesis work is done in collaboration with them, who have been both team players
and friends. My work with them is both happy and fruitful. Dave actively discussed
the Dragnet work together with me and provided a lot of help to move the project
forward successfully. Hyang-Ah, Vyas, and I help each other and grow together. Thanks to my fellow students Tiankai Tu, Julio Lopez, Yanghua-Chu, Aditya Ganjam, Debabrata Dash, Andy Myers, Justin Weisz, Hong Yan, and friends Shuheng
Zhou, Jun Gao, Mukesh Agrawal, Ningning Hu, Suman Nath, Runting Shi, Jimeng
Sun, Yan Li, as well as many others, for making my student journey in CMU so
enjoyable. I thank my husband Qifa Ke for his love, encouragement, and support. Although we work in di?erent areas, we discuss di?erent aspects of research together, and move
forward together. Our discussion often results in useful suggestions that improved
not only this thesis work, but my research approaches and styles in general. Finally,
I really thank my parents for their great love and tremendous support in every aspect
of my life. They support me to pursue my own goals with con?dence, to progress
with sweat and hard work, and to face and overcome obstacles with courage. This
thesis is dedicated to my parents! Contents 1 Introduction 1 1.1 Existing Correlation Approaches . . . . . . . . . . . . . . . . . . . . . 2 1.2 Spatiotemporal Event Correlation . . . . . . . . . . . . . . . . . . . . 4 1.2.1 Why Spatiotemporal Event Correlation? . . . . . . . . . . . . 5 1.2.2 Challenges Involved . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.3 Thesis Approach . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Two Example Applications . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3.1 Spatiotemporal Event Correlation for Anomaly Detection . . . 9 1.3.2 Spatiotemporal Event Correlation for Network Forensics . . . 10 1.4 Contributions and Thesis Outline . . . . . . . . . . . . . . . . . . . . 11 2 Anomaly Detection Using Spatiotemporal Event Correlation 13 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 Attack Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3 Correlation-based Anomaly Detection . . . . . . . . . . . . . . . . . . 17 2.3.1 A Binary Feature Vector Space . . . . . . . . . . . . . . . . . 17 2.3.2 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.3 Anomaly Detection by Clustering . . . . . . . . . . . . . . . . 22 2.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.4.1 False Positives on Linux Platforms . . . . . . . . . . . . . . . 24 2.4.2 False Positives on Microsoft Windows Platforms . . . . . . . . 27 2.4.3 False Negatives . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.4.4 Real Attacks on Linux Platforms . . . . . . . . . . . . . . . . 31 vii 2.4.5 Real Attacks on Microsoft Windows Platforms . . . . . . . . . 32 2.5 Anomaly Detection with More Attributes . . . . . . . . . . . . . . . . 34 2.5.1 A General Feature Vector Space . . . . . . . . . . . . . . . . . 35 2.5.2 Performance Impact . . . . . . . . . . . . . . . . . . . . . . . 37 2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.7 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3 Network Forensics Using Spatiotemporal Event Correlation 43 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.3 A Framework For Investigating Attacks . . . . . . . . . . . . . . . . . 48 3.3.1 Network Auditing . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.3.2 Techniques for Attack Reconstruction . . . . . . . . . . . . . . 50 3.4 The Random Moonwalk Algorithm . . . . . . . . . . . . . . . . . . . 52 3.5 Evaluation Methodology . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.6 Analytical Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.6.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.6.2 Edge Probability Distribution . . . . . . . . . . . . . . . . . . 56 3.6.3 False Positives and False Negatives . . . . . . . . . . . . . . . 59 3.6.4 Parameter Selection . . . . . . . . . . . . . . . . . . . . . . . . 61 3.7 Real Trace Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.7.1 Detecting the Existence of an Attack . . . . . . . . . . . . . . 62 3.7.2 Identifying Causal Edges and Initial Infected Hosts . . . . . . 63 3.7.3 Reconstructing the Top Level Causal Tree . . . . . . . . . . . 65 3.7.4 Parameter Selection . . . . . . . . . . . . . . . . . . . . . . . . 67 3.7.5 Performance vs. Worm Scanning Rate . . . . . . . . . . . . . 68 3.7.6 Performance vs. Worm Scanning Method . . . . . . . . . . . . 70 3.7.7 Performance vs. Fraction of Hosts Vulnerable . . . . . . . . . 71 3.8 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.9 Random Moonwalks and Spectral Graph Analysis . . . . . . . . . . . 72 3.9.1 A Flow Graph Model . . . . . . . . . . . . . . . . . . . . . . . 73 viii 3.9.2 Markov Random Walks on the Flow Graph . . . . . . . . . . . 75 3.9.3 Deterministic Computation of Probabilities . . . . . . . . . . . 76 3.9.4 Sampling Based Approximation . . . . . . . . . . . . . . . . . 77 3.9.5 Revisit the E?ectiveness of Random Moonwalks . . . . . . . . 80 3.9.6 Impact of Attack Models on Performance . . . . . . . . . . . . 81 3.10 Distributed Network Forensics . . . . . . . . . . . . . . . . . . . . . . 88 3.10.1 Missing Data Impact on Performance . . . . . . . . . . . . . . 89 3.10.2 A Distributed Random Moonwalk Algorithm . . . . . . . . . . 97 3.11 Random Sampling vs. Biased Sampling . . . . . . . . . . . . . . . . . 100 3.12 Discussion and Future Work . . . . . . . . . . . . . . . . . . . . . . . 102 4 Related Work 105 4.1 Distributed Security Architectures . . . . . . . . . . . . . . . . . . . . 105 4.2 Alert Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.3 Related Work to Seurat . . . . . . . . . . . . . . . . . . . . . . . . . 107 4.4 Related Work to Dragnet . . . . . . . . . . . . . . . . . . . . . . . . . 108 5 Conclusions and Future Work 111 5.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.3.1 Distributed Data Collection and Correlations . . . . . . . . . 113 5.3.2 Privacy Preserving Data Access Mechanisms . . . . . . . . . . 114 5.3.3 Incorporation of Heterogenous Types of Data . . . . . . . . . 114 5.3.4 Other Applications . . . . . . . . . . . . . . . . . . . . . . . . 115 6 Appendices 129 6.1 Probability Estimation in Section 3.6.2 . . . . . . . . . . . . . . . . . 129 6.2 Host Communication Graph of Top Frequency Flows . . . . . . . . . 131 ix x List of Figures 1.1 Classi?cation of correlation approaches in security. . . . . . . . . . . . 3 1.2 File update events vs. host states. . . . . . . . . . . . . . . . . . . . . 4 1.3 Network ?ow events vs. host states. . . . . . . . . . . . . . . . . . . . 5 1.4 The high level summary of the two example applications. . . . . . . . 9 2.1 Pointillist approach to anomaly detection. Normal points are clustered by the dashed circle. The appearance of a new cluster consisting of
three points suggests anomalous events on host A, B, and D. . . . . . 15 2.2 Representing host ?le updates as feature vectors. F 1 , F 2 , F 3 , F 4 , F 5 are ?ve di?erent ?les (i.e., ?le names). Accordingly, the feature vector
space has 5 dimensions in the example. . . . . . . . . . . . . . . . . . 18 2.3 Detection window, comparison window, and correlation window. The detection window is day j. The comparison window is from day j ? t
to day j ? 1. The correlation window is from day j ? t to day j. . . . 18 2.4 Representing ?le update status with wavelet transformation. The orig- inal signal is S, which can be decomposed into a low frequency signal
cA re?ecting the long term update trend, and a high frequency signal
cD re?ecting the daily variations from the long-term trend. . . . . . 20 2.5 Wavelet-based feature selection. . . . . . . . . . . . . . . . . . . . . . 21 2.6 Wavelet transformation of ?le update status. (a) The original signal of the ?le update status (b) The residual signal after wavelet transformation 21 2.7 Feature selection and dimension reduction. (a) File update patterns. Files are sorted by the cumulative number of hosts that have updated
them throughout the 91 days. The darker the color is, the more hosts
updated the corresponding ?le. (b) The number of feature vector di-
mensions after wavelet-based selection and PCA consecutively. . . . . 25 xi 2.8 Clustering feature vectors for anomaly detection at Linux platforms. Each circle represents a cluster. The number at the center of the ?gure
shows the total number of clusters. The radius of a circle corresponds
to the number of points in the cluster, which is also indicated beside
the circle. The squared dots correspond to the new points generated
on the day under detection. New clusters are identi?ed by a thicker
circle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.9 Feature selection and dimension reduction at Microsoft Windows Plat- forms. (a) File update patterns. Files are sorted by the cumulative
number of hosts that have updated them throughout the 46 days. The
darker the color is, the more hosts updated the corresponding ?le. (b)
The number of feature vector dimensions after wavelet-based selection
and PCA consecutively. . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.10 Detection rate of simulated attacks. We vary the number of hosts infected and the number of ?les inserted on each host by the simulated
attacks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.11 Detection rate of emulated worms. Percentage of experiments when Seurat detects the worms. For each experiment, we launch a worm
on a di?erent day. We vary the number of hosts compromised by the
attacks and the type of worms. . . . . . . . . . . . . . . . . . . . . . 30 2.12 Intrusion detection by Seurat. Seurat identi?ed a new cluster of three hosts on Feb 11, 2004, when we manually launched the Lion worm.
The number of clusters formed in each day varies due to an artifact of
the feature vector selection and the clustering algorithm. . . . . . . . 32 2.13 Suspicious ?les for the new cluster on Feb 11, 2004. . . . . . . . . . . 33 2.14 Detection rates of Windows worms and viruses. We merged the ?le update reports with reports from real deployment on 10 randomly se-
lected days. The detection rate is the percentage of the days when
Seurat detected the worms and viruses. The last row lists the number
of suspicious ?les involved with the new cluster identi?ed by Seurat. . 34 2.15 Suspicious ?les identi?ed to explain the cause of a new cluster detected after we launched the Blaster worm. . . . . . . . . . . . . . . . . . . . 35 2.16 The distribution of average one-dimensional di?erence across all di- mensions (each dimension corresponds to a unique ?le). (a) Average
di?erence between ?le size changes. (b) Average di?erence between
last modi?cation times. . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.17 False alarms generated using feature vectors de?ned by ?le size changes. 38 xii 2.18 Details of the three events when a single new ?le was modi?ed for the ?rst time. ”Number of hosts” means the number of hosts involved in
each ?le update event. . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.19 Seurat identi?es a new cluster of the exact three infected hosts by the Lion worm with the general feature vector space de?ned using ?le size
changes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.20 False alarms generated using feature vectors de?ned by last modi?ca- tion times. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.1 A simpli?ed network topology showing routers, hosts, and ISP bound- aries. The arrows between hosts illustrate communications that spread
an attack through the network. . . . . . . . . . . . . . . . . . . . . . 46 3.2 Example of host contact graph showing the communication between hosts. Attack edges are shown as arrows in black (both solid and
dashed). Filled nodes correspond to hosts in an infected state. . . . . 46 3.3 Example showing the causal tree, which contain causal edges with timestamps from the host contact graph. . . . . . . . . . . . . . . . 47 3.4 Path coverage vs number of highest degree ASes selected. . . . . . . . 49 3.5 Concept behind attack reconstruction: the probability of being in- volved in an attack should be reinforced by network events. The frac-
tion of host ?lled in represents the probability that host is involved in
an attack, as computed by an intrusion detection system. Given the
probability of B and E’s involvement and the communication pattern
between the hosts, the likelihood of C, D, and F’s involvement should
be increased. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.6 An edge at time k in the host contact graph. . . . . . . . . . . . . . . 55 3.7 The parameters of a host contact graph with a fast propagating worm. 57 3.8 (a) Fraction of infected hosts as an attack advances. X-axis represents time in terms of units. Y-axis corresponds to the fraction of hosts that
are already infected in the host contact graph. The total fraction of
vulnerable hosts is 0.1. (b) Estimated probability of an edge being
traversed in one random moonwalk. . . . . . . . . . . . . . . . . . . . 58 3.9 False negative rate vs. false positive rate of ?nding k = 1 causal edges. 60 3.10 Estimation of the maximum false positive rate of identifying the attack source. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.11 False positive rate of ?nding k = 1 causal edges vs. maximum path length d. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 xiii 3.12 Description of traces with di?erent rate worm tra?c arti?cially added into a real tra?c trace collected from the backbone of a university
network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.13 Stem plot of edge frequency counts with W = 10 4 walks on Trace-10. 63 3.14 Detection accuracy of causal edges and attack edges vs. number of top frequency edges (Z) returned for Trace-10 and Trace-50. Note there
are only 800 causal edges from among approximately 1.5-2 × 10 6 total ?ows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.15 (a) Fraction of initial causal edges among the actual returned causal edges. (b) The number of source hosts involved as suspect top level
hosts vs. number of top frequency edges (Z) returned. . . . . . . . . 65 3.16 Graph of the 60 top frequency ?ows returned by the random moonwalk algorithm when run on Trace-50. Note the graph is neither the host
contact graph, nor the causal tree. Hosts are represented by circles
annotated with the host ID. Flows are represented as directed arrows
between hosts, and are annotated with timestamps. Solid arrows de-
note causal edges, dashed arrows denote non-causal attack edges, and
dotted edges correspond to normal tra?c ?ows. . . . . . . . . . . . . 66 3.17 Impact of parameter selection on performance using both Trace-20 and Trace-50. (a) Detection accuracy vs. d (b) Detection accuracy vs. ?t
(c) Actual path length vs. ?t. . . . . . . . . . . . . . . . . . . . . . 67 3.18 Detection accuracy vs. worm scanning rate. The X-axis represents the worm inter-scan duration. For example, a window of x = 20 means an
infected host generates an infection ?ow every 20 seconds. . . . . . . 69 3.19 (a) Comparing detection accuracy using worms with di?erent scan- ning methods using Trace-20. (b) Comparing detection accuracy using
worms targeting di?erent fraction of vulnerable hosts F . . . . . . . . 70 3.20 Detection accuracy of causal edges using di?erent background tra?c models. “Power-law” means the controlled variable follows a power-
law distribution (with a coe?cient set to 3). “Uniform” means the
controlled variable follows a uniform distribution. |H| denotes the total
number of hosts in the network, and C is a constant number smaller
than |H|. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.21 Five example network ?ows. A, B, C, D, and E are ?ve di?erent hosts in the network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 xiv 3.22 (a) The ?ow graph de?ned by the ?ve network ?ows listed in Fig- ure 3.21. (b) The host contact graph de?ned by the same ?ve network
?ows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.23 (a) The convergence rate of the power method on both the transformed ?ow graph and the original ?ow graph. (b) The ranks of initial causal
?ows in the largest eigenvector. The causal ?ows are sorted based on
the rank obtained from the original ?ow graph. . . . . . . . . . . . . 77 3.24 (a) Computed ?ow probability distribution with a worm attack. (b) Error margin parameter k vs. number of walks W . . . . . . . . . . . 79 3.25 The ranks of initial causal ?ows computed deterministically compared against the ranks computed using random moonwalk sampling. . . . . 79 3.26 The con?gurations of the two scanning worms. . . . . . . . . . . . . . 82 3.27 Probability distributions with and without random scanning worms. (a) The normalized attack rate is 7. (b) The normalized attack rate is 1. 82 3.28 The con?gurations of the three scanners. . . . . . . . . . . . . . . . . 83 3.29 Probability distributions with and without random scanning hosts. (a) An aggressive random scanner (b) Scanners that spread infection via
a pre-compiled hitlist. . . . . . . . . . . . . . . . . . . . . . . . . . . 83 3.30 The con?gurations of the two types of topological spreading attacks. . 85 3.31 Probability distributions with and without topological spreading worms. (a) Each infected host scans its neighbor set randomly. (b) Each in-
fected host scans its neighbors exactly once. . . . . . . . . . . . . . . 85 3.32 The con?gurations of the two hitlist spreading worms. . . . . . . . . . 86 3.33 Probability distributions with and without hitlist spreading worms. (a) Each infected host randomly scans a pre-compiled list of vulnerable
hosts. (b) Infected hosts scan in a coordinated way, with each one
performing a maximal two scans to only un-compromised hosts. . . . 87 3.34 (a) The distribution of the number of subnet hosts. (b) The distribu- tion of tra?c generated. . . . . . . . . . . . . . . . . . . . . . . . . . 90 3.35 Remove 4 subnets by tra?c. . . . . . . . . . . . . . . . . . . . . . . . 91 3.36 Remove 10 subnets by tra?c. . . . . . . . . . . . . . . . . . . . . . . 91 3.37 Detection accuracy with trace data missing from the subnets that gen- erated most amount of tra?c. . . . . . . . . . . . . . . . . . . . . . . 91 3.38 Remove 10 subnets by host. . . . . . . . . . . . . . . . . . . . . . . . 92 3.39 Remove 20 subnets by host. . . . . . . . . . . . . . . . . . . . . . . . 92 xv 3.40 Detection accuracy with trace data missing from the subnets that have the most number of hosts. . . . . . . . . . . . . . . . . . . . . . . . . 92 3.41 Remove 4 subnets by infection time. . . . . . . . . . . . . . . . . . . 93 3.42 Remove 10 subnets by infection time. . . . . . . . . . . . . . . . . . . 93 3.43 Detection accuracy with trace data missing from the subnets that con- tain hosts who were infected earliest in time. . . . . . . . . . . . . . . 93 3.44 Host communication graph with 42 top frequency ?ows after 10 4 walks on the trace where we remove 10 subnets selected based on the top
level infected hosts. Each circle represents a host with the number in
the circle indicating the host ID. The ”Missing subnets” node represent
all the hosts that belong to the subnets without tra?c auditing. Solid
arrows indicate successful infection ?ows (i.e. causal edges), while the
dashed arrows represent unsuccessful infection attempts. The dotted
arrows correspond to normal tra?c ?ows. The number beside each
arrow indicates the ?ow ?nish time. . . . . . . . . . . . . . . . . . . . 95 3.45 The causal ?ow detection accuracies with the decreasing amount of trace data available. (a) Performance vs. the number of subnets se-
lected. (b) Performance vs. the amount of tra?c removed. . . . . . . 96 3.46 Fraction of causal ?ows missing vs. the total amount of tra?c removed using the three di?erent selection methods. . . . . . . . . . . . . . . . 96 3.47 An AD graph illustrating the concept of incremental Attacker Identi- ?cation and Attack Reconstruction. . . . . . . . . . . . . . . . . . . . 99 3.48 The distributed random moonwalk algorithm. . . . . . . . . . . . . . 100 3.49 The causal edge detection accuracy of the top 100 frequency ?ows using the distributed random moonwalk algorithm. . . . . . . . . . . . . . 100 3.50 Tra?c statistics of the two sub-networks for distributed forensic analysis100 3.51 Detection accuracy achieved with di?erent heuristics. “Baseline” refers to the default random moonwalk algorithm without heuristics. . . . . 102 6.1 Host communication graph with 50 top frequency ?ows after 10 4 walks on Trace-20 (Figure 3.12). Each circle represents a host with the num-
ber in the circle indicating the host ID. Solid arrows indicate successful
infection ?ows (i.e. causal edges), while the dashed arrows represent
unsuccessful infection attempts. The dotted arrows correspond to nor-
mal tra?c ?ows. The number beside each arrow indicates the ?ow
?nish time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 xvi Chapter 1 Introduction Correlation, de?ned as “establishing or ?nding relationship between entities”, is a
recognized technique in security to improve the e?ectiveness of threat identi?cation
and analysis process by combining information from multiple sources. With the in-
creasing deployment of various security monitors and sensors, e?ectively analyzing
audit data for extracting only desired information is an important step to attack dis-
covery, attack response, forensic analysis, and prediction of future attacks. By looking
at collective information on a set of events rather than the individual ones, one can
identify more types of attacks with fewer false positives. The world of network security, however, is an arms race. While many correla- tion techniques today are e?ective at identifying a wide variety of existing attacks,
future attacks can be more stealthy, constantly changing their signatures to evade
the detection of individually deployed monitors. Nevertheless, in order to infect a
large number of hosts in a network and disrupt their normal functions, large scale
attacks will exhibit discernible patterns when locally generated events are viewed in
an aggregated way. Detecting and identifying these more sophisticated attacks re-
quires us to correlate the huge volume of events taking place at distributed locations
in the network. While the sheer quantity of event records could be overwhelming, the
increasing performance and capacities of computing devices, networks, and storage
devices make it possible to store, transfer, and analyze large quantities of audit data
in a global coordinated way. The topic of this thesis is a new spatiotemporal event correlation approach to capture the global abnormal patterns of a wide class of attacks, whose activities,
when observed individually, may not seem suspicious or distinguishable from normal
host activity changes. In this introductory chapter, we ?rst review existing correlation
approaches in computer security. We then more formally de?ne our approach, describe
the challenges, and present the methodology used in the thesis. 1 2 CHAPTER 1. INTRODUCTION 1.1 Existing Correlation Approaches Figure 1.1 shows the di?erent ways of classifying correlation approaches in computer
security. Based on the source of input data, we can classify the existing approaches
as temporal correlation and spatial correlation. The temporal correlation approach
relates and analyzes event sequences that span across a range of time period. Such
approaches either look for deviations from a normal event model de?ned through
learning, or specify a set of rules to encode sequences of events that known attacks
must follow. In the learning based temporal correlation, the system is usually trained to learn characteristics of normal events. Further investigation is carried when there are sig-
ni?cant deviations from the learned normal model. For example, Warraender et al.
have proposed a number of machine learning approaches to detect anomalous pro-
gram executions based on short sequences of run-time system calls [108]. Because
such approaches build normal models upon past known activities, any new unseen
event is suspicious, possibly resulting in a high false positive rate [26]. Rule-based temporal correlations usually de?ne a set of rules that will be used to match a sequence of events across di?erent times. Such event sequence can be
encoded as either normal activities (in which case, an alarm will be raised in case of
rule violation), or attacks (in which case, an alarm will be raised in case of rule match-
ing). Although rule-based temporal correlation generally raises fewer false alarms for
security investigation, it may not cope with new types of attacks. Furthermore, such
approaches often require system administrators to manually specify rules based on
accurate, speci?c knowledge about normal user activities or known attack patterns,
which is not only time consuming, but also di?cult. The spatial correlation approach focuses on analyzing events that take place across multiple locations for security diagnosis. Such events can also distribute across a
time range. The input to the correlation modules can be either low level observed
raw events or high level alert events generate from local audit data. Examples of
low level raw events include ?le system updates, network packets, and system call
invocations. High level alerts usually refer to ?ltered abnormal events or aggregated
event summaries output by IDS systems such as Snort [83] or Bro [75]. In order to
distinguish these two types of inputs in this document, we de?ne low level raw events
as events and de?ne high level alert events as alerts. Most of the existing spatial correlation techniques focus on correlating high level alerts. Since a single attack can often induce multiple di?erent alerts generated lo-
cally from raw event inputs, these alerts can be further aggregated to output attack
scenarios to system administrators. Both statistical methods and rule based methods
can be applied to spatial correlation of alerts. Example techniques include statisti- 1.1. EXISTING CORRELATION APPROACHES 3 Correlation Type of Input data Homogeneous Heterogeneous Level of Input data Alert based Event based Source of Input data Temporal Spatial Correlation method Statistical Rule-based Figure 1.1: Classi?cation of correlation approaches in security. cally clustering alerts based on similarity of alert attributes [103] for attack causality
analysis, and combining alerts using well de?ned logical rules [23]. Compared with
low level event records, the number of high level alerts is signi?cantly smaller. Hence
correlating these alerts generally has manageable complexity. The disadvantage, how-
ever, is its limited detection capability on stealthy attacks that do not induce alerts
based on only locally observed events [80], even though they may exhibit abnormal
patterns when viewed globally. As attackers get more sophisticated, they may try
to blend attack events gradually into normal activities to evade individual monitor
detection. Another orthogonal classi?cation of correlation approaches is based on whether the input events are of a same type or di?erent types. Correlating heterogenous
types of events can potentially improve the accuracy of alarms based on di?erent
views of system states. For example, Abad et al have proposed both a top-down
approach and a bottom-up approach to correlate anomalies from di?erent types of
logs for intrusion detection [3]. In the top-down approach, known attacks are analyzed
to determine attack signatures from various logs, while in the bottom-up approach,
anomalies from multiples types of logs are correlated to detect new attacks. Other
examples include [4], where the authors correlate alerts from heterogenous sensors
using a similarity measure based on overlapping features. Because many these existing
approaches still require local event ?ltering before correlation, their capability is, 4 CHAPTER 1. INTRODUCTION Normal Infected File updates
from normal activities File updates
from attacks File updates from
attack removal File updates from
attacks/normal activities Figure 1.2: File update events vs. host states. again, limited for identifying stealthy attack patterns that can evade the detection of
individual monitors. In summary, most of the existing correlation approaches have focused on either temporal correlation, or correlating alerts from individual monitors. They usually
generate high false positive rates, or fail to detect stealthy attacks that do not seem
abnormal based on local views. 1.2 Spatiotemporal Event Correlation In this thesis, we explore a new approach of correlation for computer security. This
approach correlates events across both space and time, identifying aggregated abnor-
mal event patterns to the host state updates. We focus on low level observed events,
such as host ?le system updates or network communication ?ows, instead of high level
IDS alerts. As such, we de?ne our approach as spatiotemporal event correlation. More formally, a state of a host is a well-de?ned logical or operational mode. An event is a sequence of actions that, directly or indirectly, cause the state of a host to
transit from one to another. Such transition can happen between the same states or
di?erent states. For the purpose of security applications, we consider only two types
of logical states of a host: “normal” and “infected”. A host is in the normal state if it
is in the lack of any existing or potential malicious code execution, otherwise, the host
is in the infected state. Example events that may cause a host’s state to transit from
“normal” to “infected” include ?le system updates, network communication ?ows,
keyboard inputs, or mouse clicks from human users. In the case of ?le system updates, as illustrated by Figure 1.2, normal ?le updates, generated by user activities or system routine maintenance, will transit the host state
between normal states. However, ?le creations, modi?cations, or deletions induced
by a virus or worm for the ?rst time, will change the state of a vulnerable host 1.2. SPATIOTEMPORAL EVENT CORRELATION 5 Normal Infected Normal flows Malicious flows
for the first time Normal flows /
Malicious flows Figure 1.3: Network ?ow events vs. host states. from “normal” to “infected”. Once a host is in the infected state, further ?le updates
caused by viruses or worms, will transit the host state between infected states without
changing it. Finally, after we detect the infection, those ?le updates associated with
the virus removal process will change the host state from “infected” back again to
“normal”. Similarly shown in Figure 1.3, the reception of a communication ?ow that
carries malicious payload for the ?rst time will turn a vulnerable host’s state from
“normal” to “infected”, while the reception of other types of ?ows will not change
the host state. Spatiotemporal event correlation is based on a key observation that events of inter- est in a network system often have both temporal and spatial locality. In particular,
events induced by malware propagation often exhibit spatial locality, in the sense that
similar updates or events tend to occur across many of the hosts in a network system.
If one host is compromised by an attack, other hosts with similar vulnerabilities in
the network are likely to be compromised as well, hence generating similar events or
updates. These events also exhibit temporal locality in the sense that they tend to
be clustered closely in time. If one host is compromised by a malicious attack, other
hosts are likely to be compromised by the same attack soon afterwards. Each such
event, when viewed individually, may not seem suspicious or abnormal. When they
are viewed collectively, their abnormal patterns may stand out due to their locality
across space and time. The goal of spatiotemporal event correlation is therefore to
identify atypical such aggregate events, or the lack of typical ones. 1.2.1 Why Spatiotemporal Event Correlation? The correlation capability across both space and time allows us to learn the patterns
of normal events at individual locations over time, and thus detect new abnormal
events. More importantly, we can combine and compare these events across multiple
locations, hence reliably identifying only aggregated abnormal event patterns that are
caused by malicious intrusions. Such approach can eliminate false alarms caused by 6 CHAPTER 1. INTRODUCTION normal activity pattern shifts at single locations, without foreknowledge of normal
changes and without system-speci?c knowledge for rules. Meanwhile, by globally correlating events across multiple locations, it is possible to identify stealthy attacks that may not be detectable by looking only at events at
individual locations. While many existing approaches such as [44] have focused on
identifying virulent, fast propagating attacks, future attacks can potentially be much
stealthier. According to reports on recent attack trends [19, 90], both the e?ectiveness
of infection and the level of attack sophistication have been increasing. On one hand,
the degree of automation in attack tools and the speed of discovering vulnerable hosts
have continued to advance, leaving less time for attack response and attack defense.
On the other hand, attacks are getting increasingly stealthier to mimic normal host
communication behaviors. They can propagate via various methods (e.g., hitlist scan,
peer-to-peer propagation), self-evolve to use di?erent signatures (e.g., metamorphic
worms, polymorphic worms), and exploit well-known protocols and ports (e.g., IRC,
HTTP). Such attack events, when viewed from a single location in isolation, may seem
subtle or invisible to trigger local alerts, their abnormal structures or patterns will
potentially stand out when viewed aggregately. By identifying these global abnormal
patterns or structures based on events instead of alerts, the spatiotemporal event
correlation approach can be agnostic to attack signatures or scanning rates, and
potentially be applicable to a wide range of attacks. In summary, by correlating events across both space (multiple locations) and time (past and present): • We can distinguish malicious behavior from normal activity changes more reli- ably to reduce false positive rates. • We can identify a wide class of attacks that do not trigger local alarms, but exhibit discernable global patterns. 1.2.2 Challenges Involved The spatiotemporal event correlation approach requires us to process low level raw
event inputs across both space and time. There are thus a larger volume of data
to be collected and analyzed, compared with analyzing events from single locations,
or correlating pre-?ltered alerts generated by local monitors. Furthermore, the data
to be collected and analyzed may belong to di?erent domains or entities, containing
private information about both administrative domains and end users. Consequently,
there will be a number of challenges for the approach to be e?ective in practice as
a result of the increased scale of data and the need for sharing data among di?erent
users or service providers. 1.2. SPATIOTEMPORAL EVENT CORRELATION 7 1. Compact event representation: We need compact representations of events in order to reduce both the amount of audit data to be processed and the com-
plexity of the correlation engine. A more compact and generic representation
is also more robust to various types of attacks, because there will be less room
for attackers to exploit in order to evade detection. 2. E?cient correlation algorithm: The key part of correlation lies in the algo- rithmic components that e?ectively correlate the events to extract “interesting”
global patterns for attack detection and analysis. Since the volume of audit data
to be correlated can be much larger than the number of pre-?ltered alerts, the
correlation algorithm itself should have low complexity for it to be used in prac-
tice. The correlation algorithms may need to operate over a centralized data
repository (in the case of one domain), or will need to interact with distributed
data retrieval and query mechanisms (in the case of multiple domains). 3. A framework for data storage and query: The distributed nature of audit events requires a framework to collect, store, and query these events for e?cient
correlation analysis. One challenging problem is how to deploy data monitors to
maximize coverage while minimizing costs. Once collected, these audit data may
need to be stored at di?erent repositories due to either the scale of the network
size, or the boundary of administrative domains. In such case, queries to the
events must be e?ciently routed to the correlation module for fast detection
and response. 4. Privacy protection: Since the events to be correlated relate with the states of di?erent hosts and users, issues of trust and cooperation raise challenges
with respect to privacy protection, especially when audit data distribute across
multiple independent domains. In addition, low level observed events tend to
contain more sensitive information about both ISP and user privacy than high
level alerts. It is important for the data query and correlation process to leak as
minimum information as possible about both the domain proprietaries and end
users, while still be able to e?ectively identify abnormal patterns for detection
and analysis. 1.2.3 Thesis Approach Spatiotemporal event correlation is a general approach that can be applicable to
various security problems. While the high level idea is the same, given a speci?c
problem, one needs to address each of the above challenges based on application
speci?c semantics. 8 CHAPTER 1. INTRODUCTION In the scope of this thesis, we explore the spatiotemporal event correlation ap- proach in the context of two important security applications. We focus on addressing
the ?rst two challenges mentioned above. We show that, in spite of the data com-
plexity, one can e?ectively reduce the volume of input data through feature selection
mechanisms and compact graph representations. We also present two di?erent al-
gorithms that e?ectively identify the global abnormal patterns or abnormal graph
structures by leveraging application domain knowledge. To address the third challenge, this thesis uses a centralized architecture to store and analyze events. A framework for distributed data retrieval and correlation is
perhaps more appealing in larger scaled systems, and left as future work. This thesis will also discuss and address certain issues related with privacy protec- tion in the two example applications based on application speci?c properties. Com-
pletely addressing this issue, however, is beyond the scope of this document, and
identi?ed as future work. We note that the spatiotemporal event correlation approach speci?es the sources of input data and the levels of input data (see Figure 1.1). It puts no requirement
about the correlation methods and the types of input data to use. The latter two
are thus orthogonal dimensions, and should be decided based on detailed application
requirements. In the context of the two applications in this thesis, we focus on
statistical correlation methods to eliminate the need of rule speci?cations by human
users or administrators. For the types of input data, this thesis considers mostly
homogeneous types of events during the correlation process, but is not limited to
them. In fact, we will show that, in both applications, the incorporation of multiple
types of events will enhance the e?ectiveness of attack detection and analysis. Hence
the algorithms that we explore are complementary to those that utilize heterogeneous
types of input events. The highlighted boxes in Figure 1.1 show the focus of this thesis
among the rami?cations. 1.3 Two Example Applications To demonstrate the viability and e?ectiveness of spatiotemporal event correlation
in security, this thesis focuses on two representative security applications: anomaly
detection and network forensics. In the application of anomaly detection, we present
a prototype system called Seurat that can be used to e?ectively detect attacks that
target at multiple hosts by correlating host ?le system updates across both space
and time [114]. In the application of network forensics, this thesis argues that it
is important for the network to support automatic forensic analysis abilities after
an attack has happened. We present such a general framework called Dragnet [91],
and propose a random moonwalk algorithm that determines the origin of epidemic 1.3. TWO EXAMPLE APPLICATIONS 9 Application Anomaly detection Network forensics Prototype / Framework Seurat Dragnet Event representation Feature vectors Host contact graphs Correlation algorithm Feature reduction & clustering Random moonwalks Data access Centralized/future work Centralized/future work Privacy protection Domain speci?c/Future work Domain speci?c/Future work Figure 1.4: The high level summary of the two example applications. spreading attacks by exploring the structure of host communication graphs through
correlation [117]. Figure 1.4 summarizes how we address the challenges for each
application in high level. We introduce both of them brie?y next. 1.3.1 Spatiotemporal Event Correlation for Anomaly Detec-
tion Anomaly detection is a widely used approach for detecting attacks in cyber security
analysis. Such techniques usually de?ne a model of normal host or network activi-
ties. Deviations from the normal model indicate anomalous events and should raise
an alarm. Compared with approaches that detect known attacks via pre-de?ned sig-
natures, anomaly detection identi?es new types of attacks that have not been seen
before. Existing techniques of anomaly detection focus on detecting anomalous events based on a normal model built from single host activity patterns. Because these tech-
niques explore only the temporal locality of events occurring on individual hosts, they
can potentially generate high false positive rates due to the di?culty of separating
malicious events from normal activity changes. In this thesis, we exploit both the spatial and temporal locality of events in a network system for anomaly detection. We focus on identifying aggregated abnormal
events to detect rapidly propagating Internet worms, virus, zombies, or other mali-
cious attacks that compromise multiple hosts in a network system at a time (e.g.,
one or two days). Once these automated attacks are launched, most of the vulner-
able hosts get compromised due to the propagation of the attacks and the scanning
preferences of the automated attack tools. Correlating events across multiple hosts
can therefore expose malicious activities and reduce those false alarms generated by
normal activity pattern changes at individual hosts. Our prototype system, Seurat, represents events that cause host state transitions as ?le system updates. The correlation is performed by clustering points, each rep- 10 CHAPTER 1. INTRODUCTION resenting an individual host state transition due to one or more ?le system changes,
in a multi-dimensional feature space. Each feature indicates the change of a ?le
attribute, with all features together describing the host state transitions of an indi-
vidual machine during a given period (e.g., one day). Over time, the abstraction of
point patterns inherently re?ects the aggregated host activities. For normal host state
changes, the points should follow some regular pattern by roughly falling into several
clusters. Abnormal changes, which are hard to detect, or distinguished from single
host normal pattern changes by monitoring that host alone, will stand out when they
are correlated with other normal host state changes. Such correlation method there-
fore shares some ?avor of pointillism – a style of painting that applies small dots onto
a surface so that from a distance the dots blend together into meaningful patterns,
and we call it the pointillist approach. In this application, the number of ?le updates at each host daily could be on the order of thousands. Our feature reduction mechanisms can successfully reduce the
input data complexity by orders of magnitude. The extensive experiment evaluation
shows that Seurat can e?ectively detect the propagation of well known worms and
viruses with a low false alarm rate. 1.3.2 Spatiotemporal Event Correlation for Network Foren-
sics While end-system based approaches to defend and respond to attacks show promise
in the short-term, future attackers are bound to come up with mechanisms that
outwit existing signature-based detection and analysis techniques. We believe the
Internet architecture should be extended to include auditing mechanisms that enable
the forensic analysis of network data, with a goal of identifying the true originator
of each attack — even if the attacker recruits innocent hosts as zombies or stepping
stones to propagate the attack. In this thesis, we outline a framework for network forensic analysis called Dragnet, with the promise to dramatically change investigations of Internet-based attacks. The
key components of this framework are Attacker Identi?cation and Attack Reconstruc-
tion. They together will provide accountability for attacks in both wide area networks
and intranets, to deter future attackers. As a ?rst step toward realizing the Dragnet framework, this thesis focuses on the speci?c problem of identifying the origin of epidemic spreading attacks such as
Internet worms. Our goal is not only to identify the “patient zero” of the epidemic, but
also to reconstruct the sequence of events during the initial spread of the attack and
identify which communications were the causal ?ows by which one host infected the
next. The notion of events in this application is thus denoted by the communication 1.4. CONTRIBUTIONS AND THESIS OUTLINE 11 ?ows between end hosts, and the causal ?ows correspond to those events that transit
host states from normal to infected. Given such ?ow notion of events, our correlation algorithm exploits one invariant across all epidemic-style attacks (present and future): for the attack to progress there
must be communication among attacker and the associated set of compromised hosts.
The communication ?ows that cause new hosts to become infected form a causal tree,
where a causal ?ow from one computer (the “parent”) to its victim (the “child”) forms
a directed “edge” in this tree. The algorithm works by repeatedly sampling paths on
the host communication graph with random walks. Each walk randomly traverses the
edges of the graph backwards in time, and is called a random moonwalk. By correlating
these communication events that take place at multiple locations in a network, the
overall tree structure of an attack’s propagation, especially those initial levels, stands
out after repeated random moonwalks to trace back the worm origin. Thus the
random moonwalk algorithm can be agnostic to attack signatures or scanning rates,
and potentially be applicable to all worm attacks. In this application, spatiotemporal event correlation enables us to identify abnor- mal patterns or structures that cannot be identi?ed by existing approaches. Complex-
ity reduction is achieved through both e?cient graph representations of communica-
tion events and statistical sampling method. We show that the random moonwalk
algorithm is both e?ective and robust. It can detect initial causal ?ows with high
accuracy for both fast propagating worms and a wide variety of stealthy attacks. 1.4 Contributions and Thesis Outline The main contribution of this thesis is a general solution for more reliably identifying
a wide range of attacks, whose activities, when viewed individually, may seem normal
or indistinguishable from normal pattern changes, but nevertheless exhibit global ab-
normal event patterns or structures. This same high level concept of spatiotemporal
event correlation guides our problem formulation and algorithm design in both ex-
ample applications we study. Next, we overview the thesis organization and discuss
the speci?c contributions made in each application. In Chapter 2, we present Seurat, a prototype system for anomaly detection by correlating ?le system updates across both space and time. Our primary contributions
in Seurat is a novel pointillist approach for detecting aggregated ?le update events
shared across multiple hosts. We start with a binary feature vector space, describing
how we de?ne the vectors and reduce feature dimensions for clustering. We then
present a more generalized feature vector space to incorporate additional information
for detecting more stealthy attacks. We show that despite the large volume of ?le
updates daily, Seurat can e?ectively detect various attacks and identify only relevant 12 CHAPTER 1. INTRODUCTION ?les involved. In Chapter 3, we present the Dragnet framework for network forensics. We make contributions in problem formulations to perform large scaled postmortem forensic
analysis, and identify its two key components — Attacker Identi?cation and Attack
Reconstruction. We then discuss in this chapter the required infrastructure support
to analyze network tra?c across space and time. Our major contribution in Dragnet
is the random moonwalk sampling algorithm, which is the ?rst known technique to
identify origins of epidemic attacks. This algorithm is extensively evaluated with a
wide class of attack scenarios to demonstrate its e?ectiveness and robustness using
analysis, simulation experiments, and real trace study. We show that our algorithm
is closely related with spectral analysis, a well known technique for analyzing graph
structures. Finally, we present how our algorithm can be elegantly adapted to dis-
tributed scenarios. In Chapter 4, we survey related work in two parts. In the ?rst part, we discuss various other e?orts in leveraging distributed information and correlation mechanisms
for security. In the second part, we present related work for each speci?c application
we study. Finally, Chapter 5 summarizes the thesis work, discusses limitations, and outlines future research directions. Chapter 2 Anomaly Detection Using
Spatiotemporal Event Correlation 2.1 Introduction Anomaly detection, together with misuse detection, are two major categories of in-
trusion detection methods, aiming at detecting any set of actions that attempt to
compromise the integrity, con?dentiality, or availability of information resources [40]. Anomaly detection approaches assume known knowledge of expected normal host or system states. Activities or changes that are di?erent from the pre-de?ned normal
model are anomalous and can potentially be associated with intrusive attempts. Its
objective is thus to determine whether observed events are “normal” or “abnormal”.
Such methods can detect new types of intrusions that have not been observed before.
Compared with this approach, misuse detection approaches capture the distinguishing
features of malicious attacks with a signature (e.g., a block of attack code). Intrusions
are detected via signature match. Such approach is widely used for detecting known
attacks, but cannot catch unknown attacks without a speci?c signature beforehand,
where the e?cacy of anomaly detection comes in. In this thesis, we explore the spatiotemporal event correlation approach for anomaly detection. The idea is to correlate host state transitions de?ned as ?le system up-
dates across both space (multiple hosts) and time (past and present), detecting similar
coincident changes to the patterns of host state updates that are shared across mul-
tiple hosts. Example causes of such coincident events include administrative updates
that modify ?les that have not been modi?ed before, and malware propagations that
cause certain log ?les, which are modi?ed daily, to cease being updated. In both
cases, host ?le system updates exhibit both the temporal and spatial locality that
can be exploited by the spatiotemporal event correlation approach. 13 14CHAPTER 2. ANOMALY DETECTION USING SPATIOTEMPORAL EVENT CORRELATION By exploring both the temporal and spatial locality of host state changes in a net- work system, spatiotemporal event correlation identi?es anomalies without foreknowl-
edge of normal changes and without system-speci?c knowledge. Existing approaches
focus on the temporal locality of host state transitions, overlooking the spatial lo-
cality among di?erent hosts in a network system. They either de?ne a model of
normal host state change patterns through learning, or specify detailed rules about
normal changes. Learning based approaches train the system to learn characteris-
tics of normal changes. Since they focus only on the temporal locality of single-host
state transitions, any signi?cant deviation from the normal model is suspicious and
should raise an alarm, possibly resulting in a high false positive rate [26]. Rule-based
approaches such as Tripwire [46] require accurate, speci?c knowledge of system con-
?gurations and daily user activity patterns on a speci?c host. Violation of rules then
suggests malicious intrusions. Although rule-based intrusion detection raises fewer
false alarms, it requires system administrators to manually specify a set of rules for
each host. The correlation capability cross both space and time allows us to learn the
patterns of normal state changes over time, and to detect those anomalous events cor-
related among multiple hosts due to malicious intrusions. This obviates the need for
speci?c rules while eliminating the false alarms caused by single host activity pattern
shifts. The correlation is performed by clustering points, each representing an individual host state transition, in a multi-dimensional feature space. Each feature indicates
the change of a ?le attribute, with all features together describing the host state
transitions of an individual machine during a given period (e.g., one day). Over time,
the abstraction of point patterns inherently re?ects the aggregated host activities.
For normal host state changes, the points should follow some regular pattern by
roughly falling into several clusters. Abnormal changes, which are hard to detect by
monitoring that host alone, will stand out when they are correlated with other normal
host state changes. Hence our approach shares some ?avor of pointillism – a style of
painting that applies small dots onto a surface so that from a distance the dots blend
together into meaningful patterns. Figure 2.1 illustrates the pointillist approach to anomaly detection. There are ?ve hosts in the network system. We represent state changes on each host daily as a point
in a 2-dimensional space in this example. On normal days, the points roughly fall into
the dash-circled region. The appearance of a new cluster consisting of three points
(indicated by the solid circle) suggests the incidence of an anomaly on host A, B, and
D, which may all have been compromised by the same attack. Furthermore, if we
know that certain hosts (e.g., host A) are already compromised (possibly detected by
other means such as a network based IDS), then we can correlate the state changes
of the compromised hosts with the state changes of all other hosts in the network
system to detect more infected hosts (e.g., host B and D). 2.1. INTRODUCTION 15 0 10 20 30 40 50 60 0 5 10 15 Feature 1 F
e
a
t
u
r
e

2
host-A
host-B
host-C
host-D
host-E
Abnormal cluster Normal cluster A C B E D X X X Figure 2.1: Pointillist approach to anomaly detection. Normal points are clustered by
the dashed circle. The appearance of a new cluster consisting of three points suggests
anomalous events on host A, B, and D. We have implemented a prototype system, called Seurat 1 , that uses ?le system updates to represent host state changes for anomaly detection. Seurat successfully
detects the propagation of a manually launched Linux worm and a list of well known
Windows worms and viruses on a number of hosts in an isolated cluster. Seurat has a
low false alarm rate when evaluated by a real deployment in both Linux and Windows
systems. These alarms are caused by either administrative updates or network wide
experiments. The false negative rate and detection latency, evaluated with both
simulated attacks and real attacks, are low for fast propagating attacks. For slowly
propagating attacks, there is a tradeo? between false negative rate and detection
latency. For each alarm, Seurat identi?es the list of hosts involved and the related
?les, which we expect will be extremely helpful for system administrators to examine
the root cause and dismiss false alarms. The rest of this chapter is organized as follows: Section 2.2 describes the Seurat threat model. Section 2.3 introduces the algorithm for correlating host ?le system
updates across both space and time. Section 2.4 evaluates the pointillist approach.
Section 2.5 describes an extension of Seurat to detect stealthy attacks by exploiting
more features of ?le updates. Section 2.6 discusses the limitations of our system and
Section 2.7 suggests possible improvements for future work. 1 Seurat is the 19th century founder of pointillism. 16CHAPTER 2. ANOMALY DETECTION USING SPATIOTEMPORAL EVENT CORRELATION 2.2 Attack Model In this thesis, we focus on anomaly detection in a single network system. A network
system is a collection of host computers connected by a network in a single admin-
istrative domain. We note that our technical approach of correlation is not limited
by the administrative domain boundaries. However, issues of trust and privacy may
raise concerns regarding distributed data collection and querying, which we will not
address completely in this thesis. For this reason, we limit our discussion to hosts
inside a single administrative domain and adopt a centralized architecture for event
storage and analysis. The goal of Seurat is to automatically identify anomalous events by correlating the state change events of all hosts in a network system. Hence Seurat de?nes an
anomalous event as an unexpected state change close in time across multiple hosts in
a network system. We focus on rapidly propagating Internet worms, viruses, or other malicious at- tacks that compromise multiple hosts in a network system at a time (e.g., one or
two days). We have observed that, once fast, automated attacks are launched, most
of the vulnerable hosts get compromised due to the rapid propagation of the attack
and the scanning preferences of the automated attack tools. According to CERT’s
analysis [20], the level of automation in attack tools continues to increase, making it
faster to search vulnerable hosts and propagate attacks. Recently, the Slammer worm
hit 90 percent of vulnerable systems in the Internet within 10 minutes [63]. Worse,
the lack of diversity in systems and software run by Internet-attached hosts enables
massive and fast attacks. Computer clusters tend to be con?gured with the same
operating systems and software. In such systems, host state changes due to attacks
have strong temporal and spatial locality that can be exploited by Seurat. Although Seurat will more e?ectively detect system changes due to fast propagat- ing attacks, it can be generalized to detect slowly propagating attacks as well. This
can be done by varying the time resolution of reporting and correlating the collective
host state changes. We will discuss this issue further in Section 2.6. However, Seu-
rat’s global correlation cannot detect abnormal state changes that are unique to only
a single host in the network system. Seurat represents events that cause host state changes using ?le system updates. [77] found that 83% of the intrusion tools and network worms they surveyed modify
one or more system ?les. These modi?cations would be noticed by monitoring ?le
system updates. There are many security tools such as Tripwire [46] and AIDE [57]
that rely on monitoring abnormal ?le system updates for intrusion detection. We use the ?le name, including its complete path, to identify a ?le in the network system. We regard di?erent instances of a ?le that correspond to a common path 2.3. CORRELATION-BASED ANOMALY DETECTION 17 name as a same ?le across di?erent hosts, since we are mostly interested in system
?les which tend to have canonical path names exploited by malicious attacks. We
treat ?les with di?erent path names on di?erent hosts as di?erent ?les, even when
they are identical in content. For the detection of anomalies caused by attacks, we have found that this repre- sentation of host state changes is e?ective and useful. However, we may need di?erent
approaches for other applications of Seurat such as ?le sharing detection, or for the
detection of more sophisticated future attacks that alter ?les at arbitrary locations
as they propagate. For example, we can investigate the use of ?le content digests
instead of ?le names as future work. 2.3 Correlation-based Anomaly Detection We de?ne a d-dimensional feature vector H ij = v 1 , v 2 , . . . , v d to represent the ?le system update attributes for host i during time period j. Each H ij can be plotted as a point in a d-dimensional feature space. Our pointillist approach is based on
correlating the feature vectors by clustering. Over time, for normal ?le updates, the
points follow some regular pattern (e.g., roughly fall into several clusters). From time
to time, Seurat compares the newly generated points against points from previous
time periods. The appearance of a new cluster, consisting only of newly generated
points, indicates abnormal ?le updates and Seurat raises an alarm. In the rest of this section, we ?rst present how we de?ne the feature vector space and the distances among points. We then describe the methods Seurat uses to reduce
feature vector dimensions for clustering to work most e?ectively. Finally, we discuss
how Seurat detects abnormal ?le updates by clustering. 2.3.1 A Binary Feature Vector Space Many attacks install new ?les on a compromised host, or modify ?les that are infre-
quently updated. Various information such as host ?le updates, ?le update times,
and ?le size changes can be used as indicators of the existence of those attacks. To
simplify exposition, we describe binary feature vectors for representing host ?le up-
dates ?rst. We will then explore a more general feature vector space that utilizes
other type of information Seurat collects in Section 2.5. Each dimension in the binary feature vector space corresponds to a unique ?le (indexed by the full-path ?le name). As such, the dimension d of the space is the
number of ?le names present on any machine in the network system. We de?ne
the detection window to be the period that we are interested in ?nding anomalies. 18CHAPTER 2. ANOMALY DETECTION USING SPATIOTEMPORAL EVENT CORRELATION 0 1, 0, 1, 1, 0 0, 1, 1, 1, 1 1, 0, 1, 1, F 5 F 4 F 3 F 2 F 1 V 3 = H 12 = < V 2 = H 21 = < V 1 = H 11 = < > > > Figure 2.2: Representing host ?le updates as feature vectors. F 1 , F 2 , F 3 , F 4 , F 5 are ?ve di?erent ?les (i.e., ?le names). Accordingly, the feature vector space has 5 dimensions
in the example. day j day j-1 day j-2 … … day j-t+1 day j-t Comparison Window Correlation Window Detection Window Figure 2.3: Detection window, comparison window, and correlation window. The
detection window is day j. The comparison window is from day j ? t to day j ? 1.
The correlation window is from day j ? t to day j. In the current prototype, the detection window is one day. For each vector H ij = v 1 , v 2 , . . . , v d , we set v k to 1 if host i has updated (added, modi?ed, or removed) the k-th ?le on day j, otherwise, we set v k to 0. The vectors generated in the detection window will be correlated with vectors generated on multiple previous days. We treat the generated feature vectors as a set
of independent points. The set can include vectors generated by the same host on
multiple days, and vectors generated by multiple hosts on the same day. In the rest
of the chapter, we use V = v 1 , v 2 , . . . , v d to denote a feature vector for convenience. Figure 2.2 shows how we represent the host ?le updates using feature vectors. The correlation is based on the distances among vectors. Seurat uses a cosine distance metric, which is a common similarity measure between binary vectors [10, 45].
We de?ne the distance D(V 1 , V 2 ) between two vectors V 1 and V 2 as their angle ? computed by the cosine value: D(V 1 , V 2 ) = ? = cos ?1 V 1 · V 2 |V 1 ||V 2 | For each day j (the detection window), Seurat correlates the newly generated vectors with vectors generated in a number of previous days j ? 1, j ? 2, . . .. We 2.3. CORRELATION-BASED ANOMALY DETECTION 19 de?ne the abnormal ?le update events on day j as the ?le update patterns that have
not occurred on previous days. We de?ne the comparison window of day j as the
days that we look back for comparison, and the correlation window of day j as the
inclusive period of day j and its comparison window. Vectors generated outside the
correlation window of day j are not used to identify abnormal ?le updates on day
j. Figure 2.3 illustrates the concepts of detection window, comparison window, and
correlation window. Since each vector generated during the comparison window serves as an example of normal ?le updates to compare against in the clustering process, we explore the tem-
poral locality of normal update events by choosing an appropriate comparison window
for each day. The comparison window size is a con?gurable parameter of Seurat. It
re?ects how far we look back into history to implicitly de?ne the model of normal
?le updates. For example, some ?les such as /var/spool/anacron/cron.weekly on
Linux platforms are updated weekly. In order to regard such weekly updates as nor-
mal updates, administrators have to choose a comparison window size larger than
a week. Similarly, the size of the detection window re?ects the degree of temporal
locality of abnormal update events. Since Seurat correlates ?le updates across multiple hosts, we are interested in only those ?les that have been updated by at least two di?erent hosts. Files that
have been updated by only one single host in the network system throughout the
correlation window are more likely to be user ?les. As such, we do not select them as
relevant dimensions to de?ne the feature vector space. 2.3.2 Feature Selection Most ?le updates are irrelevant to anomalous events even after we ?lter out the
?le updates reported by only a single host. Those ?les become noise dimensions
when we correlate the vectors (points) to identify abnormal updates, and increase the
complexity of the correlation process. We need more selective ways to choose relevant
?les and reduce feature vector dimensions. We have implemented two methods for this
purpose: (1) wavelet-based selection, and (2) principal component analysis (PCA). Wavelet-based Selection The wavelet-based selection method regards each individual ?le update status as a
discrete time series signal S. Given a ?le i, the value of the signal on day n, denoted
by S i (n), is de?ned as the total number of hosts that update ?le i on day n in the network system. Each such signal S i can be decomposed into a low frequency signal cA i re?ecting the long term update trend, and a high frequency signal cD i 20CHAPTER 2. ANOMALY DETECTION USING SPATIOTEMPORAL EVENT CORRELATION Number of host day Daily variations Long term update trend Figure 2.4: Representing ?le update status with wavelet transformation. The original
signal is S, which can be decomposed into a low frequency signal cA re?ecting the
long term update trend, and a high frequency signal cD re?ecting the daily variations
from the long-term trend. re?ecting the day-to-day variation from the long term trend (see Figure 2.4). If the
high frequency signal cD i shows a spike or a dip on a certain day, we know that a signi?cantly larger or smaller number of hosts updated ?le i than on a normal day,
respectively. We then select ?le i as a relevant feature dimension in de?ning the
feature vector space. Seurat detects signal spikes and dips using the residual signal of the long-term trend. The similar technique has been used to detect disease outbreaks [119] and
network tra?c anomalies [7]. To detect anomalies on day j, the algorithm takes as
input the list of ?les that have been updated by at least two di?erent hosts in the
correlation window of day j. Then, from these ?les the algorithm selects a subset
that will be used to de?ne the feature vector space. Figure 2.5 shows the steps to select features by wavelet-based method. Given a ?xed correlation window of day j, the algorithm starts with constructing a time
series signal S i for each ?le i, and decomposes S i into cA i and cD i using a single- level Daubechies wavelet transformation as described. Then we compute the residual
signal value R i (j) of day j by subtracting the trend value cA i (j ? 1) of day j ? 1 from the original signal value S i (j) of day j. If |R i (j)| exceeds a preset threshold ?, then the actual number of hosts who have updated ?le i on day j is signi?cantly larger
or smaller than the prediction cA i (j ? 1) based on the long term trend. Therefore, Seurat selects ?le i as an interesting feature dimension for anomaly detection on day
j. As an example, Figure 2.6 shows the original signal and the residual signal of a ?le
using a 32-day correlation window in a 22-host teaching cluster. Note the threshold 2.3. CORRELATION-BASED ANOMALY DETECTION 21 For each ?le i: 1. Construct a time series signal: S i = cA i + cD i 2. Compute the residual signal value of day j: R i (j) = S i (j) ? cA i (j ? 1) 3. if |R i (j)| > ?, then select ?le i as a feature dimension. Figure 2.5: Wavelet-based feature selection. 0 10 20 30 40 50 60 0 2 4 6 8 10 12 14 16 18 20 Days Number of hosts 0 10 20 30 40 50 60 ?20 ?15 ?10 ?5 0 5 10 15 Days R
e
si
d
u
a
l

S
i
g
n
a
l
,
R i (a) (b) Figure 2.6: Wavelet transformation of ?le update status. (a) The original signal of
the ?le update status (b) The residual signal after wavelet transformation value ? of each ?le is a parameter selected based on the statistical distribution of
historical residual values. PCA-based Dimension Reduction PCA is a statistical method to reduce data dimensionality without much loss of
information [43]. Given a set of d-dimensional data points, PCA ?nds a set of d -
dimensional vectors, called principal components, that account for the variance of the
input data as much as possible. Dimensionality reduction is achieved by projecting
the original d-dimensional data onto the subspace spanned by these d orthogonal
vectors. Most of the intrinsic information of the d-dimensional data is preserved in
the subspace. 22CHAPTER 2. ANOMALY DETECTION USING SPATIOTEMPORAL EVENT CORRELATION We note that the updates of di?erent ?les are usually correlated. For example, when a software package is updated on a host, many of the related ?les will be modi?ed
together. Thus we can perform PCA to identify the correlation of ?le updates. Given a d-dimensional feature space Z d 2 , and a list of m feature vectors V 1 , V 2 , . . ., V m ? Z d 2 , we perform the following steps using PCA to obtain a new list of feature vectors V 1 , V 2 , . . . , V m ? Z d 2 (d < d) with reduced number of dimensions: 1. Standardize each feature vector V k = v 1k , v 2k , . . . , v dk (1 ? k ? m) by sub- tracting each of its elements v ik by the mean value of the corresponding di- mension u i (1 ? i ? d). We use V k = v 1k , v 2k , . . . , v dk ? Z d 2 to denote the standardized vector for the original feature vector V k . Then, v ik = v ik ? u i (where u i = m
j =1 v ij m , 1 ? i ? d) 2. Use the standardized feature vectors V 1 , V 2 , . . . , V m as input data to PCA in order to identify a set of principal components that are orthogonal vectors
de?ning a set of transformed dimensions of the original feature space Z d 2 . Se- lect the ?rst d principal components that account for most of the input data
variances (e.g., 90% of data variances) to de?ne a subspace Z d 2 . 3. Project each standardized feature vector V k ? Z d 2 onto the PCA selected sub- space Z d 2 to obtain the corresponding reduced dimension vector V k ? Z d 2 . Note that PCA is complementary to wavelet-based selection. Once we ?x the correlation window of a particular day, we ?rst pick a set of ?les to de?ne the feature
vector space by wavelet-based selection. We then perform PCA to reduce the data
dimensionality further. 2.3.3 Anomaly Detection by Clustering Once we obtain a list of transformed feature vectors using feature selection, we cluster
the vectors based on the distance between every pair of them. We call the cluster a new cluster if it consists of multiple vectors only from the detection window. The appearance of a new cluster indicates a change in ?le update
patterns occurred during the detection window and should raise an alarm. There are many existing algorithms for clustering, for example, K-means [33, 34] or Single Linkage Hierarchical Clustering [45]. Seurat uses a simple iterative algorithm,
which is a common method for K-means initialization, to cluster vectors without prior
knowledge of the number of clusters [62]. The algorithm assumes each cluster has a 2.4. EXPERIMENTS 23 hub. A vector belongs to the cluster whose hub is closest to that vector compared
with the distances from other hubs to that vector. The algorithm starts with one
cluster whose hub is randomly chosen. Then, it iteratively selects a vector that has
the largest distance to its own hub as a new hub, and re-clusters all the vectors
based on their distances to all the selected hubs. This process continues until there
is no vector whose distance to its hub is larger than the half of the average hub-hub
distance. We choose this simple iterative algorithm because it runs much faster, and works equally well as the Single Linkage Hierarchical algorithm in our experiments. The
reason that even the simple clustering algorithm works well is that the ratio of inter-
cluster distance to intra-cluster distance signi?cantly increases after feature selection. Once we detect a new cluster and generate an alarm, we can further identify the involved hosts and the ?les from which the cluster resulted. The suspicious hosts are
just the ones whose ?le updates correspond to the feature vectors in the new cluster.
To determine which ?les possibly cause the alarm, we only focus on the ?les picked
by the wavelet-based selection to de?ne the feature vector space. For each of those
?les, if it is updated by all the hosts in the new cluster during the detection window,
but has not been updated by any host during the corresponding comparison window,
Seurat outputs this ?le as a candidate ?le. Similarly, Seurat also reports the set of
?les that have been updated during the comparison window, but are not updated by
any host in the new cluster during the detection window. Based on the suspicious hosts and the selected ?les for explaining root causes, system administrators can decide whether the updates are known administrative up-
dates that should be suppressed, or some abnormal events that should be further
investigated. If the updates are caused by malicious attacks, administrators can take
remedial counter measures for the new cluster. Furthermore, additional compromised
hosts can be identi?ed by checking if the new cluster expands later and if other hosts
have updated the same set of candidate ?les. Note that the alarms due to adminis-
trative updates can be suppressed if the administrator provides Seurat with the list
of involved ?les beforehand. 2.4 Experiments We have developed a multi-platform (Linux and Windows) prototype of Seurat that
consists of a lightweight data collection tool and a correlation module. The data
collection tool scans the ?le system of the host where it is running and generates a
daily summary of ?le update attributes. Seurat harvests the summary reports from
multiple hosts in a network system and the correlation module uses the reports for
anomaly detection. 24CHAPTER 2. ANOMALY DETECTION USING SPATIOTEMPORAL EVENT CORRELATION We have installed the Seurat data collection tool on a number of campus o?ce machines and a teaching cluster that are used by students daily. By default, the tool
scans the attributes of all system ?les on a host. The attributes of a ?le include the ?le
name, type, device number, permissions, size, inode number, important timestamps,
and a 16-byte MD5 checksum of ?le content. Each day, each host compares the
newly scanned disk snapshot against that from the previous day and generates a ?le
update summary report. In the current prototype, all the reports are uploaded daily
to a centralized server where system administrators can monitor and correlate the ?le
updates using the correlation module. In this section, we study the e?ectiveness of Seurat’s pointillist approach for de- tecting aggregated anomalous events. Using the daily ?le update reports from our
real deployment, we study the false positive rate and the corresponding causes in
Section 2.4.1 and Section 2.4.2. We evaluate the false negative rate with simulated
attacks in Section 2.4.3. In order to verify the e?ectiveness of our approach on real
malicious attacks, we launched real worms and viruses into isolated computer clusters.
We report the results in Section 2.4.4 and Section 2.4.5. 2.4.1 False Positives on Linux Platforms The best way to study the e?ectiveness of our approach is to test it with real data. We
have deployed the Seurat data collection tool on both Linux and Windows hosts, and
correlate the reports for daily anomaly detection. The results from both deployments
are similar. We ?rst illustrate the details of the experiment and the results with the
data from our Linux host deployment, where we have longer period of data collection
than for our Windows deployment. Then, we describe the results of the Windows
host deployment brie?y in Section 2.4.2. We deployed Seurat on a teaching cluster of 22 Linux hosts and have been collect- ing the daily ?le update reports since Nov 2003. The teaching cluster is mostly used
by students for their programming assignments. The cluster is also occasionally used
by a few graduate students for running network experiments. For privacy protection,
personal ?les under user home directories are not scanned for Linux platforms. In this experiment, we use the ?le update reports from Dec 1, 2003 until Feb 29, 2004 to evaluate the false positive rate. During this period, there are a few days
when a couple of hosts failed to generate or upload reports due to system failure or
recon?gurations. For those small number of missing reports, we simply ignore them
because they do not a?ect the aggregated ?le update patterns. We set the correlation window to 32 days in order to accommodate monthly ?le update patterns. That is, we correlate the update pattern from day 1 to day 32 to
identify abnormal events on day 32, and correlate the update pattern from day 2 to 2.4. EXPERIMENTS 25 day 33 to detect anomalies on day 33, etc. Thus, our detection starts from Jan 1,
2004, since we do not have 32-day correlation windows for the days in Dec 2003. Dimension Reduction Once we ?xed the correlation window of a particular day, we identify relevant ?les
using wavelet-based selection with a constant threshold ? = 2 to de?ne the feature
vector space for simplicity. We then perform PCA to reduce the data dimensionality
further by picking the ?rst several principal components that account for 98% of the
input data variance. Throughout the entire period of 91 days, 772 ?les with unique ?le names were updated by at least two di?erent hosts. Figure 2.7 (a) shows the number of hosts
that updated each ?le during the data collection period. We observe that only a
small number ?les (e.g.,/var/adm/syslog/mail.log) are updated regularly by all of
the hosts, while most other ?les (e.g., /var/run/named.pid) are updated irregularly,
depending on the system usage or the applications running. Figure 2.7 (b) shows the results of feature selection. There were, on average, over 2000 ?les updated by hosts in the cluster during each correlation window (i.e., one
day). Among those ?les, there were on average 140 ?les updated by at least two
di?erent hosts daily. After wavelet-based selection, the average number of feature
dimensions is 17. PCA further reduces the vector space dimension to an average of
2. Overall, we achieved about 3 orders of magnitude dimensionality reduction. Days since Dec 1, 2003 File ID (sorted) 10 20 30 40 50 60 70 80 90 100 200 300 400 500 600 700 10 20 30 40 50 60 10 0 10 1 10 2 10 3 Days since Jan 1, 2004 Number of Dimensions Total files updated
Files updated by multiple hosts
Dimensions after wavelet
Dimensions after PCA (a) (b) Figure 2.7: Feature selection and dimension reduction. (a) File update patterns. Files
are sorted by the cumulative number of hosts that have updated them throughout
the 91 days. The darker the color is, the more hosts updated the corresponding ?le.
(b) The number of feature vector dimensions after wavelet-based selection and PCA
consecutively. 26CHAPTER 2. ANOMALY DETECTION USING SPATIOTEMPORAL EVENT CORRELATION False Alarms 173 26 322 36 20 1 84 17 15 4 10 01?19?04 66 16 13 65 95 27 89 49 49 27 57 53 81 11 14 01?20?04 261 90 9 165 58 80 35 7 01?21?04 New Cluster! 436 21 241 3 01?22?04 165 343 6 2 159 22 6 01?23?04 New Cluster! 151 162 22 72 196 94 6 01?24?04 Figure 2.8: Clustering feature vectors for anomaly detection at Linux platforms. Each
circle represents a cluster. The number at the center of the ?gure shows the total
number of clusters. The radius of a circle corresponds to the number of points in the
cluster, which is also indicated beside the circle. The squared dots correspond to the
new points generated on the day under detection. New clusters are identi?ed by a
thicker circle. After dimension reduction, we perform clustering of feature vectors and identify new clusters for each day. Figure 2.8 illustrates the clustering results of 6 consecutive
days from Jan 19, 2004 to Jan 24, 2004. There are two new clusters identi?ed on Jan
21 and Jan 23, which involve 9 hosts and 6 hosts, respectively. Since Seurat outputs
a list of suspicious ?les as the cause of each alarm, system administrators can tell if
the new clusters are caused by malicious intrusions. Based on the list of ?



Download A Spatiotemporal Event Correlation Approach to Computer Security.pdf
Comments
Your talk will be first one...
Your Name:
Your Email:
Your Talk:
Google Search
Google