`
`David Whyte
`
`Evangelos Kranakis
`
`P.C. van Oorschot
`
`Worms are arguably the most serious security threat
`facing the Internet. Seeking a detection technique that is
`both suf(cid:2)ciently ef(cid:2)cient and accurate to enable automatic
`containment of worm propagation at the network egress
`points, we propose a new technique for the rapid detec-
`tion of worm propagation from an enterprise network. It
`relies on the correlation of Domain Name System (DNS)
`queries with outgoing connections from an enterprise net-
`work. Improvements over existing scanning worm detec-
`tion techniques include: (1) the possibility to detect worm
`propagation after only a single infection attempt; (2) the
`capacity to detect zero-day worms; and (3) a low false
`positive rate. The precision of this (cid:2)rst-mile detection
`technique supports the use of automated containment and
`suppression strategies to stop fast scanning worms before
`they leave the network boundary. We believe that this tech-
`nique can be applied with the same precision to identify
`other forms of malicious behavior within an enterprise
`network including: mass-mailing worms, network recon-
`naissance activity, and covert communications. Currently,
`it is unclear if our DNS-based detector will work for all
`network protocols.
`In some network environments, the
`DNS detection technique may need to be used as a sec-
`ondary input to a more sophisticated anomaly detector.
`
`School of Computer Science
`Carleton University
`Ottawa, Ontario, Canada
`@scs.carleton.ca
` dlwhyte, kranakis, paulv
`Abstract
`tion doubled in size every 8.5 seconds [14] with 90% of
`vulnerable hosts infected in just 10 minutes. This worm
`achieved its full scanning rate (i.e. over 55 million scans
`per second) only 3 minutes after it was released. In Au-
`gust 2003 the SoBig worm caused an estimated $5 billion
`in damage and at the height of its infection was respon-
`sible for approximately 73% of all Internet email traf(cid:2)c
`[6]. Unfortunately, worm outbreaks of this scale are be-
`coming commonplace. In March 2004, the Witty Worm
`began to spread by exploiting a buffer over(cid:3)ow in Internet
`Security Systems (ISS) products that include (cid:2)rewalls and
`intrusion detection systems. Although the vulnerable pop-
`ulation of Internet systems was a magnitude smaller than
`previous worms, it spread very rapidly [19]. To achieve
`the rate of propagation observed, it is believed that this
`worm used a preprogrammed hitlist or a timed release of
`the worm on previously compromised systems. Witty was
`the (cid:2)rst widely propagated worm to contain a malicious
`payload and signi(cid:2)es a disturbing new trend for worm
`writers, combining skill and malice [23].
`Staniford et al. [21] hypothesized that a properly con-
`structed worm could infect vulnerable systems on the In-
`ternet at an even greater speed. Worms are evolving and
`they can employ a number of anti-detection techniques
`such as: anti-forensics, dynamic behavior, and modular-
`ity of attack tools [16]. Furthermore, worms spread so
`quickly that traditional intrusion detection methods (i.e.
`generation and deployment of attack signatures) are not
`feasible [15]. In order to make automatic containment of
`fast scanning worms feasible, a rapid and accurate detec-
`tion method is required.
`Currently, most countermeasures used to mitigate these
`attacks include some form of human intervention. Routers
`can be con(cid:2)gured to block network traf(cid:2)c and vulnera-
`ble software can be patched. However, worms that prop-
`agate and infect the Internet in just minutes make these
`human-in-the-loop countermeasures impractical. The de-
`velopment of wide scale automated countermeasures is re-
`quired. Current worm propagation detection methods are
`
`1 Introduction
`Recently, a multitude of high pro(cid:2)le worm epidemics
`has affected millions of networked computing devices.
`The Slammer Worm that emerged in January of 2003 ex-
`posed how quickly worm propagation could occur. It in-
`fected systems by exploiting a buffer over(cid:3)ow vulnerabil-
`ity in Microsoft SQL Server. Slammer’s infected popula-
` This paper appears in the Proceedings of the 12th Annual Network
`3-4, 2005. c
`
`and Distributed System Security Symposium, San Diego, USA. February
`ISOC
`
`GUEST TEK EXHIBIT 1005
`Guest Tek v. Nomadix, IPR2019-01191
`
`Page 1 of 15
`
`
`
`
`limited in: (1) their speed of detection, (2) their inability
`to accurately detect zero-day worms, (3) their inability to
`detect slow scanning worms, and (4) their high false posi-
`tive rate.
`Typically, scanning worms use a pseudo random num-
`ber generator (PRNG) to generate 32 bit random numbers
`that correspond to an IPv4 address. The attacking system
`uses this numeric address as the target for its infection
`attempt. The use of a numeric IP address by the worm,
`instead of the quali(cid:2)ed domain name of the system, ob-
`viates the need for a DNS query.
`In contrast, the vast
`majority of legitimate publicly available services are ac-
`cessed through the use the DNS protocol which provides
`the mapping between numeric IP addresses and the corre-
`sponding alphanumeric names. The translation of a host
`name to a registered IP address is called resolving. While
`there exists valid exceptions (e.g. client to client applica-
`tions, remote administration tools, etc.) typical user be-
`havior should include some form of DNS activity before a
`new connection can be initiated.
`Our Contributions. We use DNS anomalies to detect
`scanning worm propagation, relying on the observation of
`DNS responses. If we do not observe DNS activity be-
`fore a new connection is initiated, we consider this con-
`nection anomalous. This premise is based on our obser-
`vation that whereas users tend to remember alphanumeric
`strings and use the network services provided (i.e. DNS),
`almost all scanning worms directly use numeric IP ad-
`dresses. Behavioral signatures [9] are used to describe
`common aspects of worm behavior particular to a given
`worm that span infected systems in a temporal order. Our
`DNS-based detection technique can be used as a behavo-
`rial signature to detect scanning worms.
`Those legitimate applications and services that gener-
`ally do not rely on DNS are addressed through the use of
`whitelists (see Section 3.2). In an enterprise network with
`an open security policy (i.e. few or no user and service
`restrictions) the number of such applications and services
`may be so large as to make our detection technique prone
`to signi(cid:2)cant amounts of false positives and negatives (see
`Section 5.2). Even in this scenario, DNS-based detection
`may be a useful input to a more sophsticated anomaly de-
`tector. However, we believe the use of our DNS anomaly-
`based detection approach in an enterprise network that im-
`plements a conservative or restrictive security policy (i.e.
`more common in large (cid:2)nancial organizations or govern-
`ment) is appealing for a number of reasons including:
`1. Speed:
`the possibility to detect an infected system
`after only a single infection attempt to the Internet.
`2. Detection of zero-day worms: possible because our
`approach does not rely on the matching of existing
`worm signatures to identify suspicious traf(cid:2)c.
`3. Scanning rate independence: our approach can detect
`
`both fast and slow (i.e. stealth) scanning worms.
`4. Reduced training period: our approach includes the
`concept of a whitelist that can be quickly generated
`to reduce false positives.
`5. Low-false positive rate: our approach does not rely
`on modeling normal network and user behavior pro-
`(cid:2)les that are prone to false positives.
`6. Ease of implementation: our approach is network-
`based, runs on commodity hardware, and relies on
`the observation of a protocol found in every network
`(i.e. DNS).
`
`We believe this new technique can both rapidly and ac-
`curately detect worm propagation within enterprise net-
`works. The precision of this (cid:2)rst-mile detection enables
`the use of automated containment and suppression strate-
`gies to stop scanning worms before they leave the network
`boundary.
`Our detection technique can be used to detect scanning
`worm propagation both within an enterprise network and
`from the enterprise network to the Internet (i.e. local to
`remote). It does not detect worm propagation from the
`Internet to the enterprise network. It differs from exist-
`ing scanning worm detection techniques in that it does not
`rely on having to observe and correlate multiple events to
`determine that a scan is occuring. There is no concept of a
`threshold; we only maintain in state a list of IP addresses
`of valid connection destinations and each individual con-
`nection attempt from the enterprise network as it occurs.
`Our approach enables the detection of an infected system
`after a single scan has been initiated, regardless of the time
`between scans, and thus compares very favorably to pre-
`vious work (e.g. Weaver et al. [25]). Weaver et al. pro-
`pose an algorithm based on the Threshold Random Walk
`(TRW) scan detector [11] to detect a scanning host within
`an enterprise environment after only 10 scans, and it can
`detect scans as slow as 1 scan per minute.
`The sequel is structured as follows. Section 2 presents
`the description of the DNS-based scanning worm propa-
`gation detection technique. Section 3 discusses our ex-
`perimental platform. Section 4 discusses the analysis of
`our prototype. Section 5 presents detection circumvention
`and limitations. Section 6 discusses ideas for extended ap-
`plications of our detection technique. Section 7 discusses
`related work. We conclude in Section 8 with a brief sum-
`mary. Appendix A contains background information.
`
`2 Basic Methodolgy and Approach
`For an overview of worm propagation strategies and
`DNS please refer to Appendix A. In this section we give
`a high-level overview of our DNS-based anomaly scan-
`ning worm detection approach. In larger enterprise net-
`works, it is not unusual for network segments to be either
`
`Page 2 of 15
`
`
`
`In fact, an enterprise
`logically or physically separated.
`network may be comprised of several distinct subnets for
`a variety of reasons including security, ease of adminis-
`tration, and geographical location. We can leverage this
`natural separation of networks to contain worm propaga-
`tion within distinct network segments. As in Silicone De-
`fense’s CounterMalice solution [7], we purposely divide
`the enterprise network into segments called cells. Each
`cell contains a worm containment device to con(cid:2)ne and
`contain worm infection. Our de(cid:2)nition of a cell refers to
`all systems within the same subnet serviced by a distinct
`authoritative DNS server. Figure 1 illustrates how an en-
`terprise network can be divided into cells.
`The propagation of fast-scanning worms can be char-
`acterized as: local to local (L2L), local to remote (L2R),
`or remote to local (R2L). In L2L propagation, a scanning
`worm targets systems within the boundaries of the enter-
`prise network it resides. Topological scanning worms em-
`ploy this strategy. L2R propagation refers to a scanning
`worm within an enterprise network targeting systems out-
`side of its network boundary. Finally, R2L propagation
`refers to worm scanning from the Internet into an enter-
`prise network. In this paper, our worm propagation de-
`tection method detects L2R worm propagation and worm
`propagation between local cells, but not R2L or worm
`propagation that occurs within an individual cell.
`Systems that reside within the same cell typically do
`not use DNS to communicate. The Address Resolution
`Protocol (ARP) [17] is used when a system tries to com-
`municate with another system in the same cell. ARP is
`used by the data link layer to provide a mapping between
`the physical hardware of a system and its assigned IP ad-
`dress. L2L worm propagation can occur within a particu-
`lar cell or span multiple cells depending on the scanning
`strategy of the worm. As noted above, in the present pa-
`
`Router(cid:13)
`
`DNS Server(cid:13)
`
`Firewall(cid:13)
`DNS Server(cid:13)
`Cell 2(cid:13)
`
`DNS Server(cid:13)
`
`Cell 1(cid:13)
`
`Router(cid:13)
`
`Switch(cid:13)
`
`Cell 3(cid:13)
`
`Enterprise Network(cid:13)
`
`Figure 1. Network Cells
`
`per, we handle L2L worm propagation only in the case
`
`that the propagation occurs between cells. In a related pa-
`per [26], we detail how we have adopted the DNS-based
`technique to an ARP-based implementation which detects
`L2L worm propagation within local cells. Figure 2 pro-
`vides an example of how our prototype could be opera-
`tionally deployed. Prototype A in Cell 1 monitors activ-
`ity between Cell 1 and Cell 2. Cell 2 contains the sole
`ingress/egress point for the enterprise network. Prototype
`B, from its vantage point in Cell 2, monitors activity from
`all cells within the enterprise network to external systems.
`Finally prototype C monitors activity between Cell 3 and
`Cell 2. A system in Cell 1 is infected with a scanning
`worm. The infected system begins scanning to locate sus-
`ceptible systems both within Cell 2 and the Internet. The
`prototype device in Cell 1 will detect the scanning activ-
`ity to Cell 2 and generate an alert. The prototype device
`in Cell 2, at the enterprise gateway, will detect scanning
`activity from Cell 1 to the Internet and generate an alert.
`
`Remote Server(cid:13)
`
`Router(cid:13)
`
`Prototype B(cid:13)
`
`DNS Server(cid:13)
`Prototype A(cid:13)
`
`Firewall(cid:13)
`DNS Server(cid:13)
`Cell 2(cid:13)
`
`Prototype C(cid:13)
`DNS Server(cid:13)
`
`Cell 1(cid:13)
`
`Router(cid:13)
`
`Switch(cid:13)
`
`Cell 3(cid:13)
`
`Enterprise Network(cid:13)
`
`Figure 2. DNS Anomaly›based Detection De›
`ployment
`
`DNS Anomaly Detection Approach. In random scan-
`ning, the use of a numeric IP address by the worm, instead
`of the quali(cid:2)ed domain name of the system, obviates the
`need for a DNS query. New connections from the net-
`work that cannot be associated with any DNS activity are
`considered anomalous. If we can observe and correlate
`all locally generated DNS activity and new connection at-
`tempts within an enterprise network, we have the means
`to detect L2L inter-cell or L2R worm propagation. The
`technique does not detect R2L or intra-cell (i.e. within the
`boundaries of a cell) worm propagation.
`However, this approach must take into account valid in-
`stances where no DNS query is required to access a par-
`ticular system or resource. Our analysis of DNS activ-
`ity within a network reveals two instances where this oc-
`curs. The (cid:2)rst results from accessing distributed appli-
`
`Page 3 of 15
`
`
`
`cation and content delivery services. The HTTP protocol
`allows URLs consisting of numeric IP addresses to be em-
`bedded within the data payload of an HTTP packet. It is
`common practice for busy websites to maintain or out-
`source their content to larger centralized image servers to
`allow for better web page retrieval performance. When a
`user accesses a website to retrieve a webpage, they may be
`retrieving the requested material from several geographi-
`cally separated servers. It is not uncommon for the web
`page content to include an IP address of a centralized im-
`age server that the browser uses to retrieve an image or
`media (cid:2)le. In this instance, the browser uses this numeric
`IP address to retrieve the image and does not require a
`DNS resource record. Instead of having to perform a DNS
`request for the object, the numeric IP address is provided
`to the browser in the content of the web page. We consider
`this a valid connection attempt incidentally obtained by a
`previous DNS query.
`The second instance includes those servers and services
`that are simply not accessed with DNS. An application
`may have the numeric IP addresses of systems it needs
`to access embedded in its con(cid:2)guration (cid:2)le. A user may
`specify connections to a server by entering an IP address
`from memory at a command line. In these instances, the
`application or user has a priori knowledge of the IP ad-
`dress of the server they wish to access. This can include
`but is not limited to network server communications, re-
`mote administration tools, and certain peer to peer (P2P)
`applications. DNS, applications, and users are all legiti-
`mate sources of numeric IP addresses that can enable ac-
`cess to services and systems. Legitimate use of numeric IP
`addresses by applications and users can be identi(cid:2)ed and
`added to a whitelist for exclusion from the detection al-
`gorithm. Taking these exceptions into consideration (see
`Whitelists in Section 3.2), we consider any system that
`tries to access another system without receiving a valid
`DNS response as a possible worm infected system.
`
`3 High-Level System Design
`Our software system design uses the libpcap [5] library
`and is comprised of two logical components: the PPE and
`DCE. The Packet Processing Engine (PPE) is responsible
`for extracting the relevant features from the live network
`activity or saved network trace (cid:2)les (see Section 3.1). The
`DNS correlation engine (DCE) maintains in state all rel-
`evant DNS information, a whitelist, and numeric IP ad-
`dresses embedded in HTTP packets extracted by the PPE
`(see Section 3.2). This information is used to verify both
`outgoing TCP connections and UDP datagrams. In this
`context, verifying means ensuring that the destination IP
`address of an outgoing TCP connection or UDP datagram
`can be attributed to either a DNS query, an HTTP packet,
`or an entry in the whitelist. The software can process
`
`either live network traf(cid:2)c or saved network traces in the
`pcap [5] (cid:2)le format. To detect L2R worm propagation,
`the software system must be deployed at all external net-
`work egress/ingress points. To detect worm propagation
`between network cells, a system would need to be de-
`ployed in each cell at the internal ingress/egress points
`(see Figure 2).
`
`Network(cid:13)
`
`Prototype(cid:13)
`
`Packet Processing Engine(cid:13)
`
`5-tuple DNS(cid:13)
`
`5-tuple TCP(cid:13)
`
`5-tuple HTTP(cid:13)
`
`5-tuple DNS(cid:13)
`
`5-tuple DNS(cid:13)
`
`DNS Correlation Engine(cid:13)
`
`5-tuple DNS(cid:13)
`
`Connection Candidate Data Structure(cid:13)
`
`5-tuple Connection Candidate(cid:13)
`5-tuple Connection Candidate(cid:13)
`5-tuple Connection Candidate(cid:13)
`5-tuple Connection Candidate(cid:13)
`
`5-tuple TCP(cid:13)
`
`Alert(cid:13)
`Alert(cid:13)
`Alert(cid:13)
`
`Figure 3. High›level System Design
`
`Figure 3 reveals the high-level design of the prototype.
`In this example, the PPE extracts the relevant features
`from live network activity and bundles these into data to-
`kens. The data tokens are comprised of the appropriate
`5-tuple of features (see Section 3.1) based on the protocol
`extracted. These tokens are consumed by the DCE. The
`DCE uses the tokens to maintain a list of destination IP
`addresses it deems valid and checks any new connection
`attempts from within the enterprise network against this
`list. The DCE will generate an alert if it determines the
`new connection is being initated to a destination IP that is
`not contained in its list.
`
`3.1 Packet Processing Engine
`The PPE is responsible for processing packets of inter-
`est from pcap (cid:2)les or live off the network and extracts
`a variety of information from several protocols. Specif-
`ically, the software must extract relevant features from
`new connection attempts, embedded IP addresses within
`HTTP packets, and all DNS activity occurring within the
`network cell.
`In order to discover new TCP connection attempts, all
`TCP packets with the SYN (cid:3)ag set are examined. TCP
`packets with the SYN only (cid:3)ag set indicate the start of
`the three-way handshake that signi(cid:2)es a new connection
`attempt. UDP is connectionless and does not have the
`concept of a session. Each UDP packet is treated as a dis-
`crete event and thus a potential new connection. Feature
`
`Page 4 of 15
`
`
`
`extraction for either new TCP connections or non-DNS
`UDP datagrams includes the 5-tuple of source IP, source
`port, destination IP, destination port, and timestamp.
`Packets that contain a source port of 80 or 8080 are cap-
`tured and categorized as HTTP packets. All HTTP pack-
`ets are decoded and the payload inspected for any em-
`bedded IP addresses. Any IP addresses discovered in the
`payload are extracted along with the previously de(cid:2)ned
`5-tuple.
`DNS A records are generated when systems within the
`network wish to contact systems in other cells or external
`to the network. Any DNS requests originating from the
`network cells and any DNS replies coming into the net-
`work cells are extracted and decoded. Feature extraction
`for DNS datagrams includes the 5-tuple of DNS source IP,
`DNS source port, TTL, domain name, and resolved IPv4
`address.
`
`3.2 DNS Correlation Engine
`The DNS correlation engine (DCE) is responsible for
`processing information passed by the PPE. The two ma-
`jor functions of the DCE are: (1) to create and maintain a
`data structure of IP addresses and associated features that
`are considered valid connection candidates; and (2) to val-
`idate all new TCP and UDP connection attempts between
`cells or to remote systems against the connection candi-
`date data structure. A valid connection candidate data
`structure is produced by processing DNS A records, em-
`bedded IP addresses in HTTP packets, and the whitelist.
`Connection Candidate Data Structure. All DNS A
`resource record 5-tuples are parsed and added to the con-
`nection candidate data structure. The TTL from each 5-
`tuple is used just as it is in the cache of a DNS server.
`Once the TTL expires, the resource record is purged from
`the DCE’s connection candidate list. Although DNS ac-
`tivity provides the majority of IP addresses to the connec-
`tion candidate data structure, numeric IP addresses within
`HTTP packets must also be considered.
`As previously discussed, numeric IP addresses are reg-
`ularly embedded within HTTP packets. All HTTP 5-
`tuples are parsed and added to the connection candidate
`data structure. Unlike an IP address provided by DNS A
`records, these IP addresses do not have an associated TTL
`that can be used to discard the IP address entry from the
`connection candidate data structure. We can assume that
`a DNS query and response had to occur in order for the
`web site to be initially accessed. Therefore, we can use
`the TTL of the DNS A record of the original request as
`the TTL for the embedded IP address. All IP addresses
`harvested from HTTP decoding are then are maintained
`in state. That is, the assigned TTL values are respected
`and these addresses are valid only as long as the TTL has
`not expired.
`
`Whitelists. To address those client applications that le-
`gitimately do not rely on DNS, a whitelist is generated. A
`whitelist provides a list of IP addresses and port combi-
`nations that are exempt from the detection algorithm. For
`example, in most networks there are systems that regu-
`larly communicate with one another by using IP addresses
`speci(cid:2)ed in con(cid:2)guration (cid:2)les rather than fetched in DNS
`records. Furthermore, speci(cid:2)c applications and users (see
`further discussions below) may also use numeric IP ad-
`dresses instead of DNS to access services or communicate
`with other systems.
`In practice, internal network server communications are
`either well-known or easily discovered. If a hard-coded IP
`address is contained in a network con(cid:2)guration parameter
`or (cid:2)le, it is easily con(cid:2)rmed. These server interactions can
`be modeled and the appropriate IP address port combina-
`tion added of(cid:3)ine to the whitelist for exclusion. However,
`in the case of users, the use of numeric IP addresses may
`be more pervasive and more unpredictable. There are two
`cases worth discussion. In organizations which impose re-
`strictive network security policies, end users are restricted
`to using a (cid:2)nite list of well known services deemed per-
`missible in the security policies. For instance, it may be
`permissable to access FTP and Telnet servers using nu-
`meric IP addresses. To accommodate this, the list of fre-
`quently accessed FTP servers IP addresses could be added
`to the whitelist. Alternatively, so as not to weaken the se-
`curity posture of the network, in such environments (e.g.
`(cid:2)nancial and government) where an organization has tight
`control over its employees, users could be told to enter in
`domain names instead of the IP address. The second case,
`which may be more problematic for whitelists, involves
`end users which enjoy unrestricted or open network secu-
`rity policies. In this case, the number of whitelisted prot-
`cols may limit the effectiveness of the detector.
`The whitelist is granular enough to exempt not only spe-
`ci(cid:2)c IP addresses but also provide for IP address and port
`pairing. For instance, it is possible to specify that a com-
`munication must contain the correct source and destina-
`tion IP addresses as well as the correct destination port
`in order to match the applicable whitelist entry. Over
`time this list will need to be updated in order to re(cid:3)ect
`changes to the network, user activity, and new technol-
`ogy. The more open a network security policy, the greater
`the amount of effort required to maintain the whitelist.
`New Connection Validation. The PPE only extracts
`the relevant information from a single TCP packet for each
`new TCP connection attempt it detects. This includes TCP
`SYN packets addressed to systems outside the cell the pro-
`totype is monitoring. Once a new TCP connection attempt
`is detected, the IP destination address is compared with
`the addresses listed in the connection candidate data struc-
`ture. If the address is not found and it does not match an
`
`Page 5 of 15
`
`
`
`entry in the whitelist, the connection is considered to be
`anomalous and an alert is generated.
`UDP datagrams are regarded as discrete events. The
`PPE extracts the relevant information from the UDP data-
`grams and passes this information to the DCE. Once a new
`UDP datagram is detected, the IP destination address is
`processed as described in the previous paragraph.
`Alerts. An alert is generated when a connection attempt
`to a system in another cell or remote system is detected for
`which there is no associated entry in the connection candi-
`date data structure. Multiple connection attempts between
`the same two systems within a speci(cid:2)ed time window are
`regarded as a single alert. This alert grouping reduces
`the number of alerts generated without reducing the rele-
`vant warning information to the operator. It is not unusual
`for a new TCP session to require a number of connection
`attempts before an actual connection can be established.
`Systems may be busy, unable or simply unwilling to es-
`tablish a session. If a separate alert were generated for
`each unsuccessful connection attempt, a single communi-
`cation between two systems may generate several alerts.
`In regards to UDP, the decision to consider each UDP
`datagram as a possible new connection could result in nu-
`merous alerts that could quickly overwhelm an operator.
`The important intelligence from these alerts is the identi(cid:2)-
`cation of the potentially infected system and the intended
`victim. The fact that it took the worm multiple connec-
`tion attempts or datagrams to infect the system does not
`aid in our propagation detection. The timestamp from the
`(cid:2)rst TCP SYN packet or UDP datagram that generated an
`alert is used as the timestamp for the alert. The alert con-
`tains the time the activity was detected, protocol, source
`and destination IP address and source and destination port
`number.
`
`4 Prototype Evaluation
`4.1 Data Set
`To validate our DNS-based detection approach, we de-
`veloped and tested a fully functional software prototype.
`The software was installed on a commodity PC with a
`Linux operating system and a 10/100 network interface
`card. The prototype implements all features discussed in
`Section 3. To conduct our evaluation, one week of net-
`work traf(cid:2)c was collected at a (cid:2)rewall in front of one of
`our university’s research labs. A Linux system using tcp-
`dump was connected to a tap in front of the (cid:2)rewall to col-
`lect and archive the network traces. We monitored both
`incoming and outgoing network traf(cid:2)c to the lab. The lab
`router is connected to the university’s Internet accessible
`Class B network. The lab network consists of a one quar-
`ter Class C network (i.e. 63) of Internet reachable IPv4
`addresses.
`
`The lab network contains one authoritative DNS server
`that all internal systems in the network are con(cid:2)gured to
`use. The lab’s DNS server has entries associated with the
`lab’s mail server, web server, and Kerberos server. The
`(cid:2)rewall does not permit any inbound connections unless
`they were (cid:2)rst established by an internal system. All sys-
`tems within the lab can access the Internet directly through
`the (cid:2)rewall, which is the sole egress/ingress point for the
`network. Using the cell de(cid:2)nition previously described,
`the lab can be considered one cell in the university’s en-
`terprise network. The lab analysis allowed us to test the
`prototype’s ability to detect L2R worm propagation.
`During the course of our network traf(cid:2)c collection in
`front of the lab (cid:2)rewall, network traf(cid:2)c from a separate
`internal university network was also captured. We will re-
`fer to this network as the Internal Departmental Network
`(IDN). The IDN has its own authoritative DNS server that
`all its internal systems are con(cid:2)gured to use. The IDN can
`be considered another cell in the university’s enterprise
`network. This incidental collection provided us with the
`opportunity to perform additional analysis. In addition to
`running the prototype against the lab network traces, we
`ran the prototype against a (cid:2)ltered version of the IDN net-
`work traces. To address privacy concerns, we restricted
`our inspection of the IDN’s network traces to those pack-
`ets that contained either a source or destination address
`that matched a lab network IP address. The IDN analysis
`allowed us to test the prototype’s ability to detect worm
`propagation between cells.
`At the start of our analysis, we (cid:3)ushed the lab DNS
`server’s cache. This ensured that any new connections
`from lab systems would result in an external DNS query
`to retrieve the appropriate A record instead of accessing
`the lab DNS server’s cache. From our vantage at the net-
`work boundary, we are only able to detect DNS replies
`as they enter the lab network, not those generated inter-
`nally from the DNS server’s cache. The (cid:3)ushing of the
`lab DNS cache ensures that the DCE will contain the same
`DNS information as the lab’s DNS server. In our analy-
`sis, all IP addresses have been modi(cid:2)ed to keep the actual
`IP addresses anonymous. The university network’s IP ad-
`dresses are represented by the 192.168.0.0/16 IP address
`range.
`
`Table 1. Network Data Set
`Network Protocol
`Packet Count
`TCP packets
`5,969,266
`TCP connections
`18,634
`ICMP packets
`4,955
`UDP packets
`5,301,489
`Other
`805,604
`
`Network traf(cid:2)c was collected for a seven-day period
`
`Page 6 of 15
`
`
`
`Table 2. DNS Datagrams
`Date Total Packets DNS
`Request
`Data-
`grams
`6,485
`5,525
`1,192
`2,231
`5,225
`6,121
`4,973
`
`2,101,243
`2,491,663
`847,687
`889,251
`1,339,283
`1,382,642
`1,081,451
`
`06-24-2004
`06-25-2004
`06-26-2004
`06-27-2004
`06-28-2004
`06-29-2004
`06-30-2004
`
`DNS
`Reply
`Data-
`grams
`6,264
`4,951
`658
`3,174
`4,752
`5,998
`4,164
`
`from June 24th to June 30th, 2004. The network traces
`are comprised of all network activity that reached the lab’s
`router from internal systems, systems in the IDN cell, and
`the Internet. During this period, over 5 million UDP pack-
`ets were observed as well as almost 6 million TCP pack-
`ets. A total of 18,634 individual TCP connections oc-
`curred. Table 1 provides the observed protocols and their
`respective quantities.
`DNS is transported mainly over UDP. DNS zone trans-
`fers use the TCP protocol but it is a standard acceptable
`security practice to disallow this feature. Table 2 shows
`the number of DNS request and reply datagrams that were
`detected in the network traces. Overall, we observed that
`the total amount of DNS traf(cid:2)c is a small percentage of
`the total amount of network traf(cid:2)c. An individual DNS
`reply may contain multiple records. In fact, the 10,162
`DNS replies we received in the network actually gener-
`ated 99,994 individual DNS resource records.
`
`4.2 Lab Monitoring Analysis
`
`The lab deployment was used to test the prototype’s
`ability to detect L2R worm propagation. Initially, we ob-
`served the network for a three-hour period the day prior
`to our data set to generate a whitelist. Section 5.2, Ta-
`ble 8 contains the seven entries that comprised the lab’s
`whitelist.
`In order for network activity to be identi(cid:2)ed
`as complying with the whitelist, t