throbber
(12) Unlted States Patent
`(10) Patent No.:
`US 7,660,248 B1
`
`Duffield et a].
`(45) Date of Patent:
`Feb. 9, 2010
`
`USOO7660248B1
`
`(54) STATISTICAL, SIGNATURE-BASED
`APPROACH TO IP TRAFFIC
`CLASSIFICATION
`
`(76)
`
`Inventors: Nicholas G. Duffield, 101 W. 12th St,
`Apt. 7 S, New York, NY (US) 10011;
`-
`.
`Matthew Roughan, 15 Locust St.,
`Ig/flifiziolzgéISIeInailsgooIZigfegst Apt H6
`Chatham, NJ (US) 07928; Oliver
`32:30:11???aJIéivgiggg Rd"
`13
`’
`
`( * ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 776 days.
`
`(51)
`
`58
`
`(
`
`)
`
`.
`(21) Appl. NO" 10/764’001
`(22) File(1‘
`Jan 23 2004
`'
`'
`’
`Int Cl
`(2006 01)
`H04L 12/26
`3702230 1. 370/229. 370/232.
`(52) U S Cl
`370/2'35’ 370/235; 370/252’
`' """"""""""
`'
`'
`h
`’
`’370/229
`S
`F'
`ld f Cl
`'fi
`_
`1e
`0
`3217532§3tl§§0 1621;; 234235235 1 237’
`370/238 ’241 ’24’2 244 245 25’0 25’2’ 253’
`’
`’
`’
`’
`’ 376/231’ 232’
`1
`fil f
`h h'
`’
`.
`1.
`e or comp ete searc
`lstory.
`See app lcatlon
`References Cited
`U.S. PATENT DOCUMENTS
`
`(56)
`
`7,302,682 B2 * 11/2007 Turkoglu .................... 717/174
`................ 718/107
`7,305,676 B1 * 12/2007 Boll et al.
`
`...... 370/230
`4/2008 Klaghofer etal.
`..
`7,359,320 B2 *
`
`7,433,943 B1* 10/2008 Ford
`709/223
`
`......................... 726/13
`7,441,267 B1* 10/2008 Elllott
`OTHER PUBLICATIONS
`ys1s o
`n erne
`a
`ra c,
`ewes, e a,
`c .
`,
`ro-
`t
`1 AnAnal
`_ H t
`tCh tT Hi 0 t 2003 P
`D
`ceedings ofACM SIGCOMM Internet Measurement Conference.
`* Cited by examiner
`Primary ExamineriPankaj Kumar
`Assistant ExamineriMark Mais
`
`(74) Attorney, Agent, or F1rmiHenry Brendzel
`
`ABSTRACT
`(57)
`A signature-based trafiic classification method maps trafiic
`into preselected classes of service (COS). By analyzing a
`known corpus ofdata that clearly belongs to identified ones of
`the preselected classes of service, in a training session the
`method develops statistics about a chosen set of trafiic fea-
`tures. In an analysis session, relative to trafiic of the network
`where QoS treatments are desired (target network),
`the
`method obtains statistical information relative to the same
`chosen set of features for values of one or more predeter-
`mined trafiic attributes that are associated with connections
`that are analyzed in the analysis session, yielding a statistical
`features signature of each of the values of the one or more
`attributes. A classification process then establishes a mapping
`between values of the one or more predetermined traffic
`attributes and the preselected classes of service, leading to the
`establishment of QoS treatment rules.
`
`7,251,218 B2*
`
`7/2007 Jorgensen ................... 370/235
`
`1 Claim, 1 Drawing Sheet
`
`TRAINING SESSION
`(ON TRAINING NETWORK)
`
`ANALYSIS SESSION
`
`(ON TARGET NETWORK)
`
`
`OBTAIN STATISTICAL INFORMATION:
`
` IO
`RELATIVE TO SELECTED FEATURES FOR
`
`
`EACH OF A CHOSEN SET OF CLASSES
`
`
`
` STATISTICAL
`
`"FEATURES—CLASS"
`
`MAPPING
`
`
`OBTAIN STATISTICAL INFORMATION
` 20
`
`
`RELATIVE TO THE SAME SELECTED
`FEATURES, FOR VALUES OF ONE OR
`MORE CONNECTION ATTRIBUTES
`
`STATISTICAL FEATURES SIGNATURE
`
`OF EACH VALUE OF THE ONE ORE
`MORE ATTRIBUTES,
`
`
`ESTABLISH A CLASSIFICATION:
`MAPPING EACH OF THE VALUES OF THE
`ONE OR MORE ATTRIBUTES HAVING A
`INTO A CLASS
`FEATURES SIGNATURE
`
`
`
`ASSIGN PACKETS ARRIVING AT THE
`
`
`TARGET NETWORK TO A CLASS BASED
`ON THE ESTABUSl-IED CLASSIFICATION
`
`APPLY QoS BASED ON THE ASSIGNED CLASS
`
`SO
`
`40
`
`
`
`EX1021
`
`Palo Alto Networks V. Sable Networks
`
`IPR2020-01712
`
`EX1021
`Palo Alto Networks v. Sable Networks
`IPR2020-01712
`
`

`

`U.S. Patent
`
`Feb. 9, 2010
`
`US 7,660,248 B1
`
`FIG.
`
`1
`
`TRAINING SESSION
`(ON TRAINING NETWORK)
`
`OBTAIN STATTSTTCAL INFORMATION:
`RELATTVE TO SELECTED FEATURES FOR
`
`EACH OF A CHOSEN SET OF CLASSES
`
`STATTSTICAL
`
`IFEATURES-CLASS"
`
`MAPPING
`
`ANALYSIS SESSION
`
`(ON TARGET NETWORK)
`
`OBTAIN STATISTICAL INFORMATION
`
`RELATIVE TO THE SAME SELECTED
`FEATURES, FOR VALUES OF ONE OR
`MORE CONNECTION ATTRIBUTES
`
`10
`
`20
`
`STATISTICAL FEATURES SIGNATURE
`
`OF EACH VALUE OF THE ONE ORE
`
`MORE ATTRIBUTES.
`
`ESTABLISH A CLASSIFICATION:
`MAPPING EACH OF THE VALUES OF THE
`ONE OR MORE ATTRIBUTES HAVING A
`
`FEATURES SIGNATURE
`
`INTO A CLASS
`
`30
`
`40
`
`
`
`ASSIGN PACKETS ARRIVING AT THE
`
`TARGET NETWORK TO A CLASS BASED
`
`ON THE ESTABLISHED CLASSIFICATION
`
`APPLY OoS BASED ON THE ASSIGNED CLASS
`
`

`

`1
`STATISTICAL, SIGNATURE-BASED
`APPROACH TO IP TRAFFIC
`CLASSIFICATION
`
`BACKGROUND OF THE INVENTION
`
`This invention relates to traffic classification and, more
`particularly to statistical classification of IP traffic.
`The past few years have witnessed a dramatic increase in
`the number and variety of applications running over the Inter-
`net and over enterprise IP networks. The spectrum includes
`interactive (e. g., telnet, instant messaging, games, etc.), bulk
`data transfer (e.g., ftp, P2P file downloads), corporate; (e.g.,
`Lotus Notes, database transactions), and real-time applica-
`tions (voice, video streaming, etc.), to name just a few.
`Network operators, particularly in enterprise networks,
`desire the ability to support different levels of Quality of
`Service (QoS) for different types of applications. This desire
`is driven by (i) the inherently different QoS requirements of
`different types of applications, e.g., low end-end delay for
`interactive applications, high throughput for file transfer
`applications etc.; (ii) the different relative importance of dif-
`ferent applications to the enterprise%.g., Oracle database
`transactions are considered critical and therefore high prior-
`ity, while traffic associated with browsing external web sites
`is generally less important; and (iii) the desire to optimize the
`usage of their existing network infrastructures under finite
`capacity and cost constraints, while ensuring good perfor-
`mance for important applications.
`Various approaches have been studied, and mechanisms
`developed for providing different Q08 in a network. See, for
`example, S. Blake, et al., RFC 24757an architecture for
`differentiated service, December 1998, http://ww.faqs.org/
`rfcs/rfc2475.html; and C. Gbaguidi, et al., A survey of differ-
`entiated services architectures for the Internet, March 1998,
`http://sscwww.epfl.ch/Pages/publications/p s_files/tr98i
`020.ps; andY. Bemet, et al., A framework for differentiated
`services.
`Internet Draft
`(draft-ietf-diffserv-framework-
`02.txt), February 1999, http://search.ietf.org/internet-drafts/
`draft-ietf-diffserv-framework-02.txt.
`Previous work also has examined the variation of flow
`
`characteristics according to applications. M. Allman, et al.,
`TCP congestion control, IETF Network Working Group RFC
`2581, 1999, investigated the joint distribution of flow dura-
`tion and number ofpackets, and its variation with flow param-
`eters such as inter-packet timeout. Differences were observed
`between the distributions of some application protocols,
`although overlap was clearly also present between some
`applications. Most notably, the distribution of DNS transac-
`tions had almost no overlap with that of other applications
`considered. However, the use of such distributions as a dis-
`criminator between different application types was not con-
`sidered.
`
`There also exists a wealth ofresearch on characterizing and
`modeling workloads for particular applications, with A.
`Krishnamurth, et al., Web Protocols and Practice, Chapter
`10, Web Workload Characterization, Addison-Wesley, 2001;
`and J. E. Pitkow, Summary ofWWW characterizations, W3J,
`223-13, 1999 being but two examples of such research.
`An early work in this space, reported in V. Paxson,
`“Empirically derived analytic models of wide-area TCP con-
`nections,” IEEE/ACM Transactions on Networking, vol. 2,
`no. 4, pp. 316-336, 1994, examines the distributions of flow
`bytes and packets for a number of different applications.
`Interflow and intraflow statistics are another possible
`dimension along which application types may be distin-
`guished and research has been conducted. V. Paxson, et al.,
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`US 7,660,248 B1
`
`2
`
`“Wide-area traffic: The failure of Poisson modeling,” IEEE/
`ACM Transactions on Networking, vol. 3, pp. 226-244, June
`1995, for example, found that user initiated eventsisuch as
`telnet packets within flows or FTP-data connection arrivalsi
`can be described well by a Poisson process, whereas other
`connection arrivals deviate considerably from Poisson.
`Signature-based detection techniques have also been
`explored in the context of network security, attack and
`anomaly detection; e.g. P. Barford et al., Characteristics of
`Network Traffic Flow Anomalies, Proceedings ofACM SIG-
`COMM Internet Measurement Workshop, October 2001; and
`P. Barford, et al., A Signal Analysis of Network Traffic
`Anomalies, Proceedings ofACM SIGCOMM Internet Mea—
`surement Workshop, November 2002, where one typically
`seeks to find a signature for an attack.
`Actually, realization of a service differentiation capability
`requires (i) association of the traffic with the different appli-
`cations, (ii) determination of the QoS to be provided to each,
`and finally, (iii) mechanisms in the underlying network for
`providing the QoS; i.e., for controlling the traffic to achieve a
`particular quality of service.
`While some of the above-mentioned studies assume that
`
`one can identify the application trafiic unambiguously and
`then obtain statistics for that application, none of them have
`considered the dual problem of inferring the application from
`the traffic statistics. This type of approach has been suggested
`in very limited contexts such as identifying chat traffic in C.
`Dewes, et al., An analysis of Internet chat systems, Proceed-
`ings ofACM SIGCOMM Internet Measurement Conference,
`October 2003.
`
`Still, in spite of a clear perceived need, and the prior art
`work reported above, widespread adoption of QoS control of
`traffic has not come to pass. It is believed that the primary
`reason for the slow spread of QoS-use is the absence of
`suitable mapping techniques that can aid operators in classi-
`fying the network traffic mix among the different QoS
`classes. We refer to this as the Class of Service (CoS) mapping
`problem, and perceive that solving this would go a long way
`in making the use of QoS more accessible to operators.
`
`SUMMARY
`
`An advance in the art of providing specified Q08 in an IP
`network is achieved with a signature-based trafiic classifica-
`tion method that maps traffic into preselected classes of ser-
`vice (CoS). By analyzing, in a training session, a known
`corpus of data that clearly belongs to identified ones of the
`preselected classes of service, the method develops statistics
`about a chosen set of traffic features. In an analysis session,
`relative to traffic of the network where QoS treatments are
`desired (target network), obtaining statistical
`information
`relative to the same chosen set of features for values of one or
`
`more predetermined trafiic attributes that are associated with
`connections that are analyzed in the analysis session, yielding
`a statistical features signature of each of the values of the one
`or more attributes. A classification process then establishes a
`mapping between values of the one or more predetermined
`traffic attributes and the preselected classes of service, lead-
`ing to the establishment of rules. Once the rules are estab-
`lished, traffic that is associated with particular values of the
`predetermined trafiic attributes are mapped to classes of ser-
`vice, which leads to a designation of QoS.
`Illustratively, the preselected classes of service may be
`interactive trafiic, bulk data transfer trafiic, streaming traffic
`and transactional traffic. The chosen set of traffic features
`
`may be packet-level features, flow-level features, connection-
`level features, intra-flow/connection features, and multi-flow
`
`

`

`US 7,660,248 Bl
`
`3
`features. The predetermined traffic attributes may be the
`server port, and the server IP address. An illustrative rule
`might state that "a connection that specifies port x belongs to
`the class of interactive traffic." An administrator of the target
`network may choose to give the highest QoS level to such
`traffic.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`5
`
`4
`All future packets of a session, in either a TCP or UDP
`session, use the same pair of ports to identify the client and
`server side of the session. Therefore, in principle, the TCP or
`UDP server port number can be used to identify the higher
`layer application by simply identifying in an incoming packet
`the server port and mapping this port to an application using
`the IANA (Internet Assigned Numbers Authority) list ofreg(cid:173)
`istered ports (http://www.iana.org/assignments/port-num-
`bers). However, port-based application classification has
`limitations. First, the mapping from ports to applications is
`not always well defined. For instance.
`Many implementations of TCP use client ports in the reg-
`istered port range. This might mistakenly classify the
`connection as belonging to the application associated
`with this port. Similarly, some applications ( e.g., old
`bind versions), use port numbers from the well-known
`ports to identify the client site of a session.
`Ports are not defined with IANA for all applications, e.g.,
`P2P applications such as Napster and Kazaa.
`An application may use ports other than its well-known
`ports to circumvent operating system access control
`restrictions. E.g., non-privileged users often run WWW
`servers on ports other than port 80, which is restricted to
`privileged users on most operating systems.
`There are some ambiguities in the port registrations, e.g.,
`port 888 is used for CDDBP (CD Database Protocol)
`and access-builder.
`In some cases server ports are dynamically allocated as
`needed. For example, FTP allows the dynamic negotia(cid:173)
`tion of the server port used for the data transfer. This
`server port is negotiated on an initial TCP connection,
`which is established using the well-known FTP control
`port.
`The use of traffic control techniques like firewalls to block
`35 unauthorized, and/or unknown applications from using a net(cid:173)
`work has spawned many work-arounds which make port
`based application authentication harder. For example, port 80
`is being used by a variety of non-web applications to circum(cid:173)
`vent firewalls which do not filter port-80 traffic. In fact, avail-
`40 able implementations ofIP over HTTP allow the tunneling of
`all applications through TCP port 80.
`Trojans and other security attacks generate a large volume
`of bogus traffic which should not be associated with the
`applications of the port numbers those attacks use.
`A second limitation of port-number based classification is
`that a port can be used by a single application to transmit
`traffic with different QoS requirements. For example, (i)
`Lotus Notes transmits both email and database transaction
`traffic over the same ports, (ii) sep (secure copy), a file trans-
`50 fer protocol, runs over ssh (secure shell), an interactive appli(cid:173)
`cation using default TCP port 22. This use of the same port for
`traffic requiring different QoS requirements is quite legiti(cid:173)
`mate, and yet a good classification must separate different use
`cases for the same application. A clean QoS implementation
`55 is still possible through augmenting the classification rules to
`include IP address-based disambiguation. Server lists exist in
`some networks but, again, in practice these lists are often
`incomplete, or a single server could be used to support a
`variety of different types of traffic, so we must combine port
`60 and IP address rules.
`A possible alternative to port based classification is to use
`a painstaking process involving installation of packet sniffers
`and parsing packets for application-level information to iden(cid:173)
`tify the application class of each individual TCP connection
`65 or UDP session. However, this approach cannot be used with
`more easily collected flow level data, and its collection is
`computationally expensive, limiting its application to lower
`
`FIG. 1 presents a flow chart of the IP traffic classification 10
`method disclosed herein.
`
`DETAILED DESCRIPTION
`
`In accord with the principles disclosed herein QoS imple- 15
`mentations are based on mapping of traffic into classes of
`service. In principle the division of traffic into CoS could be
`done by end-points of the network, where traffic actually
`originates-for instance by end-user applications. However,
`for reasons of trust and scalability ofadministration and man- 20
`agement, it is typically more practical to perform the CoS
`mapping within the network; for instance, at the router that
`connects the Local Area Network (LAN) to the Wide Area
`Network (WAN). Alternatively, there might be appliances
`connected near the LAN to WAN transition point that can 25
`perform packet marking for QoS.
`CoS mapping inside the network is a non-trivial task. Ide(cid:173)
`ally, a network system administrator would possess precise
`information on the applications running inside the adminis(cid:173)
`trator's network, along with simple and unambiguous map- 30
`pings, which information is based on easily obtained traffic
`measurements ( e.g., by port numbers, or source and destina(cid:173)
`tion IP addresses). This information is vital not just for the
`implementation of CoS, but also in planning the capacity
`required for each class, and balancing tradeoffs between cost
`and performance that might occur in choosing class alloca(cid:173)
`tions. For instance, one might have an application whose
`inclusion in a higher priority class is desirable but not cost
`effective (based on traffic volumes and pricing), and so some
`difficult choices must be made. Good data is required for
`these to be informed choices.
`In general, however, the required information is rarely
`up-to-date, or complete, if it is available at all. The traditional
`ad-hoc growth ofIP networks, the continuing rapid prolifera(cid:173)
`tion of new applications, the merger of companies with dif- 45
`ferent networks, and the relative ease with which almost any
`user can add a new application to the traffic mix with no
`centralized registration are all factors that contribute to this
`"knowledge gap". Furthermore, over recent years it has
`become harder to identify network applications within IP
`traffic. Traditional techniques such as port-based classifica(cid:173)
`tion of applications, for example, have become much less
`accurate.
`One approach that is commonly used for identifying appli(cid:173)
`cations on an IP network is to associate the observed traffic
`(using flow level data, or a packet sniffer) with an application
`based on TCP or UDP port numbers. Alas, this method is
`inadequate.
`The TCP/UDP port numbers are divided into three ranges:
`the Well Known Ports (0-1023), the Registered Ports (1024-
`49,151), and the Dynamic and/or Private ports (49,152-65,
`535). A typical TCP connection starts with a SYN/SYN(cid:173)
`ACK/ACK handshake from a client to a server. The client
`addresses its initial SYN packet to the well-known server port
`of a particular application. The client typically chooses the
`source port number of the packet dynamically. UDP uses
`ports similarly to TCP, though without connection semantics.
`
`

`

`US 7,660,248 B1
`
`5
`bandwidth links. Also this approach requires precise prior
`knowledge of applications and their packet formatsisome-
`thing that may not always be possible. Furthermore, the intro-
`duction of payload encryption is increasingly limiting our
`ability to see inside packets for this type of information.
`For the above reasons, a different approach is needed.
`In accord with the principles disclosed herein CoS map-
`ping is achieved using a statistical method. Advantageously,
`the disclosed method performs CoS mapping based on simply
`and easily determined attribute, or attributes of the traffic.
`Specifically, the disclosed method assigns traffic to classes
`based on selected attribute or attributes based on a mapping
`derived from a statistical analysis that forms a signature for
`traffic having particular values for those attributes.
`Thus, in accord with the principles disclosed herein, a
`three-stage process is undertaken, as depicted in FIG. 1; to
`wit,
`1. statistics collectioniblocks 10 and 20,
`2. classification and rule creationiblock 30, and
`3. application of rules to active trafficiblock 40.
`Block 10 obtains statistical information, in a training ses-
`sion, relative to selected features for each of a chosen set of
`classes by using training data that includes collections of
`traffic, where each collection clearly belongs to one of the
`chosen classes, and there is found a collection for each of the
`chosen set of classes. This may be termed statistical “fea-
`tures-class” mapping
`Specifically, first the classes of traffic are selected/identi-
`fied to which administrators of networks may wish to apply
`different QoS treatment, and traffic from a network having a
`well-established set of applications that belong to the identi-
`fied classes (training network) is employed to obtain a set of
`statistics for a chosen set of features. The notion here is that if
`
`it is concluded, from the data of the training network, that
`feature A of class x applications is characterized by a narrow
`range in the neighborhood of value Y, then, at a later time, if
`one encounters traffic in a target network where featureA has
`the value Y one may be able conclude with a high level of
`confidence that the traffic belongs to class x.
`With respect to class definitions, it makes sense to limit the
`set of selected classes to those for which corporate network
`administrators might wish to employ for service differentia-
`tion. It is noted that today’s corporate networks carry four
`broad application classes, which are described below, but it
`should be understood that additional, or other, classes can be
`selected. The four application classes are:
`Interactive: The interactive class contains traffic that is
`
`required by a user to perform multiple real-time interac-
`tions with a remote system. This class includes such
`applications as remote login sessions or an interactive:
`Web interface.
`Bulk data transfer: The bulk data transfer class contains
`
`traffic that is required to transfer large data volumes over
`the network without any real-time constraints. This class
`includes applications such as FTP, software updates, and
`music or video downloads.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`Streaming: The streaming class contains multimedia traffic
`with real-time constraints. This class includes such
`
`applications as streaming and video conferencing.
`Transactional. The transactional class contains traffic that
`
`60
`
`is used in a small number of request response pairs that
`can be combined to represent a transaction. DNS, and
`Oracle transactions belong to this class.
`In order to characterize each application class, it is clear
`that a reference data set is needed for each class. The problem
`is that one needs to identify the class before gathering the
`statistics for the chosen features can be extracted, but the
`
`65
`
`6
`features that ought to be chosen should be ones that charac-
`terize and disambiguate the classes. To break this circular
`dependency, in accord with the principles disclosed herein
`one or more specific “reference” applications are selected for
`each class that, based on their typical use, have a low likeli-
`hood of being contaminated by traffic belonging to another
`class. To select those applications, it makes sense to select
`applications that:
`are clearly within one class (to avoid mixing the statistics
`from two classes);
`are widely used, so as to assure we get a good data-set;
`have server ports in the well-known port range to reduce
`the chance of mis-usage of these ports.
`In a representative embodiment of the disclosed method,
`the reference applications selected for each application class
`are:
`
`Interactive. Telnet,
`Bulk data. FTP-data, Kazaa,
`Streaming: RealMedia streaming,
`Transactional. DNS, HTTPS.
`As indicated above, the statistical information that is gath-
`ered for each class pertains to the chosen set offeatures. As for
`the features that one might consider, it is realized the list of
`possible features is very large, that the actual selection is left
`to the practitioner. However, it is beneficial to note that one
`can broadly classify those features into categories:
`1. Simple packet-level features such as packet size and
`various moments thereof, such as variance, RMS (root mean
`square) size etc., are simple to compute, and can be gleaned
`directly from packet-level information. One advantage of
`such features is that they offer a characterization of the appli-
`cation that is independent of the notion of flows, connections
`or other higher-level aggregations. Another advantage of such
`features is that packet-level sampling is widely used in net-
`work data collection and has little impact on these statistics.
`Another set of statistics that can be derived from simple
`packet data are time series, from which one can derive a
`number of statistics; for instance, statistics relating to corre-
`lations over time (e.g., parameters of long-range dependence
`such as the Hurst parameter). An example of this type of
`classification can be seen in Z. Liu, et al., Profile-based traffic
`characterization of commercial web sites, Proceedings ofthe
`18th International Telelrafic Congress (ITC—lS), volume 5a,
`pages 231-240, Berlin, Germany, 2003, where the authors use
`time-of—day traffic profiles to categorize web sites.
`2. Flow-level statistics are summary statistics at the grain
`of network flows. A flow is defined to be a unidirectional
`
`sequence of packets that have some field values in common,
`typically, the 5-tuple (source IP, destination IP, source port,
`destination port, IP Protocol type). Example flow-level fea-
`tures include flow duration, data volume, number of packets,
`variance of these metrics etc. There are some more complex
`forms of information one can also glean from flows (or packet
`data) statistics; for instance, one may look at the proportion of
`internal versus external trafiic within a category%xtemal
`traffic (traffic to the Internet) may have a lower priority within
`a corporate setting. These statistics can be obtained using
`flow-level data collected at routers using, e.g., Cisco Net-
`Flow, described in White paperinetflow services and appli-
`cations, http://www.cisco.com/warp/public/cc/pd/iosw/ioft/
`neflct/tech/napps_wp.htm. These do not require the more
`resource-intensive process of finer grain packet-level traces.
`A limitation is, that flow-collection may sometimes aggregate
`packets that belong to multiple application-level connections
`into a single flow, which would distort the flow-level features.
`
`

`

`US 7,660,248 B1
`
`7
`3. Connection-level statistics are required to trace some
`interesting behavior associated with connection oriented
`transport-level connections such as TCP connections. A typi-
`cal TCP connection starts and ends with well-defined hand-
`
`shakes from a client to a server. The collection process needs
`to track the connection state in order to collect connection
`level statistics. In addition to the features mentioned for the
`
`flow-level, other features that are meaningful to compute at
`the TCP connection level are the amount of symmetry of a
`connection, advertised window sizes and throughput distri-
`bution. The connection-level data generally provides better
`quality data than the flow-level information, but requires
`additional overhead, and would also be impacted by sampling
`or asymmetric routing at the collection point.
`4. lntra-flow/connection features are features that are based
`
`on the notion of a flow or TCP connection, but require statis-
`tics about the packets within each flow. A simple example is
`the statistics of the inter-arrival times between packets in
`flows. This requires data collected at a packet level, but then
`grouped into flows. The relative variance ofthese inter-arrival
`times may be used as a measure of the burstiness of a traffic
`stream.
`lntraflow/connection features include loss rates,
`latencies etc.
`
`5. Multi-flow: Sometimes interesting characteristics can be
`captured only by considering statistic, across multiple flows/
`connections. For instance, many peer-to-peer applications
`achieve the download of a large file by bulk downloads of
`smaller chunks from multiple machinesithe individual
`chunk downloads are typically performed close together in
`time. For some multimedia streaming protocols, the high
`volume data connection is accompanied by a concurrent,
`separate connection between the same set of end-systems,
`containing low volume, intermittent control data (e.g., RTSP;
`see H. Schulzrinne, et al., Real time streaming protocol
`(RTSP), request for comments 2326, April 1998, ftp://ft-
`p.isi.edu/in-notes/rfc2326.txt). These multi flow features are
`more complex and computationally more expensive to cap-
`ture than flow or connection data alone.
`
`Turning attention to block 20 of FIG. 1, in accord with the
`principles disclosed herein statistical information is collected
`relative to traffic that is identified by one or more predeter-
`mined attributes. More specifically, block 20 obtains statisti-
`cal information, in an analysis session that employs traffic of
`the target network, relative to the same selected features that
`were analyzed in block 10, for one or more predetermined
`attributes that are associated with connections that are ana-
`
`lyzed in the analysis session. Block 20 yields a statistical
`features-signature of each ofthe analyzed values ofthe one or
`more predetermined attributes. That is, in connection with
`each value of any one of the predetermined attributes, statis-
`tical information is gathered regarding the aggregate traffic
`that is accumulated in the analysis session. For illustrative
`purposes, the traffic attributes that are considered herein are
`the server ports Pl. and the server IP address 11.. The traffic
`aggregates are the collections of traffic relative to a particular
`server port, or relative to a particular IP address.
`Thus, in accord with the principles of this disclosure, a
`vector of statistics SC(i) is formed for each connection i,
`where the elements of the vector are the chosen features, and
`used to update the statistics of each aggregate in which con-
`nection i is involved, for instance statistics SC(p) for port
`aggregates, and SI(ll.) for server aggregates. To illustrate for
`statistics collected on TCP connections, the procedure might
`as in the following pseudocode.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`foreach packet
`if(packet represents a new TCP connection)
`assign the connection index i++
`determine the aggregates for connection i
`server port 1’,- = dst port ofSYN
`server IP address 1,- = dst 1P ofSYN
`
`initialize a set of statistics SC(i)
`elseif(packet belongs to an existing TCP connection i)
`update connection statistics SC(i)
`elseif(packet represents end TCP connection i)
`update connection statistics SC(i)
`update statistics for each aggregate
`by server port: SC(Pl-)
`by server IP address: SI(11-)
`
`endif
`end foreach
`
`The update procedure for connections depends on the sta-
`tistic in question. Ideally, statistics should be chosen that can
`be updated on-line in a streaming fashion, i.e., recursively,
`because that would allow the method to not store data for each
`
`packet but, rather, per connection. For example it is desirable
`to employ an algorithm like
`
`SkC(i)(_/l()(ji(k)>skca¢(i))a
`
`(1)
`
`where X; (k) is the measurements for packet j, relative to
`statistic (feature) k, in connection i, SkC(i) is the kth statistic
`(feature) for connection i, and q)(i) is some (small) set of state
`information (e.g., the packet numberj) for connection i. With
`an update algorithm as specified by equation (1), the memory
`required to store the state depends on the number of connec-
`tions. The following gives a number of specific examples that
`comport with equation (1):
`1. Average:
`
`J' 7
`1
`7
`Xj+l = j+—1Xj+l + ij,
`
`2. Variance:
`
`JI—liz
`,- Wz
`j—1
`1
`vaan+l)=;Xj+l+—j var(Xj)+—j_1Xj——j X141,
`
`(2)
`
`(3)
`
`where X] and var(Xj) are the mean and variance, respectively,
`ofthe first j samples (e.g., packets) of data. However, even for
`more difficult statistics, such as quantiles, there are a number
`of approximation algorithms that can be used to approximate
`the statistic on-line. See A. C. Gilbert, et al., “Fast, small-
`space algorithms from approximate historgram mainte-
`nance.” STOC, 2002. Equations (2) and (3) use “X” without
`the index that designates the feature that is being measured,
`for sake of clarity ofthe equations, but that is implied. That is,
`variables X]. could represent packet size, or inter-arrival time,
`or other features.
`
`It is noted that some statistics need o

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket