`
`
`
`
`
`
`
`US 7,660,248 B1
`(10) Patent No.:
`a2) United States Patent
`
`
`
`
`
`
`
`Feb. 9, 2010
`(45) Date of Patent:
`Duffield et al.
`
`
`
`
`US007660248B1
`
`
`
`
`
`(54) STATISTICAL, SIGNATURE-BASED
`
`
`
`APPROACH TO IP TRAFFIC
`
`CLASSIFICATION
`
`
`
`(76)
`
`
`
`
`
`
`
`
`
`
`Inventors: Nicholas G. Duffield, 101 W. 12th St.,
`
`
`
`
`
`
`Apt. 7S, New York, NY (US) 10011;
`Matthew Roughan,15 Locust St.,
`
`
`
`
`.
`.
`enBeeRd., Apt H6
`
`
`
`
`
`
`Chatham, NJ (US) 07928; Oliver
`
`
`Rpatsehecks|cus)07896 Rd.,
`
`Ph
`
`
`
`
`
`
`Subject to any disclaimer, the term ofthis
`
`
`
`
`patent is extended or adjusted under 35
`
`
`
`
`U.S.C. 154(b) by 776 days.
`
`
`(*) Notice:
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`7,302,682 B2* 11/2007 Turkoglu oc TTA
`
`
`
`
`
`
`
`
`
`.........
`7,305,676 B1* 12/2007 Bolletal.
`... 718/107
`
`
`
`
`
`
`
`
`7,359,320 B2*
`4/2008 Klaghofer et al.
`... 370/230
`.
`
`
`
`
`
`
`
`733,943 B1* 10/2008 Ford.......
`... 709/223
`
`
`
`
`
`
`7,441,267 BL* 10/2008 Elliott oe 726/13
`
`
`OTHER PUBLICATIONS
`of Intemet Chat Traffic. Oct. 2003. P
`tal. An Analvsis
`D
`
`
`
`
`
`
`
`
`Oct.
`Pro-
`ysis of
`Traffic,
`Interne
`ewes, et al,
`at
`,
`
`
`
`
`
`
`
`
`ceedings ofACM SIGCOMMInternet Measurement Conference.
`
`
`
`
`
`
`
`
`
`* cited by examiner
`
`
`
`
`Primary Examiner—Pankaj Kumar
`
`
`
`
`Assistant Examiner—Mark Mais
`
`
`
`
`
`(74) Attorney, Agent, or Firm—Henry Brendzel
`
`
`ABSTRACT
`6)
`
`
`
`
`
`
`
`;
`A signature-basedtraffic classification method mapstraffic
`
`
`
`
`
`
`
`
`
`
`
`
`into preselected classes of service (CoS). By analyzing a
`(21) Appl. No.: 10/764,001
`
`
`
`
`
`
`
`
`
`
`
`
`known corpusofdata that clearly belongsto identified ones of
`Filed:
`Jan. 23. 2004
`(22)
`
`
`
`
`
`
`
`
`
`the preselected classes of service, in a training session the
`o_o
`,
`
`
`
`
`
`
`
`
`
`method developsstatistics about a chosenset oftraffic fea-
`Int. Cl
`(51)
`
`
`
`
`
`
`
`
`(2006.01)
`tures. In an analysis session, relative to traffic ofthe network
`HodTE 16
`
`
`
`
`
`
`
`
`
`
`
`
`where QoS treatments are desired (target network),
`370/230 1: 370/229: 370/232:
`(52) US.CI
`the
`method obtains statistical information relative to the same
`es 370123. 370/738- 370/259
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`58) FieldofClassification S h ; ;370/209 chosen set of features for values of one or more predeter-
`
`
`
`
`
`o
`370/pen 533 734.935.2351.237
`minedtraffic attributes that are associated with connections
`Field
`(58)
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`370/238 >A] 349 344 945 30 360, 753,
`that are analyzed in the analysis session, yieldinga statistical
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`?
`?
`?
`°
`? 370173 i 35
`features signature of each of the values of the one or more
`
`
`
`
`
`
`
`
`
`
`
`file f
`bhi
`attributes. A classification process then establishes a mapping
`lication
`1
`?
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`See applicationfile for complete search
`history.
`between values of the one or more predetermined traffic
`
`
`
`
`
`
`
`
`
`References Cited
`attributes andthe preselected classes of service, leading to the
`
`
`
`
`establishment of QoS treatmentrules.
`U.S. PATENT DOCUMENTS
`
`
`
`
`
`
`7/2007 Jorgensen ............ 370/235
`
`
`
`
`
`
`
`(56)
`
`
`7,251,218 B2*
`
`
`
`
`
`
`1 Claim, 1 Drawing Sheet
`
`
`
`TRAINING SESSION
`
`
`
`
`(ON TRAINING NETWORK)
`
`
`
`ANALYSIS. SESSION
`
`
`
`
`(ON TARGET NETWORK)
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`STATISTICAL FEATURES SIGNATURE
`
`OF EACH VALUE OF THE ONE ORE
`
`
`
`
`
`MORE ATTRIBUTES.
`
`
`
`
`
`
`
`OBTAIN STATISTICAL INFORMATION:
`
`
`
`
`
`RELATIVE TO SELECTED FEATURES FOR
`
`
`
`
`
`EACH OF A CHOSEN SET OF CLASSES
`
`
`
`
`
`
`
`STATISTICAL
`
`
`
`
`"FEATURES-CLASS”
`
`MAPPING
`
`
`
`
`
`OBTAIN STATISTICAL INFORMATION
`
`
`
`
`RELATIVE TO THE SAME SELECTED
`
`
`
`
`
`
`FEATURES, FOR VALUES OF ONE OR
`
`
`
`
`
`MORE CONNECTION ATTRIBUTES
`
`
`
`
`
`ESTABLISH A CLASSIFICATION:
`
`
`MAPPING EACH OF THE VALUES OF THE
`
`
`
`
`
`ONE OR MORE ATTRIBUTES HAVING A
`
`
`
`
`
`
`FEATURES SIGNATURE
`INTO A CLASS
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`ASSIGN PACKETS ARRIVING AT THE
`
`
`
`TARGET NETWORK 10 A CLASS BASED
`
`
`
`
`
`ON THE ESTABLISHED CLASSIFICATION
`
`
`
`
`APPLY QoS BASED ON THE ASSIGNED CLASS
`
`
`
`
`
`
`
`
`
`
`
`
`
`Splunk Inc.
`
`Exhibit1021
`
`Page 1
`
`Splunk Inc. Exhibit 1021 Page 1
`
`
`
`
`U.S. Patent
`
`
`
`
`Feb. 9, 2010
`
`
`
`
`
`US 7,660,248 B1
`
`
`FIC.
`1
`
`
`
`TRAINING SESSION
`
`
`(ON TRAINING NETWORK)
`
`
`
`
`
`
`OBTAIN STATISTICAL INFORMATION:
`
`
`
`
`
`RELATIVE TO SELECTED FEATURES FOR
`
`
`
`
`
`EACH OF A CHOSEN SET OF CLASSES
`
`
`STATISTICAL
`"FEATURES-CLASS”
`
`MAPPING
`
`
`
`
`
`ANALYSIS SESSION
`
`
`(ON TARGET NETWORK)
`
`
`
`
`
`
`OBTAIN STATISTICAL INFORMATION
`RELATIVE TO THE SAME SELECTED
`
`
`
`
`
`FEATURES, FOR VALUES OF ONE OR
`
`
`
`
`
`
`
`MORE CONNECTION ATTRIBUTES
`
`10
`
`
`
`0
`
`
`
`
`
`
`
`
`
`
`
`OF EACH VALUE OF THE ONE ORE
`
`
`MORE ATTRIBUTES.
`
`
`
`
` STATISTICAL FEATURES SIGNATURE
`
`
`
`ESTABLISH A CLASSIFICATION:
`
`
`
`
`MAPPING EACH OF THE VALUES OF THE
`
`
`
`
`ONE OR MORE ATTRIBUTES HAVING A
`
`
`
`
`FEATURES SIGNATURE
`INTO A CLASS
`
`
`
`
`
`
`
`ASSIGN PACKETS ARRIVING AT THE
`
`
`
`
`TARGET NETWORK TO A CLASS BASED
`
`
`
`
`ON THE ESTABLISHED CLASSIFICATION
`
`
`
`
`
`
`APPLY QoS BASED ON THE ASSIGNED CLASS
`
`
`
`
`
`30
`
`40
`
`
`
`Splunk Inc.
`
`Exhibit1021
`
`Page 2
`
`Splunk Inc. Exhibit 1021 Page 2
`
`
`
`
`
`US 7,660,248 B1
`
`
`1
`
`STATISTICAL, SIGNATURE-BASED
`APPROACH TO IP TRAFFIC
`
`
`
`CLASSIFICATION
`
`
`
`
`BACKGROUND OF THE INVENTION
`
`
`
`
`
`
`2
`
`
`
`
`
`
`
`“Wide-area traffic: The failure of Poisson modeling,” JEEE/
`
`
`
`
`
`
`
`ACM Transactions on Networking,vol. 3, pp. 226-244, June
`
`
`
`
`
`
`
`
`1995, for example, found that user initiated events—such as
`
`
`
`
`
`
`
`telnet packets within flows or FTP-data connection arrivals—
`
`
`
`
`
`
`
`can be described well by a Poisson process, whereas other
`
`
`
`
`
`
`connectionarrivals deviate considerably from Poisson.
`
`
`
`
`
`
`Signature-based detection techniques have also been
`
`
`
`
`
`
`
`
`explored in the context of network security, attack and
`
`
`
`
`
`
`anomaly detection; e.g. P. Barford et al., Characteristics of
`
`
`
`
`
`
`Network Traffic Flow Anomalies, Proceedings ofACM SIG-
`
`
`
`
`
`
`
`COMMInternet Measurement Workshop, October 2001; and
`
`
`
`
`
`
`
`
`
`P. Barford, et al., A Signal Analysis of Network Traffic
`
`
`
`
`
`
`Anomalies, Proceedings ofACM SIGCOMM Internet Mea-
`
`
`
`
`
`
`
`surement Workshop, November 2002, where one typically
`
`
`
`
`
`seeks to find a signature for an attack.
`
`
`
`
`
`Actually, realization of a service differentiation capability
`
`
`
`
`
`
`
`
`requires (1) association ofthe traffic with the different appli-
`
`
`
`
`
`
`cations, (11) determination of the QoSto be provided to each,
`
`
`
`
`
`
`
`
`and finally, (111) mechanisms in the underlying network for
`
`
`
`
`
`
`
`
`
`providing the QoS; 1.e., for controlling the traffic to achieve a
`
`
`
`particular quality of service.
`While some of the above-mentioned studies assume that
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`one can identify the application traffic unambiguously and
`
`
`
`
`
`
`
`
`
`then obtain statistics for that application, none of them have
`
`
`
`
`
`
`
`
`considered the dual problem of inferring the application from
`
`
`
`
`
`
`
`
`
`thetraffic statistics. This type of approach has been suggested
`
`
`
`
`
`
`
`
`in very limited contexts such as identifying chattraffic in C.
`
`
`
`
`
`
`
`Dewes, et al., An analysis of Internet chat systems, Proceed-
`
`
`
`
`
`
`ings ofACM SIGCOMM Internet Measurement Conference,
`October 2003.
`
`
`
`
`
`
`
`
`
`
`
`
`Still, in spite of a clear perceived need, and the prior art
`
`
`
`
`
`
`
`workreported above, widespread adoption of QoScontrol of
`
`
`
`
`
`
`
`
`
`
`traffic has not cometo pass. It is believed that the primary
`
`
`
`
`
`
`
`
`reason for the slow spread of QoS-use is the absence of
`
`
`
`
`
`
`
`
`suitable mapping techniques that can aid operatorsin classi-
`
`
`
`
`
`
`
`
`
`fying the network traffic mix among the different QoS
`
`
`
`
`
`
`
`
`classes. We refer to this as the Class of Service (CoS) mapping
`
`
`
`
`
`
`
`
`
`problem, and perceive that solving this would go a long way
`
`
`
`
`
`
`
`in making the use of QoS more accessible to operators.
`
`
`
`
`
`
`
`
`This invention relates to traffic classification and, more
`
`
`
`
`particularly to statistical classification of IP traffic.
`
`
`
`
`
`
`
`
`The past few years have witnessed a dramatic increase in
`
`
`
`
`
`
`
`
`
`the numberandvariety of applications running overthe Inter-
`
`
`
`
`
`
`
`
`net and over enterprise IP networks. The spectrum includes
`
`
`
`
`
`
`
`
`interactive (e.g., telnet, instant messaging, games, etc.), bulk
`
`
`
`
`
`
`
`
`
`data transfer (e.g., ftp, P2P file downloads), corporate; (e.g.,
`
`
`
`
`
`
`
`Lotus Notes, database transactions), and real-time applica-
`
`
`
`
`
`
`
`
`tions (voice, video streaming,etc.), to namejust a few.
`
`
`
`
`
`
`Network operators, particularly in enterprise networks,
`
`
`
`
`
`
`
`
`
`desire the ability to support different levels of Quality of
`
`
`
`
`
`
`
`Service (QoS)for different types of applications. This desire
`
`
`
`
`
`
`
`is driven by (i) the inherently different QoS requirements of
`
`
`
`
`
`
`
`
`different types of applications, e.g., low end-end delay for
`
`
`
`
`
`
`
`interactive applications, high throughput for file transfer
`
`
`
`
`
`
`
`
`applicationsetc.; (11) the different relative importance of dif-
`
`
`
`
`
`
`ferent applications to the enterprise—e.g., Oracle database
`
`
`
`
`
`
`
`
`transactions are consideredcritical and therefore high prior-
`
`
`
`
`
`
`
`
`
`ity, while traffic associated with browsing external websites
`
`
`
`
`
`
`
`
`
`
`is generally less important; and(iii) the desire to optimize the
`
`
`
`
`
`
`
`usage of their existing network infrastructures under finite
`
`
`
`
`
`
`
`
`capacity and cost constraints, while ensuring good perfor-
`
`
`
`mance for important applications.
`
`
`
`
`
`
`
`Various approaches have been studied, and mechanisms
`
`
`
`
`
`
`
`
`developed for providing different QoS in a network. See, for
`
`
`
`
`
`
`
`
`example, S. Blake, et al., RFC 2475—anarchitecture for
`
`
`
`
`
`differentiated service, December 1998, http://ww.faqs.org/
`
`
`
`
`
`
`rfes/rfc2475 html; and C. Gbhaguidi, et al., A survey of differ-
`
`
`
`
`
`
`
`
`entiated services architectures for the Internet, March 1998,
`
`
`http://ssewww.epfi.ch/Pages/publications/ps_files/tr98__
`
`
`
`
`
`
`020.ps; and Y. Bernet, et al., A framework for differentiated
`
`
`
`
`services.
`Internet Draft
`(draft-ietf-diffserv-framework-
`
`
`
`
`02.txt), February 1999, http://search.ietf.org/internet-drafts/
`draft-ietf-diffserv-framework-02.txt.
`
`Previous work also has examined the variation of flow
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`characteristics according to applications. M. Allman,et al.,
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`An advancein the art of providing specified QoS in an IP
`TCP congestion control, IETF Network Working Group RFC
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`network is achieved with a signature-based traffic classifica-
`2581, 1999, investigated the joint distribution of flow dura-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`tion method that maps traffic into preselected classes of ser-
`tion and numberofpackets, and its variation with flow param-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`vice (CoS). By analyzing, in a training session, a known
`eters such as inter-packet timeout. Differences were observed
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`corpus of data that clearly belongs to identified ones of the
`between the distributions of some application protocols,
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`preselected classes of service, the method develops statistics
`although overlap was clearly also present between some
`
`
`
`
`
`
`
`
`
`
`
`
`
`about a chosensetoftraffic features. In an analysis session,
`applications. Most notably, the distribution of DNStransac-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`relative to traffic of the network where QoS treatments are
`tions had almost no overlap with that of other applications
`
`
`
`
`
`
`
`
`
`
`
`
`
`desired (target network), obtaining statistical
`information
`considered. However, the use of such distributions as a dis-
`
`
`
`
`
`
`
`
`relative to the same chosenset of features for values of one or
`
`
`
`
`
`
`
`
`criminator between different application types was not con-
`
`
`
`
`
`
`
`
`sidered.
`
`more predeterminedtraffic attributes that are associated with
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`connectionsthat are analyzed in the analysis session, yielding
`There also exists a wealth ofresearch on characterizing and
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`a statistical features signature of each ofthe values of the one
`modeling workloads for particular applications, with A.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`or more attributes. A classification process then establishes a
`Krishnamurth, et al., Web Protocols and Practice, Chapter
`
`
`
`
`
`
`
`
`
`
`
`
`mapping between values of the one or more predetermined
`10, Web Workload Characterization, Addison-Wesley, 2001;
`
`
`
`
`
`
`
`
`
`
`
`
`
`traffic attributes and the preselected classes of service, lead-
`and J. E. Pitkow, Summary ofWWWcharacterizations, W3/,
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`ing to the establishment of rules. Once the rules are estab-
`2:3-13, 1999 being but two examples of such research.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`lished, traffic that is associated with particular values of the
`An early work in this space, reported in V. Paxson,
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`predeterminedtraffic attributes are mappedto classes of ser-
`“Empirically derived analytic models of wide-area TCP con-
`
`
`
`
`
`
`
`
`
`
`
`vice, which leads to a designation of QoS.
`nections,” IEEE/ACM Transactions on Networking, vol. 2,
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Illustratively, the preselected classes of service may be
`no. 4, pp. 316-336, 1994, examinesthe distributions of flow
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`interactive traffic, bulk data transfertraffic, streamingtraffic
`bytes and packets for a numberof different applications.
`
`
`
`
`
`
`
`and transactional traffic. The chosen set of traffic features
`
`
`
`
`
`
`
`
`
`Interflow and intraflow statistics are another possible
`
`
`
`
`
`
`
`
`
`
`
`
`
`maybe packet-level features, flow-level features, connection-
`dimension along which application types may be distin-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`level features, intra-flow/connection features, and multi-flow
`guished and research has been conducted. V. Paxson,et al.,
`
`Splunk Inc.—Exhibit 1021 Page 3
`
`SUMMARY
`
`
`
`
`
`
`
`20
`
`25
`
`
`
`30
`
`
`
`35
`
`
`
`40
`
`
`
`45
`
`
`
`50
`
`
`
`55
`
`
`
`60
`
`
`
`65
`
`
`
`
`
`Splunk Inc. Exhibit 1021 Page 3
`
`
`
`
`
`US 7,660,248 B1
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`
`
`
`
`
`
`
`
`
`
`FIG. 1 presents a flow chart of the IP traffic classification
`method disclosed herein.
`
`
`
`
`
`
`DETAILED DESCRIPTION
`
`
`
`
`
`3
`
`
`
`
`
`
`
`features. The predetermined traffic attributes may be the
`
`
`
`
`
`
`
`
`server port, and the server IP address. An illustrative rule
`
`
`
`
`
`
`
`
`mightstate that “a connection that specifies port x belongs to
`
`
`
`
`
`
`the class of interactive traffic.” An administrator of the target
`
`
`
`
`
`
`
`
`
`network may choose to give the highest QoS level to such
`traffic.
`
`
`
`4
`
`
`
`
`
`
`
`
`All future packets of a session, in either a TCP or UDP
`
`
`
`
`
`
`
`
`
`
`session, use the samepair of ports to identify the client and
`
`
`
`
`
`
`
`server side of the session. Therefore, in principle, the TCP or
`
`
`
`
`
`
`
`
`
`UDPserver port number can be usedto identify the higher
`
`
`
`
`
`
`layer application by simply identifying in an incoming packet
`
`
`
`
`
`
`
`
`
`the server port and mappingthis port to an application using
`
`
`
`
`
`
`
`
`the IANA(Internet Assigned Numbers Authority) list of reg-
`
`
`
`istered ports
`(http://www.iana.org/assignments/port-num-
`
`
`
`
`
`
`bers). However, port-based application classification has
`
`
`
`
`
`
`
`limitations. First, the mapping from ports to applications is
`
`
`
`
`
`
`not always well defined. For instance.
`
`
`
`
`
`
`Many implementations of TCP use client ports in the reg-
`
`
`
`
`
`
`
`istered port range. This might mistakenly classify the
`
`
`
`
`connection as belonging to the application associated
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`In accord with the principles disclosed herein QoS imple-
`with this port. Similarly, some applications (e.g., old
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`mentations are based on mappingoftraffic into classes of
`bind versions), use port numbers from the well-known
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`service. In principle the division oftraffic into CoS could be
`ports to identify the client site of a session.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`done by end-points of the network, where traffic actually
`Ports are not defined with IANAfor all applications, e.g.,
`
`
`
`
`
`
`
`
`
`
`
`originates—for instance by end-user applications. However,
`P2P applications such as Napster and Kazaa.
`
`
`
`
`
`
`
`
`20
`
`
`
`
`
`
`
`
`
`for reasonsoftrust and scalability of administration and man-
`An application may use ports other than its well-known
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`agement, it is typically more practical to perform the CoS
`ports to circumvent operating system access control
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`mapping within the network; for instance, at the router that
`restrictions. E.g., non-privileged users often run WWW
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`connects the Local Area Network (LAN) to the Wide Area
`servers on ports other than port 80, whichis restricted to
`
`
`
`
`
`
`
`
`
`
`
`Network (WAN). Alternatively, there might be appliances
`privileged users on most operating systems.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`connected near the LAN to WANtransition point that can
`There are some ambiguities in the port registrations, e.g.,
`
`
`
`
`
`
`
`
`
`
`
`
`
`perform packet marking for QoS.
`port 888 is used for CDDBP (CD Database Protocol)
`
`
`
`
`
`
`
`
`and access-builder.
`
`
`CoS mappinginside the networkis a non-trivial task. Ide-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`ally, a network system administrator would possess precise
`In some cases server ports are dynamically allocated as
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`information on the applications running inside the adminis-
`needed. For example, FTP allows the dynamic negotia-
`
`
`
`
`
`
`
`
`30
`
`
`
`
`
`
`
`
`
`trator’s network, along with simple and unambiguous map-
`tion of the server port used for the data transfer. This
`
`
`
`
`
`
`
`
`
`
`
`
`
`pings, which information is based on easily obtainedtraffic
`server port is negotiated on an initial TCP connection,
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`measurements(e.g., by port numbers, or source and destina-
`whichis established using the well-known FTP control
`
`
`
`
`
`
`
`
`
`
`tion IP addresses). This information is vital not just for the
`port.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`implementation of CoS, but also in planning the capacity
`Theuse oftraffic control techniqueslike firewalls to block
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`required for each class, and balancing tradeoffs between cost
`unauthorized, and/or unknown applications from using a net-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`and performance that might occur in choosing class alloca-
`work has spawned many work-arounds which make port
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`tions. For instance, one might have an application whose
`based application authentication harder. For example, port 80
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`inclusion in a higher priority class is desirable but not cost
`is being used by a variety of non-web applications to circum-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`effective (based ontraffic volumesand pricing), and so some
`vent firewalls which donotfilter port-80 traffic. In fact, avail-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`difficult choices must be made. Good data is required for
`able implementations of IP over HTTPallow the tunneling of
`
`
`
`
`
`these to be informed choices.
`
`
`
`all applications through TCP port 80.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`In general, however, the required information is rarely
`Trojans and other security attacks generate a large volume
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`up-to-date, or complete,if it is available at all. The traditional
`of bogustraffic which should not be associated with the
`
`
`
`
`
`
`
`
`
`
`
`
`
`ad-hoc growth of IP networks, the continuing rapid prolifera-
`applications of the port numbers those attacks use.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`tion of new applications, the merger of companies with dif-
`A secondlimitation of port-numberbasedclassification is
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`ferent networks, and the relative ease with which almost any
`that a port can be used by a single application to transmit
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`user can add a new application to the traffic mix with no
`traffic with different QoS requirements. For example, (i)
`
`
`
`
`
`
`
`Lotus Notes transmits both email and database transaction
`
`
`
`
`
`
`
`
`centralized registration are all factors that contribute to this
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`“knowledge gap”. Furthermore, over recent years it has
`traffic over the sameports, (11) sep (secure copy), a file trans-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`become harder to identify network applications within IP
`fer protocol, runs overssh (secure shell), an interactive appli-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`traffic. Traditional techniques such as port-based classifica-
`cation using default TCPport 22. This use ofthe same port for
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`tion of applications, for example, have become muchless
`traffic requiring different QoS requirements is quite legiti-
`accurate.
`
`
`
`
`
`
`
`
`
`mate, and yet a good classification must separate different use
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`One approachthat is commonly usedfor identifying appli-
`cases for the sameapplication. A clean QoS implementation
`
`
`
`
`
`
`
`cations on an IP network is to associate the observedtraffic
`
`
`
`
`
`
`
`is still possible through augmentingthe classification rules to
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`(using flow level data, or a packet sniffer) with an application
`include IP address-based disambiguation. Serverlists exist in
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`based on TCP or UDP port numbers. Alas, this method is
`some networks but, again, in practice these lists are often
`
`
`
`
`
`
`
`
`inadequate.
`incomplete, or a single server could be used to support a
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`variety of different types of traffic, so we must combine port
`The TCP/UDPport numbers are divided into three ranges:
`
`
`
`
`
`
`
`
`
`and IP address rules.
`
`
`
`the Well Known Ports (0-1023), the Registered Ports (1024-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`A possible alternative to port basedclassification is to use
`49,151), and the Dynamic and/or Private ports (49,152-65,
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`a painstaking process involving installation ofpacket sniffers
`535). A typical TCP connection starts with a SYN/SYN-
`
`
`
`
`
`
`
`ACK/ACK handshake from a client to a server. The client
`
`
`
`
`
`
`
`and parsing packets for application-level information to iden-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`tify the application class of each individual TCP connection
`addresses its initial SYN packet to the well-knownserver port
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`or UDPsession. However, this approach cannot be used with
`of a particular application. The client typically chooses the
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`more easily collected flow level data, and its collection is
`source port number of the packet dynamically. UDP uses
`
`
`
`
`
`
`
`
`
`
`
`
`
`computationally expensive, limiting its application to lower
`ports similarly to TCP, though without connection semantics.
`
`Splunk Inc.—Exhibit 1021 Page 4
`
`
`
`
`
`
`
`
`
`
`
`
`
`25
`
`
`
`
`
`35
`
`
`
`40
`
`
`
`45
`
`
`
`50
`
`
`
`55
`
`
`
`60
`
`
`
`65
`
`
`
`Splunk Inc. Exhibit 1021 Page 4
`
`
`
`
`
`
`
`
`
`
`
`6
`
`
`
`
`
`
`
`
`features that ought to be chosen should be ones that charac-
`
`
`
`
`
`
`
`
`
`terize and disambiguate the classes. To break this circular
`
`
`
`
`
`
`
`dependency, in accord with the principles disclosed herein
`
`
`
`
`
`
`
`
`one or morespecific “reference” applications are selected for
`
`
`
`
`
`
`
`
`
`
`each class that, based on their typical use, have a low likeli-
`
`
`
`
`
`
`hood of being contaminated by traffic belonging to another
`
`
`
`
`
`
`
`class. To select those applications, it makes sense to select
`
`
`applicationsthat:
`
`
`
`
`
`
`
`
`
`are clearly within one class (to avoid mixingthestatistics
`
`
`
`from twoclasses);
`
`
`
`
`
`
`are widely used, so as to assure we get a good data-set;
`
`
`
`
`
`
`
`have server ports in the well-known port range to reduce
`
`
`
`
`
`the chance of mis-usage ofthese ports.
`
`
`
`
`In a representative embodimentof the disclosed method,
`
`
`
`
`
`
`
`the reference applications selected for each application class
`are:
`
`
`
`5
`
`
`
`
`
`
`
`
`bandwidth links. Also this approach requires precise prior
`
`
`
`
`
`
`knowledge of applications and their packet formats—some-
`
`
`
`
`
`
`
`
`thing that may not alwaysbe possible. Furthermore, the intro-
`
`
`
`
`
`
`duction of payload encryption is increasingly limiting our
`
`
`
`
`
`
`
`
`ability to see inside packets for this type of information.
`
`
`
`
`
`
`
`For the above reasons, a different approach is needed.
`
`
`
`
`
`
`
`
`
`In accord with the principles disclosed herein CoS map-
`
`
`
`
`
`
`ping is achieved usinga statistical method. Advantageously,
`
`
`
`
`
`
`
`
`the disclosed method performs CoS mapping based on simply
`
`
`
`
`
`
`
`
`and easily determined attribute, or attributes of the traffic.
`
`
`
`
`
`
`
`Specifically, the disclosed methodassignstraffic to classes
`
`
`
`
`
`
`based on selected attribute or attributes based on a mapping
`
`
`
`
`
`
`
`
`derived from a statistical analysis that forms a signature for
`
`
`
`
`
`
`
`traffic having particular values for those attributes.
`
`
`
`
`
`
`
`Thus, in accord with the principles disclosed herein, a
`
`
`
`
`
`three-stage process is undertaken, as depicted in FIG. 1; to
`
`wit,
`
`
`
`
`
`
`
`
`Interactive. Telnet,
`1. statistics collection—blocks 10 and 20,
`
`
`
`
`
`
`
`
`
`
`
`
`Bulk data. FTP-data, Kazaa,
`2. classification and rule creation—block 30, and
`
`
`
`
`
`
`
`
`
`3. application of rules to active traffic—block 40.
`Streaming: RealMedia streaming,
`
`
`
`
`
`
`
`
`
`Block 10 obtains statistical information,in a training ses-
`Transactional. DNS, HTTPS.
`
`
`
`
`
`
`
`
`sion, relative to selected features for each of a chosenset of
`
`
`
`
`
`
`
`
`Asindicated above, the statistical information that is gath-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`classes by using training data that includes collections of
`ered for eachclass pertainsto the chosenset offeatures. As for
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`traffic, where each collection clearly belongs to one of the
`the features that one might consider, it is realized the list of
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`chosen classes, and there is found a collection for each of the
`possible features is very large, that the actual selection is left
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`chosen set of classes. This may be termedstatistical “fea-
`to the practitioner. However, it is beneficial to note that one
`
`
`
`
`
`
`
`
`
`tures-class” mapping
`can broadly classify those features into categories:
`
`
`
`
`
`
`
`Specifically, first the classes oftraffic are selected/identi-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`1. Simple packet-level features such as packet size and
`fied to which administrators of networks may wish to apply
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`various moments thereof, such as variance, RMS(root mean
`different QoS treatment, andtraffic from a network having a
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`square) size etc., are simple to compute, and can be gleaned
`well-established set of applications that belong to the identi-
`
`
`
`
`
`
`
`
`
`
`
`
`
`directly from packet-level information. One advantage of
`fied classes (training network) is employed to obtain a set of
`
`
`
`
`
`
`
`such features is that they offer a characterization of the appli-
`statistics for a chosen set of features. The notionhereis thatif
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`cation that is independentofthe notion of flows, connections
`
`
`
`
`
`
`
`
`
`
`it is concluded, from the data of the training network, that
`
`
`
`
`
`
`
`
`
`
`
`
`or other higher-level aggregations. Another advantage of such
`feature A of class x applications is characterized by a narrow
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`features is that packet-level sampling is widely used in net-
`range in the neighborhoodof value Y, then, at a later time,if
`
`
`
`
`
`
`
`
`
`workdata collection and has little impact on thesestatistics.
`
`
`
`
`
`
`
`one encounterstraffic in a target network where feature A has
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Anotherset of statistics that can be derived from simple
`the value Y one may be able conclude with a high level of
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`packet data are time series, from which one can derive a
`confidencethat the traffic belongsto class x.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`numberofstatistics; for instance,statistics relating to corre-
`With respectto class definitions, it makes sense to limit the
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`lations over time (e.g., parameters of long-range dependence
`set of selected classes to those for which corporate network
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`such as the Hurst parameter). An example of this type of
`administrators might wish to employ for service differentia-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`classification can be seenin Z. Liu,et al., Profile-basedtraffic
`tion. It is noted that today’s corporate networks carry four
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`characterization of commercial web sites, Proceedings ofthe
`broad application classes, which are described below, butit
`
`
`
`
`
`
`18” International Teletraffic Congress (ITC-18), volume 5a,
`
`
`
`
`
`
`
`should be understoodthat additional, or other, classes can be
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`pages 231-240, Berlin, Germany, 2003, where the authors use
`selected. The four application classes are:
`
`
`
`
`
`
`Interactive: The interactive class contains traffic that is
`
`
`
`
`
`
`
`time-of-daytraffic profiles to categorize websites.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`2. Flow-levelstatistics are summary statistics at the grain
`required by a user to perform multiple real-time interac-
`
`
`
`
`
`
`
`
`of network flows. A flow is defined to be a unidirectional
`
`
`
`
`
`
`tions with a remote system. This class includes such
`
`
`
`
`
`
`
`
`
`
`
`
`
`sequence of packets that have somefield values in common,
`applications as remote login sessions or an interactive:
`
`
`
`
`
`
`
`
`Webinterface.
`
`
`typically, the 5-tuple (source IP, destination IP, source port,
`
`
`
`
`
`
`
`Bulk data transfer: The bulk data transfer class contains
`
`
`
`
`
`
`
`
`
`destination port, IP Protocol type). Example flow-level fea-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`tures include flow duration, data volume, numberofpackets,
`traffic that is required to transfer large data volumes over
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`variance of these metrics etc. There are some more complex
`the network without any real-time constraints. This class
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`forms of information one can also glean from flows (or packet
`includesapplications such as F'TP, software updates, and
`
`
`
`
`
`
`
`
`
`music or video downloads.
`
`
`
`data) statistics; for instance, one maylook atthe proportion of
`
`
`
`
`
`
`
`
`
`
`
`
`
`internal versus external traffic within a category—external
`Streaming: The streaming class contains multimediatraffic
`
`
`
`
`
`
`
`
`
`with real-time constraints. This class includes such
`
`
`
`
`
`
`
`traffic (traffic to the Internet) may have a lowerpriority within
`
`
`
`
`
`
`
`
`
`
`
`
`
`a corporate setting. These statistics can be obtained using
`applications as streaming and video conferencing.
`
`
`
`
`
`
`
`
`Transactional. The transactional class containstraffic that
`
`
`
`
`
`
`
`flow-level data collected at routers using, e.g.,