`
`(12) United States Patent
`Olesinski et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 7,782,793 B2
`Aug. 24, 2010
`
`(54) STATISTICAL TRACE-BASED METHODS
`FOR REAL-TIME TRAFFIC
`CLASSIFICATION
`
`(75) Inventors: Wladyslaw Olesinski, Kanata (CA);
`Peter Rabinovitch, Kanata (CA)
`(73) Assignee: Alcatel Lucent, Paris (FR)
`c (*) Notice:
`
`Subj ect to any site the still
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 899 days.
`(21) Appl. No.: 11/226,328
`
`(22) Filed:
`
`Sep.15, 2005
`
`(65)
`
`Prior Publication Data
`US 2007/0076606A1
`Apr. 5, 2007
`
`(51) Int. Cl.
`(2006.01)
`H04L 2/26
`(2006.01)
`H04L 2/56
`(2006.01)
`H04L 2/28
`(52) U.S. Cl. ................... 370/253; 370/391; 370/395.43
`(58) Field of Classification Search .................. 370/252
`See application file for complete search history.
`
`(56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`6,873,600 B1* 3/2005 Duffield et al. ............. 370,252
`7,080,136 B2 * 7/2006 Duffield et al. ............. 709,223
`7.286,535 B2 * 10/2007 Ishikawa et al. ............ 370,392
`7.313,100 B1* 12, 2007 Turner et all
`370,253
`7376,085 B2 *
`5, 2008 Yazaki al - - - - - - - - - - - - - - - - 370,235
`- w
`aZaki et al. ...............
`7,376,731 B2 * 5/2008 Khan et al. ................. TO9,224
`2003/0012197 A1
`1/2003 Yazaki et al. ............... 370,392
`2007/02145.04 A1* 9, 2007 Comparetti etal
`T26/23
`p
`-
`
`- - - - - - - - - - -
`
`FOREIGN PATENT DOCUMENTS
`
`OTHER PUBLICATIONS
`
`Sun et al., “Statistical Identification of Encrypted Web Browsing
`Traffic''. Microsoft Research for Proc. IEEE Symposium on Security
`and Privacy, IEEE, May 2002.*
`Zhang et al., “Detecting Backdoors'. Proceedings of the 9th
`USENIX Security Symposium Denver, Colorado, Aug. 2000, p.
`1-11.
`M. Roughan, S. Sen. O. Spatscheck, N. Duffield, “Class-of-service
`mapping for QoS: a statistical signature-based approach to IP traffic
`classification', 2004, pp. 135-148.
`
`(Continued)
`Primary Examiner Daniel J. Ryman
`Assistant Examiner Cassandra Decker
`(74) Attorney, Agent, or Firm Kramer & Amado P.C.
`
`(57)
`
`ABSTRACT
`
`Apparatus and methods for real-time traffic classification
`based on off-line determined traffic classification rules are
`provided. Traces of real traffic are obtained and subjected to
`statistical analysis. The statistical analysis identifies the mul
`tidimensional domain space of characteristic traffic param
`eters. Classification rules associated with the identified
`domains are derived and provided to traffic classification
`points for real-time traffic classification. Traffic classification
`points, typically edge network nodes, sample packets in
`aggregate streams with a predetermined probability. Statisti
`cal information regarding the sampled flows is tracked in a
`- 0
`table, the number of time a flow was sampled providing a
`-
`s
`probabilistic measure of the flows duration before the flow
`terminates. The table entries, which predominantly track high
`bandwidth flows, are subjected to the classification rules for
`real-time classification of the sampled flows. Optionally,
`les includ
`be takeni
`ff
`havi
`US 1C u ean action to be taken in respect of flows aying
`characteristics matching thereof. Advantages are derived
`from low overhead on-line real-time classification of high
`bandwidth flows at low overheads before flow termination.
`
`WO
`
`WO96,3895.5
`
`12/1996
`
`32 Claims, 2 Drawing Sheets
`
`
`
`
`
`
`
`
`
`
`
`- -
`Notifications
`Flow info. Requests -
`Flow liforation
`
`outgoing
`Contralia
`Sampled Flow
`
`f
`
`Ya
`
`y
`
`,
`w
`
`f
`f
`f
`classification Rue
`distribution
`
`
`
`Flow
`
`incoming
`Sampled
`Flow
`
`s
`
`Statistical
`Packet
`infortation
`Sampling tracking Module
`Module racist
`inspection V
`
`Module
`
`
`
`1.
`
`Rules
`a
`last
`sampsicumulative
`ackets Amt. Sampled Samplingtime
`P
`102
`H
`
`Fl
`
`R-102
`
`Splunk Inc. Exhibit 1045 Page 1
`
`
`
`US 7,782,793 B2
`Page 2
`
`OTHER PUBLICATIONS
`Konstantinos Psounis, Arpita Ghosh, and Balaji Prabhakar: "SIFT: A
`Low-complexity Scheduler for Reducing Flow Delays in the
`Internet”, 2004, pp. 1-13.
`Duffield N et al.: “Estimating Flow Distributions From Sampled
`Flow Statistics”, vol. 33, No. 4, October 20036, pp. 325-336.
`Karagiannis, T., et al., Transport Layer Indentification of P2P Traffic,
`ACM, 2004.
`
`Prabhakar, B., Network Processor Algorithms: Design and Analysis,
`Stochastic Networks Conference, 2004.
`Roughan, M., et al., Class-of-Service Mapping for QoS: A Statistical
`Signature-Based Approach to IP Traffic Classification, ACM, 2004.
`Sen, S., et al., Accurate, Scalable, In-Network Identification of P2P
`Traffice Using Application Signatures, ACM, 2004.
`
`* cited by examiner
`
`Splunk Inc. Exhibit 1045 Page 2
`
`
`
`U.S. Patent
`
`Aug. 24, 2010
`
`Sheet 1 of 2
`
`US 7,782,793 B2
`
`-- - - - - - - - - N
`Y
`- - -
`Notifications
`/
`- - -
`A.
`- - - - - - - Flow info. Requests - -
`- - -
`1 Flow information
`I
`
`POas
`
`Outgoing
`Controlled
`Sampled Flow
`
`M
`y
`/
`W
`W
`
`252
`ep -
`Traffic -
`Control
`254
`Point
`c
`- well-1
`Classifier 250
`V
`inetworkrrias s/
`eisules:
`92.9e
`
`
`
`A Data
`Communications
`
`Aggregate
`Flow
`
`
`
`incoming
`Sampled
`Flow
`
`
`
`
`
`information
`amplingTracking Module
`Module? Packet
`inspection
`R
`
`152
`
`154
`
`Classification Rule
`Distribution
`
`102
`
`102
`
`Splunk Inc. Exhibit 1045 Page 3
`
`
`
`U.S. Patent
`
`Aug
`. 24, 2010
`
`Sheet 2 of 2
`
`US 7,782,793 B2
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`rr m me air- an as
`
`a
`
`as a
`
`a- - - - - - - - - - -
`
`- -
`
`-
`
`-
`
`Auqua
`
`
`- - - - - - -- a?epdn Augu=
`!= - - - - | uo?eognoN |
`
`Splunk Inc. Exhibit 1045 Page 4
`
`
`
`US 7,782,793 B2
`
`1.
`STATISTICAL TRACE-BASED METHODS
`FOR REAL-TIME TRAFFIC
`CLASSIFICATION
`
`FIELD OF THE INVENTION
`
`The invention relates to content delivery at the edge of
`communications networks, and in particular methods and
`apparatus providing real-time trace-based traffic classifica
`tion.
`
`10
`
`BACKGROUND OF THE INVENTION
`
`15
`
`25
`
`30
`
`35
`
`Traffic classification is important for many reasons in
`delivering content to customers at the edge of communica
`tions networks. For example, Quality of Service (QoS)
`requires the traffic to be segregated first in order to assign
`packets to particular Classes of Service (CoS). A network
`operator can provide a different level of service to each class
`as well as a pricing structure.
`Knowledge of traffic characteristics can help optimize the
`usage of the communications network infrastructure
`employed, and can help ensure a desired level of performance
`for applications/services important to the customers. The
`intention has always been that application requirements be
`considered in offering a level of service. Traditional methods
`of traffic detection and classification rely on monitoring logi
`cal port specifications typically carried in packet headers as,
`in the past, applications and/or services were, in a sense,
`assigned well known logical ports.
`A large percentage of the traffic conveyed by communica
`tions networks today consists of peer-to-peer (P2P) traffic.
`Because peer-to-peer traffic is conveyed between pairs of
`customer network nodes, it is not necessary that a well known
`logical port be allocated, reserved, and assigned to traffic
`generated by applications generating peer-to-peer traffic and/
`or applications retrieving peer-to-peer content. Therefore
`known approaches to traffic classification are no longer valid
`as logical ports are undefined for peer-to-peer applications
`and/or logical ports may be dynamically allocated as needed
`such in the case of the standard File Transfer Protocol (FTP)
`and others.
`Peer-to-peer content exchange techniques are increasingly
`being used to convey without permission content Subject to
`intellectual property protection, such as music and movies.
`Network operators are under an increasing regulatory pres
`sure to detect peer-to-peer traffic and to control illicit peer
`to-peer traffic, while rogue users are seeking ways to defy
`traffic classification to avoid detection.
`Besides peer-to-peer traffic detection, means and methods
`are being sought on a continual basis for detecting short
`duration traffic flows to help identify possible intrusions such
`as, but not limited to, Denial of Service (DOS) attacks.
`Statistical billing is another domain in which knowledge of
`traffic characteristics is necessary. Network operators
`55
`increasingly employ resource utilization measurements as a
`component in determining customer charges.
`Returning to peer-to-peer traffic detection, not all peer-to
`peer traffic is illicit: in view of the high levels of resource
`utilization demanded by peer-to-peer traffic, network opera
`tors may want to charge customers generating peer-to-peer
`traffic and retrieving peer-to-peer content more for their high
`bandwidth usage. Resource utilization alone is not always an
`adequate traffic characteristic differentiator as in many
`instances content conveyed to, and received from, multiple
`customers is aggregated at the managed edge and within the
`managed transport communications network.
`
`40
`
`45
`
`50
`
`60
`
`65
`
`2
`Attempts to characterize traffic, to detect traffic types, with
`a view of classifying traffic, include Deep Packet Inspection
`(DPI) techniques. Deep packet inspection techniques are
`described by Sen S., Spatscheck O. and Wang D. in "Accu
`rate, Scalable In-Network Identification of P2P Traffic Using
`Application Signatures'. Proceedings of the 13th interna
`tional conference on WorldWideWeb, NewYork, N.Y., 2004;
`and by Karagiannis T. Broido A. Faloutsos M. Claffy K. in
`“Transport layer identification of P2P traffic''. Proceedings of
`the 4th ACM SIGCOMM Conference on Internet Measure
`ment, Taormina, Sicily, Italy, 2004.
`Proposed deep packet inspection techniques, as the name
`Suggests, assume the availability of unlimited resources to
`inspect entire packets to the perform packet characterization.
`Therefore deep packet inspection incurs high processing
`overheads and is subject to high costs. Deep packet inspection
`also suffers from a complexity associated with the require
`ment of inspecting packet payloads at high line rates. For
`certainty, deep packet inspection is not Suited at all for typical
`high throughput communications network nodes deployed in
`current communications networks. Deep packet inspection
`also suffers from a high maintenance overhead as the detec
`tion techniques rely on signatures, peer-to-peer applications,
`especially, are known for concealing their identities—a deep
`packet inspection detection signature that provides conclu
`sive detection now may not work in the future, and another
`conclusive signature would have to be found and coded
`therein.
`Traffic classification means and methods are being actively
`sought by network operators in order to determine the types of
`traffic present in a managed communications network for
`traffic and network engineering purposes, on-line marking of
`packets, quality of service assessment/assurance, billing, etc.
`In view of impending regulatory pressures, efficient detection
`and classification of peer-to-peer traffic is especially desired,
`as peer-to-peer traffic consumes large, disproportional per
`centages of bandwidth and other communication network
`resources. Network operators have to employ a combination
`of: peer-to-peer traffic control in order to reserve network
`resources for other types of traffic, charge peer-to-peer users
`different rates to curb behavior, and/or even block peer-to
`peer completely in accordance with regulations imposed on
`network operators. There therefore is a need to solve the
`above mentioned issues to provide traffic classification means
`and methods which avoid the complexities of deep packet
`inspection and the pitfalls of logical port based packet clas
`sification.
`
`SUMMARY OF THE INVENTION
`
`In accordance with an aspect of the invention, a packet flow
`classification apparatus for on-line real-time traffic flow clas
`sification at a communications network node is provided.
`Packet sampling means randomly selects packets from an
`aggregate flow with a pre-determined sampling probability.
`Packet inspection means determines the packet size of each
`sampled packet and obtains the flow identification of the
`sampled traffic flow with which the sampled packet is asso
`ciated. A sampled flow information table has sampled flow
`table entries for storing real-time sampled flow statistical
`information. Flow information tracking means maintain the
`flow information table in real-time. And, a packet classifier
`classifies sampled traffic flows on-line in real-time based on a
`group of classification rules trained off-line on statistical trace
`traffic flow information.
`In accordance with another aspect of the invention, a
`packet flow classification system for on-line real-time traffic
`
`Splunk Inc. Exhibit 1045 Page 5
`
`
`
`US 7,782,793 B2
`
`3
`flow classification of a plurality of traffic flows conveyed
`through a network node of a communications network is
`provided. The packet flow classification system includes: a
`group of classification rules trained off-line on statistical trace
`traffic flow information, at least one traffic flow monitor at the
`communications network node, and a packet classifier for
`classifying sampled traffic flows on-line in real-time based on
`the group of off-line trained classification rules. The traffic
`flow monitor includes: packet sampling means for randomly
`selecting packets from an aggregate flow with a pre-deter
`mined sampling probability, packet inspection means for
`determining the packet size of each sampled packet and for
`obtaining flow identification of the sampled traffic flow with
`which the sampled packet is associated, a sampled flow infor
`mation table having sampled flow table entries for storing
`real-time sampled flow statistical information, and flow infor
`mation tracking means for maintaining the flow information
`table in real-time.
`In accordance with yet another aspect of the invention, a
`method of classifying traffic flows on-line in real-time based
`at least one classification rule trained off-line on statistical
`traffic flow information is provided. Packets are randomly
`sampled from an aggregate flow with a predetermined Sam
`pling probability. The packet size of each sampled packet is
`extracted. A flow identifier of the sampled traffic flow with
`which the sampled packet is associated is obtained. Informa
`tion regarding the sampled flow is tracked in a sampled flow
`information table entry. And, the sampled traffic flow is clas
`sified in real-time by subjecting the information tracked in the
`flow table entry to the at least one classification rule.
`Advantages are derived from simple, low overhead and
`inexpensive on-line real-time classification of high-band
`width flows at low overheads before flow termination.
`
`10
`
`15
`
`25
`
`30
`
`4
`page download followed by a reading period, and the infre
`quent electronic bank transaction. Although the equipment is
`prevalent, video conferencing is relatively rare. Service level
`agreements include enough long-term transport bandwidth
`for comparatively higher bandwidth netradio audio stream
`ing. Customers are assumed to be nice and occasional trans
`gressions rarely translate into higher bills at the end of the
`month. It is assumed that nice customers do not listen to
`netradio, nor download MP3s from traceable and reputable
`Sources, 24/7. At the same time, in view of the intense com
`petition in communications, the available transport band
`width in the core of the managed communications network is
`oversold.
`Problems arise when customers engage in illicit/regulated
`activities such as exchanging large amounts of content Sub
`ject to intellectual property protection. Such rogue customers
`do not want to pay higher fees for levels of service which
`would provide them with increased bandwidth in order not to
`attract attention to themselves. Sophisticated rogue custom
`ers are willing to put up with sending and/or receiving content
`at transfer rates well bellow aggregate service level agree
`ment limits for long periods of time. Network operators are
`faced with the conundrum that: rogue customers do not vio
`late their service level agreements; the fact that the rogue
`customers have not signed up at a higher level of service the
`resources are overused; and because of the bandwidth is
`oversold, services provisioned to nice customers are being
`impacted. Therefore given that input traffic from multiple
`customers and traffic output to multiple customers is typically
`aggregated on edge, aggregate traffic metrics only point out
`that network resources are utilized to a very high degree and
`that the average and the typical customer is nice.
`With the prevalence of peer-to-peer activity, it does not
`make business sense for network operators to deny services to
`any user engaging in peer-to-peer file sharing, after all some
`peer-to-peer traffic Such as a video conference is legitimate.
`What is desired is to identify regulated peer-to-peer traffic in
`order to reduce the allocation of network resources thereto
`thereby making the network unfriendly only to questionable/
`undesirable peer-to-peer traffic.
`Current traffic characterization techniques, in order to be
`effective, employ a determination of particular traffic flows
`and their duration to determine the amount of resources
`expended. Because of the willingness of rogue users to make
`peer-to-peer traffic compliant over the short term with service
`level agreements subscribed to at the cost of long duration
`uploads/downloads, knowing the duration of a traffic flow has
`been found to be very important in characterizing traffic. The
`trouble is that using current techniques, the duration of a
`traffic flow can only be found after the traffic flow ends, when
`is too late to effect any control over the traffic flow. For this
`very reason, current deep packet inspection techniques,
`which are designed to detect the initiation and the termination
`of a traffic flow, are inadequate as a trigger to real-time peer
`to-peer traffic control as the traffic classification is provided
`after the termination of the monitored activity.
`Having identified the problem, real-time traffic identifica
`tion is needed to provide a measure of how the available
`network resources are utilized by, and partitioned to, rogue
`customers in real-time. From a business point of view, any
`Solution has to assume that customers are nice. This is nec
`essary both from the customer relations point of view and
`because assuming the converse, would require a prohibitive
`amount of resources to be devoted to traffic monitoring.
`In "Class-of-Service Mapping for QoS: A statistical signa
`ture-based approach to IP traffic classification, ACM SIG
`COMM Internet Measurement Workshop, Taormina, Sicily,
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`35
`
`40
`
`45
`
`The features and advantages of the invention will become
`more apparent from the following detailed description of the
`exemplary embodiment(s) with reference to the attached dia
`grams wherein:
`FIG. 1 is a schematic diagram showing an edge communi
`cations network element sampling traffic sporadically and
`tracking statistical flow information in a table in accordance
`with the exemplary embodiment of the invention;
`FIG.2 is a flow diagram showing process steps in Sampling
`traffic sporadically and tracking statistical flow information
`in a table in accordance with the exemplary embodiment of
`the invention;
`FIG. 3 is a flow diagram showing process steps of a clean
`up process ensuring that the statistical information in a flow
`50
`information table is current, in accordance with the exem
`plary embodiment of the invention; and
`FIG. 4 is a flow diagram showing process steps of an
`on-line real-time classification process classifying high
`bandwidth sampled flows in accordance with the exemplary
`embodiment of the invention.
`It will be noted that in the attached diagrams like features
`bear similar labels.
`
`55
`
`DETAILED DESCRIPTION OF THE
`EMBODIMENTS
`
`The realities of Internet service provisioning to customers
`are Such that service level agreements are described in terms
`of expected aggregate traffic characteristics with the assump
`tion that most of the user traffic is highly bursty and relatively
`low bandwidth such as the occasional email, intermittent web
`
`60
`
`65
`
`Splunk Inc. Exhibit 1045 Page 6
`
`
`
`US 7,782,793 B2
`
`5
`Italy, 2004, which is incorporated herein by reference,
`Roughan M. Subhabrata S., Spatscheck O. and Duffield N.,
`address the fact that although various mechanisms exist for
`providing QoS. QoS has yet to be widely deployed. Roughan
`et al. are of the opinion that employing previously known
`techniques to map traffic to QoS classes is prohibitive due to
`the high overheads incurred. Falling short of providing real
`time traffic classification, Roughan et al. predicate their off
`line traffic classification solution on the fact that it would be
`unrealistic for effective traffic classification to inspect every
`packet, and propose an off-line trace-based statistical method
`of traffic characterization. Roughan et al. confirm that flow
`durations are essential in characterizing and distinguishing
`between different types of traffic. However, just like deep
`packet inspection techniques, the trace-based methods pro
`posed by Roughan et al. only obtain the flow duration after
`each flow terminates and are therefore inadequate for real
`time traffic characterization.
`Other relevant research in the art of packet queuing
`includes an proposal by Psounis K. Gosh A. and Prabhakar B.
`entitled “SIFT. A low-complexity scheduler for reducing flow
`delays in the Internet”, Technical report, CENG-2004-01,
`USC, which is incorporated herein by reference, and
`describes an algorithm for identifying high bandwidth flows
`in order to queue packets of identified high bandwidth flows
`in a special queue. The SIFT proposal includes sampling
`packets with a pre-selected low probability. With the assump
`tion that most low bandwidth flows consist of few packets,
`very few low bandwidth flows would be sampled. Conversely
`high bandwidth flows would be sampled with greater cer
`tainty. The flow identifier of each sampled flow is provided to
`a packet classifier? queue manager which queues every sub
`sequent packet bearing one of the provided flow identifiers
`into the special queue. The proposed use of the special queue
`(s) would be prohibitive in terms of the necessary resources
`for high throughput deployments.
`In accordance with the exemplary embodiment of the
`invention, information is gathered about real typical traffic
`patterns for statistically relevant periods of time. The out
`come of information gathering step is a trace of traffic (packet
`headers and, possibly, portions of payloads) and perhaps a
`collection of gathered and/or derived statistics. The trace
`traffic information is gathered with the intent of subjecting
`thereof to off-line traffic characterization training.
`Because the proposed traffic characterization is performed
`off-line, information about known and/or determinable traffic
`flow types may also be used as inputs to a rule creation
`process referred to as off-line training. Using the trace infor
`mation, different applications generating/consuming the traf
`fic are identified. Diverse off-line methods can be used for this
`purpose, for example, the methods described in the above
`referenced prior art Sen and Karagiannis and other deep
`packet inspection methods to the extent that portions of pay
`loads have been obtained. This step associates traffic flows
`with the applications.
`Classes of traffic, or traffic types, are defined. For example,
`Roughan et al. propose the following exemplary traffic
`classes:
`interactive, including: remote login sessions, interactive
`Web content access, etc;
`bulk data transfer, including: FTP, music downloads, peer
`to-peer traffic, etc.;
`streaming, including video conferencing, netradio, etc.;
`and
`transactional, including: distributed database access.
`Traffic characterization means are employed off-line to
`characterize the traffic flows based on the gathered trace
`
`40
`
`45
`
`6
`information and the associated Statistics, to create off-line
`rules based on statistical properties of the gathered informa
`tion and the application associativity. The outcome of the
`characterization step is a relation between traffic statistics
`(packet sizes, flow durations, etc.), application associativity
`and the traffic classes the traffic flows belong to. In creating
`the rules, Roughan et al. propose subjecting the traffic flow
`statistical parameters to statistical analysis in order to define
`domains in the multi-dimensional space of statistical flow
`parameters. Subjecting traffic flow statistical parameters to
`statistical analysis is referred herein as training. The domains
`correspond to statistically distinct traffic classes. Statistical
`analysis methods include, but are not limited to, Nearest
`Neighbour (NN), Linear Discriminant Analysis (LDA), etc.
`Given traffic classes, gathered Statistics, trace information,
`and application associativity, a set of rules are trained for
`classifying future data is determined.
`For example assuming that only two statistics: average
`packet sizes and session durations, are all that is needed to
`characterize traffic, a rule can indicate that if an average
`duration of a traffic flow is greater than 40 seconds, and
`average packets of that traffic flow do not exceed 300 bytes,
`then the traffic flow most likely belongs to interactive class
`(e.g., a Telnet session). As another example, peer-to-peer
`flows, based on the number of packets conveyed, statistically
`fall into bulk data transfer flows and streaming flows, how
`ever average packet sizes would characterize peer-to-peer
`flows as bulk data transfer flows regardless of the application
`or logical port used to convey the content. Beside flow dura
`tions, traffic flows can be characterized based on statistical
`traffic flow parameters such as: average/median packet size,
`packet size variance, root-means-square packet size, largest
`packet sampled so far, shortest packet sampled so far, aver
`age/median inter-packet arrival delay, inter-packet arrival
`delay variance, bytes per flow, packets per flow, etc.
`In accordance with an exemplary implementation of the
`exemplary embodiment of the invention, classification rules
`pertaining to expected/uninteresting traffic patterns may be
`deleted from the off-line determined set of classification
`rules. Reducing the number of classification rules provides
`desirable overhead reductions.
`Proposed real-time methods include real-time statistics
`collection and real-time traffic classification.
`In accordance with the exemplary embodiment of the
`invention, random packet sampling techniques are employed
`on a monitored aggregated flow irrespective of the individual
`aggregated flows contained therein for the purpose of esti
`mating flow durations of individual sampled flows. Random
`packet sampling techniques conform to the desired assump
`tion that initially all customers’ traffic flows are nice.
`In accordance with the exemplary embodiment of the
`invention, network elements shown in FIG. 1 such as, but not
`limited to, edge Switching equipment and routing equipment,
`include packet selection means 150, packet inspection means
`152, statistical information tracking means 154, and a flow
`information table 100, having sampled flow table entries 102.
`Each sampled flow entry 102 exemplary includes fields for
`the flow identifier 104, the number of packets sampled 106,
`the cumulative amount of content sampled 108, Sampling
`time of the last packet 110. All information necessary to
`populate and update the fields 104, 106, 108 and 110 can
`either be extracted by the packet inspection means 152 or
`derived by the statistical information tracking means 154
`from information extracted from sampled packet headers. For
`certainty, dependent on the implementation, the packet selec
`tion means 150, the packet inspection means 152, the statis
`tical information tracking means 154, and the flow informa
`
`10
`
`15
`
`25
`
`30
`
`35
`
`50
`
`55
`
`60
`
`65
`
`Splunk Inc. Exhibit 1045 Page 7
`
`
`
`US 7,782,793 B2
`
`7
`tion table 100, without limiting the invention, can be
`associated with a physical port, a logical port, a group of
`physical/logical ports, all ports, etc. Also, depending on the
`implementation, the proposed traffic monitoring may only be
`performed on best-effort traffic and/or available bit rate traffic
`conveyed by the implementing equipment.
`In accordance with the exemplary embodiment of the
`invention, a real-time traffic monitoring process 200 shown in
`FIG.2 maintains the flow information table 100. Based on a
`preset sampling probability, the traffic monitoring process
`200 determines 202 whether to sample the next packet upon
`arrival. If the next packet is to be sampled, then the process
`200, waits for the next packet to arrive, and inspects 204 the
`received packet to extract information such as, but not limited
`to, flow identification, and packet size. Typically, the packet
`inspection 204 is typically limited to reading at most the
`packet header, application layer connection information hid
`den in packet payloads is typically not searched for. Existing
`packet inspection means otherwise used in packet processing
`may be reused.
`If a flow table entry 102 for the identified flow does not
`exist 206 in the flow information table 100, then a flow table
`entry 102 is created 208 and the flow table entry is initialized
`210. Initializing 210 the flow table entry 102 includes, but is
`not limited to, filing in the flow identifier field 104, setting the
`number of sampled packets 106 to 1, setting the cumulative
`amount of content sampled 108 to the size of the sampled
`packet, and writing the sampling time in field 110.
`If a flow table entry 102 exists for the identified flow, then
`the fields of the table entry 102 are updated 212 by: incre
`menting the number of sampled packets 106 by 1, adding the
`size of the sampled packet to the cumulative amount of con
`tent sampled 108, and overwriting field 110 with the sampling
`time.
`Having created 208 or updated 212 the flow table entry
`102, the monitoring process 200 resumes from step 202. The
`resumption may include the issuance 220 of a notification that
`the particular flow entry 102 has been updated.
`In accordance with the exemplary embodiment of the
`invention, the number of times a traffic flow is sampled,
`regardless of the number of actual packets associated with the
`corresponding traffic flow being conveyed, is representative
`of the duration of the traffic flow so far. In view of the off-line
`training, the number of packets sampled so far, perhaps
`together with the other statistical values tracked in the flow
`table entry 102, is representative of the actual duration of the
`traffic flow. Therefore duration of a sampled flow can be
`forecasted before the traffic flow terminates within statistical
`certainty when the field values in the corresponding flow
`entry table match a rule.
`Tracking the cumulative amount of sampled content and
`the number of Sampled packets is equivalent to tracking the
`average packet size without requiring performing division as
`packets are sampled in real-time at Substantial overhead
`reductions.
`An exemplary parallel flow information table clean-up pro
`cess 300, shown in FIG.3 inspects 304 sampling times speci
`fied in flow table entries 102 and discards 306 stale entries
`102. The clean-up process 300 executes 302 in accordance
`with a clean-up discipline, for example periodically.
`The monitoring process 200 and the clean-up process 300,
`identify the existence of, and monitor, suspect traffic flows
`on-line and in real-time. The number of sampled packets of
`each flow entry 102 in the flow table 100 provides an estima
`tion of the duration of each flow. And, the clean-up process
`300 makes up for the lack of a determination of the exact time
`when monitored flows terminate. Having real-time statistical
`
`40
`
`45
`
`8
`information regarding current high bandwidth flows presents
`the network operator with the most plausible flows to con
`sider in identifying illicit/regulated content transfers regard
`less of the rogue customers' attempts to foil detection.
`In accordance with a