throbber
Class-of-Service Mapping for QoS: A Statistical
`Signature-based Approach to IP Traffic Classification
`
`Matthew Roughanx Subhabrata Sen? Oliver Spatscheck? Nick Duffield?
`School of Mathematical Sciences, University of Adelaide,SA 5005, Australiax
`AT&T Labs – Research, Florham Park, NJ 07932-0971, USA?
`
`matthew.roughan@adelaide.edu.au fsen,spatscheck,duffieldg@research.att.com
`
`ABSTRACT
`The ability to provide different Quality of Service (QoS) guaran-
`tees to traffic from different applications is a highly desired feature
`for many IP network operators, particularly for enterprise networks.
`Although various mechanisms exist for providing QoS in the net-
`work, QoS is yet to be widely deployed. We believe that a key
`factor holding back widespread QoS adoption is the absence of suit-
`able methodologies/processes for appropriately mapping the traffic
`from different applications to different QoS classes. This is a chal-
`lenging task, because many enterprise network operators who are
`interested in QoS do not know all the applications running on their
`network, and furthermore, over recent years port-based application
`classification has become problematic. We argue that measurement
`based automated Class of Service (CoS) mapping is an important
`practical problem that needs to be studied.
`In this paper we describe the requirements and associated chal-
`lenges, and outline a solution framework for measurement based
`classification of traffic for QoS based on statistical application sig-
`natures. In our approach the signatures are chosen in such as way
`as to make them insensitive to the particular application layer pro-
`tocol, but rather to determine the way in which an application is
`used – for instance is it used interactively, or for bulk-data trans-
`port. The resulting application signature can then be used to derive
`the network layer signatures required to determine the CoS class
`for individual IP datagrams. Our evaluations using traffic traces
`from a variety of network locations, demonstrate the feasibility and
`potential of the approach.
`
`Categories and Subject Descriptors
`C.2.3 [Computer-Communications Networks]: Network Opera-
`tions—Network Management, Network Monitoring
`
`General Terms
`Algorithms, Management, Measurement
`
`Permission to make digital or hard copies of all or part of this work for
`personal or classroom use is granted without fee provided that copies are
`not made or distributed for profit or commercial advantage and that copies
`bear this notice and the full citation on the first page. To copy otherwise, to
`republish, to post on servers or to redistribute to lists, requires prior specific
`permission and/or a fee.
`IMC’04, October 25–27, 2004, Taormina, Sicily, Italy.
`Copyright 2004 ACM 1-58113-821-0/04/0010 ...$5.00.
`
`Keywords
`Statistical Signatures, Traffic Classification, Class of Service (CoS),
`Quality of Service (QoS)
`
`1.
`
`INTRODUCTION
`The past few years have witnessed a dramatic increase in the
`number and variety of applications running over the Internet and
`over enterprise IP networks. The spectrum includes interactive (e.g.,
`telnet, instant messaging, games etc.), bulk data transfer (e.g., ftp,
`P2P file downloads), corporate (e.g., Lotus Notes, database trans-
`actions), and real-time applications (voice, video streaming, etc.),
`to name just a few.
`Network operators (particularly in enterprise networks) are ac-
`tively seeking the ability to support different levels of Quality of
`Service (QoS) for different types of applications. The need is driven
`by (i) the inherently different QoS requirements of different types
`of applications (e.g., low end-end delay for interactive applications,
`high throughput for file transfer applications etc.); (ii) the different
`relative importance of different applications to the enterprise: e.g.,
`Oracle database transactions may be considered critical and there-
`fore high priority, while traffic associated with browsing external
`web sites is generally less important; and (iii) the desire to opti-
`mize the usage of their existing network infrastructures under finite
`capacity and cost constraints, while ensuring good performance for
`important applications. In spite of a clear perceived need, and the
`fact that various mechanisms (diffserv, traffic prioritization, etc. [8,
`17, 7]) have been developed for providing different service quality
`guarantees in the network, their adoption has not been widespread.
`A pertinent question then is: what ails QoS?
`Realization of service differentiation capabilities requires associ-
`ation of the traffic with the different applications, determination of
`the QoS to be provided to each, and finally, mechanisms in the un-
`derlying network for providing the QoS. Based on interactions with
`large enterprise network operators, we believe that a key issue be-
`hind the slow spread of QoS-use is not the lack of interest or need,
`but rather, the absence of suitable mapping techniques that can aid
`operators in classifying the network traffic mix among the different
`QoS classes. We refer to this as the Class of Service (CoS) mapping
`problem, and hypothesize that solving this would go a long way in
`making the use of QoS more accessible to operators.
`In principle the division into CoS could be done by end-points
`— for instance by end-user applications. However, for reasons of
`trust, and scalability of administration and management it is typ-
`
`135
`
`Cloudflare - Exhibit 1034, page 135
`
`

`

`ically more practical to do this within the network, for instance
`at the router that connects the Local Area Network (LAN) to the
`Wide Area Network (WAN). Alternatively there might be appli-
`ances which sit near the LAN to WAN transition point performing
`packet marking for QoS.
`CoS mapping inside the network is a non-trivial task. Ideally,
`a network system administrator would possess precise information
`on the applications running inside their network, along with simple,
`unambiguous mappings from easily obtained traffic measurements
`to applications (e.g. by port numbers, or source and destination IP
`addresses). This information is vital not just for the implementa-
`tion of CoS (e.g. via diffserv), but also in planning the capacity
`required for each class, and balancing tradeoffs between cost and
`performance that might occur in choosing class allocations. For in-
`stance, one might have an application whose inclusion in a higher
`priority class is desirable, but not cost effective (based on traffic
`volumes and pricing), and so some difficult choices must be made.
`Good data is required for these to be informed choices.
`However, in general, the required information is rarely up-to-
`date or complete, if it is available at all. The traditional ad-hoc
`growth of IP networks, the continuing rapid proliferation of new
`applications, the merger of companies with different networks, and
`the relative ease with which almost any user can add a new appli-
`cation to the traffic mix with no centralized registration are some
`factors contributing to this “knowledge gap”. Furthermore, over
`recent years it has become harder to identify network applications
`within IP traffic. Traditional techniques such as port-based classi-
`fication of applications have become much less accurate (details in
`Section 2).
`This paper presents a signature-based traffic classification frame-
`work as a candidate solution for the CoS mapping problem, and
`demonstrates the feasibility and potential of the approach using
`large traffic traces collected from multiple Internet locations. Our
`classification method is based on utilizing the statistics of particu-
`lar applications in order to form signatures. By choosing signatures
`that are determined by the way in which an application is used (e.g.
`is it used interactively, or for bulk data transport), we should ob-
`tain signatures that are insensitive to the particular application layer
`protocol. We can therefore perform CoS categorization without de-
`tailed knowledge of the specific application protocol, or usage case
`(some applications may be used for multiple tasks with different
`QoS requirements). The method would be used off-line to form a
`set of port, or IP address based rules for CoS assignment that would
`then be applied on-line in the QoS implementation. Our evaluations
`indicate that the technique has relatively low error rates. For exam-
`ple, our cross-validation tests using 3-Nearest Neighbor (details in
`Section 5) yield classification error of 2:5% and 5:1% respectively
`when the traffic is categorized into three and four classes, using two
`features. We found that even better results (for instance 0.0% errors
`were possible) with three features. The evaluations even showed
`relatively low error rates even for fine grain traffic classes. The
`last suggests that such statistical classification techniques may be
`good candidates for identifying even individual applications. Such
`identification of the different applications and their associated net-
`work traffic has a number of important usages for network opera-
`tions and management (outside of QoS implementation), including
`application-specific traffic engineering, capacity planning, provi-
`sioning, and security.
`
`The remainder of this paper is organized as follows. Section 2
`overviews existing techniques for identifying IP traffic and their
`limitations. Section 3 presents a three-phase framework for realiz-
`ing CoS mapping. Section 4 presents our techniques for CoS classi-
`fication from network traffic. Section 5 presents evaluations of our
`techniques using real traffic traces. Finally, Section 6 concludes the
`paper.
`
`2.
`
`IP TRAFFIC CLASSIFICATION
`One approach commonly used for identifying applications on an
`IP network is to associate the observed traffic (using flow level data,
`or a packet sniffer) with an application based on TCP or UDP port
`numbers. We argue here that this method (described below) is in-
`adequate.
`The TCP/UDP port numbers are divided into three ranges: the
`Well Known Ports (0-1023), the Registered Ports (1024-49,151),
`and the Dynamic and/or Private ports (49,152-65,535). A typi-
`cal TCP connection starts with a SYN/SYN–ACK/ACK handshake
`from a client to a server. The client addresses its initial SYN packet
`to the well known server port of a particular application. The source
`port number of the packet is typically chosen dynamically by the
`client. UDP uses ports similarly to TCP, though without connec-
`tion semantics. All future packets in either a TCP or UDP session
`use the same pair of ports to identify the client and server side of the
`session. Therefore, in principle the TCP or UDP server port num-
`ber can be used to identify the higher layer application, by simply
`identifying which port is the server port and mapping this port to an
`application using the IANA (Internet Assigned Numbers Author-
`ity) list of registered ports [20]. However, port-based application
`classification has limitations. First, the mapping from ports to ap-
`plications is not always well defined. For instance,
`
`(cid:15) Many implementations of TCP use client ports in the registered
`port range. This might mistakenly classify the connection as be-
`longing to the application associated with this port. Similarly,
`some applications (e.g. old bind versions), use port numbers
`from the well-known ports to identify the client site of a ses-
`sion.
`(cid:15) Ports are not defined with IANA for all applications, e.g. P2P
`applications such as Napster and Kazaa.
`(cid:15) An application may use ports other than its well-known ports
`to circumvent operating system access control restrictions, e.g.,
`non-privileged users often run WWW servers on ports other
`than port 80, which is restricted to privileged users on most
`operating systems.
`(cid:15) There are some ambiguities in the port registrations, e.g. port
`888 is used for CDDBP (CD Database Protocol) and access-
`builder.
`(cid:15) In some cases server ports are dynamically allocated as needed.
`For example, FTP allows the dynamic negotiation of the server
`port used for the data transfer. This server port is negotiated on
`an initial TCP connection which is established using the well-
`known FTP control port.
`(cid:15) The use of traffic control techniques like firewalls to block unau-
`thorized, and/or unknown applications from using a network
`has spawned many work-arounds which make port based appli-
`cation authentication harder. For example port 80 is being used
`
`136
`
`Cloudflare - Exhibit 1034, page 136
`
`

`

`by a variety of non-web applications to circumvent firewalls
`which do not filter port-80 traffic. In fact available implemen-
`tations of IP over HTTP allow the tunneling of all applications
`through TCP port 80.
`
`(cid:15) Trojans and other security (e.g. DoS) attacks generate a large
`volume of bogus traffic which should not be associated with the
`applications of the port numbers those attacks use.
`
`A second limitation of port-number based classification is that
`a port can be used by a single application to transmit traffic with
`different QoS requirements. For example, (i) Lotus Notes transmits
`both email and database transaction traffic over the same ports, and
`(ii) scp (secure copy), a file transfer protocol, runs over ssh (se-
`cure shell), which is also used interactively on the same port (TCP
`port 22). This use of the same port for traffic requiring different
`QoS requirements is quite legitimate, and yet a good classification
`must separate different use cases for the same application. In prac-
`tice, one use case may dominate on a particular VPN (Virtual Pri-
`vate Network), or use cases will have other discriminating factors
`such as the servers’ IP addresses. Thus a clean QoS implementa-
`tion is still possible through augmenting the classification rules to
`include IP address-based disambiguation. Server lists exist in some
`networks, but again, in practice these are often incomplete, or a
`single server could be used to support a variety of different types of
`traffic, so we must combine port and IP address rules.
`A possible alternative to port based classification is to use a painstak-
`ing process involving installation of packet sniffers and parsing
`packets for application-level information to identify the application
`class of each individual TCP connection or UDP session. However,
`this approach cannot be used with more easily collected flow level
`data, and its collection is computationally expensive, limiting its
`application to lower bandwidth links. Also this approach requires
`precise prior knowledge of applications and their packet formats
`– something that may not always be possible. We use it here to
`identify Kazaa and Gnutella traffic from their application layer pro-
`tocols, but this requires substantial effort for each application, and
`in some cases, each application version. Furthermore, the introduc-
`tion of payload encryption is increasingly limiting our ability to see
`inside packets for this type of information.
`In this paper we explore the potential of signatures derived from
`measured traffic for CoS categorization. In practice, this approach,
`in conjunction with the above techniques and the partial knowledge
`available for most corporate networks can be used to bear on the
`problem of application identification and traffic classification.
`2.1 Related work
`Previous related work has examined the variation of flow charac-
`teristics according to application. Claffy [10] investigated the joint
`distribution of flow duration and number of packets, and its varia-
`tion with flow parameters such as inter-packet timeout. Differences
`were observed between the support of the distributions of some ap-
`plication protocols, although overlap was clearly present between
`some applications. Most notably, the support of the distribution of
`DNS transactions had almost no overlap with that of other appli-
`cations considered. The use of such distributions as a discrimina-
`tor between different application types was not considered. There
`exists a wealth of other research on characterizing and modeling
`workloads for particular applications, e.g., [22, 31, 4, 2, 9, 34]. An
`
`early work in this space, [29], examines the distributions of flow
`bytes and packets for a number of different applications. Interflow
`and intraflow statistics are another possible dimension along which
`application types may be distinguished. [30] found that user initi-
`ated events—such as telnet packets within flows or ftp-data
`connection arrivals—can be described well by a Poisson process,
`whereas other connection arrivals deviate considerably from Pois-
`son.
`All these studies assume that one can identify the application
`traffic unambiguously and then obtain statistics for that applica-
`tion. In contrast, we are considering the dual problem of inferring
`the application from the traffic statistics. This type of approach has
`been suggested in very limited contexts such as identifying chat
`traffic [12]. Our work extends this idea while providing a rigor-
`ous mathematical framework for performing classification based
`on signatures. Signature-based detection techniques have also been
`explored in the context of network security, attack and anomaly de-
`tection (e.g. [6, 5, 36, 26]) where one typically seeks to find a signa-
`ture for an attack. However, we apply our classification techniques
`to identify everyday traffic.
`
`3. REALIZING COS MAPPING
`
`The constraints under which CoS mapping must operate are prin-
`cipally related to computational complexity. At high speeds, the
`rules that can be used for this task are rather limited. Typically, one
`can use simple criteria such as port numbers or IP source/destination
`addresses, but not details from the higher layer protocols. Our task
`is to use statistical signature based classification (trained on prior
`networks’ traffic, and applied to traffic measurements from the cur-
`rent VPN) to form a set of local rules based on port, or IP addresses,
`which would then be applied on-line for CoS assignment.
`We propose to realize this type of CoS mapping using a three
`stage process.
`
`1. statistics collection,
`
`2. classification,
`
`3. rule creation.
`
`The first stage — statistics collection — involves placing monitors
`in the network, and collecting appropriate statistics of the traffic
`from certain aggregates. In this case, the aggregates we consider
`are the server port Pi, and server IP address1 Ii of a connection i.
`One could choose these aggregates in a more flexible manner, with
`the proviso that the aggregates used must be easily implementable
`as rules (see below). There are a number of techniques available
`for efficiently identifying the interesting aggregates, for instance,
`see [13]. We then form a vector of statistics SC (i) for each con-
`nection i, and use this to update the statistics of each aggregate that
`connection is involved in, for instance statistics SP (Pi) for port
`aggregates, and SI (Ii) for server aggregates.
`
`

`

`For statistics collected on TCP connections, the procedure would
`be:
`
`foreach packet
`if new TCP connection (give it index i++)
`determine the aggregates for i
`server port Pi = dst port of SYN
`server IP address Ii = dst IP of SYN
`...
`initialize a set of statistics SC(i)
`else if part of existing TCP connection i
`update connection statistics SC (i)
`else if end TCP connection i
`update connection statistics SC (i)
`update statistics foreach aggregate
`by server port: SP (Pi)
`by server IP address: SI (Ii)
`
`The update procedure for connections depends on the statistic in
`question. Ideally, we should choose statistics that can be updated
`on-line in a streaming fashion, i.e. recursively. This means that we
`do not need to store data per packet, but rather per connection, for
`instance, assuming an update algorithm like
`
`
`
`
`
`
`
`SCk (i) f (X ij ; SCk (i); (cid:30)(i));
`
`where X i
`j is the measurement for the jth packet in connection i,
`and SC
`k (i) is the kth statistic for connection i, and (cid:30)(i) is some
`(small) set of state information (e.g. the packet number j) for con-
`nection i, then the memory required to store the state depends on
`the number of connections, not number of packets. For example,
`for a series of real-valued data Xj, the following statistics may be
`easily computed recursively:
`
`1. average:
`
`(cid:22)Xj+1 =
`
`1
`
`

`

`to map the classes/rules to known traffic and servers, and ensure
`that the known policy based rules were enforced in priority to those
`derived above. Think of this as our disgruntled business class pas-
`senger updating his administrator’s list of important people, so that
`he receives the air transport he deserves in the future. In essence,
`the technique above is intended to fill in the unknown gaps in the
`best possible way.
`A side benefit of the above approach is that the collected statis-
`tics can be used in other ways. For instance, the traffic volume per
`aggregate could be useful in planning the required capacity for each
`QoS class, or the network in general. In fact, given a set of prices
`per class, the measured volumes, and a set of utilities per applica-
`tion class, one could create the mapping from the application class
`Cj to QoS class Qk through an optimization process using traffic
`data, rather than as a set of fixed rules.
`Once monitors are installed, there is no reason one could not use
`them in an on-going manner. One could continue to make measure-
`ments (as above) of the statistics per aggregate, and if something
`changes significantly, then one could change the rules used. An
`example might be the introduction of a new application with differ-
`ent QoS characteristics, requiring a rule update. It is not, however,
`envisioned that these updates would occur often.
`
`4. COS CLASSIFICATION
`We next present the different components in our classification
`approach: identifying different application classes, selecting candi-
`date classification features, and finally specific classification meth-
`ods.
`
`4.1 Class Definitions
`In practice, service differentiation mechanisms like diffserv to-
`day only allow a relatively small number of application classes.
`For our initial study, we focus on the following 4 broad applica-
`tion classes, commonly found in corporate networks.
`
`(cid:15) Interactive: The interactive class contains traffic which is re-
`quired by a user to perform multiple real-time interactions with
`a remote system. This class includes such applications as remote
`login sessions or an interactive Web interface.
`
`(cid:15) Bulk data transfer: The bulk data transfer class contains traffic
`which is required to transfer large data volumes over the net-
`work without any realtime constraints. This class includes ap-
`plications such as FTP, software updates, and music or video
`downloads (say through an application such as Kazaa).
`
`(cid:15) Streaming: The streaming class contains multimedia traffic with
`realtime constraints. This class includes such applications as
`streaming and video conferencing.
`
`(cid:15) Transactional: The transactional class contains traffic which is
`used in a small number of request response pairs which can be
`combined to represent a transaction. DNS, and Oracle transac-
`tions belong to this class.
`
`The choices were motivated by the need to select a small num-
`ber of classes that would be simple, intuitive and still adequately
`represent the different QoS requirements of commonly used appli-
`cations. We view these 4 classes as a starting point, with the actual
`choice of application classes being a topic for research.
`
`To characterize each application class we need a reference data
`set for each class from which we can extract a set of representative
`features. Acquiring such a reference data set is made difficult by
`precisely the problems described in Section 2. Selecting the net-
`work traffic based on port numbers may not yield reliable statistics
`that are representative of any particular class, however, to classify
`them otherwise requires a reference data set. To break this circular
`dependency we selected some applications per class, which based
`on their typical use, have a low likelihood of being contaminated
`by traffic belonging to another application class. In particular we
`focus on applications which:
`
`(cid:15) are clearly within one class (to avoid mixing the statistics
`from two classes);
`
`(cid:15) are widely used, so as to assure we get a good data-set;
`(cid:15) have server ports in the well known port range to reduce the
`chance of mis-use of these ports.
`
`These reference applications will then be used to estimate a num-
`ber of statistics, from which we will select features for use in CoS
`categorization. Based on the criterion established in the previous
`section, the reference applications selected for each application class
`are:
`
`(cid:15) Interactive: Telnet,
`(cid:15) Bulk data: FTP-data, Kazaa,
`(cid:15) Streaming: RealMedia streaming,
`(cid:15) Transactional: DNS, HTTPS.
`
`We include HTTPS in the transactional class, because a large
`proportion of HTTPS interactions involve users filling out a form,
`for instance to conduct secure credit card purchases over the WWW.
`In our current study, we do not use web traffic to train the classi-
`fication. However we do include WWW traffic statistics (as cap-
`tured using port 80) in some plots as an example application, as it
`provides some interesting intuition into the results. Also, although
`Kazaa does not have registered ports, in particular data sets we have
`reliable ways of identifying Kazaa traffic (as describe below).
`4.2 Candidate Features
`The list of possible features one could consider is very large. We
`can broadly classify these into five categories:
`1. Simple packet-level features such as mean packet size and
`various moments such as variance, RMS (root mean square) size,
`etc., are simple to compute, and can be gleaned directly from packet-
`level information. They offer a characterization of the application
`that is independent of the notion of flows, connections or other
`higher level aggregations. Another advantage is that packet-level
`sampling, which is widely used in network data collection, has lit-
`tle impact on these statistics. Other statistics that can be derived
`from simple packet data are time series, from which we could de-
`rive a number of features, for instance relating to correlations over
`time (e.g. parameters of long-range dependence such as the Hurst
`parameter). An example of this type of classification can be seen in
`[23], where the authors use time-of-day traffic profiles to categorize
`web sites.
`2. Flow-level statistics are summary statistics at the grain of net-
`work flows. A flow is defined to be an unidirectional sequence
`
`139
`
`Cloudflare - Exhibit 1034, page 139
`
`

`

`set of general rules for classifying future data on the basis of a
`feature vector. Each training data point consists of a pair Xj 2
`IRM ; Gj 2 f1; : : : ; Kg, where Xj is the feature vector, and Gj is
`the class of the jth data point, and our rule should provide a map-
`ping ^G : IRM ! f1; : : : ; Kg.
`One very simple method of classification is NN. In this classifi-
`cation method we assign a new data point to the class of its nearest
`(in this paper we shall use Euclidean distance as the metric for dis-
`tance) neighbor from the training data. That is we take ^G(X) to
`be the class Gi of the data point Xi which minimizes the distance
`jjXi (cid:0) Xjj. NN methods can be generalized to k-NN (where the
`k nearest neighbors essentially ‘vote’ on the class of the observa-
`tion), to enhance its robustness. k-NN methods are generally very
`good on low-dimensional data (small M), but are less effective on
`high-dimensional data, and give little insight into the most useful
`features.
`An alternative set of approaches can be drawn from statistical
`decision theory, in which we choose ^G(X) = Gi if
`
`Pr(Gijx = X) = max
`g2f1;:::;Kg
`
`Pr(gjx = X):
`
`That is, we choose the class with maximal conditional probability,
`given the feature vector X. This approach is known as the Bayes
`classifier. In general we can call (cid:14)g = Pr(gjx = X) a discrimi-
`nant function, and we choose the class with the maximum discrim-
`inant. A k-NN approach can be seen as an approximation to the
`Bayes classifier above, where we are approximating the probabili-
`ties Pr(gjx = X) by the proportion of the k nearest neighbors of
`class g.
`There are many alternative approaches for estimating the dis-
`criminant functions. LDA [19] is a simple method that can be de-
`rived via a number of approaches. Perhaps the most famous deriva-
`tion is Fisher’s — he posed the problem “Find the linear combina-
`tion Z = aT x such that the between class variance is maximized
`relative to the within-class variance”. When one considers this lin-
`ear combination it forms a natural discriminant between the classes,
`that minimizes the overlap between them. Another simple method
`of derivation of LDA is to assume that each class g has a Gaussian
`distribution with mean (cid:22)g and with the same intra-class covariance
`(cid:6) for each class. In this case the distribution of features vectors
`within each class can be drawn from
`
`fk(x) =
`
`1
`
`of packets that have some field values in common, typically, the
`5-tuple (source IP, destination IP, source port, destination port, IP
`Protocol type). Example flow-level features include mean flow du-
`ration, mean data volume per flow, mean number of packets per
`flow, and variance of these metrics, etc. There are some more com-
`plex forms of information one can also glean from flows (or packet
`data) statistics, for instance, we may look at the proportion of inter-
`nal versus external traffic within a category – external traffic (traffic
`to the Internet) may have a lower priority within a corporate set-
`ting. These statistics can be obtained using flow-level data collected
`at routers using, e.g. Cisco NetFlow [27]. They do not require the
`more resource intensive process of finer grain packet-level traces. A
`limitation is that flow-collection may sometimes aggregate packets
`that belong to multiple application-level connections into a single
`flow, which would distort the flow-level features.
`3. Connection-level statistics are required to trace some inter-
`esting behavior associated with connection oriented transport-level
`connections such as TCP connections. A typical TCP connection
`starts and ends with well defined handshakes from a client to a
`server. The collection process needs to track the connection state
`in order to collect connection level statistics.
`In addition to the
`features mentioned for the flow-level, other features that are mean-
`ingful to compute at the TCP connection level are the symmetry
`of a connection, advertised window sizes and the throughput distri-
`bution. The connection-level data generally provides better quality
`data than the flow-level information, but requires additional over-
`head, and would also be impacted by sampling or asymmetric rout-
`ing at the collection point.
`4. Intra-flow/connection features There are features we might
`wish to collect which are based on the notion of a flow or TCP
`connection, but require statistics about the packets within each flow.
`A simple example is the statistics of the inter-arrival times between
`packets in flows. This requires data collected at a packet level, but
`then grouped into flows. We use the relative variance of these inter-
`arrival times as a measure of the burstiness of a traffic stream. Intra-
`flow/connection features include loss rates, latencies, etc.
`5. Multi-flow: Sometimes interesting characteristics can be cap-
`tured only by considering statistics across multiple flows/connections.
`For instance, many peer-to-peer applications achieve the download
`of a large file by bulk downloads of smaller chunks from multi-
`ple machines — the individual chunk downloads are typically per-
`formed close together in time. For some multimedia streaming pro-
`tocols, the high volume data connection is accompanied by a con-
`current, separate connection between the same set of end-systems,
`containing low volume, intermittent control data (e.g. RTSP [32]).
`These multi-flow features are more complex and computationally
`more expensive to capture than flow or connection data alone.
`Table 1 gives a summary of the types of features, and which types
`of measurements are needed to collect them. At the conclusion of
`our ongoing work we would like to have identified a minimal set of
`statistical features that can be used to classify traffic into application
`classes reliably and independently of the actual application.
`
`4.3 Classification methods
`In this paper we test two simple, but common methods for classi-
`fication: Nearest Neighbor (NN) and Linear Discriminant Analysis
`(LDA). The general problem of classification is: given K classes,
`M features, and N training data points, we wish to determine a
`
`

`

`

`

`connection-level statistics per application for this trace. Note that
`the numbers corresponding to domain (i.e., DNS) only consider
`DNS communications over TCP.
`
`(cid:15) Gigascop

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket