IPR2022-00228, No. 1034-42 Exhibit - Exhibit 1034 (P.T.A.B. Nov. 24, 2021)

Class-of-Service Mapping for QoS: A Statistical
`Signature-based Approach to IP Trafﬁc Classiﬁcation
`
`Matthew Roughanx Subhabrata Sen? Oliver Spatscheck? Nick Dufﬁeld?
`School of Mathematical Sciences, University of Adelaide,SA 5005, Australiax
`AT&T Labs – Research, Florham Park, NJ 07932-0971, USA?
`
`matthew.roughan@adelaide.edu.au fsen,spatscheck,dufﬁeldg@research.att.com
`
`ABSTRACT
`The ability to provide different Quality of Service (QoS) guaran-
`tees to trafﬁc from different applications is a highly desired feature
`for many IP network operators, particularly for enterprise networks.
`Although various mechanisms exist for providing QoS in the net-
`work, QoS is yet to be widely deployed. We believe that a key
`factor holding back widespread QoS adoption is the absence of suit-
`able methodologies/processes for appropriately mapping the trafﬁc
`from different applications to different QoS classes. This is a chal-
`lenging task, because many enterprise network operators who are
`interested in QoS do not know all the applications running on their
`network, and furthermore, over recent years port-based application
`classiﬁcation has become problematic. We argue that measurement
`based automated Class of Service (CoS) mapping is an important
`practical problem that needs to be studied.
`In this paper we describe the requirements and associated chal-
`lenges, and outline a solution framework for measurement based
`classiﬁcation of trafﬁc for QoS based on statistical application sig-
`natures. In our approach the signatures are chosen in such as way
`as to make them insensitive to the particular application layer pro-
`tocol, but rather to determine the way in which an application is
`used – for instance is it used interactively, or for bulk-data trans-
`port. The resulting application signature can then be used to derive
`the network layer signatures required to determine the CoS class
`for individual IP datagrams. Our evaluations using trafﬁc traces
`from a variety of network locations, demonstrate the feasibility and
`potential of the approach.
`
`Categories and Subject Descriptors
`C.2.3 [Computer-Communications Networks]: Network Opera-
`tions—Network Management, Network Monitoring
`
`General Terms
`Algorithms, Management, Measurement
`
`Permission to make digital or hard copies of all or part of this work for
`personal or classroom use is granted without fee provided that copies are
`not made or distributed for proﬁt or commercial advantage and that copies
`bear this notice and the full citation on the ﬁrst page. To copy otherwise, to
`republish, to post on servers or to redistribute to lists, requires prior speciﬁc
`permission and/or a fee.
`IMC’04, October 25–27, 2004, Taormina, Sicily, Italy.
`Copyright 2004 ACM 1-58113-821-0/04/0010 ...$5.00.
`
`Keywords
`Statistical Signatures, Trafﬁc Classiﬁcation, Class of Service (CoS),
`Quality of Service (QoS)
`
`1.
`
`INTRODUCTION
`The past few years have witnessed a dramatic increase in the
`number and variety of applications running over the Internet and
`over enterprise IP networks. The spectrum includes interactive (e.g.,
`telnet, instant messaging, games etc.), bulk data transfer (e.g., ftp,
`P2P ﬁle downloads), corporate (e.g., Lotus Notes, database trans-
`actions), and real-time applications (voice, video streaming, etc.),
`to name just a few.
`Network operators (particularly in enterprise networks) are ac-
`tively seeking the ability to support different levels of Quality of
`Service (QoS) for different types of applications. The need is driven
`by (i) the inherently different QoS requirements of different types
`of applications (e.g., low end-end delay for interactive applications,
`high throughput for ﬁle transfer applications etc.); (ii) the different
`relative importance of different applications to the enterprise: e.g.,
`Oracle database transactions may be considered critical and there-
`fore high priority, while trafﬁc associated with browsing external
`web sites is generally less important; and (iii) the desire to opti-
`mize the usage of their existing network infrastructures under ﬁnite
`capacity and cost constraints, while ensuring good performance for
`important applications. In spite of a clear perceived need, and the
`fact that various mechanisms (diffserv, trafﬁc prioritization, etc. [8,
`17, 7]) have been developed for providing different service quality
`guarantees in the network, their adoption has not been widespread.
`A pertinent question then is: what ails QoS?
`Realization of service differentiation capabilities requires associ-
`ation of the trafﬁc with the different applications, determination of
`the QoS to be provided to each, and ﬁnally, mechanisms in the un-
`derlying network for providing the QoS. Based on interactions with
`large enterprise network operators, we believe that a key issue be-
`hind the slow spread of QoS-use is not the lack of interest or need,
`but rather, the absence of suitable mapping techniques that can aid
`operators in classifying the network trafﬁc mix among the different
`QoS classes. We refer to this as the Class of Service (CoS) mapping
`problem, and hypothesize that solving this would go a long way in
`making the use of QoS more accessible to operators.
`In principle the division into CoS could be done by end-points
`— for instance by end-user applications. However, for reasons of
`trust, and scalability of administration and management it is typ-
`
`135
`
`Splunk Inc. Exhibit 1034 Page 1
`
`

`ically more practical to do this within the network, for instance
`at the router that connects the Local Area Network (LAN) to the
`Wide Area Network (WAN). Alternatively there might be appli-
`ances which sit near the LAN to WAN transition point performing
`packet marking for QoS.
`CoS mapping inside the network is a non-trivial task. Ideally,
`a network system administrator would possess precise information
`on the applications running inside their network, along with simple,
`unambiguous mappings from easily obtained trafﬁc measurements
`to applications (e.g. by port numbers, or source and destination IP
`addresses). This information is vital not just for the implementa-
`tion of CoS (e.g. via diffserv), but also in planning the capacity
`required for each class, and balancing tradeoffs between cost and
`performance that might occur in choosing class allocations. For in-
`stance, one might have an application whose inclusion in a higher
`priority class is desirable, but not cost effective (based on trafﬁc
`volumes and pricing), and so some difﬁcult choices must be made.
`Good data is required for these to be informed choices.
`However, in general, the required information is rarely up-to-
`date or complete, if it is available at all. The traditional ad-hoc
`growth of IP networks, the continuing rapid proliferation of new
`applications, the merger of companies with different networks, and
`the relative ease with which almost any user can add a new appli-
`cation to the trafﬁc mix with no centralized registration are some
`factors contributing to this “knowledge gap”. Furthermore, over
`recent years it has become harder to identify network applications
`within IP trafﬁc. Traditional techniques such as port-based classi-
`ﬁcation of applications have become much less accurate (details in
`Section 2).
`This paper presents a signature-based trafﬁc classiﬁcation frame-
`work as a candidate solution for the CoS mapping problem, and
`demonstrates the feasibility and potential of the approach using
`large trafﬁc traces collected from multiple Internet locations. Our
`classiﬁcation method is based on utilizing the statistics of particu-
`lar applications in order to form signatures. By choosing signatures
`that are determined by the way in which an application is used (e.g.
`is it used interactively, or for bulk data transport), we should ob-
`tain signatures that are insensitive to the particular application layer
`protocol. We can therefore perform CoS categorization without de-
`tailed knowledge of the speciﬁc application protocol, or usage case
`(some applications may be used for multiple tasks with different
`QoS requirements). The method would be used off-line to form a
`set of port, or IP address based rules for CoS assignment that would
`then be applied on-line in the QoS implementation. Our evaluations
`indicate that the technique has relatively low error rates. For exam-
`ple, our cross-validation tests using 3-Nearest Neighbor (details in
`Section 5) yield classiﬁcation error of 2:5% and 5:1% respectively
`when the trafﬁc is categorized into three and four classes, using two
`features. We found that even better results (for instance 0.0% errors
`were possible) with three features. The evaluations even showed
`relatively low error rates even for ﬁne grain trafﬁc classes. The
`last suggests that such statistical classiﬁcation techniques may be
`good candidates for identifying even individual applications. Such
`identiﬁcation of the different applications and their associated net-
`work trafﬁc has a number of important usages for network opera-
`tions and management (outside of QoS implementation), including
`application-speciﬁc trafﬁc engineering, capacity planning, provi-
`sioning, and security.
`
`The remainder of this paper is organized as follows. Section 2
`overviews existing techniques for identifying IP trafﬁc and their
`limitations. Section 3 presents a three-phase framework for realiz-
`ing CoS mapping. Section 4 presents our techniques for CoS classi-
`ﬁcation from network trafﬁc. Section 5 presents evaluations of our
`techniques using real trafﬁc traces. Finally, Section 6 concludes the
`paper.
`
`2.
`
`IP TRAFFIC CLASSIFICATION
`One approach commonly used for identifying applications on an
`IP network is to associate the observed trafﬁc (using ﬂow level data,
`or a packet sniffer) with an application based on TCP or UDP port
`numbers. We argue here that this method (described below) is in-
`adequate.
`The TCP/UDP port numbers are divided into three ranges: the
`Well Known Ports (0-1023), the Registered Ports (1024-49,151),
`and the Dynamic and/or Private ports (49,152-65,535). A typi-
`cal TCP connection starts with a SYN/SYN–ACK/ACK handshake
`from a client to a server. The client addresses its initial SYN packet
`to the well known server port of a particular application. The source
`port number of the packet is typically chosen dynamically by the
`client. UDP uses ports similarly to TCP, though without connec-
`tion semantics. All future packets in either a TCP or UDP session
`use the same pair of ports to identify the client and server side of the
`session. Therefore, in principle the TCP or UDP server port num-
`ber can be used to identify the higher layer application, by simply
`identifying which port is the server port and mapping this port to an
`application using the IANA (Internet Assigned Numbers Author-
`ity) list of registered ports [20]. However, port-based application
`classiﬁcation has limitations. First, the mapping from ports to ap-
`plications is not always well deﬁned. For instance,
`
`(cid:15) Many implementations of TCP use client ports in the registered
`port range. This might mistakenly classify the connection as be-
`longing to the application associated with this port. Similarly,
`some applications (e.g. old bind versions), use port numbers
`from the well-known ports to identify the client site of a ses-
`sion.
`(cid:15) Ports are not deﬁned with IANA for all applications, e.g. P2P
`applications such as Napster and Kazaa.
`(cid:15) An application may use ports other than its well-known ports
`to circumvent operating system access control restrictions, e.g.,
`non-privileged users often run WWW servers on ports other
`than port 80, which is restricted to privileged users on most
`operating systems.
`(cid:15) There are some ambiguities in the port registrations, e.g. port
`888 is used for CDDBP (CD Database Protocol) and access-
`builder.
`(cid:15) In some cases server ports are dynamically allocated as needed.
`For example, FTP allows the dynamic negotiation of the server
`port used for the data transfer. This server port is negotiated on
`an initial TCP connection which is established using the well-
`known FTP control port.
`(cid:15) The use of trafﬁc control techniques like ﬁrewalls to block unau-
`thorized, and/or unknown applications from using a network
`has spawned many work-arounds which make port based appli-
`cation authentication harder. For example port 80 is being used
`
`136
`
`Splunk Inc. Exhibit 1034 Page 2
`
`

`by a variety of non-web applications to circumvent ﬁrewalls
`which do not ﬁlter port-80 trafﬁc. In fact available implemen-
`tations of IP over HTTP allow the tunneling of all applications
`through TCP port 80.
`
`(cid:15) Trojans and other security (e.g. DoS) attacks generate a large
`volume of bogus trafﬁc which should not be associated with the
`applications of the port numbers those attacks use.
`
`A second limitation of port-number based classiﬁcation is that
`a port can be used by a single application to transmit trafﬁc with
`different QoS requirements. For example, (i) Lotus Notes transmits
`both email and database transaction trafﬁc over the same ports, and
`(ii) scp (secure copy), a ﬁle transfer protocol, runs over ssh (se-
`cure shell), which is also used interactively on the same port (TCP
`port 22). This use of the same port for trafﬁc requiring different
`QoS requirements is quite legitimate, and yet a good classiﬁcation
`must separate different use cases for the same application. In prac-
`tice, one use case may dominate on a particular VPN (Virtual Pri-
`vate Network), or use cases will have other discriminating factors
`such as the servers’ IP addresses. Thus a clean QoS implementa-
`tion is still possible through augmenting the classiﬁcation rules to
`include IP address-based disambiguation. Server lists exist in some
`networks, but again, in practice these are often incomplete, or a
`single server could be used to support a variety of different types of
`trafﬁc, so we must combine port and IP address rules.
`A possible alternative to port based classiﬁcation is to use a painstak-
`ing process involving installation of packet sniffers and parsing
`packets for application-level information to identify the application
`class of each individual TCP connection or UDP session. However,
`this approach cannot be used with more easily collected ﬂow level
`data, and its collection is computationally expensive, limiting its
`application to lower bandwidth links. Also this approach requires
`precise prior knowledge of applications and their packet formats
`– something that may not always be possible. We use it here to
`identify Kazaa and Gnutella trafﬁc from their application layer pro-
`tocols, but this requires substantial effort for each application, and
`in some cases, each application version. Furthermore, the introduc-
`tion of payload encryption is increasingly limiting our ability to see
`inside packets for this type of information.
`In this paper we explore the potential of signatures derived from
`measured trafﬁc for CoS categorization. In practice, this approach,
`in conjunction with the above techniques and the partial knowledge
`available for most corporate networks can be used to bear on the
`problem of application identiﬁcation and trafﬁc classiﬁcation.
`2.1 Related work
`Previous related work has examined the variation of ﬂow charac-
`teristics according to application. Claffy [10] investigated the joint
`distribution of ﬂow duration and number of packets, and its varia-
`tion with ﬂow parameters such as inter-packet timeout. Differences
`were observed between the support of the distributions of some ap-
`plication protocols, although overlap was clearly present between
`some applications. Most notably, the support of the distribution of
`DNS transactions had almost no overlap with that of other appli-
`cations considered. The use of such distributions as a discrimina-
`tor between different application types was not considered. There
`exists a wealth of other research on characterizing and modeling
`workloads for particular applications, e.g., [22, 31, 4, 2, 9, 34]. An
`
`early work in this space, [29], examines the distributions of ﬂow
`bytes and packets for a number of different applications. Interﬂow
`and intraﬂow statistics are another possible dimension along which
`application types may be distinguished. [30] found that user initi-
`ated events—such as telnet packets within ﬂows or ftp-data
`connection arrivals—can be described well by a Poisson process,
`whereas other connection arrivals deviate considerably from Pois-
`son.
`All these studies assume that one can identify the application
`trafﬁc unambiguously and then obtain statistics for that applica-
`tion. In contrast, we are considering the dual problem of inferring
`the application from the trafﬁc statistics. This type of approach has
`been suggested in very limited contexts such as identifying chat
`trafﬁc [12]. Our work extends this idea while providing a rigor-
`ous mathematical framework for performing classiﬁcation based
`on signatures. Signature-based detection techniques have also been
`explored in the context of network security, attack and anomaly de-
`tection (e.g. [6, 5, 36, 26]) where one typically seeks to ﬁnd a signa-
`ture for an attack. However, we apply our classiﬁcation techniques
`to identify everyday trafﬁc.
`
`3. REALIZING COS MAPPING
`
`The constraints under which CoS mapping must operate are prin-
`cipally related to computational complexity. At high speeds, the
`rules that can be used for this task are rather limited. Typically, one
`can use simple criteria such as port numbers or IP source/destination
`addresses, but not details from the higher layer protocols. Our task
`is to use statistical signature based classiﬁcation (trained on prior
`networks’ trafﬁc, and applied to trafﬁc measurements from the cur-
`rent VPN) to form a set of local rules based on port, or IP addresses,
`which would then be applied on-line for CoS assignment.
`We propose to realize this type of CoS mapping using a three
`stage process.
`
`1. statistics collection,
`
`2. classiﬁcation,
`
`3. rule creation.
`
`The ﬁrst stage — statistics collection — involves placing monitors
`in the network, and collecting appropriate statistics of the trafﬁc
`from certain aggregates. In this case, the aggregates we consider
`are the server port Pi, and server IP address1 Ii of a connection i.
`One could choose these aggregates in a more ﬂexible manner, with
`the proviso that the aggregates used must be easily implementable
`as rules (see below). There are a number of techniques available
`for efﬁciently identifying the interesting aggregates, for instance,
`see [13]. We then form a vector of statistics SC (i) for each con-
`nection i, and use this to update the statistics of each aggregate that
`connection is involved in, for instance statistics SP (Pi) for port
`aggregates, and SI (Ii) for server aggregates.
`
`1Notice that server address are usually statically allocated, rather
`than allocated via DHCP, and so we do not need to worry about
`constantly shifting IP addresses, at least on the time scales of mea-
`surements considered here.
`
`137
`
`Splunk Inc. Exhibit 1034 Page 3
`
`

`We may also ﬁnalize any extant TCP connections at the end of
`data collection in one of two ways: by including them in our statis-
`tics, or excluding them. Either approach biases the results — for
`instance, if we exclude the connections we naturally exclude any
`connections longer than our measurement interval, but if we include
`them we underestimate the duration of the connections. These edge
`effects will be minimized by having a longer data collection inter-
`val, so in this work we propose using one day worth of data, though
`it may be practical to use longer data sets. Ideally, the collection
`intervals for training data should be the same as those for test data,
`so that both data sets are subject to the same biases.
`After statistics collection, unsurprisingly, one has a collection of
`statistics indexed by aggregate (in our case server port, and server
`IP address). The next step is to classify the trafﬁc on each ag-
`gregate. We do so using the classiﬁcation algorithms described in
`Section 4 (or alternative algorithms if these are shown to be more
`accurate), in conjunction with a pre-existing set of training data,
`carefully collected on well understood networks of the type under
`study.
`Once we have completed classiﬁcation we will have associated
`each aggregate with a class. Assuming the classes map directly to
`CoS we can immediately construct the rules required in the QoS
`implementation. That is, assume that aggregate Ai has been deter-
`mined to belong to class Cj , which requires QoS Qk. We now have
`a mapping
`
`Ai ! Cj ! Qk;
`
`which can be instantiated as a rule. For instance, if Ai (cid:17) Pi corre-
`sponds to a particular server port Pi, our rule is
`place traffic to/from port Pi in class Qk.
`One can obviously create a large set of such rules, and in gen-
`eral it might be non-trivial to reduce the size of this rule set to
`something manageable. However, in practice, differential pricing
`between classes means that only trafﬁc speciﬁcally requiring high
`priority should go into high priority classes, and so the majority
`of application will most likely be placed in a lower priority class.
`Hence there are typically only a few classes, and the majority of
`aggregates will go into a default class, so few rules will be needed.
`Once a set of rules has been created, these would then be imple-
`mented on (for example) the access router, which would use them
`to mark the packets appropriately, and place them in appropriate
`queues for forwarding. Thus the classiﬁcation process described so
`far is an off-line process, to be applied before the fact to create a set
`of simple rules that would be used in the actual on-line QoS imple-
`mentation. Note that the error rate of the classiﬁcation algorithm is
`in forming these rules, not in classiﬁcation of packets in the actual
`QoS implementation. This is an important distinction: the error rate
`does result in packets being placed in suboptimal classes, however,
`these packets only pay for the class in which they are placed. This
`is not the same, for example, as paying a business class airfare and
`being bumped to economy. It is simply a case of an administrator
`occasionally booking the wrong class of ticket because he has in-
`correctly categorized the importance of some of his administratees.
`Furthermore, the above process (as described) is automated. How-
`ever, it is desirable for a human to “double-check” the assignment
`rules. The method described here cannot be made 100% foolproof
`simply because a particular network operator may have speciﬁc re-
`quirements which deviate from the norm. A human would be able
`
`For statistics collected on TCP connections, the procedure would
`be:
`
`foreach packet
`if new TCP connection (give it index i++)
`determine the aggregates for i
`server port Pi = dst port of SYN
`server IP address Ii = dst IP of SYN
`...
`initialize a set of statistics SC(i)
`else if part of existing TCP connection i
`update connection statistics SC (i)
`else if end TCP connection i
`update connection statistics SC (i)
`update statistics foreach aggregate
`by server port: SP (Pi)
`by server IP address: SI (Ii)
`
`The update procedure for connections depends on the statistic in
`question. Ideally, we should choose statistics that can be updated
`on-line in a streaming fashion, i.e. recursively. This means that we
`do not need to store data per packet, but rather per connection, for
`instance, assuming an update algorithm like
`
`
`
`
`
`
`
`SCk (i) f (X ij ; SCk (i); (cid:30)(i));
`
`where X i
`j is the measurement for the jth packet in connection i,
`and SC
`k (i) is the kth statistic for connection i, and (cid:30)(i) is some
`(small) set of state information (e.g. the packet number j) for con-
`nection i, then the memory required to store the state depends on
`the number of connections, not number of packets. For example,
`for a series of real-valued data Xj , the following statistics may be
`easily computed recursively:
`
`1. average:
`
`2. variance:
`
`(cid:22)Xj+1 =
`
`1
`j + 1
`
`Xj+1 +
`
`j
`j + 1
`
`(cid:22)Xj ;
`
`var (Xj)+
`
`j
`j (cid:0) 1
`
`(cid:22)X 2
`j (cid:0)
`
`j + 1
`j
`
`(cid:22)X 2
`j+1:
`
`j (cid:0) 1
`j
`
`Xj+1+
`
`1 j
`
`var (Xj+1) =
`
`where (cid:22)Xj and var (Xj ) are the mean and variance, respectively,
`of the ﬁrst j samples of data. However, even for more difﬁcult
`statistics, such as quantiles, there are a number of approximation
`algorithms [18] that can be used to approximate the statistic on-line.
`The variables Xj could represent packet size, or inter-arrival time,
`or other features, and so we can generate a moderately large number
`of statistics even with the limitation of on-line computation. Some
`statistics need only be computed at the start and end of the TCP
`connection — for instance, the duration, which we may compute
`by including the start time of connection i in the state variables
`(cid:30)(i).
`Likewise, it is appealing to be able to update the statistics of each
`aggregate recursively, but this is not necessary, as it is much easier
`to store one set of statistics per connection than per packet. If the
`statistics for each connection are stored, then we could alternatively
`compute the statistics per aggregate off-line, after the data collec-
`tion.
`
`138
`
`Splunk Inc. Exhibit 1034 Page 4
`
`

`to map the classes/rules to known trafﬁc and servers, and ensure
`that the known policy based rules were enforced in priority to those
`derived above. Think of this as our disgruntled business class pas-
`senger updating his administrator’s list of important people, so that
`he receives the air transport he deserves in the future. In essence,
`the technique above is intended to ﬁll in the unknown gaps in the
`best possible way.
`A side beneﬁt of the above approach is that the collected statis-
`tics can be used in other ways. For instance, the trafﬁc volume per
`aggregate could be useful in planning the required capacity for each
`QoS class, or the network in general. In fact, given a set of prices
`per class, the measured volumes, and a set of utilities per applica-
`tion class, one could create the mapping from the application class
`Cj to QoS class Qk through an optimization process using trafﬁc
`data, rather than as a set of ﬁxed rules.
`Once monitors are installed, there is no reason one could not use
`them in an on-going manner. One could continue to make measure-
`ments (as above) of the statistics per aggregate, and if something
`changes signiﬁcantly, then one could change the rules used. An
`example might be the introduction of a new application with differ-
`ent QoS characteristics, requiring a rule update. It is not, however,
`envisioned that these updates would occur often.
`
`4. COS CLASSIFICATION
`We next present the different components in our classiﬁcation
`approach: identifying different application classes, selecting candi-
`date classiﬁcation features, and ﬁnally speciﬁc classiﬁcation meth-
`ods.
`
`4.1 Class Deﬁnitions
`In practice, service differentiation mechanisms like diffserv to-
`day only allow a relatively small number of application classes.
`For our initial study, we focus on the following 4 broad applica-
`tion classes, commonly found in corporate networks.
`
`(cid:15) Interactive: The interactive class contains trafﬁc which is re-
`quired by a user to perform multiple real-time interactions with
`a remote system. This class includes such applications as remote
`login sessions or an interactive Web interface.
`
`(cid:15) Bulk data transfer: The bulk data transfer class contains trafﬁc
`which is required to transfer large data volumes over the net-
`work without any realtime constraints. This class includes ap-
`plications such as FTP, software updates, and music or video
`downloads (say through an application such as Kazaa).
`
`(cid:15) Streaming: The streaming class contains multimedia trafﬁc with
`realtime constraints. This class includes such applications as
`streaming and video conferencing.
`
`(cid:15) Transactional: The transactional class contains trafﬁc which is
`used in a small number of request response pairs which can be
`combined to represent a transaction. DNS, and Oracle transac-
`tions belong to this class.
`
`The choices were motivated by the need to select a small num-
`ber of classes that would be simple, intuitive and still adequately
`represent the different QoS requirements of commonly used appli-
`cations. We view these 4 classes as a starting point, with the actual
`choice of application classes being a topic for research.
`
`To characterize each application class we need a reference data
`set for each class from which we can extract a set of representative
`features. Acquiring such a reference data set is made difﬁcult by
`precisely the problems described in Section 2. Selecting the net-
`work trafﬁc based on port numbers may not yield reliable statistics
`that are representative of any particular class, however, to classify
`them otherwise requires a reference data set. To break this circular
`dependency we selected some applications per class, which based
`on their typical use, have a low likelihood of being contaminated
`by trafﬁc belonging to another application class. In particular we
`focus on applications which:
`
`(cid:15) are clearly within one class (to avoid mixing the statistics
`from two classes);
`
`(cid:15) are widely used, so as to assure we get a good data-set;
`(cid:15) have server ports in the well known port range to reduce the
`chance of mis-use of these ports.
`
`These reference applications will then be used to estimate a num-
`ber of statistics, from which we will select features for use in CoS
`categorization. Based on the criterion established in the previous
`section, the reference applications selected for each application class
`are:
`
`(cid:15) Interactive: Telnet,
`(cid:15) Bulk data: FTP-data, Kazaa,
`(cid:15) Streaming: RealMedia streaming,
`(cid:15) Transactional: DNS, HTTPS.
`
`We include HTTPS in the transactional class, because a large
`proportion of HTTPS interactions involve users ﬁlling out a form,
`for instance to conduct secure credit card purchases over the WWW.
`In our current study, we do not use web trafﬁc to train the classi-
`ﬁcation. However we do include WWW trafﬁc statistics (as cap-
`tured using port 80) in some plots as an example application, as it
`provides some interesting intuition into the results. Also, although
`Kazaa does not have registered ports, in particular data sets we have
`reliable ways of identifying Kazaa trafﬁc (as describe below).
`4.2 Candidate Features
`The list of possible features one could consider is very large. We
`can broadly classify these into ﬁve categories:
`1. Simple packet-level features such as mean packet size and
`various moments such as variance, RMS (root mean square) size,
`etc., are simple to compute, and can be gleaned directly from packet-
`level information. They offer a characterization of the application
`that is independent of the notion of ﬂows, connections or other
`higher level aggregations. Another advantage is that packet-level
`sampling, which is widely used in network data collection, has lit-
`tle impact on these statistics. Other statistics that can be derived
`from simple packet data are time series, from which we could de-
`rive a number of features, for instance relating to correlations over
`time (e.g. parameters of long-range dependence such as the Hurst
`parameter). An example of this type of classiﬁcation can be seen in
`[23], where the authors use time-of-day trafﬁc proﬁles to categorize
`web sites.
`2. Flow-level statistics are summary statistics at the grain of net-
`work ﬂows. A ﬂow is deﬁned to be an unidirectional sequence
`
`139
`
`Splunk Inc. Exhibit 1034 Page 5
`
`

`set of general rules for classifying future data on the basis of a
`feature vector. Each training data point consists of a pair Xj 2
`IRM ; Gj 2 f1; : : : ; Kg, where Xj is the feature vector, and Gj is
`the class of the jth data point, and our rule should provide a map-
`ping ^G : IRM ! f1; : : : ; Kg.
`One very simple method of classiﬁcation is NN. In this classiﬁ-
`cation method we assign a new data point to the class of its nearest
`(in this paper we shall use Euclidean distance as the metric for dis-
`tance) neighbor from the training data. That is we take ^G(X) to
`be the class Gi of the data point Xi which minimizes the distance
`jjXi (cid:0) Xjj. NN methods can be generalized to k-NN (where the
`k nearest neighbors essentially ‘vote’ on the class of the observa-
`tion), to enhance its robustness. k-NN methods are generally very
`good on low-dimensional data (small M ), but are less effective on
`high-dimensional data, and give little insight into the most useful
`features.
`An alternative set of approaches can be drawn from statistical
`decision theory, in which we choose ^G(X) = Gi if
`
`Pr(Gijx = X) = max
`g2f1;

This document is available on Docket Alarm but you must sign up to view it.

Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

Up-to-date information for this case.
Email alerts whenever there is an update.
Full text search for other cases.
Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.

Access Government Site

We are redirecting you
to a mobile optimized page.

Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket

Supplemental Search

Search for PTAB Motions

PTAB Analytics

TTAB Analytics

Basic Search

Filters

Party Search

Advanced

Selected Courts

Recently Selected Courts

Find PTAB Decisions

PTAB Analytics

Special PTAB Alerts

Orange Book

Directly Search Federal Courts

Search Trademark ...

This document is available on Docket Alarm but you must sign up to view it.

Accessing this document will incur an additional charge of $.

Still Working On It

A few More Minutes ... Still Working

This document could not be displayed.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

One Moment Please

Your document is on its way!

Sealed Document

We are redirecting youto a mobile optimized page.

Document Unreadable or Corrupt

We are unable to display this document.

STEP 2 of 2

Choose your membership type

Flat-Fee

Pay-As-You-Go

Add your payment information

Login or Join

Enter your corporate Email

Thousands of your peers are saving time and gaining a competitive advantage with Docket Alarm.

Join Docket Alarm to perform smarter legal research.

Download this document and millions of others instantly with a Docket Alarm membership.

Join Docket Alarm and start performing smarter legal research.

Start tracking this docket instantly with a Docket Alarm membership.

Join thousands of your peers and start performing smarter legal research.

STEP 1 of 2

Millions of Documents | 15 Seconds to Signup

Hi !

Welcome to Docket Alarm

Welcome to Docket Alarm!

Explore Litigation Insights andManage Your Cases

Reset Password

What is PACER?

Why do I need it?

What will I be charged?

Do other courts have fees?

Basic Free Access

Welcome

Thank you

Check Firm Account

We are redirecting you
to a mobile optimized page.

Explore Litigation Insights and
Manage Your Cases