`
`(12) United States Patent
`Wei et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 8,693,348 B1
`Apr. 8, 2014
`
`(54) SYSTEMS AND METHODS FOR CONTENT
`TYPE CLASSIFICATION
`
`(71) Applicant: Fortinet, Inc., Sunnyvale, CA (US)
`(72) Inventors: Shaohong Wei, Sunnyvale, CA (US);
`Zhong Qiang Chen, Sunnyvale, CA
`(US); Ping Ng, Milpitas, CA (US); Gang
`Duan, San Jose, CA (US)
`(73) Assignee: Fortinet, Inc., Sunnnyvale, CA (US)
`(*) Notice:
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`(21) Appl. No.: 14/087,847
`
`(22) Filed:
`
`Nov. 22, 2013
`Related U.S. Application Data
`(63) Continuation of application No. 13/795.283, filed on
`Mar. 12, 2013, which is a continuation of application
`No. 13/409,141, filed on Mar. 1, 2012, now Pat. No.
`8,639,752, which is a continuation of application No.
`12/503,100, filed on Jul. 15, 2009, now Pat. No.
`8.204,933, which is a continuation of application No.
`1 1/357,654, filed on Feb. 16, 2006, now Pat. No.
`7,580,974.
`
`(2006.01)
`
`(51) Int. Cl.
`H04L 2/26
`(52) U.S. Cl.
`CPC ...................................... H04L 43/18 (2013.01)
`USPC .......................................................... 37Of241
`(58) Field of Classification Search
`None
`See application file for complete search history.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,361,379 A
`6,157,955 A
`
`11, 1994 White
`12/2000 Naradet al.
`
`8, 2003 Nichols
`6,608,816 B1
`6,665,725 B1 12/2003 Dietz et al.
`7,006,502 B2
`2/2006 Lin
`7,082,102 B1
`7/2006 Wright
`7,095,715 B2
`8/2006 Buckman et al.
`7,242,681 B1* 7/2007 Van Bokkelen et al. ...... 370,389
`7,420,992 B1
`9/2008 Fang et al.
`7,580,974 B2
`8, 2009 Wei et al.
`7,945,522 B2
`5/2011 Mcgovernet al.
`8,204,933 B2
`6, 2012 Wei et al.
`8,639,752 B2
`1/2014 Wei et al.
`2003, OO12147 A1
`1/2003 Buckman et al.
`2004.000293.0 A1
`1/2004 Oliver et al.
`2004/026.1016 A1 12/2004 Glass et al.
`2005/024925 A1* 11/2005 Yoon et al. .................... 370,252
`2006/01 12043 A1
`5, 2006 Oliver et al.
`2006/0229902 A1 10/2006 Mcgovernet al.
`2006/0239273 A1 10, 2006 Buckman et al.
`2007,0192481 A1
`8, 2007 Wei et al.
`2008/0052326 A1
`2/2008 Evanchik et al.
`2009,0268617 A1 10, 2009 Wei et al.
`(Continued)
`
`OTHER PUBLICATIONS
`
`“U.S. Appl. No. 1 1/357,654, 312 Amendment filed Jun. 23, 2009, 5
`pg.S.
`
`(Continued)
`
`Primary Examiner — Kevin C Harper
`(74) Attorney, Agent, or Firm — Schwegman Lundberg &
`Woessner, P.A.
`
`(57)
`
`ABSTRACT
`
`Various embodiments illustrated and described herein
`include systems, methods and Software for content type clas
`sification. Some such embodiments include determining a
`potential State of classification for packets associated with a
`session based at least in part on a packet associated with the
`session that is a packet other than the first packet of the
`session.
`
`20 Claims, 7 Drawing Sheets
`
`TIME
`
`
`
`SESSION S1
`
`SESSION S1
`
`P21 STATE ASE
`STATE AED
`STATE SED
`
`P11 stility
`station
`P13 STATE ISD
`
`P14 STATE ASE
`
`P15 STATE ASED
`
`Splunk Inc. Exhibit 1046 Page 1
`
`
`
`US 8,693,348 B1
`Page 2
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`2012,0023557 A1*
`2012fO163186 A1
`2013,0258863 A1
`
`1/2012 Bevan et al. ...................... T26/4
`6, 2012 Wei et al.
`10/2013 Wei et al.
`
`OTHER PUBLICATIONS
`
`“U.S. Appl. No. 1 1/357,654, Non-Final Office Action mailed Sep. 18,
`2008”, 10 pgs.
`“U.S. Appl. No. 1 1/357,654, Notice of Allowance mailed Mar. 23,
`2009”. 10 pgs.
`“U.S. Appl. No. 1 1/357,654, Response filed Dec. 18, 2008 to Non
`Final Office Action mailed Sep. 18, 2008”, 8 pgs.
`“U.S. Appl. No. 1 1/357,654. Response to Rule 312 Communication
`mailed Jul. 23, 2009', 2 pgs.
`“U.S. Appl. No. 12/503,100, Final Office Action mailed Jun. 9,
`2011”, 12 pgs.
`
`“U.S. Appl. No. 12/503,100, Non-Final Office Action mailed Oct. 18,
`2010”. 12 pgs.
`“U.S. Appl. No. 12/503,100, Notice of Allowance mailed Feb. 17,
`2012, 13 pgs.
`“U.S. Appl. No. 12/503,100, Response Filed Feb. 9, 2012 to Final
`Office Action Jun. 9, 2011”, 8 pgs.
`“U.S. Appl. No. 12/503,100, Response filed Mar. 18, 2011 to Non
`Final Office Action mailed Oct. 18, 2010”. 10 pgs.
`“U.S. Appl. No. 12/503,100, Supplemental Notice of Allowability
`mailed Apr. 25, 2012', 9 pgs.
`“U.S. Appl. No. 13/409,141, Non Final Office Action mailed Aug. 8,
`2013”, 6 pgs.
`“U.S. Appl. No. 13/409,141. Notice of Allowance mailed Sep. 18.
`2013, 9 pgs.
`“U.S. Appl. No. 13/409,141, Response filed Sep. 3, 2013 to Office
`Action mailed Aug. 8, 2013, 5 pgs.
`"Application U.S. Appl. No. 13/409,141, Supplemental Notice of
`Allowability mailed Dec. 20, 2013, 5 pgs.
`* cited by examiner
`
`Splunk Inc. Exhibit 1046 Page 2
`
`
`
`U.S. Patent
`
`Apr. 8, 2014
`
`Sheet 1 of 7
`
`US 8,693,348 B1
`
`W N
`
`(2.
`
`A.
`
`A.
`
`DATA
`- - - - CLASSIFICATION
`MODULE
`
`- - - -
`
`22,
`
`- - - - -
`
`-26)
`
`-
`
`
`
`
`
`
`
`
`
`DETERMINE
`WHETHERSESSION HAS
`BEEN CLASSIFIED
`
`
`
`2(6
`CLASSIFYPACKETTO
`BETHE SAMETYPEAS
`THAT FORTHESESSION
`
`TYPE'UNKNOWN
`
`
`
`
`
`CLASSIFYPACKET
`AS THE
`DETERMINEDTYPE
`
`Splunk Inc. Exhibit 1046 Page 3
`
`
`
`
`
`
`
`
`
`
`
`adi’ITaLdurexs
`
`
`
`
`
`BdAYSsysiem(S'd)Bulsn|azZls~iayseduoLIDeVIp ¥Digfes2 AXOudquod
`
`U.S. Patent
`
`Apr. 8, 2014
`
`Sheet 2 of 7
`
`
`
`umoujun—ss=*=~«SSSmHE/OBT|)CUtSOTaumouunsss=—=«iSHHT/OBT|OE-BZCOSZa
`
`
`
`
`
`
`
`
`
`(UALJUOD)SBAoutpi/08i¥TS<--D€
`
`VéOld
`
`
`
`
`
`
`
`adAyssyoiew(sd)AxosdGulsnodazEs~IaxIedUoLDeVIpFf
`
`PL4FeI2ddL22BLdurex;
`
`
`
`umoujun—s=—“‘ésON#CO!*#*~«SR!”~*~*~<«*dEMBZOCSFOOOunouyunsi”:—=iC(sé‘iaSCOCORS
`
`SaAOuog¥T3«--§¥
`
`
`
`umoujunS””~*~—<—tS:S:C“‘i*é‘“‘*SSCO*~“‘C;CSCOC*‘“‘S===D~OCST
`
`déOld
`
`
`
`‘addssypaewGSDuiaazed=AxoudGuisn30dazis~aayeduolvadip gDELPBUDddL:€a,duexy
`
`
`
`
`
`US 8,693,348 B1
`
`aumowunzoBhOOTOCONTI=ONESOZ
`
`
`
`
`
`
`
`umouyun[PZ00TO£0TO908]“OuEPPeZ$<-->T
`
`
`
`(WsLpUuoD)SBAV/NoupreTS$<-->¢€
`
`OEOl
`
`Splunk Inc.
`
`Exhibit 1046
`
`Page 4
`
`Splunk Inc. Exhibit 1046 Page 4
`
`
`
`U.S. Patent
`
`Apr. 8, 2014
`
`Sheet 3 of 7
`
`US 8,693,348 B1
`
`JILELSSE{OBy.‘yeyoed
`
`WUMOUAUN,,SUINIeU
`
`
`
`dé‘Old
`
`
`
`
`EBT"SST9TZ2(S)4dAUSSTHEEORTS“B9T°Z6T_27(3)JuetLLDfddL+LosejoudCooued‘s‘dj}uatpissepotieagzzedUOLIDBIGLp=#COB'ZPT
`
`
`
`
`JOR,OUSURRthedheiTEJOFLRSBOaeAceceealihteaneceeeOElneeOOeeSOA£0X0/ZOX0yz0T<OT<3I<--52PMmeORmeNOTRTOETPNFOSODFLekNAAceedcnrenceideaeARJOU.
`umouunZ0X0yz0T<Le/etS<->TOPASRalaieIASOiSeSeatietEABidtslnSieaShSliileiititaSdHsiMeOOFSOEUEOEeetOeeehneeomeoeaMlA
`
`
`
`
`
`
`
`adAyssy>jeuw(s‘d)(@lAqpagjuseazed=juodszis™yaxoedUoLIDeuLD=#SILgFe4Rddipa_duexg
`
`
`
`
`
`
`
`
`adA}}uUaUODBUL“pRUWULZUODwNUUBA+LOSWAIQ<--$2
`
`
`
`BLEPLBURSFOOYeASLUOLSsasAaqunu™aA+[OSWA]S<--~3}fT
`OPLSUSAIASYNOYLLM*JGABMOH3892usp~24d+
`
`
`
`
`
`
`ogy4odyGnouyy‘uLbopooyeapeuuou:¢a_durexg
`
`jOOYRASBpeyueMSi
`
`qeOld
`
`Splunk Inc.
`
`Exhibit 1046
`
`Page5
`
`Splunk Inc. Exhibit 1046 Page 5
`
`
`
`U.S. Patent
`
`Apr. 8, 2014
`
`Sheet 4 of 7
`
`US 8,693,348 B1
`
`Splunk Inc. Exhibit 1046 Page 6
`
`
`
`U.S. Patent
`
`Apr. 8, 2014
`
`Sheet 5 Of 7
`
`US 8,693,348 B1
`
`
`
`
`
`?ººd uol? ? gerð ‘ „Mudux?uri, gungº u
`
`K? d?l da auas
`
`83 ep
`
`Splunk Inc. Exhibit 1046 Page 7
`
`
`
`U.S. Patent
`U.S. Patent
`
`Apr.8, 2014
`
`Sheet 6 of 7
`
`US 8,693,348 B1
`US 8,693,348 B1
`
`
`
`
`
`
`
`
`
`
`
`CO808GEPSTERBTZ2(S)4eAueS“YEGLIOBTS“B9T°ZET*(O)AUBLLDtdan«LoD02o8dTLosej0ud
`
`
`
`
`
`ABAUBS@YLUOJTLEMPureUMOUAUN,
`
`
`payieuaq4,imadA}yuaqUODayy
`
`BLEPLDURDUeIUOIILGSLUOLSsSas
`
`aysquemuinbe.yybue,ysjewJT
`
`
`
`‘SRA,UINIOULLLMWaLpLsse1o
`‘yaxyoedYNOULLMwasd*UBAOMOH
`
`
`
`
`(lususoqig‘s‘djuatsissels
`UINeu“peHULJuOduGLoseEs
`
`
`UINIedLLLMJaLLLSSeLosth
`
`Jaedwou,ApdasBued1349
`
`SEPayxIPlEUOLSSaS*SA,
`
`Huned‘ayeplpuesyezey
`
`
`“WaLjuos0}Ajdeused
`
`JO;GULTZLEM*MuUOUUT,,
`
`Cveren‘s'dyyelgessepo
`ywezeyedAyjueRuaS
`
`
`"WiLJUODG2LexDed
`
`
`
`
`"343JOx\\‘67x\“GExX\,Bqepl_NooadARSUSfayo6u
`saysesLdan40g:@@|dwexg
`JOWRLAIOM-QaU+yDay2
`ua}SURU-yIOMIaU+
`VWROTOTLELZTVOOXO=fd<-~-TdOT
`
`UB,JxABedASBLTd<-~gdZ
`
`
`uieIjedszZis~3yduoL{BouLp¢(S$<--3)FeypedAsus
`wisiedUOLZIBULP=¥
`
`WUSIIOIEGSe
`
`He
`OH
`
`
`
`
`
`
`
`
`
`"BTLTOCPS'S£°991CS)ABAUSS'JESTOST“S*S9L'Z6T1(5)BueLL>taonipopeqoud+6aduexg
`
`sodAsus+[e0G000Zz]2Lfd<--TdTRrENneSONceRRAEMiooNeneonSeAneTeeeeeSREAEYAeaIneteeyFeeeneReeerSeeeeane
`
`7WaYyDadAyDue+100GO00az!e/uTd<-~Zd2areaeeneneeeneeeeeem2OGUROAWRTSAROLAYSRABaSANaiea
`
`CuVEZEX,,
`
`CnVEZEH
`
`léOf
`
`Splunk Inc.
`
`Exhibit 1046
`
`Page 8
`
`Splunk Inc. Exhibit 1046 Page 8
`
`
`
`U.S. Patent
`
`Apr. 8, 2014
`
`Sheet 7 Of 7
`
`US 8,693,348 B1
`
`TIME
`
`|
`
`SESSION S1
`
`SESSION S1
`
`P11 st; y P21 STATE LSFD
`P12 stily
`P22 STATE ASE
`
`P13 STATE t SIFIED
`
`P23 STATE AED
`
`P14 STATE t SIFIED
`
`P15 STATE t SIFIED
`
`52
`
`DISPLAY
`
`
`
`
`
`
`
`CURSER
`CONTROL
`
`56
`
`5
`
`5)
`
`526
`EQUIPMENT
`
`522
`
`COMMUNICATIONS
`INTERFACE
`
`Splunk Inc. Exhibit 1046 Page 9
`
`
`
`US 8,693,348 B1
`
`1.
`SYSTEMS AND METHODS FOR CONTENT
`TYPE CLASSIFICATION
`
`RELATED APPLICATIONS
`
`This application is a Continuation of U.S. application Ser.
`No. 13/795,283, filed Mar. 12, 2013; which was a Continua
`tion of U.S. application Ser. No. 13/409,141, filed Mar. 1,
`2012; which was a Continuation of U.S. application Ser. No.
`12/503,100, filed Jul. 15, 2009, and issued as U.S. Pat. No.
`8,204,933 on Jun. 19, 2012; and which was a Continuation of
`U.S. application Ser. No. 1 1/357,654, filed Feb. 16, 2006, and
`issued as U.S. Pat. No. 7,580,974 on Aug. 25, 2009, to each of
`which priority is claimed and each of which are incorporated
`herein by reference in their entirety.
`
`10
`
`15
`
`FIELD
`
`The field of the invention relates to computer systems and
`computer networks, and more particularly, to systems and
`methods for classifying content of computer and network
`traffic.
`
`BACKGROUND
`
`2
`second packet, and determining a content type for the second
`packet based at least in part on the determined State. As used
`in this specification, the term “first packet' refers to any one
`of the packets in a session (e.g., it can be the first, second,
`third, fourth, etc. packet in a session), and does not necessar
`ily mean the very first packet in a session (although it could be
`used to refer to the very first packet in a session). Similarly, as
`used in this specification, the term “second packet' refers to
`any one of the packets in a session (e.g., it can be the first,
`second, third, fourth, etc. packet in a session) that is different
`from the first packet.
`In accordance with other embodiments, a computer prod
`uct includes a computer-readable medium, the computer
`readable medium having a set of stored instructions, an
`execution of which causes a process to be performed, the
`process includes receiving a first packet, determining a state
`of classification for the first packet or for a session with which
`the first packet is associated, receiving a second packet, and
`determining a content type for the second packet based at
`least in part on the determined State.
`In accordance with other embodiments, a system for deter
`mining a type of content includes means for receiving a first
`packet and a second packet, means for determining a state of
`classification for the first packet or for a session with which
`the first packet is associated, and means for determining a
`content type for the second packet based at least in part on the
`determined State.
`In accordance with other embodiments, a method for deter
`mining a type of content includes receiving a packet associ
`ated with a session, determining whether a content type has
`been determined for the session or for an other packet asso
`ciated with the session, and classifying the packet to be the
`content type based at least in part on a result from the act of
`determining.
`In accordance with other embodiments, a computer prod
`uct includes a computer-readable medium, the computer
`readable medium having a set of stored instructions, an
`execution of which causes a process to be performed, the
`process includes receiving a packet associated with a session,
`determining whether a content type has been determined for
`the session or for an other packet associated with the session,
`and classifying the packet to be the content type based at least
`in part on a result from the act of determining.
`In accordance with other embodiments, a system for deter
`mining a type of content includes means for receiving a
`packet associated with a session, means for determining
`whether a content type has been determined for the session or
`for an other packet associated with the session, and means for
`classifying the packet to be the content type based at least in
`part on a result from the act of determining.
`In accordance with other embodiments, a method for deter
`mining a type of content includes receiving a first packet from
`a first port, the first port adapted for receiving at least two
`types of content, and determining a content type for the first
`packet or for a session with which the first packet is associ
`ated.
`In accordance with other embodiments, a computer prod
`uct includes a computer-readable medium, the computer
`readable medium having a set of stored instructions, an
`execution of which causes a process to be performed, the
`process includes receiving a first packet from a first port, the
`first port adapted for receiving at least two types of content,
`and determining a content type for the first packet or for a
`session with which the first packet is associated.
`In accordance with other embodiments, a system for deter
`mining a type of content includes means for receiving a first
`packet from a first port, the first port adapted for receiving at
`
`Many data processing systems require a content type of
`data to be determined before the data can be further pro
`cessed. For example, in malicious content detection systems,
`Such as anti-virus systems and anti-spam systems, a received
`data generally needs to be classified before it can be scanned
`for malicious content. Intrusion detection/prevention sys
`tems, application-based traffic shaping devices or load bal
`ancers, IM proxies, and application accelerators may also
`require data to be classified. If the data is classified to be a
`skype data, then a content detection module may apply a set
`of algorithms to scan the data for malicious content. On the
`other hand, if the data is classified to be a bittorrent data, then
`the content detection module may apply a different set of
`algorithms to scan the data for malicious content. As such,
`determining content type of data is an important step before
`the data is scanned.
`Existing systems determine content type by using port
`number of a port at which data is transmitted. For example,
`well-known port for HTTP protocol is “80,' well-known port
`for SMTP protocol is “25, and well-known port for POP3
`protocol is “110. In Such systems, data belonging to a certain
`type is transmitted to a dedicated port. As such, by determin
`ing the port number of the port at which data is transmitted,
`and knowing the content type that is associated with the port
`number, a system can determine the content type for the data.
`However, use of a port to transmit only one type of data is
`restrictive. Sometimes, it may be desirable to allow a port to
`transmit more than one type of data. Existing systems do not
`allow a content type to be determined if data is transmitted
`through a port that is not data type specific (i.e., port that is
`allowed to transmit more than one type of data).
`Also, some type of data, such as IM data and P2P data, may
`not go to any specific port, and can be transmitted through
`different ports. In such cases, existing systems may not be
`able to classify IM data and P2P data using port number.
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`SUMMARY
`
`Inaccordance with some embodiments, a method for deter
`mining a type of content includes receiving a first packet,
`determining a state of classification for the first packet or for
`a session with which the first packet is associated, receiving a
`
`65
`
`Splunk Inc. Exhibit 1046 Page 10
`
`
`
`3
`least two types of content, and means for determining a con
`tent type for the first packet or for a session with which the
`first packet is associated.
`In accordance with other embodiments, a method for deter
`mining a type of content includes receiving a packet associ
`ated with a session, and determining a state of classification
`for the packet or the session.
`In accordance with other embodiments, a computer prod
`uct includes a computer-readable medium, the computer
`readable medium having a set of stored instructions, an
`execution of which causes a process to be performed, the
`process includes receiving a packet associated with a session,
`and determining a state of classification for the packet or the
`session.
`In accordance with other embodiments, a system for deter
`mining a type of content includes means for receiving a
`packet associated with a session, and means for determining
`a state of classification for the packet or the session.
`Other aspects and features will be evident from reading the
`following detailed description of the preferred embodiments,
`which are intended to illustrate, not limit, the invention.
`
`10
`
`15
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The drawings illustrate the design and utility of various
`embodiments, in which similar elements are referred to by
`common reference numerals. More particular descriptions
`will be rendered by reference to specific embodiments, which
`are illustrated in the accompanying drawings. Understanding
`that these drawings are not to be considered limiting in Scope,
`the embodiments will be described and explained with addi
`tional specificity and detail through the use of the accompa
`nying figures.
`FIG. 1 illustrates a block diagram representing a system
`that includes a module for classifying data in accordance with
`Some embodiments;
`FIG. 2 illustrates a method for classifying data in accor
`dance with Some embodiments;
`FIGS. 3A-3I illustrate examples of criteria that may be
`used to determine content type in accordance with some
`embodiments;
`FIG. 4 illustrates examples of results obtained using the
`method of FIG.2 inaccordance with some embodiments; and
`FIG. 5 is a diagram of a computer hardware system with
`which embodiments described herein can be implemented.
`
`25
`
`30
`
`35
`
`40
`
`45
`
`DETAILED DESCRIPTION
`
`Various embodiments are described hereinafter with refer
`ence to the figures. It should be noted that the figures are not
`drawn to Scale and that elements of similar structures or
`functions are represented by like reference numerals through
`out the figures. It should also be noted that the figures are only
`intended to facilitate the description of specific embodiments,
`and are not intended as an exhaustive description of the inven
`tion, or as a limitation on the scope of the invention. In
`addition, an illustrated embodiment need not have all the
`aspects or advantages of the invention shown. An aspect oran
`advantage described in conjunction with a particular embodi
`ment is not necessarily limited to that embodiment and can be
`practiced in any other embodiments even if not so illustrated.
`FIG. 1 illustrates a block diagram of a system 100, which
`includes a data classification module 110 for classifying data
`into content type in accordance with some embodiments.
`Sender 102 transmits data associated with network traffic
`content to module 110. Module 110 receives the transmitted
`data, determines a type of content to which the network traffic
`
`50
`
`55
`
`60
`
`65
`
`US 8,693,348 B1
`
`4
`data belongs (i.e., classifies the data), and causes a result, Such
`as a message, to be sent to a receiver 104. The message sent to
`receiver 104 notifies the receiver 104 that the received data
`belongs to a content type. As used in this specification, the
`term “receiver should not be limited to a human receiver, and
`can include a server or other types of devices that can receive
`information. For example, in Some embodiments, the receiver
`104 can be a malicious content detection module. Such as an
`anti-virus module, which detects malicious content based on
`a content type as determined by the module 110. Also, as used
`in this specification, the term "sender should not be limited
`to a human sender, and can include a server or other types of
`devices that can transmit information.
`In some embodiments, module 110 can be implemented
`using software. For example, module 110 can be imple
`mented using software that is loaded onto a user's computer,
`a server, or other types of memory, Such as a disk or a CD
`ROM. In some cases, module 110 can be implemented as web
`applications. In alternative embodiments, module 110 can be
`implemented using hardware. For example, in Some embodi
`ments, module 110 includes an application-specific inte
`grated circuit (ASIC). Such as a semi-custom ASIC processor
`or a programmable ASIC processor. ASICs, such as those
`described in Application-Specific Integrated Circuits by
`Michael J. S. Smith, Addison-Wesley Pub Co. (1st Edition,
`June 1997), are well known in the art of circuit design, and
`therefore will not be described in further detail herein. In
`other embodiments, module 110 can also be any of a variety
`of circuits or devices that are capable of performing the func
`tions described herein. For example, in alternative embodi
`ments, module 110 can include a general purpose processor,
`Such as a Pentium processor. In other embodiments, module
`110 can be implemented using a combination of software and
`hardware. In some embodiments, module 110 may be imple
`mented as a firewall, a component of a firewall, or a compo
`nent that is configured to be coupled to a firewall.
`FIG. 2 illustrates a method 200 for classifying data in
`accordance with some embodiments. First, module 110
`receives network traffic data in a form of a packet (Step 202).
`Next, module 110 determines a session S for the received
`packet (Step 204). A session is an interaction or a series of
`interactions between two communication end points. Various
`techniques may be used to determine a session. For example,
`module 110 can be configured to determine one or more of a
`Source IP address, a destination IP address, a source port, a
`destination port, and a protocol, to thereby determine a ses
`sion S for the received packet. Techniques for determining a
`session are known in the art, and will not be described in
`details.
`Next, module 110 determines whether a content type has
`already been determined for the session S (and therefore, for
`the packet associated with the session S) (Step 205). In the
`illustrated embodiments, each session (e.g., session S) being
`processed by module 110 is automatically assigned an initial
`state of classification, “unknown (i.e., the content type is
`initially determined as “unknown”). In Such cases, in step
`205, module 110 determines whether a content type other
`than “unknown has been determined for session S. If a
`content type other than “unknown type has already been
`determined for the session S, then module 110 classifies the
`packet to be the same type as that for the classified session S.
`and the method returns to step 202 to process additional
`packet, if any (Step 206). In some embodiments, the classi
`fying of the packet (determining the content type for the
`packet) may be implemented by associating the packet with
`the classified session S.
`
`Splunk Inc. Exhibit 1046 Page 11
`
`
`
`US 8,693,348 B1
`
`5
`On the other hand, if a content type (other than “unknown
`type) has not been previously determined for the session S.
`module 110 then analyzes the received packet to attempt to
`determine a content type (Step 207). If a content type (other
`than “unknown type) is determined for the received packet,
`module 110 then classifies the received packet (or its associ
`ated session S) as having the determined content type (Step
`208). If there is an additional packet, module 110 then
`receives the additional packet, and repeats the process 200 to
`process the additional packet.
`Alternatively, if after step 207, the content type remains
`“unknown” (e.g., because the analysis of the packet provides
`an inconclusive result), then module 110 receives additional
`packet that is associated with the same session S, and ana
`lyzes the additional packet to attempt to determine a content
`type for data being transmitted in the session S (repeating
`Steps 202-207, or Steps 202 and 207), until a content type
`other than “unknown is determined for the session S.
`As shown in the above embodiments, module 110 receives
`packet(s) in step 202, analyzes the packet(s) in step 207, and
`repeats these two steps until it determines a content type for
`the session in which the packet(s) is being transmitted.
`Examples of content type that may be determined by module
`110 includes skype, gnutella, kazaa, edonkey, bittorrent, aim,
`yahoo, msn., ico, qq. http, Smtp. pop3, imap, ftp, bo2k, bo, t?in,
`tfn2k, Idap, radius, ms, rpc, Snmp, mSSql, mysql, and oracle.
`Various techniques may be used to analyze received packet(s)
`to determine a content type for the packet(s) being transmitted
`in a session. For example, module 110 may be configured to
`examine one or more characteristics of a packet, Such as a
`packet size, a port number of a port from which the packet is
`received, whether a proxy is used to transmit the packet, a
`direction in which the packet travels (e.g., from client to
`server, or from server to client), a string pattern, order of
`packets, and/or other protocol characteristics.
`Several examples of techniques for analyzing packets to
`determine content type will now be discussed. However, it
`should be understood that module 110 is not limited to using
`the examples oftechniques described herein, and that module
`110 can use other algorithms, techniques, and criteria to per
`form the functions described herein. FIG. 3A illustrates an
`example of criteria that may be used to determine whether
`data transmitted at a normal port is skype data. As shown in
`the example, module 110 is configured to examine the first
`packet that is transmitted from client to server, and determine
`whether the first packet has a prescribed packet size (in the
`example, prescribed packet size=14). If the payload size
`matches the prescribed packet size, then module 110 deter
`mines that the session is a candidate of skype type. However,
`module 110 still classifies the session as “unknown” because
`the result is inconclusive. The module 110 then determines
`the payload size of a second packet from server to client, and
`determines whether the size satisfies the prescribed criteria
`(in the example, the prescribed size criteria is 28-36). If there
`is no match (i.e., the size of the second packet does not match
`the prescribed size criteria), then module 110 determines that
`the session is not a skype type, and any further packets
`received in the same session would not be considered as a
`skype type. However, if there is a match, the module 110 still
`determines that the session is a candidate of skype type, and
`maintains the state of classification as “unknown.” Module
`110 next determines the payload size of a third packet from
`client to server, and determines whether the packet size
`matches the prescribed size criteria (in the example, the pre
`scribed size criteria=14). If there is a match, then module 110
`determines that data transmitted in the session are skype data.
`
`40
`
`45
`
`6
`FIG. 3B illustrates an example of criteria that may be used
`to determine whether data transmitted at a http port is skype
`data. FIG. 3C illustrates an example of criteria that may be
`used to determine whether data transmitted at a ssl port is
`skype data. FIG. 3D illustrates an example of criteria that may
`be used to determine whether UDP traffic data is skype data.
`FIG.3E illustrates an example of criteria that may be used
`to identify Yahoo! messenger traffic (through normal Yahoo
`login). In such cases, instead of determining a size of the
`packet, module 110 is configured to examine the content and
`string pattern. As shown in the example, two packets are used
`to determine that the traffic data is Yahoo! messenger traffic.
`FIG.3F illustrates an example of criteria that may be used to
`identify Yahoo! messenger traffic (through http proxy). As
`shown in the example, three packets are used to determine
`that the traffic data is Yahoo! messenger traffic.
`FIG. 3G illustrates an example of criteria that may be used
`to identify msn messenger. In such cases, module 110 is
`configured to examine payload size and pattern in the pay
`load. As shown in the example, six packets are used to deter
`mine that the traffic data is msn messenger traffic.
`FIG.3H illustrates an example of criteria that may be used
`to identify bittorrent data. As shown in the example, two
`packets are used to determine that the traffic data is bittorrent
`data.
`FIG.3I illustrates an example of criteria that may be used
`to identify kazaa data. As shown in the example, two packets
`are used to determine that the traffic data is kazaa traffic.
`As shown in the above example, examining more than one
`packets within a session is advantageous in that it greatly
`increases an accuracy, and eliminates false detection.
`It should be noted that module 110 is not limited to using
`the examples of criteria described previously, and that module
`110 can use other criteria for determining content type in
`other embodiments. In some embodiments, a user interface
`can be provided that allows an administrator to select criteria
`or parameters for determining content type. For example,
`module 110 can allow an administrator to input packet size,
`port number, prescribed string pattern, classifier, and other
`parameters that may be used to determine a content type. In
`Some embodiments, the user interface also allows an admin
`istrator to create customized criteria to detect certain content
`type.
`FIG. 4 illustrates an example of results obtained using the
`process 200 of FIG. 2. After module 110 receives packet P11
`(Step 202), module 110 determines that the packet P11 is
`associated with session S1 (Step 204). Packet P11 is the first
`packet in session S1, which has not been previously classi
`fied. Module 110 processes the packet P11 in an attempt to
`determine a content type (i.e., in an attempt to classify the
`session/packet) (Step 207). In the illustrated example, the
`analysis of packet P11 indicates that the packet P11 could be
`one of three content types T1, T2, and T3, and therefore,
`provides a result that is inconclusive. As a result, the content
`type of the session S1 remains “unknown (i.e., the state of
`classification is “unknown with T1, T2, and T3 being pos
`sible candidates).
`Module 110 next receives another packet P12 (Step 202),
`and determines that the packet P12 is associated with the
`same session S1 (Step 204). Module 110 determines that the
`session S1 has not been classified (Step 205), and processes
`the packet P12 in an attempt to determine a content type (Step
`207). In the illustrated example, the analysis of packet P12
`indicates that data transmitted in session S1 does not belong
`to content type T2 (e.g., a packet size of P12 may not match a
`prescribed criteria for type T2), and is therefore, one of two
`remaining content types T1 and T3. Because the content type
`
`10
`
`15
`
`25
`
`30
`
`35
`
`50
`
`55
`
`60
`
`65
`
`Splunk Inc. Exhibit 1046 Page 12
`
`
`
`7
`determination is inconclusive, the content type of the session
`S1 remains “unknown (i.e., the state of classification is
`“unknown with T1 and T3 being possible candidates).
`Module 110 next receives another packet P13 (Step 202),
`and determines that the packet P13 is associated with the
`same session S1 (Step 204). Module 110 determines that the
`session S1 has not been classified (Step 205), and processes
`the packet P13 in an attempt to determine a content type (Step
`207). In the illustrated example, the analysis of packet P13
`indicates that data transmitted in session S1 is content type
`T3. As a result, module 110 classifies the session S1 (and
`therefore, its associated data) to be content type T3 (Step
`208). In such cases, the state of classification for the session
`S1 is changed from “unknown to “classified, with the clas
`sified content type being T3.
`Module 110 next receives another packet P14 (Step 202),
`and determines that the packet P14 is associated with the
`same session S1 (Step 204). Module 110 determines that the
`session S1 has already been classified (Step 205), and there
`fore, classifies the packet P14 to be type T3 (Step 206). As
`shown in the example, after the session S1 has been classified,
`all subsequent received p