`
`
`
`11111111111111111111111111#1111)19111,11411111111111111111111111111111111
`
`(12) United States Patent
`Wei et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 8,693,348 B1
`Apr. 8, 2014
`
`(54) SYSTEMS AND METHODS FOR CONTENT
`TYPE CLASSIFICATION
`
`(71) Applicant: Fortinet, Inc., Sunnyvale, CA (US)
`
`(72)
`
`Inventors: Shaohong Wei, Sunnyvale, CA (US);
`Zhong Qiang Chen, Sunnyvale, CA
`(US); Ping Ng, Milpitas, CA (US); Gang
`Duan, San Jose, CA (US)
`
`(73) Assignee: Fortinet, Inc., Sunnnyvale, CA (US)
`
`* ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`(21) Appl. No.: 14/087,847
`
`(22) Filed:
`
`Nov. 22, 2013
`
`Related U.S. Application Data
`
`(63) Continuation of application No. 13/795,283, filed on
`Mar. 12, 2013, which is a continuation of application
`No. 13/409,141, filed on Mar. 1, 2012, now Pat. No.
`8,639,752, which is a continuation of application No.
`12/503,100, filed on Jul. 15, 2009, now Pat. No.
`8,204,933, which is a continuation of application No.
`11/357,654, filed on Feb. 16, 2006, now Pat. No.
`7,580,974.
`
`(2006.01)
`
`(51) Int. Cl.
`HO4L 12/26
`(52) U.S. Cl.
`CPC
`USPC
`(58) Field of Classification Search
`None
`See application file for complete search history.
`
`HO4L 43/18 (2013.01)
` 370/241
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,361,379 A
`6,157,955 A
`
`11/1994
`12/2000
`
`White
`Narad et
`
`al.
`
`Nichols
`8/2003
`6,608,816 B1
`Dietz et al.
`12/2003
`6,665,725 B1
`Lin
`2/2006
`7,006,502 B2
`Wright
`7/2006
`7,082,102 B1
`Buckman et al.
`8/2006
`7,095,715 B2
`7/2007 Van Bokkelen et al.
`7,242,681 B1 *
`9/2008 Fang et al.
`7,420,992 B1
`8/2009 Wei et al.
`7,580,974 B2
`5/2011 Mcgovern et al.
`7,945,522 B2
`6/2012 Wei et al.
`8,204,933 B2
`1/2014 Wei et al.
`8,639,752 B2
`1/2003 Buckman et al.
`2003/0012147 Al
`1/2004 Oliver et al.
`2004/0002930 Al
`12/2004 Glass et al.
`2004/0261016 Al
`2005/0249125 Al * 11/2005 Yoon et al.
`2006/0112043 Al
`5/2006 Oliver et al.
`2006/0229902 Al
`10/2006 Mcgovern et al.
`2006/0239273 Al
`10/2006 Buckman et al.
`2007/0192481 Al
`8/2007 Wei et al.
`2008/0052326 Al
`2/2008 Evanchik et al.
`2009/0268617 Al
`10/2009 Wei et al.
`(Continued)
`
`OTHER PUBLICATIONS
`
`370/389
`
`370/252
`
`"U.S. Appl. No. 11/357,654, 312 Amendment filed Jun. 23, 2009", 5
`pgs.
`
`(Continued)
`
`Primary Examiner — Kevin C Harper
`(74) Attorney, Agent, or Firm
`Schwegman Lundberg &
`Woessner, P.A.
`
`(57)
`
`ABSTRACT
`
`Various embodiments illustrated and described herein
`include systems, methods and software for content type clas-
`sification. Some such embodiments include determining a
`potential state of classification for packets associated with a
`session based at least in part on a packet associated with the
`session that is a packet other than the first packet of the
`session.
`
`
`
`20 Claims, 7 Drawing Sheets
`
`TIME
`
`SESSION S1
`
`SESSION Si
`
`P21
`
`P22
`
`P23
`
`STATE CLASSIFIED
`(T2)
`
`STATE CLASSIFIED
`(12)
`
`STATE CLASSIFIED
`(12)
`
`P 1 1
`
`P12
`
`P13
`
`P14
`
`P15
`
`STATE UNKNOWN
`(T1, T2, T3)
`
`STATE UNKNOWN
`(T1, T3)
`
`STATE CLASSIFIED
`(T3)
`STATE CLASSIFIED
`(T3)
`
`STATE CLASSIFIED
`(T3)
`
`Juniper - Exhibit 1029, page 1
`
`Cloudflare - Exhibit 1046, page 1
`
`
`
`US 8,693,348 B1
`Page 2
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`2012/0023557 Al *
`2012/0163186 Al
`2013/0258863 Al
`
`1/2012 Bevan et al.
`6/2012 Wei et al.
`10/2013 Wei et al.
`
` 726/4
`
`OTHER PUBLICATIONS
`
`"U.S. Appl. No. 11/357,654, Non-Final Office Action mailed Sep. 18,
`2008", 10 pgs.
`"U.S. Appl. No. 11/357,654, Notice of Allowance mailed Mar. 23,
`2009", 10 pgs.
`"U.S. Appl. No. 11/357,654, Response filed Dec. 18, 2008 to Non-
`Final Office Action mailed Sep. 18, 2008", 8 pgs.
`"U.S. Appl. No. 11/357,654, Response to Rule 312 Communication
`mailed Jul. 23, 2009", 2 pgs.
`"U.S. Appl. No. 12/503,100, Final Office Action mailed Jun. 9,
`2011", 12 pgs.
`
`"U.S. Appl. No. 12/503,100, Non-Final Office Action mailed Oct. 18,
`2010", 12 pgs.
`"U.S. Appl. No. 12/503,100, Notice of Allowance mailed Feb. 17,
`2012", 13 pgs.
`"U.S. Appl. No. 12/503,100, Response Filed Feb. 9, 2012 to Final
`Office Action Jun. 9, 2011", 8 pgs.
`"U.S. Appl. No. 12/503,100, Response filed Mar. 18, 2011 to Non
`Final Office Action mailed Oct. 18, 2010", 10 pgs.
`"U.S. Appl. No. 12/503,100, Supplemental Notice of Allowability
`mailed Apr. 25, 2012", 9 pgs.
`"U.S. Appl. No. 13/409,141, Non Final Office Action mailed Aug. 8,
`2013", 6 pgs.
`"U.S. Appl. No. 13/409,141, Notice of Allowance mailed Sep. 18,
`2013", 9 pgs.
`"U.S. Appl. No. 13/409,141, Response filed Sep. 3, 2013 to Office
`Action mailed Aug. 8, 2013", 5 pgs.
`"Application U.S. Appl. No. 13/409,141, Supplemental Notice of
`Allowability mailed Dec. 20, 2013", 5 pgs.
`
`* cited by examiner
`
`Juniper - Exhibit 1029, page 2
`
`Cloudflare - Exhibit 1046, page 2
`
`
`
`U.S. Patent
`
`Apr. 8, 2014
`
`Sheet 1 of 7
`
`US 8,693,348 B1
`
`100
`
`102
`
`SENDER
`}
`
`X110 110
`
`DATA
`CLASSIFICATION
`MODULE
`
`7 - 1.04
`
`(--
`- — — — H RECEIVER
`
`FIG. I
`
`i- 2,02.
`
`RECEIVE PACKET
`
`r- 204
`
`DETERMINE SESSION
`FOR PACKET
`
`205
`
`DETERMINE
`WHETHER SESSION HAS
`BEEN CLASSIFIED
`
`YES
`
`r 206
`CLASSIFY PACKET TO
`BE THE SAME TYPE AS
`THAT FOR THE SESSION
`
`TYPE "UNKNOWN"
`
`ANALYZE PACKET
`
`0
`
`q IAN
`'‘)
`(
`
`TYPE DETERMINED
`208 --
`CLASSIFY PACKET
`AS THE
`DETERMINED TYPE
`
`FIG. 2
`
`Juniper - Exhibit 1029, page 3
`
`Cloudflare - Exhibit 1046, page 3
`
`
`
`lualud °Sil
`
`.90
`
`L JO Z WIN
`
`Ill 817£`£69`8 Sfl
`
`Example 1: TCP traffic
`# direction packet_size port
`
`using proxy
`
`(p,$) matchs skype
`
`1 C --> S
`
`14
`
`2 S --> C
`
`28-36
`
`3 C --> S
`
`14
`
`180/1443
`
`180/1443
`
`no
`
`no
`
`!80/1443
`no
`FIG. 3A
`
`unknown
`
`unknonwn
`
`Yes (confirm)
`
`Example 2: TCP traffic
`# direction packet_size
`
`port
`
`using proxy
`
`(p,$) matchs skype
`
`1 C --> S
`
`2 S --> C
`
`16
`
`14
`
`3 C --> S
`
`28-36
`
`4 S --> C
`
`14
`
`80
`
`80
`
`80
`
`no
`
`no
`
`no
`
`no
`80
`FIG. 3B
`
`unknown
`
`unknown
`
`unknown
`
`Yes
`
`Example 3: TCP traffic
`# direction packet_size port
`
`1 C --> S
`
`2 S --> C
`
`3 C --›- S
`
`72
`
`93
`
`14
`
`443
`
`443
`
`443
`
`using proxy
`
`pattern
`
`(p,$) matchs skype
`
`no
`
`no
`
`180 46 01 03 01 00 2dJ
`
`unknown
`
`116 03 01 00 4a 021
`
`unknown
`
`N/A
`
`no
`FIG. 3C
`
`yes (confirm)
`
`rn
`
`EF
`xv
`
`CD
`Na
`5°
`73 ID
`
`CD
`
`Cloudflare - Exhibit 1046, page 4
`
`
`
`lualud °Sil
`
`L JO £ JamiS
`
`Ill 817£`£69`8 Sfl
`
`Example 4: UDP traffic:
`# direction packet_size port
`
`pattern(3rd byte) (p,$) matchs skype
`
`1 C --> s
`
`18/27
`
`>1024
`
`0x02
`
`2 s --> C
`
`>10
`
`>1024
`Ox02/0x07
`FIG. 3D
`
`unknown
`
`yes
`
`Example 5: normal Yahoo login, through port 80
`protocol: ?CP; client (C): 192.168.5.186:1734; server (S): 216.155,193.142:80)
`# direction
`pattern
`classifier(p,s, yahoo)
`1 C --> S
`IYMSG1 + ver_number
`+ pkt_len test
`
`session is Yahoo! candidate
`However, without server_side
`packet, the classifier
`returns "unknown"
`
`2 s --> C
`
`IYMSG1
`
`ver_num
`
`Confirmed. The content type
`is marked as Yahoo!
`FIG. 3E
`
`Cloudflare - Exhibit 1046, page 5
`
`
`
`lualud °Sil
`
`L JO 17 JaM1S
`
`Ill 817£`£69`8 Sfl
`
`Example 6:(fragment packet, through http proxy 192.168.5.158;8007)
`protocol: TCP; client (C): 192.168.5.186:40324; server (5): 192.168.5.158:8007)
`
`# direction
`
`1 C --> S
`
`pattern
`IPOSTI + 1YMSGj
`+ ver_num +
`pkt_len test
`
`2 s --> C
`
`IHTTP/1.0 200 OKI
`
`3 S --> C
`
`IYMSGI + ver_num
`+ pkt_len test
`
`classi fi er(p s , yahoo)
`
`session is Yahoo! candidate
`However, without server_side
`packet, the classifier
`returns "unknown"
`
`Get proxy server reply. Classifier
`returns "unknown". waiting
`for the next server packet.
`
`Confirmed. The content type
`is marked as Yahoo!
`FIG. 3F
`
`Juniper - Exhibit 1029, page 6
`
`Cloudflare - Exhibit 1046, page 6
`
`
`
`wawa •sn
`
`L JO S WIN
`
`Ill 817£`£69`8 Sfl
`
`Example 7:(using sock5 server: 192.168.5.158:1080)
`protocol: TCP; client (C): 192.168.5.186:39594; server (5): 192.168.5.158:1080)
`
`# direction pkt_size pattern
`
`classifier(p,s, msn)
`
`1 C --> S
`
`4
`
`105 02 00 021 session is Socks5, classifier
`return "unknown"
`
`2 s --> C
`
`2
`
`105 001
`
`Socks reply, no auth, classifier
`return "unknown". Wait for real
`data
`
`3 C --> S
`
`>10
`
`105 01 00 031 Socks5 connection require,
`return "unknonw", wait for real
`data
`
`4 S --> C
`
`10
`
`105 00 00 011 Socks5 connection success
`return "unknonw", wait for real
`data.
`
`5 C --> S
`
`>16
`
`msn command
`
`MSN candidate, classifier
`return "unknonwn", wait for
`server reply
`
`6 S --> C
`
`N/A
`
`msn command
`
`session confirmed, session's content
`type is marked as "msN"
`
`FIG. 3G
`
`Juniper - Exhibit 1029, page 7
`
`Cloudflare - Exhibit 1046, page 7
`
`
`
`wawa •sn
`
`L JO 9 WIN
`
`Ill 817£`£69`8 Sfl
`
`Example 8: For UDP Tracker protocol 1
`protocol: IJDP; client (C): 192.168.5.186:7934; server (S): 218.83.154.39:8080)
`
`# direction
`
`pattern
`
`classifier(p,s, Bittorrent)
`
`1 P1 --> P2
`
`0x00417271019801
`
`2 P2 --> P1 Test Packet len
`
`session is Bittorrent candidate
`However, without peer packet,
`the classifier will return
`"unknown" and wait for the server
`packet to confirm.
`
`If match length requirMent.The
`Classifier will return "YES".
`The content type will be marked
`as Bittorrent
`FIG. 3H
`
`Example 9: protocol: UDP; client (C): 192.168.5.186:1837; server (S): 66.75.54.201:11.82;
`First Packet (C --> 5)
`# direction pkt_size
`
`classifier(p,s, KazaA)
`
`pattern
`
`1 P1 --> P2
`
`12
`
`2 P2 --› P1
`
`n a
`
`127 00 00 001
`• enc_type check
`• net-workname(
`"KazaA")
`
`128 00 00 001
`• enc_type check
`4- network-name (
`"KaZaA")
`
`KaZaA candidate, return
`"unknonw", waiting for
`peer reply for confirm.
`
`Get pony reply from peer
`session confirmed. return
`"YES". Session marked as
`content type KaZaA
`
`note: enc....type coulde be \xa9, \x29,%xhf etc.
`
`FIG. 31
`
`Juniper - Exhibit 1029, page 8
`
`Cloudflare - Exhibit 1046, page 8
`
`
`
`U.S. Patent
`
`Apr. 8, 2014
`
`Sheet 7 of 7
`
`US 8,693,348 B1
`
`TIME
`
`1
`
`SESSION Si
`
`SESSION Si
`
`P21
`
`P22
`
`P23
`
`STATE CLASSIFIED
`(T2)
`
`STATE CLASSIFIED
`(T2)
`
`STATE CLASSIFIED
`(T2)
`
`P11
`
`P12
`
`P13
`
`P14
`
`P15
`
`STATE UNKNOWN
`(T1, T2, T3)
`
`STATE UNKNOWN
`(T1, T3)
`
`STATE CLASSIFIED
`(T3)
`
`STATE CLASSIFIED
`(T3)
`
`STATE CLASSIFIED
`(T3)
`
`FIG. 4
`
`r
`
`12-
`
`DISPLAY
`
`i- 506
`MAIN
`MEMORY
`
`r- 506
`
`ROM
`
`i- 510
`STORAGE
`DEVICE
`
`z - 500
`
`—1
`
`INPUT
`DEVICE
`
`16 -
`CURSER
`CONTROL
`
`I( _
`
`BUS
`
`‘,
`
`i--564
`
`PROCESSOR
`
`502
`
`I
`
`7-- 518
`COMMUNICATIONS
`INTERFACE
`
`FIG. .
`
`- 526
`
`EQUIPMENT
`
`522
`
`NETWORK
`LINK
`(
`520
`
`LOCAL
`NETWORK
`
`i - 52,4
`
`HOST
`
`Juniper - Exhibit 1029, page 9
`
`Cloudflare - Exhibit 1046, page 9
`
`
`
`US 8,693,348 B1
`
`1
`SYSTEMS AND METHODS FOR CONTENT
`TYPE CLASSIFICATION
`
`RELATED APPLICATIONS
`
`This application is a Continuation of U.S. application Ser.
`No. 13/795,283, filed Mar. 12, 2013; which was a Continua-
`tion of U.S. application Ser. No. 13/409,141, filed Mar. 1,
`2012; which was a Continuation of U.S. application Ser. No.
`12/503,100, filed Jul. 15, 2009, and issued as U.S. Pat. No.
`8,204,933 on Jun. 19, 2012; and which was a Continuation of
`U.S. application Ser. No. 11/357,654, filed Feb. 16, 2006, and
`issued as U.S. Pat. No. 7,580,974 onAug. 25, 2009, to each of
`which priority is claimed and each of which are incorporated
`herein by reference in their entirety.
`
`FIELD
`
`The field of the invention relates to computer systems and
`computer networks, and more particularly, to systems and
`methods for classifying content of computer and network
`traffic.
`
`BACKGROUND
`
`Many data processing systems require a content type of
`data to be determined before the data can be further pro-
`cessed. For example, in malicious content detection systems,
`such as anti-virus systems and anti-spam systems, a received
`data generally needs to be classified before it can be scanned
`for malicious content. Intrusion detection/prevention sys-
`tems, application-based traffic shaping devices or load bal-
`ancers, IM proxies, and application accelerators may also
`require data to be classified. If the data is classified to be a
`skype data, then a content detection module may apply a set
`of algorithms to scan the data for malicious content. On the
`other hand, if the data is classified to be a bittorrent data, then
`the content detection module may apply a different set of
`algorithms to scan the data for malicious content. As such,
`determining content type of data is an important step before
`the data is scanned.
`Existing systems determine content type by using port
`number of a port at which data is transmitted. For example,
`well-known port for HTTP protocol is "80," well-known port
`for SMTP protocol is "25," and well-known port for POP3
`protocol is "110." In such systems, data belonging to a certain
`type is transmitted to a dedicated port. As such, by determin-
`ing the port number of the port at which data is transmitted,
`and knowing the content type that is associated with the port
`number, a system can determine the content type for the data.
`However, use of a port to transmit only one type of data is
`restrictive. Sometimes, it may be desirable to allow a port to
`transmit more than one type of data. Existing systems do not
`allow a content type to be determined if data is transmitted
`through a port that is not data type specific (i.e., port that is
`allowed to transmit more than one type of data).
`Also, some type of data, such as IM data and P2P data, may
`not go to any specific port, and can be transmitted through
`different ports. In such cases, existing systems may not be
`able to classify IM data and P2P data using port number.
`
`SUMMARY
`
`In accordance with some embodiments, a method for deter-
`mining a type of content includes receiving a first packet,
`determining a state of classification for the first packet or for
`a session with which the first packet is associated, receiving a
`
`5
`
`2
`second packet, and determining a content type for the second
`packet based at least in part on the determined state. As used
`in this specification, the term "first packet" refers to any one
`of the packets in a session (e.g., it can be the first, second,
`third, fourth, etc. packet in a session), and does not necessar-
`ily mean the very first packet in a session (although it could be
`used to refer to the very first packet in a session). Similarly, as
`used in this specification, the term "second packet" refers to
`any one of the packets in a session (e.g., it can be the first,
`10 second, third, fourth, etc. packet in a session) that is different
`from the first packet.
`In accordance with other embodiments, a computer prod-
`uct includes a computer-readable medium, the computer-
`readable medium having a set of stored instructions, an
`15 execution of which causes a process to be performed, the
`process includes receiving a first packet, determining a state
`of classification for the first packet or for a session with which
`the first packet is associated, receiving a second packet, and
`determining a content type for the second packet based at
`20 least in part on the determined state.
`In accordance with other embodiments, a system for deter-
`mining a type of content includes means for receiving a first
`packet and a second packet, means for determining a state of
`classification for the first packet or for a session with which
`25 the first packet is associated, and means for determining a
`content type for the second packet based at least in part on the
`determined state.
`In accordance with other embodiments, a method for deter-
`mining a type of content includes receiving a packet associ-
`30 ated with a session, determining whether a content type has
`been determined for the session or for an other packet asso-
`ciated with the session, and classifying the packet to be the
`content type based at least in part on a result from the act of
`determining.
`In accordance with other embodiments, a computer prod-
`uct includes a computer-readable medium, the computer-
`readable medium having a set of stored instructions, an
`execution of which causes a process to be performed, the
`process includes receiving a packet associated with a session,
`40 determining whether a content type has been determined for
`the session or for an other packet associated with the session,
`and classifying the packet to be the content type based at least
`in part on a result from the act of determining.
`In accordance with other embodiments, a system for deter-
`45 mining a type of content includes means for receiving a
`packet associated with a session, means for determining
`whether a content type has been determined for the session or
`for an other packet associated with the session, and means for
`classifying the packet to be the content type based at least in
`so part on a result from the act of determining.
`In accordance with other embodiments, a method for deter-
`mining a type of content includes receiving a first packet from
`a first port, the first port adapted for receiving at least two
`types of content, and determining a content type for the first
`55 packet or for a session with which the first packet is associ-
`ated.
`In accordance with other embodiments, a computer prod-
`uct includes a computer-readable medium, the computer-
`readable medium having a set of stored instructions, an
`60 execution of which causes a process to be performed, the
`process includes receiving a first packet from a first port, the
`first port adapted for receiving at least two types of content,
`and determining a content type for the first packet or for a
`session with which the first packet is associated.
`In accordance with other embodiments, a system for deter-
`mining a type of content includes means for receiving a first
`packet from a first port, the first port adapted for receiving at
`Juniper - Exhibit 1029, page 10
`
`35
`
`65
`
`Cloudflare - Exhibit 1046, page 10
`
`
`
`US 8,693,348 B1
`
`3
`least two types of content, and means for determining a con-
`tent type for the first packet or for a session with which the
`first packet is associated.
`In accordance with other embodiments, a method for deter-
`mining a type of content includes receiving a packet associ-
`ated with a session, and determining a state of classification
`for the packet or the session.
`In accordance with other embodiments, a computer prod-
`uct includes a computer-readable medium, the computer-
`readable medium having a set of stored instructions, an
`execution of which causes a process to be performed, the
`process includes receiving a packet associated with a session,
`and determining a state of classification for the packet or the
`session.
`In accordance with other embodiments, a system for deter-
`mining a type of content includes means for receiving a
`packet associated with a session, and means for determining
`a state of classification for the packet or the session.
`Other aspects and features will be evident from reading the
`following detailed description of the preferred embodiments,
`which are intended to illustrate, not limit, the invention.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The drawings illustrate the design and utility of various
`embodiments, in which similar elements are referred to by
`common reference numerals. More particular descriptions
`will be rendered by reference to specific embodiments, which
`are illustrated in the accompanying drawings. Understanding
`that these drawings are not to be considered limiting in scope,
`the embodiments will be described and explained with addi-
`tional specificity and detail through the use of the accompa-
`nying figures.
`FIG. 1 illustrates a block diagram representing a system
`that includes a module for classifying data in accordance with
`some embodiments;
`FIG. 2 illustrates a method for classifying data in accor-
`dance with some embodiments;
`FIGS. 3A-3I illustrate examples of criteria that may be
`used to determine content type in accordance with some
`embodiments;
`FIG. 4 illustrates examples of results obtained using the
`method of FIG. 2 in accordance with some embodiments; and
`FIG. 5 is a diagram of a computer hardware system with
`which embodiments described herein can be implemented.
`
`DETAILED DESCRIPTION
`
`Various embodiments are described hereinafter with refer-
`ence to the figures. It should be noted that the figures are not
`drawn to scale and that elements of similar structures or
`functions are represented by like reference numerals through-
`out the figures. It should also be noted that the figures are only
`intended to facilitate the description of specific embodiments,
`and are not intended as an exhaustive description of the inven-
`tion, or as a limitation on the scope of the invention. In
`addition, an illustrated embodiment need not have all the
`aspects or advantages of the invention shown. An aspect or an
`advantage described in conjunction with a particular embodi-
`ment is not necessarily limited to that embodiment and can be
`practiced in any other embodiments even if not so illustrated.
`FIG. 1 illustrates a block diagram of a system 100, which
`includes a data classification module 110 for classifying data
`into content type in accordance with some embodiments.
`Sender 102 transmits data associated with network traffic
`content to module 110. Module 110 receives the transmitted
`data, determines a type of content to which the network traffic
`
`5
`
`0
`
`15
`
`4
`data belongs (i.e., classifies the data), and causes a result, such
`as a message, to be sent to a receiver 104. The message sent to
`receiver 104 notifies the receiver 104 that the received data
`belongs to a content type. As used in this specification, the
`term "receiver" should not be limited to a human receiver, and
`can include a server or other types of devices that can receive
`information. For example, in some embodiments, the receiver
`104 can be a malicious content detection module, such as an
`anti-virus module, which detects malicious content based on
`a content type as determined by the module 110. Also, as used
`in this specification, the term "sender" should not be limited
`to a human sender, and can include a server or other types of
`devices that can transmit information.
`In some embodiments, module 110 can be implemented
`using software. For example, module 110 can be imple-
`mented using software that is loaded onto a user's computer,
`a server, or other types of memory, such as a disk or a CD-
`ROM. In some cases, module 110 can be implemented as web
`20 applications. In alternative embodiments, module 110 can be
`implemented using hardware. For example, in some embodi-
`ments, module 110 includes an application-specific inte-
`grated circuit (ASIC), such as a semi-custom ASIC processor
`or a programmable ASIC processor. ASICs, such as those
`25 described in Application-Specific Integrated Circuits by
`Michael J. S. Smith, Addison-Wesley Pub Co. (1st Edition,
`June 1997), are well known in the art of circuit design, and
`therefore will not be described in further detail herein. In
`other embodiments, module 110 can also be any of a variety
`30 of circuits or devices that are capable of performing the func-
`tions described herein. For example, in alternative embodi-
`ments, module 110 can include a general purpose processor,
`such as a Pentium processor. In other embodiments, module
`110 can be implemented using a combination of software and
`35 hardware. In some embodiments, module 110 may be imple-
`mented as a firewall, a component of a firewall, or a compo-
`nent that is configured to be coupled to a firewall.
`FIG. 2 illustrates a method 200 for classifying data in
`accordance with some embodiments. First, module 110
`40 receives network traffic data in a form of a packet (Step 202).
`Next, module 110 determines a session S for the received
`packet (Step 204). A session is an interaction or a series of
`interactions between two communication end points. Various
`techniques may be used to determine a session. For example,
`45 module 110 can be configured to determine one or more of a
`source IP address, a destination IP address, a source port, a
`destination port, and a protocol, to thereby determine a ses-
`sion S for the received packet. Techniques for determining a
`session are known in the art, and will not be described in
`so details.
`Next, module 110 determines whether a content type has
`already been determined for the session S (and therefore, for
`the packet associated with the session S) (Step 205). In the
`illustrated embodiments, each session (e.g., session S) being
`55 processed by module 110 is automatically assigned an initial
`state of classification, "unknown" (i.e., the content type is
`initially determined as "unknown"). In such cases, in step
`205, module 110 determines whether a content type other
`than "unknown" has been determined for session S. If a
`60 content type other than "unknown" type has already been
`determined for the session S, then module 110 classifies the
`packet to be the same type as that for the classified session S,
`and the method returns to step 202 to process additional
`packet, if any (Step 206). In some embodiments, the classi-
`65 fying of the packet (determining the content type for the
`packet) may be implemented by associating the packet with
`the classified session S.
`Juniper - Exhibit 1029, page 11
`
`Cloudflare - Exhibit 1046, page 11
`
`
`
`US 8,693,348 B1
`
`5
`On the other hand, if a content type (other than "unknown"
`type) has not been previously determined for the session S,
`module 110 then analyzes the received packet to attempt to
`determine a content type (Step 207). If a content type (other
`than "unknown" type) is determined for the received packet,
`module 110 then classifies the received packet (or its associ-
`ated session S) as having the determined content type (Step
`208). If there is an additional packet, module 110 then
`receives the additional packet, and repeats the process 200 to
`process the additional packet.
`Alternatively, if after step 207, the content type remains
`"unknown" (e.g., because the analysis of the packet provides
`an inconclusive result), then module 110 receives additional
`packet that is associated with the same session S, and ana-
`lyzes the additional packet to attempt to determine a content
`type for data being transmitted in the session S (repeating
`Steps 202-207, or Steps 202 and 207), until a content type
`other than "unknown" is determined for the session S.
`As shown in the above embodiments, module 110 receives
`packet(s) in step 202, analyzes the packet(s) in step 207, and
`repeats these two steps until it determines a content type for
`the session in which the packet(s) is being transmitted.
`Examples of content type that may be determined by module
`110 includes skype, gnutella, kazaa, edonkey, bittorrent, aim,
`yahoo, msn, icq, qq, http, smtp, pop3, imap, ftp, bo2k, bo, tfn,
`tfn2k, Idap, radius, ms, rpc, snmp, mssql, mysql, and oracle.
`Various techniques may be used to analyze received packet(s)
`to determine a content type for the packet(s) being transmitted
`in a session. For example, module 110 may be configured to
`examine one or more characteristics of a packet, such as a
`packet size, a port number of a port from which the packet is
`received, whether a proxy is used to transmit the packet, a
`direction in which the packet travels (e.g., from client to
`server, or from server to client), a string pattern, order of
`packets, and/or other protocol characteristics.
`Several examples of techniques for analyzing packets to
`determine content type will now be discussed. However, it
`should be understood that module 110 is not limited to using
`the examples of techniques described herein, and that module
`110 can use other algorithms, techniques, and criteria to per-
`form the functions described herein. FIG. 3A illustrates an
`example of criteria that may be used to determine whether
`data transmitted at a normal port is skype data. As shown in
`the example, module 110 is configured to examine the first
`packet that is transmitted from client to server, and determine
`whether the first packet has a prescribed packet size (in the
`example, prescribed packet size=14). If the payload size
`matches the prescribed packet size, then module 110 deter-
`mines that the session is a candidate of skype type. However,
`module 110 still classifies the session as "unknown" because
`the result is inconclusive. The module 110 then determines
`the payload size of a second packet from server to client, and
`determines whether the size satisfies the prescribed criteria
`(in the example, the prescribed size criteria is 28-36). If there
`is no match (i.e., the size of the second packet does not match
`the prescribed size criteria), then module 110 determines that
`the session is not a skype type, and any further packets
`received in the same session would not be considered as a
`skype type. However, if there is a match, the module 110 still
`determines that the session is a candidate of skype type, and
`maintains the state of classification as "unknown." Module
`110 next determines the payload size of a third packet from
`client to server, and determines whether the packet size
`matches the prescribed size criteria (in the example, the pre-
`scribed size criteria=14). If there is a match, then module 110
`determines that data transmitted in the session are skype data.
`
`6
`FIG. 3B illustrates an example of criteria that may be used
`to determine whether data transmitted at a http port is skype
`data. FIG. 3C illustrates an example of criteria that may be
`used to determine whether data transmitted at a ssl port is
`5 skype data. FIG. 3D illustrates an example of criteria that may
`be used to determine whether UDP traffic data is skype data.
`FIG. 3E illustrates an example of criteria that may be used
`to identify Yahoo! messenger traffic (through normal Yahoo
`login). In such cases, instead of determining a size of the
`10 packet, module 110 is configured to examine the content and
`string pattern. As shown in the example, two packets are used
`to determine that the traffic data is Yahoo! messenger traffic.
`FIG. 3F illustrates an example of criteria that may be used to
`identify Yahoo! messenger traffic (through http proxy). As
`15 shown in the example, three packets are used to determine
`that the traffic data is Yahoo! messenger traffic.
`FIG. 3G illustrates an example of criteria that may be used
`to identify msn messenger. In such cases, module 110 is
`configured to examine payload size and pattern in the pay-
`20 load. As shown in the example, six packets are used to deter-
`mine that the traffic data is msn messenger traffic.
`FIG. 3H illustrates an example of criteria that may be used
`to identify bittorrent data. As shown in the example, two
`packets are used to determine that the traffic data is bittorrent
`25 data.
`FIG. 31 illustrates an example of criteria that may be used
`to identify kazaa data. As shown in the example, two packets
`are used to determine that the traffic data is kazaa traffic.
`As shown in the above example, examining more than one
`30 packets within a session is advantageous in that it greatly
`increases an accuracy, and eliminates false detection.
`It should be noted that module 110 is not limited to using
`the examples of criteria described previously, and that module
`110 can use other criteria for determining content type in
`35 other embodiments. In some embodiments, a user interface
`can be provided that allows an administrator to select criteria
`or parameters for determining content type. For example,
`module 110 can allow an administrator to input packet size,
`port number, prescribed string pattern, classifier, and other
`40 parameters that may be used to determine a content type. In
`some embodiments, the user interface also allows an admin-
`istrator to create customized criteria to detect certain content
`type.
`FIG. 4 illustrates an example of results obtained using the
`45 process 200 of FIG. 2. After module 110 receives packet P11
`(Step 202), module 110 determines that the packet Pll is
`associated with session S1 (Step 204). Packet Pll is the first
`packet in session Sl, which has not been previously classi-
`fied. Module 110 processes the packet Pll in an attempt to
`so determine a content type (i.e., in an attempt to classify the
`session/packet) (Step 207). In the illustrated example, the
`analysis of packet Pll indicates that the packet Pll could be
`one of three content types T1, T2, and T3, and therefore,
`provides a result that is inconclusive. As a result, the content
`55 type of the session S1 remains "unknown" (i.e., the state of
`classification is "unknown" with T1, T2, and T3 being pos-
`sible candidat