`Case 4:18-cv-07229—YGR Document 195-6 Filed 05/10/21 Page 1 of 25
`
`
`
`
`
`
`
`
`
`
`
`
`
`EXHIBIT 1
`
`EXHIBIT 1
`
`
`
`Case 4:18-cv-07229-YGR Document 195-6 Filed 05/10/21 Page 2 of 25
`I 1111111111111111 11111 1111111111 1111111111 1111111111 111111111111111 IIII IIII
`US008225408B2
`
`c12) United States Patent
`Rubin et al.
`
`(IO) Patent No.:
`(45) Date of Patent:
`
`US 8,225,408 B2
`Jul. 17, 2012
`
`(54) METHOD AND SYSTEM FOR ADAPTIVE
`RULE-BASED CONTENT SCANNERS
`
`(75)
`
`Inventors: Moshe Rubin, Jerusalem (IL); Moshe
`Matitya, Jerusalem (IL); Artem
`Melnick, Beit Shemesh (IL); Shlomo
`Touboul, Kefar-Haim (IL); Alexander
`Yermakov, Beit Shemesh (IL); Amit
`Shaked, Tel Aviv (IL)
`
`(73) Assignee: Finjan, Inc., San Jose, CA (US)
`
`( *) Notice:
`
`Subject to any disclaimer, the term ofthis
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 1298 days.
`
`(21) Appl. No.: 10/930,884
`
`(22) Filed:
`
`Aug. 30, 2004
`
`(65)
`
`Prior Publication Data
`
`US 2005/0108554 Al
`
`May 19, 2005
`
`Related U.S. Application Data
`
`(63)
`
`Continuation-in-part of application No. 09/539,667,
`filed on Mar. 30, 2000, now Pat. No. 6,804,780, which
`is a continuation of application No. 08/964,388, filed
`on Nov. 6, 1997, now Pat. No. 6,092,194.
`
`(51)
`
`Int. Cl.
`(2006.01)
`H04L 29106
`(52) U.S. Cl. ............................. 726/25; 713/153; 726/22
`(58) Field of Classification Search ........................ None
`See application file for complete search history.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`5,077,677 A
`12/1991 Murphy et al.
`5,359,659 A
`10/ 1994 Rosenthal
`5,361,359 A
`1111994 Tajalli et al.
`
`5,414,833 A *
`5,485,409 A
`5,485,575 A
`5,572,643 A
`5,579,509 A
`5,606,668 A
`5,623,600 A
`5,638,446 A
`5,675,711 A *
`
`5/1995 Hershey et al. ................. 726/22
`1/1996 Gupta et al.
`l/ 1996 Chess et al.
`1111996 Judson
`11/1996 Furtney et al.
`2/1997 Shwed
`4/1997 Ji et al.
`6/1997 Rubin
`10/1997 Kephart et al.
`(Continued)
`
`................. 706/12
`
`EP
`
`FOREIGN PATENT DOCUMENTS
`1091276 Al
`4/2001
`(Continued)
`
`OTHER PUBLICATIONS
`
`D Grune, C Jacobs, K Langendoen, H Bal-Parsing Techniques: A
`Practical Guide, 2000-John Wiley & Sons, Inc. New York, NY,
`USA, p. 1-326.*
`
`(Continued)
`
`Primary Examiner - Eleni Shiferaw
`Assistant Examiner -
`Jeffery Williams
`(74) Attorney, Agent, or Firm - Dawn-Marie Bey; King &
`Spalding LLP
`
`(57)
`
`ABSTRACT
`
`A method for scanning content, including identifying tokens
`within an incoming byte stream, the tokens being lexical
`constructs for a specific language, identifying patterns of
`tokens, generating a parse tree from the identified patterns of
`tokens, and identifying the presence of potential exploits
`within the parse tree, wherein said identifying tokens, iden(cid:173)
`tifying patterns of tokens, and identifying the presence of
`potential exploits are based upon a set of rules for the specific
`language. A system and a computer readable storage medium
`are also described and claimed.
`
`35 Claims, 7 Drawing Sheets
`
`NETWORK GATEWAY
`
`PRE-SCANNER
`
`110
`
`INTERNET
`
`CONTENT SCANNER
`
`l+ - - - - - - - - - -1 CONTENTCACHE
`
`140
`
`
`
`Case 4:18-cv-07229-YGR Document 195-6 Filed 05/10/21 Page 3 of 25
`
`US 8,225,408 B2
`Page 2
`
`U.S. PATENT DOCUMENTS
`5,692,047 A
`11/1997 McManis
`5,692,124 A
`11/1997 Holden et al.
`5,720,033 A
`2/1998 Deo
`5,724,425 A
`3/1998 Chang et al.
`5,740,248 A
`4/1998 Fieres et al.
`5,740,441 A *
`4/1998 Yellin et al.
`5,761,421 A
`6/1998 van Hoff et al.
`5,765,205 A
`6/1998 Breslau et al.
`5,784,459 A
`7/1998 Devarakonda et al.
`5,796,952 A
`8/1998 Davis et al.
`5,805,829 A
`9/1998 Cohen et al.
`5,832,208 A
`11/1998 Chen et al.
`5,832,274 A
`11/1998 Cutler et al.
`5,850,559 A
`12/1998 Angelo et al.
`5,859,966 A
`1/1999 Hayman eta!.
`5,864,683 A
`1/1999 Boebert et al.
`5,881,151 A *
`3/1999 Yamamoto.
`5,884,033 A *
`3/1999 Duvall et al.
`5,892,904 A
`4/1999 Atkinson et al.
`5,951,698 A
`9/1999 Chen et al.
`5,956,481 A
`9/1999 Walsh eta!.
`5,963,742 A
`10/1999 Williams
`5,974,549 A
`10/1999 Golan
`5,978,484 A
`11/1999 Apperson et al.
`5,983,348 A * 11/1999 Ji
`5,987,611 A
`11/1999 Freund .
`6,088,801 A *
`7/2000 Grecsek
`6,088,803 A *
`7/2000 Tso et al.
`6,092,194 A
`7/2000 Touboul
`6,154,844 A
`11/2000 Touboul
`6,167,520 A
`12/2000 Touboul
`6,339,829 Bl
`1/2002 Beadle et al.
`6,425,058 Bl
`7/2002 Arimilli et al.
`6,434,668 Bl
`8/2002 Arimilli et al.
`6,434,669 Bl
`8/2002 Arimilli et al.
`6,480,962 Bl
`11/2002 Touboul
`6,487,666 Bl
`11/2002 Shanklin et al.
`6,519,679 B2
`2/2003 Devireddy et al.
`6,598,033 B2
`7/2003 Ross et al.
`6,732,179 Bl
`5/2004 Brown eta!.
`6,804,780 Bl
`10/2004 Touboul
`6,917,953 B2
`7/2005 Simon et al.
`7,058,822 B2
`6/2006 Edery et al.
`7,143,444 B2 * 11/2006 Porras et al.
`7,210,041 Bl*
`4/2007 Gryaznov et al.
`7,308,648 Bl* 12/2007 Buchthal et al.
`7,343,604 B2 *
`3/2008 Grabarnik et al.
`7,418,731 B2
`8/2008 Touboul
`2003/0014662 Al *
`1/2003 Gupta et al.
`2003/0074190 Al*
`4/2003 Allison
`2003/0101358 Al*
`5/2003 Porras et al.
`2004/0073811 Al *
`4/2004 Sanin
`2004/0088425 Al*
`5/2004 Rubinstein et al.
`2005/0050338 Al*
`3/2005 Liang et al.
`2005/0172338 Al*
`8/2005 Sandu et al.
`2006/0031207 Al *
`2/2006 Bj arnestam et al.
`2006/0048224 Al *
`3/2006 Duncan et al.
`2008/0066160 Al*
`3/2008 Becker et al.
`2010/0195909 Al
`8/2010 Wasson et al.
`
`7 l 7 /134
`
`726/24
`709/206
`
`717/143
`
`726/13
`726/4
`726/1
`726/22
`
`711/134
`711/128
`711/128
`
`726/23
`711/114
`706/46
`709/229
`
`707/204
`726/22
`726/30
`713/188
`715/234
`719/313
`726/22
`713/200
`704/10
`713/201
`713/201
`709/230
`713/188
`726/22
`707/3
`726/22
`726/4
`382/176
`
`FOREIGN PATENT DOCUMENTS
`1132796 Al
`9/2001
`WO 2004/063948
`7/2004
`
`EP
`WO
`
`OTHER PUBLICATIONS
`
`Power, James, "Notes on Formal Language Theory and Parsing",
`1999, National University of Ireland, p. 1-40.*
`Scott et al., "Abstracting Application-Level Web Security", 2002,
`ACM, p. 396-407.*
`U.S. Appl. No. 10/838,889, filed Oct. 26, 1999, Golan, G.
`http://www.codeguru.com/Cpp/Cpp/cpp_mfc/parsing/article. php/
`c4093/.
`http:/ /www.cs.may.ie/-jpower/ Courses/ compilers/notes/lexical. pdf.
`http://www.mail-archive.com/kragen-tol@canonical.org/
`msg00097 .html.
`
`http://www.owlnet.rice.edu/-comp4l2/Lectures/L06LexWrapup4.
`pdf.
`http:/ /www. cs. odu .edu/-toi da/nerzi c/390teched/ regular/ fa/min -fa.
`html.
`http://rw4.cs.uni-sb.de/-ganimal/GANIFA/pagel6_e.htm.
`http://www.cs.msstate.edu/-hansen/classes/3 813 fall0 1/ slides/
`06Minimize.pdf.
`http://www.win.tue.nl/-watson/2R870/downloads/madfa_algs.pdf.
`http://www.cs.nyu.edu/web/Research/Theses/chang_chia-hsiang.
`pdf.
`"Products" Article published on the Internet, "Revolutionary Secu(cid:173)
`rity for a New Computing Paradigm" regarding SurfinGate™ 7
`pages.
`"Release Notes for the Microsoft ActiveX Development Kit", Aug.
`13, 1996, activex.adsp.or.jp/inetsdk/readme.txt, pp. 1-10.
`Doyle eta!., "Microsoft Press Computer Dictionary" 1993, Microsoft
`Press, 2nd Edition, pp. 137-138.
`Finjan Software Ltd., "Powerful PC Security for the New World of
`Java™ and Downloadables, Surfin Shield™" Article published on
`the Internet by Finjan Software Ltd., 1996, 2 pages.
`Finjan Software Ltd., "FinjanAnnounces a Personal Java™ Firewall
`for Web Browsers-the SurfinShield™ 1.6 (formerly known as
`SurfinBoard)", Press Release of Finjan Releases SurfinShield 1.6,
`Oct. 21, 1996, 2 pages.
`Finjan Software Ltd., "Finjan Announces Major Power Boost and
`New Features for SurfinShield™ 2.0" Las Vegas Convention Center/
`Pavilion 5 P5551, Nov. 18, 1996, 3 pages.
`Finjan Software Ltd., "Finjan Software Releases SurfinBoard, Indus(cid:173)
`try's First JAVA Security Product for the World Wide Web", Article
`published on the Internet by Finjan Software Ltd., Jul. 29, 1996, 1
`page.
`Finjan Software Ltd., "Java Security: Issues & Solutions" Article
`published on the Internet by Finjan Software Ltd., 1996, 8 pages.
`Finjan Software Ltd., Company Profile "Finjan-Safe Surfing, The
`Java Security Solutions Provider" Article published on the Internet
`by Oct. 31, 1996, 3 pages.
`IBM Antivirus User's Guide Version 2.4, International Business
`Machines Corporation, Nov. 15, 1995, p. 6-7.
`Khare, R. "Microsoft Authenticod Analyzed" Jul. 22, 1996, xent.
`corn/FoRK-archive/smmer96/0338.html, p. 1-2.
`LaDue, M., "Online Business Consultant: Java Security: Whose
`Business Is It?" Article published on the Internet, Home Page Press,
`Inc. 1996, 4 pages.
`Leach, Norvin et al., "IE 3.0 Applets Will Earn Certification", PC
`Week, vol. 13, No. 29, Jul. 22, 1996, 2 pages.
`Moritz, R., "Why We Shouldn't Fear Java" Java Report, Feb. 1997,
`pp. 51-56.
`Microsoft-"Microsoft ActiveX Software Development Kit" Aug.
`12, 1996, activex.adsp.or.jp/inetsdk/help/overview.htm, pp. 1-6.
`Microsoft Corporation, Web Page Article "Frequently Asked Ques(cid:173)
`tions About Authenticode", last updated Feb. 17, 1997, Printed Dec.
`23, 1998. URL: http://www.microsoft.com/workshop/security/
`authcode/signfaq.asp#9, pp. 1-13.
`Microsoft® Authenticode Technology, "Ensuring Accountability
`and Authenticity for Software Components on the Internet",
`Microsoft Corporation, Oct. 1996, including Abstract, Contents,
`Introduction and pp. 1-10.
`Okamoto, E. et al., "ID-Based Authentication System for Computer
`Virus Detection", IEEE/IEE Electronic Library online, Electronics
`Letters, vol. 26, Issue 15, ISSN 0013-5194, Jul. 19, 1990, Abstract
`and pp. 1169-1170. URL: http://iel.ihs.com:80/cgi-bin/iel_cgi?se ..
`2ehts%26ViewTemplate%3ddocview%5 fb%2ehts.
`Omura, J. K., "Novel Applications of Cryptography in Digital Com(cid:173)
`munications", IEEE Communications Magazine, May 1990; pp.
`21-29.
`Schmitt, D.A., ".EXE files, OS-2 style" PC Tech Journal, v6, nl 1, p.
`76 (13).
`Zhang, X.N., "Secure Code Distribution", IEEE/IEE Electronic
`Library online, Computer, vol. 30, Issue 6, Jun. 1997, pp. 76-79.
`International Search Report for Application No. PCT/IL05/00915, 4
`pp., dated Mar. 3, 2006.
`
`
`
`Case 4:18-cv-07229-YGR Document 195-6 Filed 05/10/21 Page 4 of 25
`
`US 8,225,408 B2
`Page 3
`
`Zhong, et al., "Security in the Large: is Java's Sandbox Scalable,"
`Seventh IEEE Symposium on Reliable Distributed Systems, pp. 1-6,
`Oct. 1998.
`Rubin, et al., "Mobile Code Security," IEEE Internet, pp. 30-34, Dec.
`1998.
`Schmid, et al. "Protecting Data From Malicious Software," Proceed(cid:173)
`ing of the 18th Annual Computer Security Applications Conference,
`pp. 1-10, 2002.
`Corradi, et al., "A Flexible Access Control Service for Java Mobile
`Code," IEEE, pp. 356-365, 2000.
`
`International Search Report for Application No. PCT /IB97/01626, 3
`pp., May 14, 1998 (mailing date).
`Written Opinion for Application No. PCT/IL05/00915, 5 pp., dated
`Mar. 3, 2006 (mailing date).
`International Search Report for Application No. PCT /IBO 1/01138, 4
`pp., Sep. 20, 2002 (mailing date).
`International Preliminary Examination Report for Application No.
`PCT/IBO 1/01138, 2 pp., dated Dec. 19, 2002.
`
`* cited by examiner
`
`
`
`Case 4:18-cv-07229-YGR Document 195-6 Filed 05/10/21 Page 5 of 25
`
`U.S. Patent
`
`Jul. 17, 2012
`
`Sheet 1 of 7
`
`US 8,225,408 B2
`
`w
`
`_______ .. ~ ~
`I: 5
`
`1-z
`z
`8
`
`0
`N
`
`I-z
`w
`;:j
`0
`
`T"'"'I
`
`l9
`t,,-,f
`LL
`
`0 ....
`....
`
`0
`IO
`
`....
`
`>-
`~ w
`~ C)
`
`~
`0:::
`0
`
`~ w z
`
`g
`
`0::: w z
`~ u
`en
`I-z w
`I-z
`0 u
`
`0::: w z
`z
`t3 en I
`~ a.
`
`Iii z
`CZ::
`w
`1-z
`
`
`
`Case 4:18-cv-07229-YGR Document 195-6 Filed 05/10/21 Page 6 of 25
`
`~ = 00 = N
`
`UI
`N
`'N
`00
`d r.,;_
`
`FIG. 2
`
`260
`
`PATTERN MATCHING ENGINE
`
`SUB-SCANNER
`
`270
`
`0 ....
`N
`.....
`rJJ =- ('D
`
`('D
`
`-....J
`
`....
`~ = :-'
`
`N
`~-....J
`
`0 ....
`
`N
`
`I
`
`l
`
`AIIJAI V71=~ ~I II I=!':
`
`f
`
`I
`
`I
`
`l
`
`PARSER RULES
`
`I
`
`I
`
`I
`
`I ~ PARSE TREE
`1(~5%00~~
`
`/
`
`I I
`
`I I
`
`DECODER
`
`2501
`
`.. 1
`
`BYTE SOURCE ")
`
`NORMALIZER
`
`2401
`
`~ = ~
`
`~
`~
`~
`•
`00
`~
`
`230
`
`ANALY2ER
`
`PARSER
`
`I 220
`
`TOKENIZER
`
`210
`
`~,~
`
`
`
`Case 4:18-cv-07229-YGR Document 195-6 Filed 05/10/21 Page 7 of 25
`
`U.S. Patent
`
`Jul. 17, 2012
`
`Sheet 3 of 7
`
`US 8,225,408 B2
`
`C:
`0
`.:;
`
`tU fl C:
`::::, a.
`
`C
`
`.Q -tU
`.g
`
`C:
`:::,
`Q.
`
`co
`
`M
`l9
`t--t u.
`
`:.::;
`
`§
`tU .g
`::::, a.
`L
`
`C:
`
`
`
`Case 4:18-cv-07229-YGR Document 195-6 Filed 05/10/21 Page 8 of 25
`
`U.S. Patent
`
`Jul. 17, 2012
`
`Sheet 4 of 7
`
`US 8,225,408 B2
`
`FIG. 4
`
`
`
`Case 4:18-cv-07229-YGR Document 195-6 Filed 05/10/21 Page 9 of 25
`
`U.S. Patent
`
`Jul. 17, 2012
`
`Sheet 5 of 7
`
`US 8,225,408 B2
`
`500
`
`510
`
`540
`
`560
`
`580
`
`590
`
`CALL TOKENIZER TO RETRIEVE NEXT
`TOKEN
`
`ADD TOKEN TO PARSE TREE
`
`NO
`
`YES
`
`520
`
`530
`
`NO
`PERFORM ACTION ASSOCIATED WITH
`MATCHED PARSER RULE:
`CREATE A NEW NODE, CALLED [RULE(cid:173)
`NAMEJ AND PLACE THE MATCHING
`NODES UNDER THE NEW NODE
`
`550
`
`YES
`
`CALL ANAL VZER TO DETERMINE IF A
`POTENTIAL EXPLOIT IS PRESENT
`
`570
`
`NO
`
`ANALY
`NALYZE
`MATCH
`
`YES
`
`PERFORM ACTION ASSOCIATED WITH
`MATCHED ANALYZER RULE:
`RECORD ANALYZER RULE AT CURRENT
`NODE, AS LEVEL 0
`
`PROPAGATE ANAL VZER RULE UPWARD
`THROUGH NODE PARENTS, AS
`SUCCESSIVELY INCREASING LEVEL
`
`FIG. 5
`
`
`
`Case 4:18-cv-07229-YGR Document 195-6 Filed 05/10/21 Page 10 of 25
`
`~ = 00 = N
`
`UI
`N
`'N
`00
`d r.,;_
`
`0 ....
`O'I
`.....
`rJJ =(cid:173)
`
`('D
`('D
`
`-....J
`
`....
`2' :-'
`
`N
`~-....J
`
`0 ....
`
`N
`
`~ = ~
`
`~
`~
`~
`•
`00
`~
`
`RULE DATA
`SERIALIZED
`
`I 8TML
`
`8
`0
`
`6601
`(7 REPOSITORY
`
`SCANNER
`FACTORY
`
`ARB SCANNER
`
`8
`0
`8
`
`640 I
`
`REPOSITORY
`
`SCANNER
`FACTORY
`
`ARB SCANNER
`
`BUILDER
`
`FIG. 6
`
`FACTORY OBJECT
`
`instance()
`
`ARB SUB-THREAD
`
`670
`
`MAIN
`
`610
`
`CONVERTER
`RULE-TO-XML
`
`
`
`Case 4:18-cv-07229-YGR Document 195-6 Filed 05/10/21 Page 11 of 25
`
`U.S. Patent
`
`Jul. 17, 2012
`
`Sheet 7 of 7
`
`US 8,225,408 B2
`
`BUILDER
`
`ARB SCANNER FACTORY
`
`SCANNER REPOSITORY
`
`ARB SCANNER
`JAVASCRIPT
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`TOKENIZER
`
`PARSER
`
`ANALYZER
`
`FIG. 7
`
`I
`
`ARB SCANNER
`URI
`
`I
`
`TOKENIZER
`
`PARSER
`
`ANALYZER
`
`I
`
`ARB SCANNER
`HTML
`
`I
`
`TOKENIZER
`
`PARSER
`
`ANALVZER
`
`-
`
`-
`
`-
`
`
`
`Case 4:18-cv-07229-YGR Document 195-6 Filed 05/10/21 Page 12 of 25
`
`US 8,225,408 B2
`
`1
`METHOD AND SYSTEM FOR ADAPTIVE
`RULE-BASED CONTENT SCANNERS
`
`CROSS REFERENCES TO RELATED
`APPLICATIONS
`
`This application is a continuation-in-part of assignee's
`application U.S. Ser. No. 09/539,667, filed on Mar. 30, 2000,
`now U.S. Pat. No. 6,804,780, entitled "System and Method
`for Protecting a Computer and a Network from Hostile
`Downloadables," which is a continuation of assignee's patent
`application U.S. Ser. No. 08/964,388, filed on 6 Nov. 1997,
`now U.S. Pat. No. 6,092,194, also entitled "System and
`Method for Protecting a Computer and a Network from Hos(cid:173)
`tile Downloadables."
`
`FIELD OF THE INVENTION
`
`The present invention relates to network security, and in
`particular to scanning of mobile content for exploits.
`
`BACKGROUND OF THE INVENTION
`
`20
`
`2
`content, such as inter alia JavaScript, VBScript, URI, URL
`and HTML. ARB scanners differ from prior art scanners that
`are hard-coded for one particular type of content. In distinc(cid:173)
`tion, ARB scanners are data-driven, and can be enabled to
`5 scan any specific type of content by providing appropriate
`rule files, without the need to modify source code. Rule files
`are text files that describe lexical characteristics of a particu(cid:173)
`lar language. Rule files for a language describe character
`encodings, sequences of characters that form lexical con-
`10 structs of the language, referred to as tokens, patterns of
`tokens that form syntactical constructs of program code,
`referred to as parsing rules, and patterns of tokens that corre(cid:173)
`spond to potential exploits, referred to as analyzer rules.
`Rules files thus serve as adaptors, to adapt an ARB content
`15 scanner to a specific type of content.
`The present invention also utilizes a novel description lan(cid:173)
`guage for efficiently describing exploits. This description
`language enables an engineer to describe exploits as logical
`combinations of patterns of tokens.
`Thus it may be appreciated that the present invention is able
`to diagnose incoming content. As such, the present invention
`achieves very accurate blocking of content, with minimal
`over-blocking as compared with prior art scanning technolo-
`gies.
`There is thus provided in accordance with a preferred
`embodiment of the present invention a method for scanning
`content, including identifying tokens within an incoming
`byte stream, the tokens being lexical constructs for a specific
`language, identifying patterns of tokens, generating a parse
`30 tree from the identified patterns of tokens, and identifying the
`presence of potential exploits within the parse tree, wherein
`said identifying tokens, identifying patters of tokens, and
`identifying the presence of potential exploits are based upon
`a set of rules for the specific language.
`There is moreover provided in accordance with a preferred
`embodiment of the present invention a system for scanning
`content, including a tokenizer for identifying tokens within an
`incoming byte stream, the tokens being lexical constructs for
`a specific language, a parser operatively coupled to the token-
`40 izer for identifying patterns of tokens, and generating a parse
`tree therefrom, and an analyzer operatively coupled to the
`parser for analyzing the parse tree and identifying the pres(cid:173)
`ence of potential exploits therewithin, wherein the tokenizer,
`the parser and the analyzer use a set of rules for the specific
`45 language to identify tokens, patterns and potential exploits,
`respectively.
`There is further provided in accordance with a preferred
`embodiment of the present invention a computer-readable
`storage medium storing program code for causing a computer
`50 to perform the steps of identifying tokens within an incoming
`byte stream, the tokens being lexical constructs for a specific
`language, identifying patterns of tokens, generating a parse
`tree from the identified patterns of tokens, and identifying the
`presence of potential exploits within the parse tree, wherein
`55 said identifying tokens, identifying patters of tokens, and
`identifying the presence of potential exploits are based upon
`a set of rules for the specific language.
`There is yet further provided in accordance with a preferred
`embodiment of the present invention a method for scanning
`60 content, including expressing an exploit in terms of patterns
`of tokens and rules, where tokens are lexical constructs of a
`specific programming language, and rules are sequences of
`tokens that form programmatical constructs, and parsing an
`incoming byte source to determine if an exploit is present
`65 therewithin, based on said expressing.
`There is additionally provided in accordance with a pre(cid:173)
`ferred embodiment of the present invention a system for
`
`Conventional anti-virus software scans a computer file sys(cid:173)
`tem by searching for byte patterns, referred to as signatures 25
`that are present within known viruses. If a virus signature is
`discovered within a file, the file is designated as infected.
`Content that enters a computer from the Internet poses
`additional security threats, as such content executes upon
`entry into a client computer, without being saved into the
`computer's file system. Content such as JavaScript and
`VBScript is executed by an Internet browser, as soon as the
`content is received within a web page.
`Conventional network security software also scans such
`mobile content by searching for heuristic virus signatures. 35
`However, in order to be as protective as possible, virus sig(cid:173)
`natures for mobile content tend to be over-conservative,
`which results in significant over-blocking of content. Over(cid:173)
`blocking refers to false positives; i.e., in addition to blocking
`of malicious content, prior art technologies also block a sig(cid:173)
`nificant amount of content that is not malicious.
`Another drawback with prior art network security software
`is that it is unable to recognize combined attacks, in which an
`exploit is split among different content streams. Yet another
`drawback is that prior art network security software is unable
`to scan content containers, such as URI within JavaScript.
`All of the above drawbacks with conventional network
`security software are due to an inability to diagnose mobile
`code. Diagnosis is a daunting task, since it entails understand(cid:173)
`ing incoming byte source code. The same malicious exploit
`can be encoded in an endless variety of ways, so it is not
`sufficient to look for specific signatures.
`Nevertheless, in order to accurately block malicious code
`with minimal over-blocking, a
`thorough diagnosis
`is
`required.
`
`SUMMARY OF THE DESCRIPTION
`
`The present invention provides a method and system for
`scanning content that includes mobile code, to produce a
`diagnostic analysis of potential exploits within the content.
`The present invention is preferably used within a network
`gateway or proxy, to protect an intranet against viruses and
`other malicious mobile code.
`The content scanners of the present invention are referred
`to as adaptive rule-based (ARB) scanners. AnARB scanner is
`able to adapt itself dynamically to scan a specific type of
`
`
`
`Case 4:18-cv-07229-YGR Document 195-6 Filed 05/10/21 Page 13 of 25
`
`US 8,225,408 B2
`
`3
`scanning content, including a parser for parsing an incoming
`byte source to determine if an exploit is present therewithin,
`based on a formal description of the exploit expressed in
`terms of patterns of tokens and rules, where tokens are lexical
`constructs of a specific programming language, and rules are
`sequences of tokens that form programmatical constructs.
`There is moreover provided in accordance with a preferred
`embodiment of the present invention a computer-readable
`storage medium storing program code for causing a computer
`to perform the steps of expressing an exploit in terms of
`patterns of tokens and rules, where tokens are lexical con(cid:173)
`structs of a specific programming language, and rules are
`sequences of tokens that form programmatical constructs,
`and parsing an incoming byte source to determine if an
`exploit is present therewithin, based on said expressing.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The present invention will be more fully understood and
`appreciated from the following detailed description, taken in
`conjunction with the drawings in which:
`FIG. 1 is a simplified block diagram of an overall gateway
`security system that uses an adaptive rule-based (ARB) con(cid:173)
`tent scanner, in accordance with a preferred embodiment of
`the present invention;
`FIG. 2 is a simplified block diagram of an adaptive rule(cid:173)
`based content scanner system, in accordance with a preferred
`embodiment of the present invention;
`FIG. 3 is an illustration of a simple finite state machine for
`detecting tokens "a" and "ab", used in accordance with a 30
`preferred embodiment of the present invention;
`FIG. 4 is an illustration of a simple finite state machine for
`a pattern, used in accordance with a preferred embodiment of
`the present invention;
`FIG. 5 is a simplified flowchart of operation of a parser for 35
`a specific content language within an ARB content scanner, in
`accordance with a preferred embodiment of the present
`invention;
`FIG. 6 is a simplified block diagram of a system for seri(cid:173)
`alizing binary instances of ARB content scarmers, transmit- 40
`ting them to a client site, and regenerating them back into
`binary instances at the client site, in accordance with a pre(cid:173)
`ferred embodiment of the present invention; and
`FIG. 7 illustrates a representative hierarchy of objects cre(cid:173)
`ated by a builder module, in accordance with a preferred
`embodiment of the present invention.
`
`LIST OF APPENDICES
`
`4
`from malicious mobile code originating from the Internet.
`Mobile code is program code that executes on a client com(cid:173)
`puter. Mobile code can take many diverse forms, including
`inter alia JavaScript, Visual Basic script, HTML pages, as
`5 well as a Uniform Resource Identifier (URI).
`Mobile code can be detrimental to a client computer.
`Mobile code can access a client computer's operating system
`and file system, can open sockets for transmitting data to and
`from a client computer, and can tie up a client computer's
`10 processing and memory resources. Such malicious mobile
`code carmot be detected using conventional anti-virus scan(cid:173)
`ners, which scan a computer's file system, since mobile code
`is able to execute as soon as it enters a client computer from
`the Internet, before being saved to a file.
`15 Many examples of malicious mobile code are known today.
`Portions of code that are malicious are referred to as exploits.
`For example, one such exploit uses JavaScript to create a
`window that fills an entire screen. The user is then unable to
`access any windows lying underneath the filler window. The
`20 following sample code shows such an exploit.
`
`<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//
`
`<HEAD>
`<TITLE> BID-3469</TITLE>
`<SCRIPT>
`op-window.createPopup( );
`s-"<body>foobar</body>';
`op.document.body.innerHTML-s;
`function oppop( )
`{
`
`if (!op.isOpen)
`{
`
`w = screen.width;
`h = screen.height;
`op.show(0,0,w,h,document.body);
`
`function doit ()
`{
`
`oppop( );
`setlnterval("window.focus( ); { oppop( );}",10);
`
`</SCRIPT>
`</HEAD>
`<BODY>
`
`<INPUT type-"button" name-"btnDolt" value-"Do It" onclick-"doit( )">
`</FORM>
`</BODY>
`</HTML>
`
`Appendix A is a source listing of an ARB rule file for the 50
`JavaScript language, in accordance with a preferred embodi(cid:173)
`ment of the present invention.
`
`DETAILED DESCRIPTION
`
`The present invention concerns scarming of content that
`contains mobile code, to protect an enterprise against viruses
`and other malicious code.
`Reference is now made to FIG. 1, which is a simplified
`block diagram of an overall gateway security system that uses
`an adaptive rule-based (ARB) content scarmer, in accordance
`with a preferred embodiment of the present invention. Shown
`in FIG. 1 is a network gateway 110 that acts as a conduit for
`content from the Internet entering into a corporate intranet,
`and for content from the corporate intranet exiting to the 65
`Internet. One of the functions of network gateway 110 is to
`protect client computers 120 within the corporate intranet
`
`Thus it may be appreciated that the security function of net(cid:173)
`work gateway 110 is critical to a corporate intranet.
`In accordance with a preferred embodiment of the present
`invention, network gateway 110 includes a content scanner
`55 130, whose purpose is to scan mobile code and identify poten(cid:173)
`tial exploits. Content scanner 130 receives as input content
`containing mobile code in the form of byte source, and gen(cid:173)
`erates a security profile for the content. The security profile
`indicates whether or not potential exploits have been discov-
`60 ered within the content, and, if so, provides a diagnostic list of
`one or more potential exploits and their respective locations
`within the content.
`Preferably, the corporate intranet uses a security policy to
`decide whether or not to block incoming content based on the
`content's security profile. For example, a security policy may
`block content that may be severely malicious, say, content
`that accesses an operating system or a file system, and may
`
`
`
`Case 4:18-cv-07229-YGR Document 195-6 Filed 05/10/21 Page 14 of 25
`
`US 8,225,408 B2
`
`5
`permit content that is less malicious, such as content that can
`consume a user's computer screen as in the example above.
`The diagnostics within a content security profile are com(cid:173)
`pared with the intranet security policy, and a decision is made
`to allow or block the content. When content is blocked, one or
`more alternative actions can be taken, such as replacing sus(cid:173)
`picious portions of the content with innocuous code and
`allowing the modified content, and sending a notification to
`an intranet administrator.
`Scanned content and their corresponding security profiles
`are preferably stored within a content cache 140. Preferably,
`network gateway 110 checks if incoming content is already
`resident in cache 140, and, if so, bypasses content scanner
`130. Use of cache 140 saves content scanner 130 the task of 15
`re-scanning the same content.
`Alternatively, a hash value of scanned content, such as an
`MD5 hash value, can be cached instead of caching the content
`itself. When content arrives at scanner 130, preferably its hash
`value is computed and checked against cached hash values. If
`a match is found with a cached hash value, then the content
`does not have to be re-scanned and its security profile can be
`obtained directly from cache.
`Consider, for example, a complicated JavaScript file that is
`scanned and determined to contain a known exploit there(cid:173)
`within. An MD5 hash value of the entire JavaScript file can be
`stored in cache, together with a security profile indicating that
`the JavaScript file contains the known exploit. If the same
`JavaScript file arrives again, its hash value is computed and
`found to already reside in cache. Thus, it can immediately be 30
`determined that the JavaScript file contains the known
`exploit, without re-scanning the file.
`It may be appreciated by those skilled in the art that cache
`140 may reside at network gateway 110. However, it is often
`advantageous to place cache 140 as close as possible to the 35
`corporate intranet, in order to transmit content to the intranet
`as quickly as possible. However, in order for the security
`profiles within cache 140 to be up to date, it is important that
`network gateway 110 notify cache 140 whenever content
`scanner 130 is updated. Updates to content scanner 130 can 40
`occur inter alia when content scanner 130 is expanded (i) to
`cover additional content languages; (ii) to cover additional
`exploits; or (iii) to correct for bugs.
`Preferably, when cache 140 is notified that content scanner
`130 has been updated, cache 140 clears its cache, so that 45
`content that was in cache 140 is re-scanned upon arrival at
`network gateway 110.
`Also, shown in FIG. 1 is a pre-scanner 150 that uses con(cid:173)
`ventional signature technology to scan content. As mentioned
`hereinabove, pre-scanner 150 can quickly determine if con- 50
`tent is innocuous, but over-blocks on the safe side. Thus
`pre-scanner 150 is useful for recognizing content that poses
`no security threat. Preferably, pre-scanner 150 is a simple
`signature matching scanner, and processes incoming content
`at a rate of approximately 100 mega-bits per second. ARB 55
`scanner 130 performs much more intensive processing than
`pre-scanner 150, and processes incoming content at a rate of
`approximately 1 mega-bit per second.
`In order to accelerate the scanning process, pre-scanner
`150 acts as a first-pass filter, to filter content that can be 60
`quickly recognized as innocuous. Content that is screened by
`pre-scanner 150 as being potentially malicious is passed
`along to ARB scanner 130 for further diagnosis. Content that
`is screened by pre-scanner 150 as being innocuous bypasses
`ARB scanner 130. It is expected that pre-scanner 150 filters 65
`90% of incoming content, and that only 10% of the content
`requires extensive scanning by ARB scanner 130. As such,
`
`6
`the combined effect of ARB scanner 130 and pre-scanner 150
`provides an average scanning throughout of approximately 9
`mega-bits per second.
`Use of security profiles, security policies and caching is
`5 described in applicant's U.S. Pat. No. 6,092,194 entitled
`SYSTEM AND METHOD FOR PROTECTING A COM(cid:173)
`PUTER AND A NETWORK FROM HOSTILE DOWN(cid:173)
`LOADABLES, in applicant's U.S. Pat. No. 6,804,780
`entitled SYSTEM AND METHOD FOR PROTECTING A
`10 COMPUTER AND A NETWORK FROM HOSTILE
`DOWNLOADABLES, and in applicant's U.S. Pat. No.
`7,418,731 entitled METHOD AND SYSTEM FOR CACH(cid:173)
`ING AT SECURE GATEWAYS.
`Reference is now made to FIG. 2, which is a simplified
`block diagram of an adaptive rule-based content scanner sys(cid:173)
`tem 200, in accordance with a preferred embodiment of the
`present invention. An ARB scanner system is preferably
`designed as a generic architecture that is language-indepen(cid:173)
`dent, and is customized for a specific language through use of
`20 a set of language-specific rules. Thus, a scanner system is
`customized for JavaScript by means of a set of JavaScript
`rules, and is customi