`
`1111111111111111111111111111111111111111111111111111111111111
`US007210041Bl
`
`c12) United States Patent
`Gryaznov et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 7,210,041 B1
`Apr. 24, 2007
`
`(54) SYSTEM AND METHOD FOR IDENTIFYING
`A MACRO VIRUS FAMILY USING A MACRO
`VIRUS DEFINITIONS DATABASE
`
`(75)
`
`Inventors: Dmitry 0. Gryaznov, Portland, OR
`(US); Viatcheslav Peternev, Bucks
`(GB); Igor Muttik, Herts (GB)
`
`(73) Assignee: McAfee, Inc., Santa Clara, CA (US)
`
`( *) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 738 days.
`
`(21) Appl. No.: 09/846,103
`
`(22) Filed:
`
`Apr. 30, 2001
`
`(51)
`
`(52)
`
`(58)
`
`(56)
`
`Int. Cl.
`G06F 11100
`(2006.01)
`G06F 17130
`(2006.01)
`U.S. Cl. ........................ 713/188; 714/38; 395/182;
`395/183
`Field of Classification Search ................ 713/200,
`713/201; 395/183, 575
`See application file for complete search history.
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`5,414,833 A *
`5,448,668 A *
`5,452,442 A *
`5,485,575 A *
`5,951,698 A *
`5,960,170 A *
`6,016,546 A *
`
`5/1995 Hershey eta!. ............. 713/201
`9/1995 Perelson eta!. .............. 714/21
`9/1995 Kephart ....................... 714/38
`111996 Chess et al ................... 714/38
`9/1999 Chen eta!. ................... 714/38
`9/1999 Chen eta!. ................... 714/38
`112000 Kephart et a!.
`............. 713/200
`
`6,067,410 A *
`6,577,920 B1 *
`6,647,400 B1 *
`6,721,721 B1 *
`6,748,534 B1 *
`6,892,303 B2 *
`6,963,978 B1 *
`7,093,135 B1 *
`OTHER PUBLICATIONS
`
`5/2000 Nachenberg ................. 703/28
`6/2003 Hypponen et al ........... 700/200
`11/2003 Moran ........................ 707/205
`4/2004 Bates et a!. .................... 707/1
`6/2004 Gryaznov eta!. .......... 713/188
`5/2005 Le Pennec eta!. ......... 713/188
`1112005 Muttik et al ................ 713/188
`8/2006 Radatti eta!. .............. 713/188
`
`Office Action Sununary from U.S. Appl. No. 09/579,810 which was
`mailed on Feb. 25, 2005.
`* cited by examiner
`Primary Examiner-Nasser Moazzami
`Assistant Examiner--Carl Colin
`(74) Attorney, Agent,
`or Firm-Zilka-Kotab, PC;
`Christopher J. Hamaty
`
`(57)
`
`ABSTRACT
`
`A macro virus definitions database is maintained and
`includes a set of indices and associated macro virus defini(cid:173)
`tion data files. One or more of the macro virus definition data
`files are referenced by the associated index. Each macro
`virus definition data file defines macro virus attributes for
`known macro viruses. The sets of the indices and the macro
`virus definition data files are organized according to macro
`virus families. One or more strings stored in a suspect file are
`compared to the macro virus attributes defined in the one or
`more macro virus definition data files for each macro virus
`family in the macro virus definitions database. The macro
`virus family to which the suspect file belongs is determined
`from the indices for each of the macro virus definition data
`files at least partially containing the suspect file.
`
`16 Claims, 21 Drawing Sheets
`
`26
`
`Suspect
`String
`
`Suspect
`File
`
`28
`_____________________ { _______________ ,
`Macro Virus Definitions
`:
`'
`
`29
`
`20
`-../'
`
`Parser
`
`30
`
`31
`
`32
`
`33
`
`Spreadsheet
`
`Presentation
`
`Generic
`
`21
`
`-../'
`
`Family Finder
`
`-../'
`
`String Finder
`
`-../'
`
`Updater
`
`24
`-../'
`
`Checker
`
`25
`-../'
`
`Lister
`
`Report
`
`35
`
`Blue Coat Systems - Exhibit 1009 Page 1
`
`
`
`Figure 1.
`
`10
`
`12
`
`Server
`
`13
`
`Client
`
`e •
`
`00
`•
`~
`~
`~
`
`~ = ~
`
`>
`'e :-:
`~ ...
`
`N
`
`N
`0
`0
`-....l
`
`15
`
`Gateway
`
`Server
`
`12
`
`14
`
`13 V'l Client
`
`I
`
`I
`
`Client r13
`
`MVC
`
`lV-- 16
`
`('D
`('D
`
`rFJ =(cid:173)
`.....
`....
`0 .....
`N ....
`
`d
`rJl
`
`"'--...1
`N
`
`""""' "'= = ~
`""""' = """"'
`
`Blue Coat Systems - Exhibit 1009 Page 2
`
`
`
`U.S. Patent
`
`Apr. 24, 2007
`
`Sheet 2 of 21
`
`US 7,210,041 B1
`
`.......
`c::
`Q)
`
`()
`
`Q)
`
`E
`....... en
`>.
`C/)
`
`(].)
`0)
`~
`0
`.......
`C/)
`
`Blue Coat Systems - Exhibit 1009 Page 3
`
`
`
`28
`---------------------~---------------,
`
`Macro Virus Definitions
`
`1
`
`Root ./' 29
`
`-
`
`-
`
`-
`
`Word Processor
`
`Spreadsheet
`
`30
`
`31
`
`Presentation
`
`32
`
`I
`I
`:
`I
`I
`I
`L - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -1
`
`Generic
`Generic
`
`k../' 33
`f
`
`l_____j
`
`1
`
`Figure 3.
`
`26
`
`Suspect
`String
`
`Suspect
`File
`
`27
`
`e •
`
`00
`•
`~
`~
`~
`
`~ = ~
`
`34
`
`Log File I
`
`Macro Virus Checker
`
`16
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`Parser
`
`Family Finder
`
`String Finder
`
`Updater
`
`Checker
`
`Lister
`
`- - - - - - - -
`
`Report
`
`35
`
`>
`'e :-:
`
`N
`.j;o.
`
`~
`
`N
`0
`0
`-....l
`
`rFJ =-('D
`......
`(.H
`
`('D
`
`0 .....
`N ....
`
`d
`rJl
`-....l
`'N
`
`""""' = = ~
`""""' = """"'
`
`Blue Coat Systems - Exhibit 1009 Page 4
`
`
`
`U.S. Patent
`
`Apr. 24, 2007
`
`Sheet 4 of 21
`
`US 7,210,041 B1
`
`I
`I
`I
`I
`I
`I
`I
`I
`I
`
`I
`
`---
`
`----
`
`- - - -
`
`- - -
`
`" "
`
`/
`
`/
`
`---
`
`I
`1
`1
`1
`/
`I
`I I
`II
`
`/
`
`I
`I
`I
`
`I
`
`I
`
`/
`
`/
`
`/
`
`/
`
`"
`
`/
`
`/
`
`----------
`
`'<'"""
`0
`0
`0
`0
`0
`0
`0
`
`-0
`
`&.
`
`~I
`
`Blue Coat Systems - Exhibit 1009 Page 5
`
`
`
`U.S. Patent
`
`Apr. 24, 2007
`
`Sheet 5 of 21
`
`US 7,210,041 B1
`
`II index of first string in chain
`
`II index of first index in chain
`
`Figure 5.
`
`typedef struct
`int
`TStrings
`TStrings
`int
`int
`unsigned long
`int
`int
`} TParselnfo;
`
`tag Parse Info{
`FilesNum;
`*Strings;
`*Lines;
`StringsNum;
`TopString;
`RepiFiags;
`LinesNum;
`TopLine;
`
`Figure 6.
`
`typedef struct
`char
`unsigned char
`unsigned char
`int
`} TStrings;
`
`tag Strings{
`*String;
`Type;
`Use;
`Next;
`
`Blue Coat Systems - Exhibit 1009 Page 6
`
`
`
`Figure 7.
`
`70
`
`71
`
`Parse Info ~ I
`
`-------
`I L String~~
`
`Strings1
`
`-----
`
`I
`
`72
`
`Strings3
`
`I
`
`I I
`
`Strings4
`
`_I J
`
`73a
`
`73b
`
`73c
`
`73d
`
`Lines1
`
`I I Lines2
`
`I I Lines3
`
`I I Lines4
`
`I I
`
`Lines5
`
`74
`
`I
`
`I
`
`75a
`
`75b
`
`75c
`
`75d
`
`75e
`
`e •
`
`00
`•
`~
`~
`~
`
`~ = ~
`
`>
`'e
`:-:
`~ ...
`N
`
`N
`0
`0
`-....l
`
`('D
`
`rFJ =-('D
`.....
`0\
`0 .....
`N ....
`
`d
`rJl
`-....l
`'N
`
`""""' = = ~
`""""' = """"'
`
`Blue Coat Systems - Exhibit 1009 Page 7
`
`
`
`U.S. Patent
`
`Apr. 24, 2007
`
`Sheet 7 of 21
`
`US 7,210,041 B1
`
`Figure 8.
`
`Start
`
`80
`
`Open storage file
`
`81
`
`Set parameters
`
`84
`
`85
`
`Full report
`
`86
`
`87
`
`88
`
`89
`
`90
`
`91
`
`Blue Coat Systems - Exhibit 1009 Page 8
`
`
`
`U.S. Patent
`
`Apr. 24, 2007
`
`Sheet 8 of 21
`
`US 7,210,041 B1
`
`Figure 9A.
`
`100
`
`Parse file
`
`Set log file
`
`101
`
`102
`
`Initialize found array
`
`103
`
`Set search entry to
`first entry
`
`104
`
`While search entry if:.
`null, do I* search *I
`
`105
`
`Open index file
`
`Get list of strings
`
`Set current index to
`first string in chain
`
`106
`
`107
`
`108
`
`While current index 2:
`0,do
`I * compare string * I
`
`109
`
`111
`
`112
`
`Increment same
`string count
`
`Blue Coat Systems - Exhibit 1009 Page 9
`
`
`
`U.S. Patent
`
`Apr. 24, 2007
`
`Sheet 9 of 21
`
`US 7,210,041 B1
`
`Figure 98.
`
`Set current index to
`next index in chain
`
`113
`
`End do
`I* compare string *I
`
`114
`
`No
`
`Level Text
`
`115
`
`Set current index to
`first index in chain
`
`116
`
`While current index~
`fl,do
`I * compare text * I
`
`117
`
`Find line
`
`118
`
`120
`
`119
`
`Increment same
`text count
`
`Set current index to
`next index in chain
`
`121
`
`End do
`I * compare text *I
`
`122
`
`Blue Coat Systems - Exhibit 1009 Page 10
`
`
`
`U.S. Patent
`
`Apr. 24, 2007
`
`Sheet 10 of 21
`
`US 7,210,041 B1
`
`Figure 9C.
`
`Save best results
`
`123
`
`Set search entry to
`next entry
`
`End do
`I* search *I
`
`124
`
`125
`
`Output report
`
`126
`
`Return
`
`Blue Coat Systems - Exhibit 1009 Page 11
`
`
`
`U.S. Patent
`
`Apr. 24, 2007
`
`Sheet 11 of 21
`
`US 7,210,041 B1
`
`Figure 10A.
`
`130
`
`Set log file
`
`Set search entry to
`first entry
`
`131
`
`132
`
`While search entry =f:.
`null, do I* search *I
`
`133
`
`Open file index
`
`134
`
`Set found to first
`byte flag
`
`135
`
`While found, do
`I* compare string*/
`
`136
`
`Open Oat file
`
`Find line
`
`Set found to next
`byte flag
`
`137
`
`138
`
`139
`
`End do
`I* compare string *I
`
`140
`
`Set search entry to
`next entry
`
`141
`
`Blue Coat Systems - Exhibit 1009 Page 12
`
`
`
`U.S. Patent
`
`Apr. 24, 2007
`
`Sheet 12 of 21
`
`US 7,210,041 B1
`
`Figure 108.
`
`End do
`I* search */
`
`142
`
`Output report
`
`143
`
`Return
`
`Blue Coat Systems - Exhibit 1009 Page 13
`
`
`
`U.S. Patent
`
`Apr. 24, 2007
`
`Sheet 13 of 21
`
`US 7,210,041 B1
`
`Figure 11A.
`
`Update
`
`150
`
`Set log file
`
`151
`
`Get first entry
`
`152
`
`While entry, do
`I* process entries*/
`
`153
`
`Reset index file
`
`154
`
`Find first scan item
`
`155
`
`While scan item, do
`I * process item *I
`
`156
`
`Initialize parser
`
`157
`
`Parse file
`
`158
`
`Store item header
`
`159
`
`Set current index to
`first index in chain
`
`160
`
`While current index~ 0,
`do I * store strings *I
`
`161
`
`Blue Coat Systems - Exhibit 1009 Page 14
`
`
`
`U.S. Patent
`
`Apr. 24, 2007
`
`Sheet 14 of 21
`
`US 7,210,041 B1
`
`Figure 11B.
`
`Store string at Strings
`[ current index]
`
`Set current index to
`next index in chain
`
`162
`
`163
`
`End do
`I * store strings *I
`
`164
`
`Set current index to
`first index in chain
`
`165
`
`While current index> 0,
`do I* store text *I
`
`166
`
`Store text at lines
`[ current index ]
`
`Set current index to
`next index in chain
`
`167
`
`168
`
`End do
`I * store text *I
`
`169
`
`Get next scan item
`
`170
`
`End do
`I* process item*/
`
`171
`
`Blue Coat Systems - Exhibit 1009 Page 15
`
`
`
`U.S. Patent
`
`Apr. 24, 2007
`
`Sheet 15 of 21
`
`US 7,210,041 B1
`
`Figure 11C.
`
`Close index file
`
`Get next entry
`
`172
`
`173
`
`End do
`I* process entries*/
`
`174
`
`Blue Coat Systems - Exhibit 1009 Page 16
`
`
`
`U.S. Patent
`
`Apr. 24, 2007
`
`Sheet 16 of 21
`
`US 7,210,041 B1
`
`Figure 12A.
`
`180
`
`Get first entry
`
`181
`
`While entry, do
`I * process entries *I
`
`182
`
`Open index file
`
`183
`
`Find first scan item
`
`184
`
`While scan item, do
`I* process item *I
`
`185
`
`Get first file object
`
`186
`
`While file object, do
`I * process file * I
`
`187
`
`Initialize parser
`
`188
`
`Parse file
`
`Set found to first
`byte flags
`
`189
`
`190
`
`While found, do
`I * process family * I
`
`191
`
`Blue Coat Systems - Exhibit 1009 Page 17
`
`
`
`U.S. Patent
`
`Apr. 24, 2007
`
`Sheet 17 of 21
`
`US 7,210,041 B1
`
`Figure 128.
`
`Yes
`
`192
`
`Open Oat file
`
`Set current index to
`first index in chain
`
`193
`
`194
`
`While current index~ ;},
`do I* compare string*/
`
`195
`
`Find string
`
`196
`
`197
`
`198
`
`Increment same
`string count
`
`Set current index to
`next index in chain
`
`199
`
`End do
`I* compare string*/
`
`200
`
`No
`
`201
`
`>0?
`
`c
`
`Blue Coat Systems - Exhibit 1009 Page 18
`
`
`
`U.S. Patent
`
`Apr. 24, 2007
`
`Sheet 18 of 21
`
`US 7,210,041 B1
`
`Figure 12C.
`
`c
`
`Set current index to
`first index in chain
`
`202
`
`While current index~ ,0,
`do I* compare text *I
`
`203
`
`Find line
`
`204
`
`206
`
`205
`
`Yes
`
`Increment same
`text count
`
`Set current index to
`next index in chain
`
`207
`
`End do
`I* compare text*/
`
`208
`
`Set found to next
`byte flags
`
`209
`
`End do
`J *process family*/
`
`210
`
`Get next file object
`
`211
`
`End do
`I * process file *I
`
`212
`
`Blue Coat Systems - Exhibit 1009 Page 19
`
`
`
`U.S. Patent
`
`Apr. 24, 2007
`
`Sheet 19 of 21
`
`US 7,210,041 B1
`
`Figure 120.
`
`Find next scan item
`
`213
`
`End do
`I* process item*/
`
`214
`
`Close index file
`
`Get next entry
`
`215
`
`216
`
`End do
`I * process entries *I
`
`217
`
`Blue Coat Systems - Exhibit 1009 Page 20
`
`
`
`U.S. Patent
`
`Figure 13A.
`
`Apr. 24, 2007
`
`Sheet 20 of 21
`
`US 7,210,041 B1
`
`220
`
`Get first entry
`
`221
`
`While entry, do
`I* process entries*/
`
`222
`
`Open index file
`
`223
`
`Set found to first
`byte flags
`
`224
`
`While found, do
`I* process items*/
`
`225
`
`For i=O to Header Level
`
`226
`
`Set Head to Headers [i]
`
`227
`
`228
`
`Yes
`
`Print index offset, header
`level, name, replication
`flags, next sibling,
`cluster, and Oat offset
`
`229
`
`Blue Coat Systems - Exhibit 1009 Page 21
`
`
`
`U.S. Patent
`
`Apr. 24, 2007
`
`Sheet 21 of 21
`
`US 7,210,041 B1
`
`Figure 138.
`
`End for
`
`Set found to next
`byte flags
`
`End do
`1 *process items*/
`
`Close index file
`
`230
`
`231
`
`232
`
`233
`
`Set entry to next entry
`
`234
`
`End do
`I * process entries *I
`
`235
`
`Blue Coat Systems - Exhibit 1009 Page 22
`
`
`
`US 7,210,041 B1
`
`1
`SYSTEM AND METHOD FOR IDENTIFYING
`A MACRO VIRUS FAMILY USING A MACRO
`VIRUS DEFINITIONS DATABASE
`
`A portion of the disclosure of this patent document
`contains material which is subject to copyright protection.
`The copyright owner has no objection to the facsimile
`reproduction by anyone of the patent document or disclo(cid:173)
`sure, as the patent document or disclosure appear in the
`Patent and Trademark Office patent file or records, but
`otherwise reserves all copyright rights whatsoever.
`
`FIELD OF THE INVENTION
`
`The present invention relates in general to macro virus 15
`identification and, in particular, to a system and a method for
`identifying a macro virus family using a macro virus defi(cid:173)
`nitions database.
`
`BACKGROUND OF THE INVENTION
`
`2
`anti-virus experts to discover new viruses are ad hoc and
`primarily reactive, rather than proactive. Typically, suspect
`files or objects are sent to the virus detection centers by
`concerned users who have often already suffered some
`adverse side effect from a possible virus. In times past, virus
`detection centers had more time during which to identify and
`analyze viruses, and to implement patches and anti-viral
`measures that could be disseminated before widespread
`infection occurred. Today, however, viruses often travel by
`10 e-mail and other forms of electronic communication and can
`infect entire networks at an alarming rate. As a result, the
`present manual processes for detecting new viruses are
`woefully slow and generally incapable of responding in a
`timely fashion.
`Similarly, existing anti-virus software fails to provide an
`adequate solution to protecting and defeating new viruses.
`These types of software are designed to pattern scan and
`search out those viruses already positively identified by
`anti-virus software vendors. Invidious writers of computer
`20 viruses constantly strive to create new forms of viruses and
`easily evade existing anti-virus measures.
`Therefore, there is a need for an approach to automatically
`identifying new forms of computer viruses and, in particular,
`macro computer viruses. Preferably, such an approach
`25 would be capable of identifYing candidate virus families
`when presented with a suspect string or a particular virus
`family when presented with a suspect file or object. More(cid:173)
`over, such an approach would be capable of identifying a
`macro virus within a range of given search parameters.
`
`Computer viruses, or simply "viruses," continue to plague
`unsuspecting users worldwide with malicious and often
`destructive results. Computer viruses propagate through
`infected files or objects and are often disguised as applica(cid:173)
`tion programs or are embedded in library functions, macro
`scripts, electronic mail (email) attachments, applets, and
`even within hypertext links. Typically, a user unwittingly
`downloads and executes the infected file, thereby triggering
`the virus.
`By definition, a computer virus is executable program
`code that is self-replicating and almost universally unsanc(cid:173)
`tioned. More precisely, computer viruses include any form
`of self-replication computer code which can be stored,
`disseminated, and directly or indirectly executed. The ear- 35
`liest computer viruses infected boot sectors and files. Over
`time, computer viruses evolved into numerous forms and
`types, including cavity, cluster, companion, direct action,
`encrypting, multipartite, mutating, polymorphic, overwrit(cid:173)
`ing, self-garbling, and stealth viruses, such as described in 40
`"McAfee.com: Virus Glossary of Terms," Networks Asso(cid:173)
`ciates Technology, Inc., Santa Clara, Calif. (2000), the
`disclosure of which is incorporated by reference.
`In particular, macro viruses have become increasingly
`popular, due in part to the ease with which these viruses can 45
`be written. Macro viruses are written in widely available
`macro programming languages and can be attached to
`document templates or electronic mail. These viruses can be
`easily triggered by merely opening the template or attach(cid:173)
`ment, as graphically illustrated by the recent "Love Bug" 50
`and "Anna Koumikova" macro virus attacks in May 2000
`and February 2001, respectively. The "Love Bug" virus was
`extremely devastating, saturating email systems worldwide
`and causing an estimated tens of millions of dollars worth of
`damage.
`Today, there are over 53,000 known computer viruses and
`new viruses are being discovered daily. The process of
`identifying and cataloging new viruses is manual and labor
`intensive. Anti-virus detections companies employ full-time
`staffs of professionals whose only job is to analyze suspect 60
`files and objects for the presence of viruses. On average,
`training an anti-virus specialist can take six months or
`longer. These professionals are hard pressed to keep up with
`the constant challenge of discovering and devising solutions
`to new viruses.
`In the prior art, few automated tools for identifYing new
`viruses exist. On the front line, the processes employed by
`
`30
`
`SUMMARY OF THE INVENTION
`
`The present invention provides an automated system and
`method for maintaining and accessing a database of macro
`virus definitions. The database is organized by macro virus
`families, as characterized by replication method. In addition,
`the database stores string constants and source code text
`representative of and further characterizing macro families.
`A suspect string can be compared to the macro virus
`definitions maintained in the database to determine those
`macro virus families to which the string likely belongs.
`Similarly, a suspect file or object can be compared to the
`macro virus definitions in the database to determine the
`likely family to which the suspect file or object belongs.
`Thresholds specifying the percentage of common string
`constants and common text lines, as well as minimal length
`of sting constants, can be specified.
`An embodiment of the present invention is a system and
`a method for identifYing a macro virus family using a macro
`virus definitions database. A macro virus definitions data(cid:173)
`base is maintained and includes a set of indices an macro
`virus definition data files. Each index references one or more
`of the macro virus definition data files. Each macro virus
`definition data file defines macro virus attributes for known
`55 macro viruses. The sets of the indices and the macro virus
`definition data files are organized according to macro virus
`families in each respective index and macro virus definition
`data file set. A suspect string is compared to the macro virus
`attributes defined in the one or more macro virus definition
`data files for each macro virus family in the macro virus
`definitions database. Each macro virus family to which the
`suspect string belongs is determined from the index for each
`macro virus definition data file at least partially containing
`the suspect string.
`A further embodiment is a system and a method for
`identifying a macro virus family using a macro virus defi(cid:173)
`nitions database. A macro virus definitions database is
`
`65
`
`Blue Coat Systems - Exhibit 1009 Page 23
`
`
`
`US 7,210,041 B1
`
`3
`maintained and includes a set of indices and associated
`macro virus definition data files. One or more of the macro
`virus definition data files are referenced by the associated
`index. Each macro virus definition data file defines macro
`virus attributes for known macro viruses. The sets of the
`indices and the macro virus definition data files are orga(cid:173)
`nized according to macro virus families. One or more strings
`stored in a suspect file are compared to the macro virus
`attributes defined in the one or more macro virus definition
`data files for each macro virus family in the macro virus
`definitions database. The macro virus family to which the
`suspect file belongs is determined from the indices for each
`of the macro virus definition data files at least partially
`containing the suspect file.
`Still other embodiments of the present invention will
`become readily apparent to those skilled in the art from the
`following detailed description, wherein is described embodi(cid:173)
`ments of the invention by way of illustrating the best mode
`contemplated for carrying out the invention. As will be
`realized, the invention is capable of other and different
`embodiments and its several details are capable of modifi(cid:173)
`cations in various obvious respects, all without departing
`from the spirit and the scope of the present invention.
`Accordingly, the drawings and detailed description are to be
`regarded as illustrative in nature and not as restrictive.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a functional block diagram of a distributed
`computing environment, including a system for identifying
`a macro virus family using a macro virus definitions data(cid:173)
`base, in accordance with the present invention.
`FIG. 2 is a block diagram of the system for identifYing a
`macro virus family of FIG. 1.
`FIG. 3 is a block diagram showing the software modules 35
`implemented in the system of FIG. 1.
`FIG. 4 is a data structure diagram showing the cataloging
`of macro virus definitions.
`FIG. 5 is a data structure diagram showing a parse tree
`header.
`FIG. 6 is a data structure diagram showing a strings block.
`FIG. 7 is a data structure diagram showing, by way of
`example, a parse tree constructed using the data structures of
`FIGS. 5 and 6.
`FIG. 8 is a flow diagram showing a method for identifying 45
`a macro virus family using a macro virus definitions data(cid:173)
`base in accordance with the present invention.
`FIGS. 9A-9C are flow diagrams showing the routine for
`finding a macro virus family for use in the method of FIG.
`8.
`
`4
`networked computing environment 10 includes one or more
`servers 12 interconnected to one or more clients 13 over an
`internetwork 11, such as the Internet. Each server 12 pro(cid:173)
`vides client services, such as information retrieval and file
`serving. Alternatively, the clients could be interconnected
`with the server 12 using a direct connection, over a dial-up
`connection, via an intranetwork 14, by way of a gateway 15,
`or by a combination of the foregoing or with various other
`network configurations and topologies, as would be recog-
`10 nized by one skilled in the art.
`A client 13, or alternatively a server 12, implements a
`macro virus checker (MVC) 16 for identifying macro virus
`attributes using a macro virus definitions database, as further
`described below with reference to FIG. 2. During operation,
`15 a user can submit a suspect string to the macro virus checker
`16 to identifY candidate virus families to which the suspect
`string may belong. Alternatively, the user can submit a file
`or object to the macro virus checker 16 to identifY a
`candidate virus family to which the suspect file or object
`20 belongs.
`The individual computer systems, including the servers
`12 and clients 13, are general purpose, programmed digital
`computing devices consisting of a central processing unit
`(CPU), random access memory (RAM), non-volatile sec-
`25 ondary storage, such as a hard drive or CD ROM drive,
`network interfaces, and peripheral devices, including user
`interfacing means, such as a keyboard and display. Program
`code, including software programs, and data are loaded into
`the RAM for execution and processing by the CPU and
`30 results are generated for display, output, transmittal, or
`storage.
`FIG. 2 is a block diagram showing the system for iden-
`tifYing a macro virus family of FIG. 1. By way of example,
`the macro virus checker 16 executes on a client 13 coupled
`to a secondary storage device 17. The system is preferably
`implemented in software as a macro virus checker 16
`operating on the client 13, or on the server 12 (shown in FIG.
`1) or any similar general purpose programmed digital com(cid:173)
`puting device. The storage device 17 includes a file system
`40 18 within which files and related objects are persistently
`stored. In addition, the client 13 interfaces to other comput(cid:173)
`ing devices and resources via an intranetwork 14, an inter(cid:173)
`network 11 (shown in FIG. 1), or other type of network or
`communications interface.
`FIG. 3 is a block diagram showing the software modules
`implementing the macro virus checker 16 of the system of
`FIG. 1. Each module is a computer program, procedure or
`module written as source code in a conventional program(cid:173)
`ming language, such as the C++ programming language, and
`50 is presented for execution by the CPU as object or byte code,
`as is known in the art. The various implementations of the
`source code and object and byte codes can be held on a
`computer-readable storage medium or embodied on a trans(cid:173)
`mission medium in a carrier wave. The macro virus checker
`55 16 operates in accordance with a sequence of process steps,
`as further described below beginning with reference to FIG.
`8. The Appendix includes a source code listing for a com(cid:173)
`puter program in the C++ programming language imple-
`menting the macro virus checker 16.
`The macro virus checker 16 consists of six intercooper-
`ating modules: parser 20, family finder 21, string finder 22,
`updater 23, checker 24, and lister 25. Operationally, the
`macro virus checker 16 receives as an input either a suspect
`string 26 or a suspect file 27 or object (hereinafter simply
`65 "suspect file") for comparison to the database of macro virus
`definitions 28. The suspect string 26 or suspect file 27 is
`parsed by the parser 20 to identify individual tokens. In the
`
`60
`
`FIGS. lOA-lOB are flow diagrams showing the routine
`for finding a string for use in the method of FIG. 8.
`FIGS. 11A-11C are flow diagrams showing the routine
`for updating the virus definitions database for use in the
`method of FIG. 8.
`FIGS. 12A-12D are flow diagrams showing the routine
`for checking the virus definitions database for use in the
`method of FIG. 8.
`FIGS. 13A-13B are flow diagrams showing the routine
`for listing the macro virus definitions.
`
`DETAILED DESCRIPTION
`
`FIG. 1 is a functional block diagram showing a distributed
`computing environment 10, including a system for identi(cid:173)
`fying a macro virus family, using a macro virus definitions
`database, in accordance with the present invention. The
`
`Blue Coat Systems - Exhibit 1009 Page 24
`
`
`
`US 7,210,041 B1
`
`5
`described embodiment, the parser 20 removes comments
`and extraneous information from the suspect string 26 and
`suspect file 27. The parser 20 processes the suspect string 26
`and suspect filed 27 on a line-by-line basis and generates a
`hierarchical parse tree, as is known in the art.
`During analysis, a suspect string 26 or suspect file 27
`(shown in FIG. 3) is parsed into individual tokens stored in
`a parse tree. As further described below with reference to
`FIG. 7, parse tree stores individual string constants and
`source code text as two linked lists rooted using a parse
`information header.
`Once parsed, a number of operations can be performed on
`the parse tree. First, the macro virus family to which the
`suspect file 27 belongs can be identified using the family
`finder 21, as further described below with reference to FIGS.
`9A-9C. Similarly, the candidate macro virus families to
`which the suspect string 26 belongs can be identified by the
`string finder 22, as further described below with reference to
`FIGS. 9A-9C. The macro virus definitions database 28 can
`be updated using the updater 23, as further described below 20
`with reference to FIGS. lOA-lOB. Likewise, the macro
`virus definitions database 28 can be checked for cross(cid:173)
`references using the checker 24, as further described below
`with reference to FIGS. 12A-12D. Finally, the file names of
`the macro virus definition families can be listed using the
`lister 25, as further described below with reference to FIGS.
`13A-13B.
`The macro virus definitions database 28 is hierarchically
`organized into macro virus families based on the type of
`application to which the macro applies. By way of example, 30
`the macro virus definitions database 28 can include a root
`directory 29, below which word processor 30, spreadsheet
`31, presentation 32, and generic 33 subdirectories can con(cid:173)
`tain individual indices and macro virus definition data (.dat)
`files, as further described below with reference to FIG. 4.
`The results of the operations performed by the macro virus
`checker 16 on the suspect string 26 or suspect file 27 are
`output in a report 35 and details of the analysis are provided
`in a log file 34.
`FIG. 4 is a data structure diagram 40 showing the index(cid:173)
`ing of a macro virus definitions family. An index maintained
`in index files, route.idx 41 stores pointers to locations in
`individual .dat files OOOOOOOOl.dat 42, 000000002.dat 43
`and 000000002.dat 44 files. Each of the .dat files 42-44 store
`information describing a macro virus family, as character(cid:173)
`ized by the replication method used by the virus. In the
`described embodiment, the replication methods include
`types "organizer," "macro copy," "import," "replace line,"
`"insert lines," "add from string," and "add from file."
`In addition, each .dat file contains any sting constants and
`lines of source code text, without comments, common to all
`replicants of the macro virus. The macro virus definition is
`assigned a name to aid in the understanding by the user.
`Macro viruses are further described in M. Ludwig, "The
`Giant Black Book of Computer Viruses," Ch. 14, American
`Eagle Pubs, Inc., Show Low, Ariz. (2nd ed. 1998), the
`disclosure of which is incorporated by reference.
`FIG. 5 is a data structure diagram showing the structure
`of the header 50 Tparseinfo for storing parse information.
`The header includes a count of the number of files
`FilesNUM from which the suspect file 27 originates, point(cid:173)
`ers to the string constants Strings and source code text Lines,
`an index to the first string for the sting constants TopString,
`an index to the first string for the source code text TopLine,
`and a count of the number of strings StringsNum and source
`code text LinesNum. Finally, the parse information header
`
`6
`includes a byte flag ReplFlags storing an indication of the
`type of replication method used.
`FIG. 6 is a data structure diagram showing the structure
`of each node TStrings 60 in which each of the sets of parsed
`tokens for the string constants and source code text are
`stored. The actual token is stored as a character string String
`along with the type and use of the string. A pointer Next
`points to the next node in the linked list.
`FIG. 7 is a data structure diagram showing, by way of
`10 example, a parse tree 70 for a suspect file 27 (shown in FIG.
`3). The parse information header TParseinfo 71 points to the
`first node 73a-d, 7Sa-e in each of the respective linked lists
`for the main constants Strings 72 and source code text Lines
`74. Each of the individual nodes in the strings linked list 72
`15 and lines linked list 74 point to the next node in each list. The
`linked lists wrap back around such that each list forms a
`continuous chain. The first string (for string constants) or
`index (for source code text) in each chain is respectively
`identified by a counter TopString or TopLine, as further
`described above with reference to FIG. 5.
`FIG. 8 is a flow diagram showing a method 80 for
`identifying macro virus attributes using macro virus defini(cid:173)
`tions database 28 (shown in FIG. 3) in accordance with the
`present invention. The method provides an environment in
`25 which the macro virus definitions database 28 can be main(cid:173)
`tained and accessed to determine macro virus attributes and
`family membership for a suspect string 26 or a suspect file
`27.
`The method 80 begins with the initialization of a working
`environment. First, the storage file, that is, the directory
`containing the macro family description datafile, is opened
`(block 81). Next, the log file 34 (shown in FIG. 3) is set
`(block 82) and the initialization file is opened (block 83).
`Any parameters specified by the user are set, in addition to
`35 any default parameters (block 84). Processing then begins.
`The macro virus checker 16 performs several operations
`based on a user or automatically specified selection (blocks
`85-92) as follows. First, a full report can be generated (block
`86) to present the macro virus definition family stored in the
`40 macro virus de